Skip to content
This repository has been archived by the owner on May 1, 2024. It is now read-only.

Ansible Configuration to Dockerfile #1333

Merged
merged 1 commit into from
Nov 3, 2022
Merged

Conversation

aht007
Copy link
Contributor

@aht007 aht007 commented Jul 13, 2022

ISSUE: #1328
This PR is part of effort aimed at removing Ansible based configurations and replacing them with Dockerfile. Currently Devstack Docker images are built using Ansible based configurations in the configurations repository. Through this effort we will make sure that the Repo has its own Dockerfile which has all the necessary configurations to setup small production and dev environments.

Steps to run this Image with Devstack:

  • Build the Image locally first using the target dev i.e. docker build -t image-name-of-choice --target dev .
  • After the image is built successfully go to the docker compose file of devstack and replace the existing insights image with the one that you built without changing any other configurations there.
  • Run make dev.up.insights in the terminal while in the devstack directory.
  • Additional Note: If you face any auth related errors or 500 while accessing insights, then also run provisioning for insights and analyticsapi services.

@aht007 aht007 marked this pull request as draft July 13, 2022 10:24
@@ -1,11 +1,13 @@
FROM ubuntu:xenial as openedx
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are currently using Ubuntu 20.04 which is the focal release. This was outdated and causing issues hence updated this.

Dockerfile Outdated
apt-get upgrade -qy && \
apt install -y git-core language-pack-en build-essential python3.8-dev python3.8-distutils libmysqlclient-dev && \
apt install -y git-core language-pack-en build-essential python3.8-dev python3-virtualenv \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some more dependencies that were being installed in Ansible configurations too and missing here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm..our edx-platform image uses venv, Tutor uses pyenv, and our cookiecutter template just uses the system Python installation. I think we still need to hash out a standard here.

Dockerfile Outdated
@@ -14,20 +16,116 @@ RUN locale-gen en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8
ENV ANALYTICS_DASHBOARD_CFG /edx/etc/insights.yml
ENV LMS_HOST http://localhost:18000
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some more environment variables. This is just a part of configurations and other variables required by the service are in edx/etc/insights.yml file which is loaded by analytics_dashboard->settings->yaml_config.py file through ANALYTICS_DASHBOARD_CFG environment variable.
Database related settings are overridden from the docker-compose file of devstack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't think we want all these explicitly in the Dockerfile (especially things like localhost references). These feel like they should live in the YAML file described in https://open-edx-proposals.readthedocs.io/en/latest/architectural-decisions/oep-0045-arch-ops-and-config.html#configuration if there's no reasonable universal default, and in the default Django settings otherwise.

WORKDIR /edx/app/analytics_dashboard
COPY requirements /edx/app/analytics_dashboard/requirements
RUN python3.8 -m pip install -r requirements/production.txt
ARG COMMON_APP_DIR="/edx/app"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These variables are just defined here so they can be reused throughout the Dockerfile. They have been defined as ARG instead of ENV because ARG has its lifecycle bound only to Dockerfile whereas ENV variables live throughout the life of Docker container in the form of environment variables As these variables have no usage as environment variables it made more sense to define them as ARG with default values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this context should be added as a comment in the file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was way before we had an initial review round with Kyle. Meanwhile this is still one of the motivation/reasons to use ARG instead of ENV here, but the actual and important point behind using ARGS is the compatibility of Path structure with Tutor and other OpenedX installations as they can be provided during Image building process.

Dockerfile Outdated
RUN npm install -g npm@${INSIGHTS_NPM_VERSION}

# Tried to cache the dependencies by copying related files after the npm install step but npm post install fails in that case.
# COPY package.json ${INSIGHTS_CODE_DIR}/package.json
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to cache the dependencies by copying code related files after the npm install step but npm post install fails in that case so had to copy all the files before installing dependencies

Dockerfile Outdated
ENV ANALYTICS_DASHBOARD_CFG /edx/etc/insights.yml
ENV LMS_HOST http://localhost:18000
ENV HOME /root
ENV PATH "/edx/app/insights/venvs/insights/bin:/edx/app/insights/nodeenvs/insights/bin:$PATH"
Copy link
Contributor Author

@aht007 aht007 Jul 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added python venv bin and nodeenv bin to path so that we don't have to access the whole path again and again and activate the env to install dependencies and run packages from there.

RUN virtualenv -p python3.8 --always-copy ${SUPERVISOR_VENV_DIR}
COPY requirements ${INSIGHTS_CODE_DIR}/requirements

ENV PATH="${INSIGHTS_CODE_DIR}/node_modules/.bin:$PATH"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added node_modules to path so packages are available globally in our Docker image and we can run them directly instead of accessing the whole path again and again

@@ -0,0 +1,11 @@
[program:cms]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the succeeding configuration files were being generated through Jinja templates in Ansible configuration. We decided not to use Jinja templates with Dockerfile and instead copy the content of those files as they aren't modified regularly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these cms.conf and lms.conf files here? It doesn't look the shell scripts they wrap exist in the container, and I certainly hope edx-platform isn't installed in it.

@@ -0,0 +1,26 @@
#!/usr/bin/env bash
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script is used as entry point to the service from devstack and was previously hosted in configurations repo.

@iamsobanjaved
Copy link
Contributor

@kdmccormick we created this PR against openedx-unsupported/devstack#943, tagging you if you could review this PR

@kdmccormick
Copy link
Contributor

Sorry @iamsobanjaved , I don't have the capacity to review any Devstack work.

@kdmccormick
Copy link
Contributor

Jeremy gave me some additional context on this. I'll take a look soon.

Copy link
Contributor

@jmbowman jmbowman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not done reviewing yet, but here are my questions and comments so far.

Dockerfile Outdated
@@ -1,11 +1,13 @@
FROM ubuntu:xenial as openedx
FROM ubuntu:focal as production
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the alias be "app" instead of "production"? That seems to be the most common choice in other service repos: https://github.com/search?q=user%3Aopenedx+%22from+ubuntu%3Afocal%22&type=code . It's also used in our cookiecutter template: https://github.com/openedx/edx-cookiecutters/blob/master/cookiecutter-django-ida/%7B%7Bcookiecutter.repo_name%7D%7D/Dockerfile

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I had named it keeping in view that we are going to have two images one for production and the other for dev environments and we are building the other image(dev) on the top of our base image which is production in this case. If we have consensus that we should call our base image as app instead then I will go ahead and change that.

Dockerfile Outdated
apt-get upgrade -qy && \
apt install -y git-core language-pack-en build-essential python3.8-dev python3.8-distutils libmysqlclient-dev && \
apt install -y git-core language-pack-en build-essential python3.8-dev python3-virtualenv \
python3.8-distutils libmysqlclient-dev libssl-dev openjdk-8-jdk gettext && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need openjdk-8-jdk here? I'm not sure about gettext either. Can we switch to the format from the edx-cookiecutters template that has comments explaining what each package is needed for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I listed these packages for installation keeping in view that they were already being installed in the Ansible configuration for insights. I will remove them if they are unnecessary but I will need input from someone who might have idea as to why they were added here in the first place.

Dockerfile Outdated
apt-get upgrade -qy && \
apt install -y git-core language-pack-en build-essential python3.8-dev python3.8-distutils libmysqlclient-dev && \
apt install -y git-core language-pack-en build-essential python3.8-dev python3-virtualenv \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm..our edx-platform image uses venv, Tutor uses pyenv, and our cookiecutter template just uses the system Python installation. I think we still need to hash out a standard here.

@jmbowman
Copy link
Contributor

Also, I've updated https://openedx.atlassian.net/wiki/spaces/COMM/pages/3483205633/Dockerfile+Best+Practices with a few notes, references, and open questions. Feel free to contribute there as issues come up and get resolved in this PR.

@aht007 aht007 linked an issue Jul 26, 2022 that may be closed by this pull request
Copy link
Contributor

@jmbowman jmbowman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more notes, but not a full review. I'm hoping some of the people I've poked again to provide feedback can do such a full review, or at least help us get to a point where we're comfortable with it as a first attempt.

Dockerfile Outdated
@@ -14,20 +16,116 @@ RUN locale-gen en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8
ENV ANALYTICS_DASHBOARD_CFG /edx/etc/insights.yml
ENV LMS_HOST http://localhost:18000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't think we want all these explicitly in the Dockerfile (especially things like localhost references). These feel like they should live in the YAML file described in https://open-edx-proposals.readthedocs.io/en/latest/architectural-decisions/oep-0045-arch-ops-and-config.html#configuration if there's no reasonable universal default, and in the default Django settings otherwise.

@@ -0,0 +1,11 @@
[program:cms]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these cms.conf and lms.conf files here? It doesn't look the shell scripts they wrap exist in the container, and I certainly hope edx-platform isn't installed in it.

@kdmccormick
Copy link
Contributor

@aht007 @jmbowman a couple questions, which will affect how I review this:

  • Are you trying to make something that would be a suitable base image for Tutor as well? Or is this just for Devstack and edx.org production?
  • Will this, once merged, become a template for other repos to follow? Or are you just trying to get a first iteration merged so you can try it out?

@aht007
Copy link
Contributor Author

aht007 commented Aug 4, 2022

Why are these cms.conf and lms.conf files here? It doesn't look the shell scripts they wrap exist in the container, and I certainly hope edx-platform isn't installed in it.

cms.conf and lms.conf are present in the current Image of insights that we are using with devstack and hence I copied these files too so I don't leave anything out from the current configurations. I myself couldn't find any usage for these scripts and was waiting on review from someone with context as to why they were already present in the current image and if we can remove them safely.

@aht007
Copy link
Contributor Author

aht007 commented Aug 4, 2022

@aht007 @jmbowman a couple questions, which will affect how I review this:

  • Are you trying to make something that would be a suitable base image for Tutor as well? Or is this just for Devstack and edx.org production?
  • Will this, once merged, become a template for other repos to follow? Or are you just trying to get a first iteration merged so you can try it out?

@kdmccormick We are aiming that these images can be used as suitable base images for tutor as well. Also this will serve as a template to follow for other repos once this gets merged so we can apply any learnings we get from here to other tasks as well. You can also have a look at this Github issue for more context.

@kdmccormick
Copy link
Contributor

Thanks @aht007 @jmbowman .

In terms of Tutor compatibility, a couple issues jump out at me.

the /edx/... path structure

Most Tutor images have a structure like:

  • /openedx/$APP_NAME/ -> application repository
  • /openedx/venv/ -> Python virtual environment
  • /openedx/nodeenv/ -> Nodejs environment
  • /openedx/staticfiles/ -> built static assets
  • /openedx/config/ -> config YAML files
  • /openedx/bin/ -> special scripts for managing container (added to PATH)

(They are not all 100% consistent, but that is the general scheme.)

Tutor and its plugins count on this directory structure in certain ways. For example, in the LMS/CMS image, Tutor's YAML config files are mounted into the container at /openedx/config. And when the user mounts an edx-platform folder, it is mounted by default at /openedx/edx-platform.

As far as I know, there is no Insights plugin for Tutor. So, the paths in this image are OK because there's nothing in Tutor for them to be incompatible with. However, if this Dockerfile is to be used as a template for other Tutor-compatible Dockerfiles, then I think it needs to match Tutor's path structure.

patchability

Tutor's Dockerfiles, docker-compose files, etc, are all Jinja2 templates. Tutor exposes a custom patch(...) function to the templates, which allows user-defined plugins to inject custom code ("patches") into the rendered template . There are patch(...) points throughout every Dockerfile, including the edx-platform Dockerfile.

Unless Tutor drops the patching system, I am not sure how it would be able to use these upstream Dockerfiles as base images. (This is something that hadn't occurred to me when we last talked Jeremy, sorry!) @regisb is on vacation until the end of August, but when he gets back I'm really curious what he'd think about this.

@jmbowman
Copy link
Contributor

jmbowman commented Aug 5, 2022

I'm inclined to leave out the lms/cms.conf files and see if anything breaks, although I'm fine if you want to wait for owning squad review to do that; just flag it explicitly as a question for them. I'd like to end up with the simplest/smallest working image, which may involve dropping stuff that's only there for historical reasons; this seems like a good opportunity to attempt that cleanup, even if we ultimately have to add back a few things that turn out to still have valid but non-obvious reasons for being included. (Especially since we could then document why they're present.)

On the paths - can we make the path prefix an argument to the Dockerfile? I suspect installations other than Tutor and edx.org have even different choices, and it would be nice to accommodate those differences if we can do so without adding major complexity to the Dockerfile. I suspect changing the entire root path of an installation would be challenging due to paths saved in database fields, etc., but we might be able to move or rename certain subdirectories if that gets us to better compatibility across the installations.

On the patching - I guess it depends on how much value is being obtained from it. One option could be to have both the template and a default output in the source repos, so Tutor could still leverage that extension option. But it does add some complexity to the source template files that makes them harder to understand for people who aren't familiar with Jinja2 yet, and a little harder to reason about regardless.

@jmbowman
Copy link
Contributor

Reminder of this note regarding Kyle's prior art in revamping the edx-platform Dockerfile, in case any of the changes there are relevant here: openedx-unsupported/devstack#943 (comment)

@aht007 aht007 force-pushed the aht007/Ansible-to-Docker branch 2 times, most recently from 2186d96 to 380e973 Compare August 23, 2022 10:43
@aht007
Copy link
Contributor Author

aht007 commented Aug 23, 2022

@kdmccormick Can you please take a look at this again.
Some pre review notes:

  • We couldn't completely revamp the directory paths due to file paths being saved in database etc as Jeremy mentioned here , but we had already made arguments for base directory paths so that we could have cross platform compatibility as well as default paths set for our use case.
  • About the patching issue, we couldn't come to a conclusion as to what value this included complexity would be providing us. One option is to add patching in a subsequent future PR with the template and a default base output but that is still open to discussion and we didn't have a decision about this.
  • Should we still be using "service"-env(in this case insights-env) files to source environment variables. To me it looks extra and the variables should be directly defined in the Dockerfile. I have done some changes related to this where I defined settings variable in Dockerfile itself and removed from env file to avoid having two env files but I haven't completely eliminated the file and I would like your suggestion on this.
  • Additionally I have added some comments in the Dockerfile where I have made some changes that need attention from the reviewers specifically.

@aht007 aht007 marked this pull request as ready for review August 23, 2022 11:28
Dockerfile Outdated
COPY . ${INSIGHTS_CODE_DIR}/
RUN npm set progress=false && npm ci

COPY configuration_files/insights.yml /edx/etc/insights.yml

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we're using relative path with the copy command...
Can we also use it in other places as well just for the consistency?

like here

COPY /scripts/insights.sh ${INSIGHTS_APP_DIR}/insights.sh
COPY /configuration_files/insights.conf ${SUPERVISOR_AVAILABLE_DIR}/insights.conf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted.

@aht007
Copy link
Contributor Author

aht007 commented Aug 24, 2022

  • Should we still be using "service"-env(in this case insights-env) files to source environment variables. To me it looks extra and the variables should be directly defined in the Dockerfile. I have done some changes related to this where I defined settings variable in Dockerfile itself and removed from env file to avoid having two env files but I haven't completely eliminated the file and I would like your suggestion on this.

Regarding this comment, I just realized that we would still need an empty service-env file for the time being even if we go with the approach of leaving this file out as this pattern is being used for all services in the generic provisioning script i.e. provision_ida.sh in devstack. For now we can move the content out of the file and keep an empty placeholder file and once done with all the work we can go ahead and modify our provisioning scripts to stop using this file.

@kdmccormick
Copy link
Contributor

Should we still be using "service"-env(in this case insights-env) files to source environment variables. To me it looks extra and the variables should be directly defined in the Dockerfile. I have done some changes related to this where I defined settings variable in Dockerfile itself and removed from env file to avoid having two env files but I haven't completely eliminated the file and I would like your suggestion on this.

Regarding this comment, I just realized that we would still need an empty service-env file for the time being even if we go with the approach of leaving this file out as this pattern is being used for all services in the generic provisioning script i.e. provision_ida.sh in devstack. For now we can move the content out of the file and keep an empty placeholder file and once done with all the work we can go ahead and modify our provisioning scripts to stop using this file.

Yup, agreed with all of this. That is exactly what I did in the edx-platform Dockerfile: https://github.com/openedx/edx-platform/blob/c1009b56f6a0b3f8acfab815e3fc55a6a1820612/Dockerfile#L189-L195. Based on my manual testing, it worked fine with Devstack!

@aht007 aht007 requested a review from jmbowman August 25, 2022 11:35
@aht007 aht007 force-pushed the aht007/Ansible-to-Docker branch 2 times, most recently from c2c3a71 to af7ece5 Compare October 11, 2022 10:31
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
target: dev
repository: openedx/insights-dev
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question for reviewers: Should we push to edxops or openedx? It made sense to me that for openedx repos the images should also be in openedx docker hub org.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question @aht007 . Let me check with my team today and see what they think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aht007 After discussing, we'd prefer if the images are pushed to edxops, for now at least.

Our reasoning is that these images will be used first and foremost by 2U (for Devstack and edX.org), so it makes sense if 2U has full administrative access over the image repositories. In the future, if we find that there's a general use case for the images that the community would like to support (whether that be a basis for Tutor or as an alternative installation method), at that point it would make sense to host the images from the openedx organization.

I know there are currently some images hosted the openedx organization, but the community doesn't support them and I don't know of anyone who uses them, so it's a bit misleading to have them there. I think it is OK to stop pushing those images to openedx and start pushing them to edxops instead.

All that said, I don't want to slow down edX's containerization project, so let me know if that causes a lot of logistical issues, and we can figure something else out.

@aht007 aht007 requested a review from a team October 13, 2022 06:42
Copy link
Contributor

@schenedx schenedx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the change looks good to me except for where to push the docker image to.
Please make that change and I'll approve.

username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
target: prod
repository: openedx/insights
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @kdmccormick so let's make the change to edxops here as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will be commenting out the code that pushes the prod docker image for now because the repository edxops/insights has been used for devstack image for a long time and if someone from the community is using that repository then it would break their development environment. Prod images are also not a priority for now and I will discuss with Jeremy about them next. Nevertheless I will change the org to edxops for both images.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schenedx I have made the suggested change. Can you also take a look at this PR which introduces the usage of this image in devstack?

@ashultz0
Copy link
Contributor

some weirdness trying to get this all to work locally, see comment in openedx-unsupported/devstack#976

Copy link
Contributor

@ashultz0 ashultz0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test using the docker changes in openedx-unsupported/devstack#976 worked

Copy link
Contributor

@jmbowman jmbowman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor suggestion, otherwise looks good to me.

WORKDIR /edx/app/analytics_dashboard
COPY requirements /edx/app/analytics_dashboard/requirements
RUN python3.8 -m pip install -r requirements/production.txt
ARG COMMON_APP_DIR="/edx/app"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this context should be added as a comment in the file?

@aht007 aht007 merged commit f0ccb9f into master Nov 3, 2022
@aht007 aht007 deleted the aht007/Ansible-to-Docker branch November 3, 2022 11:09
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
target: dev
repository: edxops/insights-dev
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, can this be in the openedx Dockerhub?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aht007 aht007 restored the aht007/Ansible-to-Docker branch February 2, 2023 06:16
@aht007 aht007 mentioned this pull request Feb 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch to Ansible-free Docker image of insights
8 participants