Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker: extract heredocs from multi-process init and update docker-compose files #2298

Merged
merged 3 commits into from
Aug 27, 2018

Conversation

baip
Copy link
Contributor

@baip baip commented May 25, 2018

Three aspects:

  • No regex matching for location in nginx configurations

  • Extract heredocs from multi-process init script

  • Update docker-compose files to version 3 format

I cleaned up the setup_env script slightly, and also renamed develop.yml to docker-compose.yml in test and single-process to be consistent with what's in multi-process.

Copy link
Collaborator

@dsander dsander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract heredocs from multi-process init script

Very nice!

I cleaned up the setup_env script slightly, and also renamed develop.yml to docker-compose.yml in test and single-process to be consistent with what's in multi-process.

I like the idea to unify the file names. What do you think about inverting the change you made? I like the develop.yml files for compose configurations that build the image from the current source and docker-compose.yml that can be used without first building the image.

@@ -28,11 +28,17 @@ if [[ -n "${MYSQL_PORT_3306_TCP_ADDR}" || ("${DATABASE_ADAPTER}" == "mysql2" &&
DATABASE_HOST=${DATABASE_HOST:-${MYSQL_PORT_3306_TCP_ADDR}}
DATABASE_PORT=${DATABASE_PORT:-${MYSQL_PORT_3306_TCP_PORT:-3306}}
DATABASE_ENCODING=${DATABASE_ENCODING:-utf8mb4}
DATABASE_NAME=${DATABASE_NAME:-${MYSQL_DATABASE}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those new variable names needed? We could also put DATABASE_NAME and mysql.env and postgres.env couldn't we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mysql and postgres images use these variables to setup their database. In the old docker-compose.yml files, the same data were entered twice for the different containers (e.g., under MYSQL_PASSWORD and HUGINN_DATABASE_PASSWORD). Yes, we could do the same in mysql.env, but I feel that now with the env_file option, it makes more sense to keep it DRY.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not realize it was part of the old docker-compose file, I think we can get rid of those env variables in the compose file. For postgres I know that the image works without giving it a database name. In case of mysql the docs say

This variable is optional and allows you to specify the name of a database to be created on image startup.

I think we don't want this. The container might create the database with the wrong encoding/collation and we normally handle the database creation. Can you test if the compose files work for mysql without passing MYSQL_DATABASE?

EOF

# Default to the environment variable values set in .env.example
IFS="="
sed -n -r -e 's/^#?([A-Za-z0-9_]+ *=.*)/\1/p' /app/.env.example | \
sed -n -r -e 's/^ *#?([A-Za-z0-9_]+=.*)/\1/p' /tmp/.env.example | \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change needed? afaik we don't have lines starting with a space in .env.example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, in someone...'s local version, they may decide to add lines with spaces at the beginning (e.g., '[space][space]ABC=abc'). This is correct; you can source it in bash for example. On the other hand, 'ABC[space]=abc' is incorrect. This change basically reflects the correct parsing behavior.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm the dotenv gem we use to load the .env file in non docker setup does not allow leading spaces in the version we are using currently. I'd like the two parsing methods to behave the same, so i'd say either we keep disallowing the leading space or bump the dotenv version.

On the other hand, 'ABC[space]=abc' is incorrect

Makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to mention (because the comment is hidden now) that the regex was modified to not allow leading white space.

while read var value ; do
eval "echo \"$var=\${HUGINN_$var:-\${$var:-\$value}}\""
value="$(sed -e 's/^["'\'']//' -e 's/["'\'']$//' <<<"$value")"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what this line does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The right-hand side sometimes needs to be quoted. But then if value is from .env.example (vs. being passed in as environment variables), it may already be quoted. This line removes the leading/trailing quotes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I think that had that problem once or twice before 😄

sed -ri 's/^stdout_path.*$//' config/unicorn.rb
sed -ri 's/^pid.*$//' config/unicorn.rb
sed -r config/unicorn.rb.example \
-e 's/^ *listen .+/listen ENV["DOMAIN"] || "127.0.0.1:3000"/' \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that should stay PORT, DOMAIN is supposed to be the external URL of the Huginn server like https://youdomain.com and is used in the application for redirects. I don't think there is a use case in which the unicorn server inside docker should listen do anything but localhost.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake: I should have written listen $IP:$PORT (to achieve similar behavior as in the multi-process image). Localhost may not be sufficient for all use cases. For example, one can add an nginx reverse proxy to one of the docker-compose.yml files. The inter-container communication would not be on the loopback interface.


services:
mysqldata:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this change, if someone is currently using this compose file and updates to this version mysql would start with an empty database.

Are there benefits in using volumes instead of "volume containers"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose people will have to export data -- is there a release schedule to pace this kind of changes? I imagine it would make sense to eventually abandon the rather hacky and legacy data-only container setup. See here for some discussions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is we don't have a release cycle, we try to not introduce breaking changes as good as we can because master is always supposed to work the way it did before.

@baip
Copy link
Contributor Author

baip commented May 29, 2018

I like the idea to unify the file names. What do you think about inverting the change you made? I like the develop.yml files for compose configurations that build the image from the current source and docker-compose.yml that can be used without first building the image.

It makes sense. I inverted the name change. Compared to before, only multi-process/docker-compose.yml was renamed to multi-process/develop.yml. (But we don't have the mysql.yml and postgresql.yml parity any more :P). Please take a look at the other change. If they look good, I can squash the last two commits.

@dsander
Copy link
Collaborator

dsander commented May 29, 2018

Please take a look at the other change. If they look good, I can squash the last two commits.

Yep they look good.

@baip
Copy link
Contributor Author

baip commented May 30, 2018

I think we don't want this. The container might create the database with the wrong encoding/collation and we normally handle the database creation. Can you test if the compose files work for mysql without passing MYSQL_DATABASE?

Yes, I believe both mysql and postgres images can run without needing to create a database themselves. The user and password variables still need to be repeated though, and since database names were included previously in docker-compose.yml, they would be created by database images instead of huginn. Also, docker-compose v3 apparently removed the volumes_from option, so it looks like this effort is going to end mostly in vain...

Anyway, I've split the last commit to isolate the breaking changes. Hopefully commits up to 5e1a1ce can be merged without too much concern (I've kept v3 for test/develop.yml as I don't think the backward compatibility issue matters much for this image). I'm fine with whatever you want to do with aa51ebe.

@dsander
Copy link
Collaborator

dsander commented Jun 3, 2018

since database names were included previously in docker-compose.yml, they would be created by database images instead of huginn

That is want I meant by "I think we don't want this", we do have a migration that converts the database to the right collation and encoding for mysql, but if we would configure the database containers to not create the db we could have rails create the database with the right collation/encoding.

Also, docker-compose v3 apparently removed the volumes_from option, so it looks like this effort is going to end mostly in vain...

I see, so they really don't want to support that feature anymore. If we want to migrate to v3 it has to be a breaking change in the compose files.

Anyway, I've split the last commit to isolate the breaking changes.

Thanks, what do you think about splitting the changes into two PRs? I really like the cleanup of the multi-process init script, but am not sure about the compose-file version change. It's nice to use the most recent version and features, but I am not sure if it's worth potentially breaking setups.

@baip
Copy link
Contributor Author

baip commented Jun 4, 2018

Done. I'll create another PR once this one is merged.

Copy link
Collaborator

@dsander dsander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it took me so long to get back to you and thanks for splitting up the changes.

while read var value ; do
eval "echo \"$var=\${HUGINN_$var:-\${$var:-\$value}}\""
value="$(sed -e 's/^["'\'']//' -e 's/["'\'']$//' <<<"$value")"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is causing problems on my existing setup, I have TIMEZONE="Berlin" in my .env file and I assume it is double quoted because rails complains about ArgumentError: Invalid Timezone: "Berlin" which should be Berlin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting --- I just tested again with both what's in .env.example: TIMEZONE="Pacific Time (US & Canada)", and TIMEZONE="Berlin", and they all worked fine. What steps should I take to reproduce the problem?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm that is odd, I used that .env file and docker run -it --env-file=~/code/ruby/huginn-serverspec/spec/single_process/.env huginnbuilder/huginn:pr-2298 fails when it's trying to migrate the database.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I tested by using the actual .env file. It looks like when .env is instead provided via docker --env-file, whatever is on the right-hand side of = (including quotes ") will become the value of the left-hand variable. So even though the current huginn/huginn image works fine, if you start a shell in the container and check: echo $TIMEZONE, you will see "Berlin", which I think is incorrect. But then this value is saved to .env as TIMEZONE="Berlin", and dotenv prefers .env over environment variables, so things are fine. Now that scripts/setup_env quote environment variables, it becomes TIMEZONE='"Berlin"' in .env, which causes what you saw.

scripts/setup_env only "dequotes" .env.example, not environment variables. I think this would be the desirable behavior. For example, if you do docker --env TIMEZONE="Berlin" and echo $TIMEZONE inside the container, you will get Berlin (without quotes). What do you think? If using .env via --env-file is documented somewhere as a sanctioned way of providing environment variables, then I can revert this change as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So even though the current huginn/huginn image works fine, if you start a shell in the container and check: echo $TIMEZONE, you will see "Berlin", which I think is incorrect.

Not sure if it is incorrect, there are cases in which one needs to escape the the string for dotenv to function properly. The docker behavior seems consistent to me if you pass a environment variable in a way that the shell does not unquotes the variable you get the same result -e TIMEZONE='"BERLIN"'

But then this value is saved to .env as TIMEZONE="Berlin", and dotenv prefers .env over environment variables, so things are fine.

I think dotenv only loads the .env file, but every variable defined there overrides the actual environment variable.

scripts/setup_env only "dequotes" .env.example, not environment variables. I think this would be the desirable behavior.

I don't think that is needed because dotenv dequotes the variables itself when loading the .env file. It is kind of messy, for example if your password contains s $ you need to pass it quoted to dotenv, i.e. it has to be quoted in the .env file we write in setup_env.

What we could try is to remove the dotenv usage in the docker containers, but to stay compatible we would need to do the "dequoting" ourselves which could cause other errors.

What would be useful to do here is to quote all real environment variables that are not quoted yet to ensure dotenv does not choke on $ characters (we had a few bug reports about that because it is a valid environment variable without quote and only causes problems due to the variable substitution dotenv attempts).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes -- if we can give up the option of supplying .env via --env-file (is this actually supported, in the sense of being a documented approach?).

We can not remove support for that it is documented and in my opinion the easiest way to configure the container.
It is one of the two ways docker supports passing environment variables and we should keep supporting it.

I meant if someone tries docker -e VAR='"abc"', they'd expect $VAR to be "abc" inside the container, but $VAR will be abc instead.

Like I said before that is nothing we do in setup_env, it's the shell dequoting the variable. docker run -it -e VAR="abc" busybox and $VAR will be set to abc.

To be honest the only thing I find a bit confusing is that --env-file passes VAR="abc" as "abc" inside the container because it is different from the shell and dotenv (which has been a thing long before --env-file was introduced), but that's nothing we can change 😄

Of course, in the case of --env-file, one would use a proper Docker env file with the correct Docker syntax (i.e., no extraneous quoting).

Which we can not guarantee because Huginn did the dequoting since the first huginn/huginn image was created. If we would start from scratch I would probably only support "bare" (no quoting/dequoting) --env and --env-file.

In short we can not remove support for --env, --env-file and the dequoting because it would probably break for every current docker user.
If it does not add anything that isn't already possible with --env-file I would rather not support volume mounting the .env file.

With the "quote if necessary" changes you made we now support every combination of "bare" and quoted --env-files with the one exception that VAR="abc" is dequoted. Since it has always been like this and nobody ever reported a bug about it I think it's fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I understand the need to keep the support if it has been documented.

However, what you said

Like I said before that is nothing we do in setup_env, it's the shell dequoting the variable. docker run -it -e VAR="abc" busybox and $VAR will be set to abc.

is incorrect. I now realized that probably you have misread my example:

I meant if someone tries docker -e VAR='"abc"', they'd expect $VAR to be "abc" inside the container, but $VAR will be abc instead.

My example was a Bash-capable user (who understands shell quoting) trying to pass "abc" by enclosing it inside single quotes, thus '"abc"'. Per your busybox example, with docker -e VAR='"abc"' busybox, $VAR will be set to "abc", but with docker -e VAR='"abc"' huginn/huginn, $VAR will be set to abc. And this is a collateral damage when we needed to do the shell hacking in order to correctly handle variables defined in .env per dotenv syntax, but passed through --env-file Docker option. And that is why I so passionately advocated against it. And there I rest my case :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I over read the double quotes in your example. I know of this one issue we have with the "auto dequoting":

With the "quote if necessary" changes you made we now support every combination of "bare" and quoted --env-files with the one exception that VAR="abc" is dequoted. Since it has always been like this and nobody ever reported a bug about it I think it's fine.

Especially seasoned bash users will probably figure that out on their own (maybe that's the reason we never got an issue about it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I'm wondering if there is anything else you need for this PR.

I don't have anything else to add to the discussion. If passing the dotenv .env file as a Docker variable listing has been documented and supported, I understood the desire to continue supporting it. I had just hoped that this could be switched to the -v volume mount, because in an ideal world we certainly would like to have minimal surprises and not have to dig into the code (especially Bash) to solve surprises. I think the "abc" use cases are probably rare enough that no Huginn users needed it yet. Even if they did encounter the issue, most users would probably try to change the variable than to file a bug report.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had just hoped that this could be switched to the -v volume mount, because in an ideal world we certainly would like to have minimal surprises and not have to dig into the code (especially Bash) to solve surprises.

Using the volume definitely is "purer". I found another use case for it, the test docker image could be used with it without having it override the local .env file (which happened to me a few times 😄 ).

I'll give the PR a once over on the weekend but think we should be good.

@baip baip force-pushed the master branch 3 times, most recently from bde48ff to 69ed552 Compare July 1, 2018 17:24
@dsander
Copy link
Collaborator

dsander commented Jul 30, 2018

The huginn and single-process images seem to work well now. I found issues with the test image, the docker-compose file version bump changed the container names and the usage of existing .env files broke the postgres test. (I think mysql was broken before, we don't need to fix that here).

We could display warning and fail hard here if a .env file exists.

@baip
Copy link
Contributor Author

baip commented Aug 3, 2018

An existing .env file would be appended to the one generated by setup_env here. Do you have an example .env that breaks the postgres test?

I think I changed the container names because the old name (e.g., huginn_test_mysql) felt unnecessarily verbose, and if you are outside the directory containing the develop.yml file, you have to refer to these running containers as something like huginn_huginn_test_mysql. While I get it that we want to preserve everything user-facing so that they don't need to tweak anything after a Huginn upgrade, do we allow some slack here and say that anyone who wants to run this test image is considered developers and expected to be able do minor tweaking / debugging? Is this image used by CI tools?

@dsander
Copy link
Collaborator

dsander commented Aug 4, 2018

Do you have an example .env that breaks the postgres test?

Pretty much any .env that does not match the database connection required for the test image. I have my local .env connect to a postgresql database and ran

cd docker/test
docker-compose -f develop.yml run huginn_test_postgres
docker-compose -f develop.yml down

Changing the name of the test images is totally fine with me, I just meant to say the names also need to be updated in the test readme 😄

An existing .env file would be appended to the one generated by setup_env here.

I see, I guess when we inverted the logic the test images should work. I don't have a preference, inverting the append logic or failing with a warning prompting the user to move the local .env our of the way is fine with me.

@baip
Copy link
Contributor Author

baip commented Aug 21, 2018

Please pardon me for my tardy response. For the test image, I have now added a check to fail the container if /app/.env exists and modified its README.md. I felt that it's a legitimate use case that someone would want to supply variables by mounting /app/.env, so inverting the logic (i.e., letting .env.example potentially overriding .env) may be un-intuitive. People might also use the test image this way without relying on develop.yml that mounts the entire /app directory, so maybe instead of failing hard, we can also print a warning message. What do you think of this alternative?

Copy link
Collaborator

@dsander dsander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so maybe instead of failing hard, we can also print a warning message.

Not sure that I follow, my idea was to prevent the user from overriding the local (non docker) development setup configuration (which I do every time when I run the test image because I forget that it modifies the .env file 😄 ). Would you print a warning and then not modify the .env if it's present?

When I try to run the containers now they just do nothing :/

$ docker run -it huginnbuilder/huginn:pr-2298
$

@@ -1,6 +1,8 @@
#!/bin/bash
set -e

[ -e '/app/.env' ] && { echo 'This image does not support mounting /app/.env.' >&2; exit 1; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about something like Existing .env detected, please rename it temporarily before running the test container to prevent data loss.?

@baip
Copy link
Contributor Author

baip commented Aug 23, 2018

Oh, I noticed that as well and have actually changed it so that .env (and .env.example) are no longer modified in place. So now /app/.env.example is first copied to /tmp/.env.example, and /app/.env is then appended to the /tmp/.env.example. Finally /tmp/.env is generated from this file and environment variables. There is no danger of overriding the local development setup configuration.

I've changed the warning message as suggested and fixed a minor problem when I rebased using my local branch instead of this PR.

@dsander
Copy link
Collaborator

dsander commented Aug 24, 2018

Nice, testing the images went well. Give me the weekend to read through the changes again.

Finally /tmp/.env is generated from this file and environment variables. There is no danger of overriding the local development setup configuration.

But /tmp/.env is still copied over /app/.env here, which is fine by me with the warning.

@baip
Copy link
Contributor Author

baip commented Aug 24, 2018

Right, I remember something complaining about not finding .env, but /tmp/.env is only copied over when /app/.env does not already exist, so there still shouldn't be any danger of overwriting existing files.

@dsander dsander merged commit 7c90b83 into huginn:master Aug 27, 2018
@dsander
Copy link
Collaborator

dsander commented Aug 27, 2018

Thank you @baip! Attempts to get docker users to tests PRs have been not very successful in the past, so I'll just hope we found all the breaking changes 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants