Enable feeding the server configuration from environment variables #765

lfoppiano · 2021-05-30T23:42:19Z

Following up #762. This PR enables the substitution of certain configuration item from environment variables.
In future it can be reused as long as the configuration is loaded from dropwizard.

coveralls · 2021-05-30T23:54:36Z

Coverage increased (+0.003%) to 39.077% when pulling 2c0e061 on feature/enable-env-variables into 5486030 on master.

kermitt2 · 2021-05-31T01:53:32Z

In the new yaml configuration, there is no more specific server config file, just a unique one.

While this mechanism could work for the new yaml config, I am trying to have something more generic using Java reflection which does not require changing the yaml file and follow the same format as for the current property file (e.g. --env GROBID__NB_THREADS=10 ).
So this PR will probably not be merged, because the yaml-config branch is almost ready.

kermitt2 · 2021-05-31T04:31:26Z

I am trying to have something more generic using Java reflection which does not require changing the yaml file and follow the same format as for the current property file (e.g. --env GROBID__NB_THREADS=10).

Ok I did not manage to find a generic solution. I think nothing reasonable can work with list. It would be probably better to forget about overriding config. parameter with environment variables? I think this can work only for trivial configuration.

lfoppiano · 2021-05-31T06:44:35Z

No problem.

For the YAML de-serialisation perhaps using Jackson would solve some of the problems in the mapping. If it works as the JSON/XML mapper it should be rather simple to implement even with complex nested YAML config files...

kermitt2 · 2021-05-31T07:03:42Z

For the YAML de-serialisation perhaps using Jackson would solve some of the problems in the mapping. If it works as the JSON/XML mapper it should be rather simple to implement even with complex nested YAML config files...

The problem is the mapping of environment variable (which are kind of linear serialization) to override the configuration class/object, which is nested with lists.

lfoppiano · 2021-06-07T07:02:13Z

The solution adopted from dropwizard (we could borrow the idea.. and the code) is to specify the variable directly in the configuration, so that the complexity is left out, without losing flexibility.

kermitt2 · 2021-06-11T06:33:34Z

Luca, how do you see the process with the new yaml config? It would be only user-driven? I don't think we can hack the yaml file for all the possible parameter values, because it has potentially much more than hundred of possible parameter keys now (each model has its own set of nested parameters).

lfoppiano · 2021-06-11T07:25:56Z

Luca, how do you see the process with the new yaml config? It would be only user-driven? I don't think we can hack the yaml file for all the possible parameter values, because it has potentially much more than hundred of possible parameter keys now (each model has its own set of nested parameters).

Yes, let's keep it simple. User can always define their own configuration and decide which variables can be read from the env.

lfoppiano · 2021-08-04T06:18:50Z

I've updated this PR. Now the users can select which information can be read from environment variables by using

${VARIABLE_NAME:- $$default_value$$}

E.g. for the grobid-home:

${GROBID__GROBID_HOME:- "grobid-home"}

To keep it simple we can just leave the user to decide what to replace (or we can add some variables that can be useful e.g. for the docker container"

kermitt2 · 2021-08-05T23:54:26Z

When I tries to write the documentation for this, I realized that there is no clear way to get it working with docker, for example:

docker run -t --rm --init -p 8080:8070 -p 8081:8071 \
    --env GROBID_MODEL_PRELOAD=true \
    --env GROBID_GROBID_HOME="/some/where/grobid-home" \
    lfoppiano/grobid:0.7.0

... because the grobid.yaml in the image does not have these environment variable definitions. Of course if we need to modify the configuration file, we are in the case where the configuration file that is mounted when running the container, so we don't need environment variables.

The user would need not only its own definitions in the configuration file, but also rebuild a new image with the modified configuration file.

lfoppiano · 2021-08-06T00:52:02Z

Yes indeed. My idea would be that once you've set up your configuration file with the parameters, then you can easily run the docker by changing the parameters at the command line, instead of editing the file.
Maybe others that are using more the docker architecture could provide more useful use cases? @de-code, what do you think?

de-code · 2021-08-06T08:11:35Z

Yes, for me the main use-case would also look like that. Mainly with docker / container (but not strictly).
There would be a default configuration and environment variables make it easy to override some of those values (including setting previously unset values). This is to make deployment easier but could also make experimentation easier in the same way.

This to potentially consider:

Allow overriding only specific configuration options rather than providing everything (this could also be possible by merging multiple configuration files)
Running a single docker command setting environment variables may be easier to copy and paste by users as an example, than a multi-step example with a separate config file
Deploying container in Kubernetes would require be easier with environment variables than a more bespoke deployment config supporting the configuration file
Environment variables can almost always be set

(see also #480 - I had my previous implementation based on Airflow's config)

kermitt2 · 2021-08-06T19:35:30Z

Sorry if I was not clear, the problem I raise is the complexity of the solution:

the idea it to have a mechanism to easily run the docker by changing the parameters at the command line, instead of editing the file,
but to realize this solution we have to

edit the config file (anyway)
rebuild an image (which takes ages for the complete one with DL)
then we can run the docker by changing the parameters at the command line

The consequences are that the solution can never be compatible with the images on dockerhub and it requires some heavy preliminary work that finally makes it look more complicated that just specifying a config file as parameters at the command line.

I initially started to document it in the Docker section, but this might need to go to the "developer notes" because the existing docker images cannot support this.

de-code · 2021-08-08T06:21:34Z

I guess you could include a default config file with all of the placeholders in Docker Hub?
Sub-projects would need to do the same, or anyone with a different default config.
But it does indeed look like a bit of effort to maintain all of those placeholders (and prone to copy and paste errors).

That is why Airflow' config opted for a more general pattern.
You mentioned that it doesn't work well with lists, which is true. One could try to make it work maybe. One challenge would be to select the right item, not based on the index (e.g. by model name).
But maybe it will be easier to revise the config to avoid lists for things that could avoid it.
For example I would maybe want to override the model parameters which are currently a list, but could also be a map (with the name as the key, using underscore instead of hyphen).

Perhaps a bit similar (although not environment variables), Helm install supports --set (and variations) to override values in the yaml file. It also has some limitations but works for many cases.

lfoppiano · 2021-08-10T02:42:42Z

Sorry if I was not clear, the problem I raise is the complexity of the solution:
[...]
Yes, the solution is supposed to be for advanced users not for simple users of the docker version... But IMHO I don't think a rebuild of the docker image is necessary.

I guess you could include a default config file with all of the placeholders in Docker Hub?

We can include some placeholder by default so that certain basic configurations can be modified from the docker image without supplying an additional config file. E.g. grobid home, number of threads, lazy/eager load of models... etc...

Personally, for more advance usege, I would modify the config file and add placeholders where I need to, supply such values to the docker image and I will have the possibility to run a different configuration without modifying the configuration file.

enable reading env variables as optional source for configuration

2c0e061

lfoppiano mentioned this pull request May 30, 2021

Support loading models eagerly #762

Closed

lfoppiano linked an issue May 30, 2021 that may be closed by this pull request

Support loading models eagerly #762

Closed

lfoppiano added 2 commits August 4, 2021 15:15

Merge branch 'master' into feature/enable-env-variables

af2b8b8

read grobid-home and modelPreview from env variables

55ca5ee

lfoppiano added the enhancement label Aug 4, 2021

kermitt2 added this to the 0.7.1 milestone Aug 5, 2021

kermitt2 modified the milestones: 0.7.1, 0.7.2 Aug 6, 2021

kermitt2 removed this from the 0.7.2 milestone Oct 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable feeding the server configuration from environment variables #765

Enable feeding the server configuration from environment variables #765

lfoppiano commented May 30, 2021

coveralls commented May 30, 2021

kermitt2 commented May 31, 2021

kermitt2 commented May 31, 2021 •

edited

lfoppiano commented May 31, 2021

kermitt2 commented May 31, 2021

lfoppiano commented Jun 7, 2021

kermitt2 commented Jun 11, 2021

lfoppiano commented Jun 11, 2021

lfoppiano commented Aug 4, 2021

kermitt2 commented Aug 5, 2021 •

edited

lfoppiano commented Aug 6, 2021

de-code commented Aug 6, 2021

kermitt2 commented Aug 6, 2021

de-code commented Aug 8, 2021

lfoppiano commented Aug 10, 2021

Enable feeding the server configuration from environment variables #765

Are you sure you want to change the base?

Enable feeding the server configuration from environment variables #765

Conversation

lfoppiano commented May 30, 2021

coveralls commented May 30, 2021

kermitt2 commented May 31, 2021

kermitt2 commented May 31, 2021 • edited

lfoppiano commented May 31, 2021

kermitt2 commented May 31, 2021

lfoppiano commented Jun 7, 2021

kermitt2 commented Jun 11, 2021

lfoppiano commented Jun 11, 2021

lfoppiano commented Aug 4, 2021

kermitt2 commented Aug 5, 2021 • edited

lfoppiano commented Aug 6, 2021

de-code commented Aug 6, 2021

kermitt2 commented Aug 6, 2021

de-code commented Aug 8, 2021

lfoppiano commented Aug 10, 2021

kermitt2 commented May 31, 2021 •

edited

kermitt2 commented Aug 5, 2021 •

edited