Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data init #205

Merged
merged 17 commits into from
Apr 24, 2024
Merged

Data init #205

merged 17 commits into from
Apr 24, 2024

Conversation

manning-ncsa
Copy link
Collaborator

Fixes #203

Description of the Change

The data initialization routine has been refactored heavily, simplifying and further parameterizing locations of data files required by Blast. The integrity of all files is verified using md5sums, allowing the detection of the slightest change to a file or a missing file. The S3-compatible CLI client from MinIO, mc, was installed in the application image so that mc mirror can be used to pull data in a highly bandwidth and time efficient manner, downloading only the files that are invalid or missing. A FORCE_INITIALIZATION env var was added as a convenience to clear the initialization lock files that can sometimes be left on the persistent data volume if initialization fails, typically during code development. A data source page was added to the acknowledgements section of the docs.

@manning-ncsa
Copy link
Collaborator Author

In the process of fixing regressions caught by running the unit tests (Compose "ci" profile), I discovered more problems related the way Compose invludes files specified in the env_file list. Basically, even if you run the tests like so:

$ COMPOSE_ARGS="--project-name blast -f docker/docker-compose.yml --env-file env/.env.default --env-file env/.env.ci --profile ci"
docker compose $COMPOSE_ARGS up --build 

Compose will still load the env vars in env/.env.dev because they are included in the service definition. However, if you try to omit .env.dev from the env_file list, then it will not allow you to include it on the CLI with --env-file env/.env.dev! So, the redundant Compose services combined with the execution command shown above appear to ensure that only the desired environment files are loaded.

Passes the tests as of c5a527f.

@manning-ncsa manning-ncsa marked this pull request as ready for review April 24, 2024 18:36
@manning-ncsa manning-ncsa merged commit d9a989d into main Apr 24, 2024
2 of 4 checks passed
@manning-ncsa manning-ncsa deleted the data-init branch April 24, 2024 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Verify file integrity during data initialization
1 participant