-
Notifications
You must be signed in to change notification settings - Fork 18
Bootstrap via http #169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bootstrap via http #169
Conversation
_id = result.inserted_id | ||
|
||
log.info('Running %s as job %s to process %s %s' % (gear.name, str(_id), input.container_type, input.container_id)) | ||
log.info('Enqueuing %s as job %s to process %s %s' % (gear.name, str(_id), input.container_type, input.container_id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚡
Seems reasonable. Not reviewing Can you speak to the schema changes? Lengths changed, restrictions were removed? |
Our own bootstrap file did not pass schema validation. Current bootstrapping circumvents validation. The input schemas are for input validation; the mongo schemas are sanity checks on our own code to prevent database corruption. Length and pattern restrictions don't constitute database corruption. @rentzso: Could you please give a practical example of how the mongo schemas help? Can all the mongo schemas be reduced to the required fields, as that is all we actually care about in terms of internal consistency? Additional fields certainly don't affect us. |
@gsfr a practical example is permissions where you want to validate that they are an array and that they follow a certain schema (_id + site). I think that this problem actually happened and was blocked by the mongo schema a while ago. In project creation the permissions are copied from group roles, additionalProperties is also important to protect the API from potential field misspelling in the code. |
SGTM 👍 |
bin/bootstrap.py
Outdated
log.info('Packaging %s' % dirpath) | ||
filepath = create_archive(dirpath, os.path.basename(dirpath), metadata, tempdir, filenames) | ||
filename = os.path.basename(filepath) | ||
metadata.get('acquisition', {}).get('files', [{}])[0]['name'] = filename |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metadata could be undefined if there is no metadata.json.
this line doesn't seem also to update the metadata if it doesn't have the 'acquisition' field or the 'files' subfield.
LGTM pending a comment |
I'm testing now. some tweaks will be required for docker env. |
Added proper schema validation. |
@gsfr Required changes for /docker folder are pushed. I'd like to know your thoughts on the below commit. If you approve, we should do similar in bin/run.sh, or take this as an opportunity to refactor the bootstrapping out of bin/run.sh and have both that, and docker reference the same scripts. RE Download size: |
- run paster asynchronously - improve mongo installation - pull out mongo version
requires #171
Now supports API enabled bootstrapping.
Don't reference branches from within source. This allows core and testdata repo to move without breaking not up-to-date versions of core. Also allows branches to be fully merged. Still a similar branch reference in bin/run.sh
Thanks, @ryansanford. I added my final commit and rebased. I don't like the commit hash in the code. My idea for An alternative approach could be to get the testdata branch that matches the current core branch, if it exists, else fall back to master. The large data size is due to several iterations of zip files polluting the git history. I just updated the bootstrap branch on testdata to start fresh, without history. With that, a I guess that obliterated the commit hash you are referencing. 😧 |
Nice work on reducing the download size for bootstrap branch of scitran/testdata. Right now downloading "master" branch comes in at 444.69 MiB. Do we run into the same issue there as we merge changes in? You did invalidate that commit hash, but that's ok. =) The final commit hash that would get pushed to bootstrap branch of scitran/core is the one after scitran/testdata is merged to master. See comments here. https://github.com/scitran/core/blob/bootstrap/docker/bootstrap-data.sh#L22-L23 The benefit is that, moving forward, merging changes to "master" of scitran/testdata would never break users who aren't at the tip of scitran/core. When the user of scitran/core decides to pull or rebase off the tip of "master" they get the new hash for the compatible testdata, and off they go. When breaking changes are made to scitran/testdata, this makes the lives of downstream scitran/core developers easier, because they are no longer reliant on "getting the memo". It also means people merging scitran/testdata are no longer concerned with others "receiving the memo" or even properly understanding it. |
I'm ready to drop the hammer on this today. Re testdata: I'm planning to blow away the history on master as well and temporarily keep the old stuff on a legacy branch. @ryansanford: Standing by for your go-ahead on merging testdata. |
@gsfr |
This PR converts our bootstrapping logic to use http rather than manipulating mongo directly. As such, bootstrapping now requires the api to be up. Ping @ryansanford re flywheel infra.
The code still contains a reference to the bootstrap branch of the testdata repo. When ready to merge this, testdata should be merged and the branch reference should be removed.
To test this code,
persistent/testdata
needs to be removed as it is not currently a git repo.