Bootstrap via http #169

gsfr · 2016-02-23T22:52:23Z

This PR converts our bootstrapping logic to use http rather than manipulating mongo directly. As such, bootstrapping now requires the api to be up. Ping @ryansanford re flywheel infra.

The code still contains a reference to the bootstrap branch of the testdata repo. When ready to merge this, testdata should be merged and the branch reference should be removed.

To test this code, persistent/testdata needs to be removed as it is not currently a git repo.

kofalt · 2016-02-23T23:23:29Z

api/jobs.py

    _id = result.inserted_id

-    log.info('Running %s as job %s to process %s %s' % (gear.name, str(_id), input.container_type, input.container_id))
+    log.info('Enqueuing %s as job %s to process %s %s' % (gear.name, str(_id), input.container_type, input.container_id))


kofalt · 2016-02-23T23:25:43Z

Seems reasonable. Not reviewing run.sh.

Can you speak to the schema changes? Lengths changed, restrictions were removed?

gsfr · 2016-02-23T23:46:01Z

Can you speak to the schema changes? Lengths changed, restrictions were removed?

Our own bootstrap file did not pass schema validation. Current bootstrapping circumvents validation.

The input schemas are for input validation; the mongo schemas are sanity checks on our own code to prevent database corruption. Length and pattern restrictions don't constitute database corruption.

@rentzso: Could you please give a practical example of how the mongo schemas help? Can all the mongo schemas be reduced to the required fields, as that is all we actually care about in terms of internal consistency? Additional fields certainly don't affect us.

rentzso · 2016-02-24T00:16:47Z

@gsfr a practical example is permissions where you want to validate that they are an array and that they follow a certain schema (_id + site).
This is a basic requirement for the API code and it's better to fail a request than to continue running in a corrupted state.

I think that this problem actually happened and was blocked by the mongo schema a while ago. In project creation the permissions are copied from group roles,
and group roles, at the time generated by the bootstrap process, were missing the site.

additionalProperties is also important to protect the API from potential field misspelling in the code.

kofalt · 2016-02-24T13:42:08Z

SGTM 👍

rentzso · 2016-02-24T19:21:23Z

bin/bootstrap.py

+                    log.info('Packaging   %s' % dirpath)
+                    filepath = create_archive(dirpath, os.path.basename(dirpath), metadata, tempdir, filenames)
+                    filename = os.path.basename(filepath)
+                    metadata.get('acquisition', {}).get('files', [{}])[0]['name'] = filename


metadata could be undefined if there is no metadata.json.
this line doesn't seem also to update the metadata if it doesn't have the 'acquisition' field or the 'files' subfield.

rentzso · 2016-02-24T19:24:14Z

LGTM pending a comment

ryansanford · 2016-02-24T21:11:41Z

I'm testing now. some tweaks will be required for docker env.

gsfr · 2016-02-25T02:16:12Z

Added proper schema validation.

ryansanford · 2016-02-25T14:16:20Z

@gsfr Required changes for /docker folder are pushed.

I'd like to know your thoughts on the below commit. If you approve, we should do similar in bin/run.sh, or take this as an opportunity to refactor the bootstrapping out of bin/run.sh and have both that, and docker reference the same scripts.

029ec55

RE Download size:
I question the use of git rather than curl to pull down the test data, because the current method with git resulted in a 660 MiB download (using transport compression) rather than 210 MB with curl. For now, I didn't change the use of curl for the docker bootstrap.

- run paster asynchronously - improve mongo installation - pull out mongo version

requires #171

Now supports API enabled bootstrapping.

Don't reference branches from within source. This allows core and testdata repo to move without breaking not up-to-date versions of core. Also allows branches to be fully merged. Still a similar branch reference in bin/run.sh

gsfr · 2016-02-26T01:25:04Z

Thanks, @ryansanford. I added my final commit and rebased.

I don't like the commit hash in the code. My idea for run.sh is to use the tip of master by default and otherwise stay on whatever branch the user has manually checked out. I also don't like that we are currently implementing our own versioning in run.sh, based on curl and ETags.

An alternative approach could be to get the testdata branch that matches the current core branch, if it exists, else fall back to master.

The large data size is due to several iterations of zip files polluting the git history. I just updated the bootstrap branch on testdata to start fresh, without history. With that, a --single-branch clone comes in at 216MiB.

I guess that obliterated the commit hash you are referencing. 😧

ryansanford · 2016-02-26T12:48:14Z

@gsfr

Nice work on reducing the download size for bootstrap branch of scitran/testdata. Right now downloading "master" branch comes in at 444.69 MiB. Do we run into the same issue there as we merge changes in?

You did invalidate that commit hash, but that's ok. =)

The final commit hash that would get pushed to bootstrap branch of scitran/core is the one after scitran/testdata is merged to master. See comments here. https://github.com/scitran/core/blob/bootstrap/docker/bootstrap-data.sh#L22-L23

The benefit is that, moving forward, merging changes to "master" of scitran/testdata would never break users who aren't at the tip of scitran/core. When the user of scitran/core decides to pull or rebase off the tip of "master" they get the new hash for the compatible testdata, and off they go.

When breaking changes are made to scitran/testdata, this makes the lives of downstream scitran/core developers easier, because they are no longer reliant on "getting the memo". It also means people merging scitran/testdata are no longer concerned with others "receiving the memo" or even properly understanding it.

gsfr · 2016-02-26T17:28:02Z

I'm ready to drop the hammer on this today. Re testdata: I'm planning to blow away the history on master as well and temporarily keep the old stuff on a legacy branch.

@ryansanford: Standing by for your go-ahead on merging testdata.

ryansanford · 2016-02-26T18:35:05Z

@gsfr
No objections to merging scitran/testdata

Bootstrap via http

kofalt reviewed Feb 23, 2016
View reviewed changes

rentzso reviewed Feb 24, 2016
View reviewed changes

gsfr force-pushed the bootstrap branch from 01db4a6 to 5198b9c Compare February 25, 2016 02:14

Gunnar Schaefer and others added 9 commits February 25, 2016 16:07

Prep for http-based bootstrapping

13dfa40

- run paster asynchronously - improve mongo installation - pull out mongo version

Reword job logging

f142f6d

Improve mongo and paster async behavior

afef7b5

Remove some input vs mongo schema redundancy

e023dd7

Bootstrap users, groups, and data via http

ef00d4a

Add metadata schema validation

3fe74fc

requires #171

Docker bootstrap updates

079e9f9

Now supports API enabled bootstrapping.

Remove branch reference

5f0f5ec

Don't reference branches from within source. This allows core and testdata repo to move without breaking not up-to-date versions of core. Also allows branches to be fully merged. Still a similar branch reference in bin/run.sh

Finialize metadata validation

cf13e16

gsfr force-pushed the bootstrap branch from 029ec55 to cf13e16 Compare February 26, 2016 01:24

Get site id from api

22183e1

Update testdata label (NOT FINAL)

0964355

gsfr pushed a commit that referenced this pull request Feb 26, 2016

Merge pull request #169 from scitran/bootstrap

680930f

Bootstrap via http

gsfr merged commit 680930f into master Feb 26, 2016

gsfr deleted the bootstrap branch February 26, 2016 21:51

gsfr pushed a commit that referenced this pull request Feb 26, 2016

Merge pull request #169 from scitran/bootstrap

3032f23

Bootstrap via http

gsfr pushed a commit that referenced this pull request Feb 26, 2016

Merge pull request #169 from scitran/bootstrap

aaf2936

Bootstrap via http

gsfr pushed a commit that referenced this pull request Feb 26, 2016

Merge pull request #169 from scitran/bootstrap

e80226f

Bootstrap via http

nagem mentioned this pull request Jun 28, 2016

Engine Placer #357

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bootstrap via http #169

Bootstrap via http #169

Uh oh!

gsfr commented Feb 23, 2016

Uh oh!

kofalt Feb 23, 2016

Uh oh!

kofalt commented Feb 23, 2016

Uh oh!

gsfr commented Feb 23, 2016

Uh oh!

rentzso commented Feb 24, 2016

Uh oh!

kofalt commented Feb 24, 2016

Uh oh!

rentzso Feb 24, 2016

Uh oh!

rentzso commented Feb 24, 2016

Uh oh!

ryansanford commented Feb 24, 2016

Uh oh!

gsfr commented Feb 25, 2016

Uh oh!

ryansanford commented Feb 25, 2016

Uh oh!

gsfr commented Feb 26, 2016

Uh oh!

ryansanford commented Feb 26, 2016

Uh oh!

gsfr commented Feb 26, 2016

Uh oh!

ryansanford commented Feb 26, 2016

Uh oh!

Uh oh!

Bootstrap via http #169

Bootstrap via http #169

Uh oh!

Conversation

gsfr commented Feb 23, 2016

Uh oh!

kofalt Feb 23, 2016

Choose a reason for hiding this comment

Uh oh!

kofalt commented Feb 23, 2016

Uh oh!

gsfr commented Feb 23, 2016

Uh oh!

rentzso commented Feb 24, 2016

Uh oh!

kofalt commented Feb 24, 2016

Uh oh!

rentzso Feb 24, 2016

Choose a reason for hiding this comment

Uh oh!

rentzso commented Feb 24, 2016

Uh oh!

ryansanford commented Feb 24, 2016

Uh oh!

gsfr commented Feb 25, 2016

Uh oh!

ryansanford commented Feb 25, 2016

Uh oh!

gsfr commented Feb 26, 2016

Uh oh!

ryansanford commented Feb 26, 2016

Uh oh!

gsfr commented Feb 26, 2016

Uh oh!

ryansanford commented Feb 26, 2016

Uh oh!

Uh oh!