Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic provenance tracking #28

Merged
merged 4 commits into from
May 10, 2016
Merged

Add basic provenance tracking #28

merged 4 commits into from
May 10, 2016

Conversation

tavinathanson
Copy link
Member

Since I'm regenerating a shared cache right now, I figured I'd knock out some basic provenance tracking this time around.

What this does:

  • Records the versions of a few key modules alongside each cached file.
  • If those versions don't match up with the environment when loading from the cache, simply prints the diff for each patient.

Future work:

  • Allow multiple cached versions to co-exist, keyed by the provenance. Currently, this is just an alert rather than a useful tool.
  • Somehow track NetMHC* versions.
  • Use logging rather than print statements and have a way to suppress and/or merge the many warnings.

@tavinathanson
Copy link
Member Author

@armish I'm writing up a small test right now, and decided I should probably disable the warnings by default.

provenance_set_previous = provenance_set_from_df(df_provenance_previous)
new_provenance_diff = provenance_set.difference(provenance_set_previous)
old_provenance_diff = provenance_set_previous.difference(provenance_set)
if new_provenance_diff != set():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about something like:

def check_diff(diff):
  if diff != set():
    print("Current provenance for patient %s: %s" % (
      patient_id, provenance_str_from_set(diff)))

check_diff(new_provenance_diff)
check_diff(old_provenance_diff)

to reduce the redundancy here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reduced the redundancy somewhat differently, let me know if it looks good?

@armish
Copy link
Member

armish commented May 10, 2016

nice one! 👏

Just a few minor questions/comments for you. Feel free to merge it when/if you address/ignore them.

@tavinathanson
Copy link
Member Author

Updates:

  • Added a test.
  • Added a check_provenance flag that defaults to False.

@tavinathanson
Copy link
Member Author

@armish Good to merge on green?

@tavinathanson
Copy link
Member Author

Fixes #15

@armish
Copy link
Member

armish commented May 10, 2016

Go for it!

BTW: I think this looks much better with df -> dict switch. Thanks for doing that.

@coveralls
Copy link

coveralls commented May 10, 2016

Coverage Status

Coverage increased (+2.8%) to 59.267% when pulling 526030f on provenance into cb01071 on master.

@tavinathanson tavinathanson merged commit 23278c6 into master May 10, 2016
@tavinathanson tavinathanson deleted the provenance branch May 10, 2016 21:55
This was referenced May 10, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants