Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve augur export #165

Merged
merged 2 commits into from
Jul 11, 2018
Merged

improve augur export #165

merged 2 commits into from
Jul 11, 2018

Conversation

jameshadfield
Copy link
Member

@jameshadfield jameshadfield commented Jul 11, 2018

This PR rewrites most of augur/export.py, incorporating ideas @rneher and I discussed in Basel. I think it's easier to understand and, as a bonus, will be easier to modify as we change the schema. This PR was necessitated by work on WNV which didn't work with current master (see slack for WNV stuff).

Main changes:

  • attach_tree_meta_data is removed from utils and not imported by titers.py (where it was never called)
  • The attrs attached to nodes are determined by scanning the (config) meta.json
  • Null values (for traits / attrs) are no longer exported
  • Missing geographic deme lat/longs are now fatal
  • Colour map hex values which are in colors.tsv but not in any tree nodes are not exported.
  • The main code block is more linear in style (which I find easier to read)
  • More error checking is incorporated. Notably, both WNV and zika now show multiple errors such as "Error - White et al had contradictory title(s): Isolation of Zika virus from mosquitoes in Haiti vs Zika viruses in Haiti during an outbreak of Chikungunya Fever in mid 2014". This is because we use non non unique author lookups like "White et al". Note that the error messages are new, not the errors themselves.
  • There are no changes to the augur export API

I've tested this on WNV & zika. @emmahodcroft could you take a look with TB?

@rneher
Copy link
Member

rneher commented Jul 11, 2018

looks good to me, but haven't had time to test.

Copy link
Member

@emmahodcroft emmahodcroft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to work fine for TB :)

My only comment is that I get a ream of errors for sometimes having a space after the paper title and sometimes not. I don't think adding .strip() to either side of the comparison would hurt anything for these fields - but feel free to disregard if it does.

@rneher
Copy link
Member

rneher commented Jul 11, 2018

This line should remove prefix whitespace from the csv or tsv metadata. Any trailing whitespace remains (I think). strip() seems useful to me.

@jameshadfield jameshadfield merged commit 32037dc into master Jul 11, 2018
@jameshadfield jameshadfield deleted the export branch July 11, 2018 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants