-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes Qiita's 2237 issue #18
Conversation
This is ready for review. State Unifrac is failing to install and I don't know why. |
.travis.yml
Outdated
# Installing state unifrac | ||
- conda install --yes -c conda-forge cython | ||
- conda install --yes -c biocore unifrac | ||
# - sed -i 's/^CXXBASE=.*/CXXBASE=clang++/' `which h5c++` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove please
'i-table': ('artifact', ['BIOM']), | ||
'p-sampling-depth': ['integer', 1000] | ||
'BIOM table': ('artifact', ['BIOM']), | ||
'Sampling depth': ['integer', 1000] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is 1000 the default? it looks like this is a replicated value with the dflt_param_set
? why 1000?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a default per parameter while the dflt_param_set
is a set of defaults for all parameters. In this case, this command only has one parameter and that's why it looks replicated. It is also possible, however, to add more than one set of dflt_param_set
in which we change this value.
qp_qiime2/__init__.py
Outdated
'jaccard'], | ||
'i-tree': ['choice:["default", "None"]', 'None']} | ||
'Diversity metric': [ | ||
'choice:%s' % dumps(list(BETA_DIVERSITY_METRICS.keys())), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for .keys()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
qp_qiime2/__init__.py
Outdated
'i-tree': ['choice:["default", "None"]', 'None']} | ||
'Diversity metric': [ | ||
'choice:%s' % dumps(list(BETA_DIVERSITY_METRICS.keys())), | ||
'Jaccard similarity index'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unifrac metrics + jaccard only? what about bray curtis or others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the default value, and we can only choose one. The way it is done in Qiita is that each command executes only 1 metric, because we don't support an arbitrary number of outputs. Since by default we don't have a tree, we've chosen a non-phylogenetic metric as default. The user can, however, change this value on the interface.
qp_qiime2/__init__.py
Outdated
'Diversity metric': [ | ||
'choice:%s' % dumps(list(BETA_DIVERSITY_METRICS.keys())), | ||
'Jaccard similarity index'], | ||
'Phylogenetic tree': ['choice:["default", "None"]', 'None']} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's possible to perform a phylogenetic metric without a tree?
@@ -41,9 +41,10 @@ def tearDown(self): | |||
remove(fp) | |||
|
|||
def test_rarefy(self): | |||
params = {'p-sampling-depth': 2, 'i-table': 5} | |||
params = {'Sampling depth': 2, 'BIOM table': 5} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5
doesn't look like a biom table or path to a biom table. Would it be more correct to describe this as "BIOM table artifact ID"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think from the point of view of the interface this makes sense - we agreed that we don't necessarily want to expose the word Artifacts to the user interface to avoid confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the blending of UI and compute logic is confusing personally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, the way I see this, is that the plugin is doing the translation between the compute logic and the UI. At the end, the actual logic is in Q2 not here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the responsibilities of this plugin are to:
- perform computational work directly using the qiime2 API
- interact with the qiita database for parameters, artifacts, etc
- ensure the above fulfills requirements for a singular and logically distinct user interface
Is that accurate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Close but not exactly.
The plugin is the interface between Qiita and the underlying tool (in this case Qiime2).
It is not using the Qiime2 API, but the Qiime2 CLI (@antgonza may have more information about this decision).
It doesn't interact with the Database directly, it uses the Qiita REST api to retrieve the different values that it needs.
Since Qiita doesn't know anything about the plugin, the plugin itself needs to make sure to tell Qiita how he wants the different parts of the plugin to be shown to the user.
params['p-metric'] = 'unweighted UniFrac' | ||
params['i-tree'] = join( | ||
params['Diversity metric'] = 'Unweighted UniFrac' | ||
params['Phylogenetic tree'] = join( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels inconsistent. The tree is a file path, but the BIOM table is an internal ID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BIOM is an artifact ID, trees are not stored as artifacts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inconsistency on this is confusing but I guess it is what it is.
qp_qiime2/tests/test_qiime2.py
Outdated
'i-table': aid, 'p-metric': 'euclidean', | ||
'i-tree': 'None'} | ||
'BIOM table': aid, 'Diversity metric': 'Euclidean distance', | ||
'Phylogenetic tree': 'None'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an enforcement that absolutely ensures a phylogenetic metric will error appropriately if "Phylogenetic tree" is "None", and similarly, will it error appropriately if a tree is specified with a non-phylogenetic metric? I'm not seeing these tests but perhaps I'm missing them, would it be possible to add if they're not already included?
qp_qiime2/tests/test_qiime2.py
Outdated
@@ -328,7 +342,7 @@ def test_alpha(self): | |||
|
|||
# To avoid having to set up all these files, we are gonna test | |||
# that if phylogenetic and no tree it fails | |||
params['i-tree'] = None | |||
params['Phylogenetic tree'] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused, above this is set as "None"
but now its set as None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed - given how the code works it was behaving correctly anyways.
qp_qiime2/tests/test_qiime2.py
Outdated
'Minimum feature frequency across samples': '5', | ||
'Maximum feature frequency across samples': '10', | ||
'Minimum features per sample': '5', | ||
'Maximum features per sample': '9223372036854775807', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, these were int
above but now are str
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch - the code forces a cast anyways so it doesn't really matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wasade !
'i-table': ('artifact', ['BIOM']), | ||
'p-sampling-depth': ['integer', 1000] | ||
'BIOM table': ('artifact', ['BIOM']), | ||
'Sampling depth': ['integer', 1000] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a default per parameter while the dflt_param_set
is a set of defaults for all parameters. In this case, this command only has one parameter and that's why it looks replicated. It is also possible, however, to add more than one set of dflt_param_set
in which we change this value.
qp_qiime2/__init__.py
Outdated
'jaccard'], | ||
'i-tree': ['choice:["default", "None"]', 'None']} | ||
'Diversity metric': [ | ||
'choice:%s' % dumps(list(BETA_DIVERSITY_METRICS.keys())), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
qp_qiime2/__init__.py
Outdated
'i-tree': ['choice:["default", "None"]', 'None']} | ||
'Diversity metric': [ | ||
'choice:%s' % dumps(list(BETA_DIVERSITY_METRICS.keys())), | ||
'Jaccard similarity index'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the default value, and we can only choose one. The way it is done in Qiita is that each command executes only 1 metric, because we don't support an arbitrary number of outputs. Since by default we don't have a tree, we've chosen a non-phylogenetic metric as default. The user can, however, change this value on the interface.
qp_qiime2/__init__.py
Outdated
opt_params = {} | ||
outputs = {'o-pcoa': 'ordination_results'} | ||
dflt_param_set = { | ||
'Defaults': {} | ||
} | ||
qiime_cmd = QiitaCommand( | ||
"pcoa", "Principal Coordinate Analysis", | ||
"Generate principal coordinates analysis (PCoA)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure - Done!
qp_qiime2/__init__.py
Outdated
outputs = {'q2_visualization': 'q2_visualization'} | ||
dflt_param_set = { | ||
'Defaults': { | ||
'p-method': 'spearman', | ||
'p-permutations': 999} | ||
'Correlation method': 'spearman', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
p_min_features = int(parameters['p-min-features']) | ||
p_where = parameters['p-where'] | ||
artifact_id = int(parameters['BIOM table']) | ||
p_max_frequency = int( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're passes as JSON - but Qiita stores them always as a string because that is how they're reported from the interface.
@@ -41,9 +41,10 @@ def tearDown(self): | |||
remove(fp) | |||
|
|||
def test_rarefy(self): | |||
params = {'p-sampling-depth': 2, 'i-table': 5} | |||
params = {'Sampling depth': 2, 'BIOM table': 5} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think from the point of view of the interface this makes sense - we agreed that we don't necessarily want to expose the word Artifacts to the user interface to avoid confusion.
params['p-metric'] = 'unweighted UniFrac' | ||
params['i-tree'] = join( | ||
params['Diversity metric'] = 'Unweighted UniFrac' | ||
params['Phylogenetic tree'] = join( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BIOM is an artifact ID, trees are not stored as artifacts.
qp_qiime2/tests/test_qiime2.py
Outdated
@@ -328,7 +342,7 @@ def test_alpha(self): | |||
|
|||
# To avoid having to set up all these files, we are gonna test | |||
# that if phylogenetic and no tree it fails | |||
params['i-tree'] = None | |||
params['Phylogenetic tree'] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed - given how the code works it was behaving correctly anyways.
qp_qiime2/tests/test_qiime2.py
Outdated
'Minimum feature frequency across samples': '5', | ||
'Maximum feature frequency across samples': '10', | ||
'Minimum features per sample': '5', | ||
'Maximum features per sample': '9223372036854775807', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch - the code forces a cast anyways so it doesn't really matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple questions
qp_qiime2/__init__.py
Outdated
'Number of jobs': ['integer', 1], | ||
'Adjust variance (phylogenetic only)': ['boolean', False], | ||
'Alpha value (Generalized Unifrac only)': ['float', 0], | ||
'Bypass tips (phylogenetic only)': ['boolean', False]} | ||
outputs = {'distance_matrix': 'distance_matrix'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've renamed the input dictionary keys (e.g. i-tree
to Phylogenetic tree
) to be more human-friendly -- should that also be the case for the output dict keys?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...see also lines 73, 91, 110, 127, 138, 157, 174, and 192
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antgonza @adswafford what do you think?
qp_qiime2/__init__.py
Outdated
'p-where': ('string', '')} | ||
'Minimum feature frequency across samples': ('integer', 1), | ||
'Maximum feature frequency across samples': | ||
('integer', 9223372036854775807), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a more polite / less typo-able way to represent max 64 bit signed integer value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antgonza you used this value. Is there any specific reason you used that value instead of using sys.maxint
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for confirming - removed the hardcoded value
q2_metrics = beta_methods.union(beta_alt_methods) | ||
qp_metrics = set(BETA_DIVERSITY_METRICS.values()).union( | ||
STATE_UNIFRAC_METRICS.values()).difference(STATE_UNIFRAC_METRICS) | ||
self.assertEqual(q2_metrics, qp_metrics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I parse the logic and the above conversation correctly,
- we are hardcoding the dictionaries like
BETA_DIVERSITY_METRICS
so that we have human-friendly labels and will need to update them, but - this test will check in with the installed Qiime2 version and complain to us if they've added or removed one of the option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct! The test will fail if there is a metric present in the plugin not present in Qiime 2 and viceversa.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me, although it sure would be nice if the human-readable element of the pair could be totally imported from Qiime2!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... I think one of the main drawbacks is that the API itself is mainly tailored for programmers and CLI, which have different limitations than a GUI, which is probably why it is harder to provide those human readable elements from Q2 directly
I've decided to change the output names after @tanaes comment since it makes sense and I think it will improve the GUI and information to the user. |
Awesome, that's much more intelligible for me! Thanks! |
🍻 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 more comment.
dflt_param_set = { | ||
'Defaults': {} | ||
} | ||
qiime_cmd = QiitaCommand( | ||
"pcoa", "Principal Coordinate Analysis", | ||
"Perform Principal Coordinates Analysis (PCoA)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only one that is Perform vs. Calculate and the only one that has capital initial letters, fine if we want to leave as this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was changed as request by @wasade to match the capitalization of PCoA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
Just want to make sure that tests are executing correctly here