Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes Qiita's 2237 issue #18

Merged
merged 22 commits into from
Oct 23, 2017
Merged

Fixes Qiita's 2237 issue #18

merged 22 commits into from
Oct 23, 2017

Conversation

josenavas
Copy link
Contributor

Just want to make sure that tests are executing correctly here

@josenavas josenavas changed the title [WIP] Fixes Qiita's 2237 issue Fixes Qiita's 2237 issue Oct 23, 2017
@josenavas
Copy link
Contributor Author

This is ready for review. State Unifrac is failing to install and I don't know why.

@josenavas josenavas mentioned this pull request Oct 23, 2017
.travis.yml Outdated
# Installing state unifrac
- conda install --yes -c conda-forge cython
- conda install --yes -c biocore unifrac
# - sed -i 's/^CXXBASE=.*/CXXBASE=clang++/' `which h5c++`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove please

'i-table': ('artifact', ['BIOM']),
'p-sampling-depth': ['integer', 1000]
'BIOM table': ('artifact', ['BIOM']),
'Sampling depth': ['integer', 1000]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is 1000 the default? it looks like this is a replicated value with the dflt_param_set? why 1000?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a default per parameter while the dflt_param_set is a set of defaults for all parameters. In this case, this command only has one parameter and that's why it looks replicated. It is also possible, however, to add more than one set of dflt_param_set in which we change this value.

'jaccard'],
'i-tree': ['choice:["default", "None"]', 'None']}
'Diversity metric': [
'choice:%s' % dumps(list(BETA_DIVERSITY_METRICS.keys())),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for .keys()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

'i-tree': ['choice:["default", "None"]', 'None']}
'Diversity metric': [
'choice:%s' % dumps(list(BETA_DIVERSITY_METRICS.keys())),
'Jaccard similarity index'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unifrac metrics + jaccard only? what about bray curtis or others?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the default value, and we can only choose one. The way it is done in Qiita is that each command executes only 1 metric, because we don't support an arbitrary number of outputs. Since by default we don't have a tree, we've chosen a non-phylogenetic metric as default. The user can, however, change this value on the interface.

'Diversity metric': [
'choice:%s' % dumps(list(BETA_DIVERSITY_METRICS.keys())),
'Jaccard similarity index'],
'Phylogenetic tree': ['choice:["default", "None"]', 'None']}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's possible to perform a phylogenetic metric without a tree?

@@ -41,9 +41,10 @@ def tearDown(self):
remove(fp)

def test_rarefy(self):
params = {'p-sampling-depth': 2, 'i-table': 5}
params = {'Sampling depth': 2, 'BIOM table': 5}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 doesn't look like a biom table or path to a biom table. Would it be more correct to describe this as "BIOM table artifact ID"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think from the point of view of the interface this makes sense - we agreed that we don't necessarily want to expose the word Artifacts to the user interface to avoid confusion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the blending of UI and compute logic is confusing personally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the way I see this, is that the plugin is doing the translation between the compute logic and the UI. At the end, the actual logic is in Q2 not here...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the responsibilities of this plugin are to:

  • perform computational work directly using the qiime2 API
  • interact with the qiita database for parameters, artifacts, etc
  • ensure the above fulfills requirements for a singular and logically distinct user interface

Is that accurate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close but not exactly.
The plugin is the interface between Qiita and the underlying tool (in this case Qiime2).

It is not using the Qiime2 API, but the Qiime2 CLI (@antgonza may have more information about this decision).

It doesn't interact with the Database directly, it uses the Qiita REST api to retrieve the different values that it needs.

Since Qiita doesn't know anything about the plugin, the plugin itself needs to make sure to tell Qiita how he wants the different parts of the plugin to be shown to the user.

params['p-metric'] = 'unweighted UniFrac'
params['i-tree'] = join(
params['Diversity metric'] = 'Unweighted UniFrac'
params['Phylogenetic tree'] = join(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels inconsistent. The tree is a file path, but the BIOM table is an internal ID?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BIOM is an artifact ID, trees are not stored as artifacts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inconsistency on this is confusing but I guess it is what it is.

'i-table': aid, 'p-metric': 'euclidean',
'i-tree': 'None'}
'BIOM table': aid, 'Diversity metric': 'Euclidean distance',
'Phylogenetic tree': 'None'}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an enforcement that absolutely ensures a phylogenetic metric will error appropriately if "Phylogenetic tree" is "None", and similarly, will it error appropriately if a tree is specified with a non-phylogenetic metric? I'm not seeing these tests but perhaps I'm missing them, would it be possible to add if they're not already included?

@@ -328,7 +342,7 @@ def test_alpha(self):

# To avoid having to set up all these files, we are gonna test
# that if phylogenetic and no tree it fails
params['i-tree'] = None
params['Phylogenetic tree'] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, above this is set as "None" but now its set as None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed - given how the code works it was behaving correctly anyways.

'Minimum feature frequency across samples': '5',
'Maximum feature frequency across samples': '10',
'Minimum features per sample': '5',
'Maximum features per sample': '9223372036854775807',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, these were int above but now are str?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - the code forces a cast anyways so it doesn't really matter.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.2%) to 92.063% when pulling 04002a4 on josenavas:fix-2237 into ab4f1b7 on qiita-spots:master.

Copy link
Contributor Author

@josenavas josenavas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wasade !

'i-table': ('artifact', ['BIOM']),
'p-sampling-depth': ['integer', 1000]
'BIOM table': ('artifact', ['BIOM']),
'Sampling depth': ['integer', 1000]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a default per parameter while the dflt_param_set is a set of defaults for all parameters. In this case, this command only has one parameter and that's why it looks replicated. It is also possible, however, to add more than one set of dflt_param_set in which we change this value.

'jaccard'],
'i-tree': ['choice:["default", "None"]', 'None']}
'Diversity metric': [
'choice:%s' % dumps(list(BETA_DIVERSITY_METRICS.keys())),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

'i-tree': ['choice:["default", "None"]', 'None']}
'Diversity metric': [
'choice:%s' % dumps(list(BETA_DIVERSITY_METRICS.keys())),
'Jaccard similarity index'],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the default value, and we can only choose one. The way it is done in Qiita is that each command executes only 1 metric, because we don't support an arbitrary number of outputs. Since by default we don't have a tree, we've chosen a non-phylogenetic metric as default. The user can, however, change this value on the interface.

opt_params = {}
outputs = {'o-pcoa': 'ordination_results'}
dflt_param_set = {
'Defaults': {}
}
qiime_cmd = QiitaCommand(
"pcoa", "Principal Coordinate Analysis",
"Generate principal coordinates analysis (PCoA)",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - Done!

outputs = {'q2_visualization': 'q2_visualization'}
dflt_param_set = {
'Defaults': {
'p-method': 'spearman',
'p-permutations': 999}
'Correlation method': 'spearman',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

p_min_features = int(parameters['p-min-features'])
p_where = parameters['p-where']
artifact_id = int(parameters['BIOM table'])
p_max_frequency = int(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're passes as JSON - but Qiita stores them always as a string because that is how they're reported from the interface.

@@ -41,9 +41,10 @@ def tearDown(self):
remove(fp)

def test_rarefy(self):
params = {'p-sampling-depth': 2, 'i-table': 5}
params = {'Sampling depth': 2, 'BIOM table': 5}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think from the point of view of the interface this makes sense - we agreed that we don't necessarily want to expose the word Artifacts to the user interface to avoid confusion.

params['p-metric'] = 'unweighted UniFrac'
params['i-tree'] = join(
params['Diversity metric'] = 'Unweighted UniFrac'
params['Phylogenetic tree'] = join(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BIOM is an artifact ID, trees are not stored as artifacts.

@@ -328,7 +342,7 @@ def test_alpha(self):

# To avoid having to set up all these files, we are gonna test
# that if phylogenetic and no tree it fails
params['i-tree'] = None
params['Phylogenetic tree'] = None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed - given how the code works it was behaving correctly anyways.

'Minimum feature frequency across samples': '5',
'Maximum feature frequency across samples': '10',
'Minimum features per sample': '5',
'Maximum features per sample': '9223372036854775807',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - the code forces a cast anyways so it doesn't really matter.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.2%) to 92.063% when pulling aa42544 on josenavas:fix-2237 into ab4f1b7 on qiita-spots:master.

@josenavas
Copy link
Contributor Author

@wasade @antgonza any other comments here?

Copy link

@tanaes tanaes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple questions

'Number of jobs': ['integer', 1],
'Adjust variance (phylogenetic only)': ['boolean', False],
'Alpha value (Generalized Unifrac only)': ['float', 0],
'Bypass tips (phylogenetic only)': ['boolean', False]}
outputs = {'distance_matrix': 'distance_matrix'}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've renamed the input dictionary keys (e.g. i-tree to Phylogenetic tree) to be more human-friendly -- should that also be the case for the output dict keys?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...see also lines 73, 91, 110, 127, 138, 157, 174, and 192

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antgonza @adswafford what do you think?

'p-where': ('string', '')}
'Minimum feature frequency across samples': ('integer', 1),
'Maximum feature frequency across samples':
('integer', 9223372036854775807),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a more polite / less typo-able way to represent max 64 bit signed integer value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antgonza you used this value. Is there any specific reason you used that value instead of using sys.maxint?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming - removed the hardcoded value

q2_metrics = beta_methods.union(beta_alt_methods)
qp_metrics = set(BETA_DIVERSITY_METRICS.values()).union(
STATE_UNIFRAC_METRICS.values()).difference(STATE_UNIFRAC_METRICS)
self.assertEqual(q2_metrics, qp_metrics)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I parse the logic and the above conversation correctly,

  1. we are hardcoding the dictionaries like BETA_DIVERSITY_METRICS so that we have human-friendly labels and will need to update them, but
  2. this test will check in with the installed Qiime2 version and complain to us if they've added or removed one of the option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct! The test will fail if there is a metric present in the plugin not present in Qiime 2 and viceversa.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me, although it sure would be nice if the human-readable element of the pair could be totally imported from Qiime2!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... I think one of the main drawbacks is that the API itself is mainly tailored for programmers and CLI, which have different limitations than a GUI, which is probably why it is harder to provide those human readable elements from Q2 directly

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 92.278% when pulling 2f59226 on josenavas:fix-2237 into ab4f1b7 on qiita-spots:master.

@josenavas
Copy link
Contributor Author

I've decided to change the output names after @tanaes comment since it makes sense and I think it will improve the GUI and information to the user.

@tanaes
Copy link

tanaes commented Oct 23, 2017

Awesome, that's much more intelligible for me! Thanks!

@josenavas
Copy link
Contributor Author

🍻

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 92.278% when pulling 6a6646e on josenavas:fix-2237 into ab4f1b7 on qiita-spots:master.

Copy link
Member

@antgonza antgonza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 more comment.

dflt_param_set = {
'Defaults': {}
}
qiime_cmd = QiitaCommand(
"pcoa", "Principal Coordinate Analysis",
"Perform Principal Coordinates Analysis (PCoA)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only one that is Perform vs. Calculate and the only one that has capital initial letters, fine if we want to leave as this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was changed as request by @wasade to match the capitalization of PCoA

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@coveralls
Copy link

Coverage Status

Coverage increased (+0.0005%) to 92.298% when pulling fbd3bff on josenavas:fix-2237 into ab4f1b7 on qiita-spots:master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants