Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

imp: adds helpers to get metadata/artifacts from URLs in usage examples #658

Merged
merged 12 commits into from
Nov 22, 2022

Conversation

gregcaporaso
Copy link
Member

@gregcaporaso gregcaporaso commented Oct 24, 2022

Addresses #657

Still needs:

  • doc strings
  • unit tests
  • error detection and reporting on failed downloads (e.g., report 404 with the URL rather than failing when trying to load nonsense data as Metadata)
  • init_metadata_column_from_url this is hooked up by passing column to init_metadata_from_url going to skip this in this PR altogether
  • update handling of integration of {epoch} in URLs to replace with the most recent epoch
  • is this encoding/replacement of {epoch} a reasonable way to handle this? we decided to use f-strings instead, based on feedback from @ebolyen
  • address new TODOs throughout

Submitting a draft PR so we can use it in our usage sprint example.

@gregcaporaso
Copy link
Member Author

gregcaporaso commented Oct 24, 2022

This can be used as follows:

def feature_table_filter_samples(use):
    feature_table = use.init_artifact_from_url(
        'feature_table',
        'https://docs.qiime2.org/{epoch}/data/tutorials/moving-pictures/table.qza'
        )
    sample_metadata = use.init_metadata_from_url(
        'sample_metadata',
        'https://data.qiime2.org/{epoch}/tutorials/moving-pictures/sample_metadata.tsv'
        )

    filtered_table, = use.action(
        use.UsageAction(plugin_id='feature_table', action_id='filter_samples'),
        use.UsageInputs(table=feature_table, metadata=sample_metadata,
                        where="[body-site] IN ('left palm', 'right palm')"),
        use.UsageOutputNames(filtered_table='filtered_table')
    )

Copy link
Member

@lizgehret lizgehret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This great, thanks for adding this @gregcaporaso! Everything looks reasonable to me, and it seems like you have good test coverage for the common use-cases.

Let's let @ebolyen take a final look over this one before it's merged (in case there's any missing context on something that could cause problems).

# Obtaining epoch modeled on qiime2.metadata.io.MetadataFileError
import qiime2

epoch = qiime2.__release__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach!

return data

def init_artifact_from_url(self, name: str, url: str,
replace_url_epoch: bool = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this too - on the off-chance that we ever don't need the epoch in a given artifact URL, this will be helpful.


return self.init_artifact(name, factory)

def init_metadata_from_url(self, name: str, url: str, column: str = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the addition of the column parameter, this is so helpful!

@@ -922,6 +922,146 @@ def init_format(self, name: str,
"""
return self._usage_variable(name, factory, 'format')

def _replace_url_epoch(self, url):
# Obtaining epoch modeled on qiime2.metadata.io.MetadataFileError
import qiime2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import qiime2

qiime2 is being imported at the top of this file, so it shouldn't be needed as an additional import here.

return url.replace('{epoch}', epoch)

def _request_url(self, url):
import requests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting - the requests module isn't being found when CI is running your tests. I'm wondering if it actually does need to be added at the top of the file, or maybe in the recipe file as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I wonder if it's not actually a framework dependency?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might recommend using stdlib for this (even if it's slightly unpleasant), since there's not too much going on. If we were doing lots of URL requests, then a 3rd party lib might be a good choice.

Since almost everyone uses requests, it's easy for us to end up in a dependency bind someday, so not using it would avoid that.

Copy link
Member

@lizgehret lizgehret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a couple of additional comments inline re: CI failures.

@ebolyen
Copy link
Member

ebolyen commented Oct 25, 2022

Big picture questions:

  • These seems like the first example of "utility" methods. It seems these are not generally expected to be overridden by each interface. It occurs to me that it would be perfectly valid for them to do so however. Are we cool with that? It seems reasonable enough.

  • I think with python's f-strings we might be able to not worry about the epoch replacement, is it very much work to do:

    def example(use):
        import qiime2
        use.init_artifact_from_url(f'http://whatever.org/{qiime2.__release__}/foo.qza')

    This avoids any unanticipated behavior, and we should be able to consider __release__ to be stable. Otherwise, I think the default should probably be False. The epoch replacement parameter also makes implementing interface overriding more complicated, as they would all need some mechanism to repeat the epoch logic.

  • The column parameter on init_metadata_from_url is a pretty good idea, however will the lack of friction cause any confusion when relating it to other methods? Said differently, is the current "syntactic salt" of having to split the two operations beneficial as it requires acknowledging the difference, or should we go the other way and also add use.init_metadata_column which is a similar utility method? If that happened, should we have init_metadata_column_from_url to keep things clear?

    • An alternative would be to add column to init_metadata as well, but this would break the API for each interface which implements those, so I don't think we want to do that.

@gregcaporaso
Copy link
Member Author

Thanks for the feedback @lizgehret and @ebolyen!

@ebolyen:
| use.init_artifact_from_url(f'http://whatever.org/{qiime2.__epoch__}/foo.qza')

Did you mean qiime2.__release__ here? qiime2.__epoch__ doesn't appear to be defined.

@ebolyen
Copy link
Member

ebolyen commented Oct 25, 2022

Oh yeah, sorry. That's what I get for doing quick drive-by comments.

@gregcaporaso
Copy link
Member Author

gregcaporaso commented Oct 26, 2022

@lizgehret, @ebolyen - I partially addressed the comments here.

the requests module isn't being found
I might recommend using stdlib

Updated to use urllib instead of requests.

These seems like the first example of "utility" methods. It seems these are not generally expected to be overridden by each interface. It occurs to me that it would be perfectly valid for them to do so however. Are we cool with that? It seems reasonable enough.

I agree - generally don't expect them to be overridden, but I don't think there is harm if they are.

I think with python's f-strings...

That makes sense to me, and I implemented this change. For folks using the previous version from this PR, you would change something like this:

metadata_url = \
            'https://data.qiime2.org/{epoch}/tutorials/moving-pictures/sample_metadata.tsv'
use.init_metadata_from_url('md', metadata_url)

to something like this:

import qiime2
metadata_url = \
            f'https://data.qiime2.org/{qiime2.__release__}/tutorials/moving-pictures/sample_metadata.tsv'
use.init_metadata_from_url('md', metadata_url)

The column parameter on init_metadata_from_url is a pretty good idea, however...

I think you're right that we shouldn't take this approach. Something I didn't notice until today is that the var_type on the result was incorrect in the current implementation in this PR (it's metadata when it should be column).

Going with use.init_metadata_column and use.init_metadata_column_from_url seems to make the most sense to me. I spent some time on this this afternoon but am stuck without working code at the moment - not sure what I'm doing wrong. I probably won't have time to get back to this till late this week - if someone else wants to take a crack and commit to this PR that's fine with me. Here's a diff of the code that I have locally right now where this is not working.

Examples
--------
>>> import qiime2
>>> url = f'https://data.qiime2.org/{qiime2.__release__}/tutorials/moving-pictures/sample_metadata.tsv' # noqa: E501
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't figure out how to get a multi-line string working as input in a doc string. As it stands, I'm disabling the max line length check for flake8. Open to other ideas though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will do what you want. Assuming I remembered how to show continued expressions in doctest.

Suggested change
>>> url = f'https://data.qiime2.org/{qiime2.__release__}/tutorials/moving-pictures/sample_metadata.tsv' # noqa: E501
>>> url = (f'https://data.qiime2.org/{qiime2.__release__}/tutorials/'
... 'moving-pictures/sample_metadata.tsv')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested @ebolyen's suggestion locally, and that worked for me!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could have sworn I tried that but I guess not! Or maybe something unrelated was failing when I tried and mistook it for this failing. In any case, thank you both!

@gregcaporaso
Copy link
Member Author

I decided to drop the column selection support from this PR for now as I was continuing to run into trouble with it as I developed unit tests and real-world examples and I don't want to hold this PR any more due to this. So, if you were previously doing something like:

md = use.init_metadata_from_url('md', metadata_url, column='body-site')

You should now do:

column = 'body-site'
md = use.init_metadata_from_url('md', metadata_url)
mdc = use.get_metadata_column('mdc', column, md)

@gregcaporaso gregcaporaso marked this pull request as ready for review November 1, 2022 20:46
@lizgehret
Copy link
Member

These changes look great @gregcaporaso! Excited to have this available for usage examples moving forward. Unless @ebolyen has any change requests, this looks g2g 👍

@lizgehret lizgehret changed the title adds helpers to get metadata and artifacts from URLs in usage examples imp: adds helpers to get metadata/artifacts from URLs in usage examples Nov 1, 2022
@lizgehret
Copy link
Member

Oh hmm that's interesting @Oddant1 - I wonder if this needs to be __epoch__ instead of __release__. I'll take a look at this later today and see if I get the same result as you do!

@lizgehret
Copy link
Member

lizgehret commented Nov 17, 2022

So after a make dev on my machine, this is the QIIME 2 version that I'm seeing:

QIIME 2 version: 2021.8.0.dev0+68.g1339294

@Oddant1 what QIIME 2 release is your conda env created from? I'm wondering if that is (at least part of) the inconsistency here. --> Yeah it looks like this is related to the conda env being used - I'm not sure if it's pulled from anything else, so I could totally be missing something.

@Oddant1
Copy link
Member

Oddant1 commented Nov 17, 2022

@lizgehret I figured it out and deleted my comment (probably shoulda just edited it). I just needed to do git fetch --all --tags

Copy link
Member

@ebolyen ebolyen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reviews everyone!

@ebolyen ebolyen merged commit 77df12a into qiime2:master Nov 22, 2022
@gregcaporaso gregcaporaso deleted the usage-update branch November 22, 2022 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

imp: support for obtaining artifacts/metadata from URLs in usage examples
4 participants