Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/isa data type #5787

Merged
merged 27 commits into from Oct 19, 2018

Conversation

Projects
None yet
9 participants
@pkrog
Copy link
Contributor

commented Mar 27, 2018

New data types ISA-tab and ISA-json, proposed by the PhenoMeNal team. These new types have been successfully used inside the PhenoMeNal project, in tools such as
mtbls-dwnld and isa2w4m.

The uploader has been modified in order to be able to update the name of the composite dataset after uploading and opening the archive.

isa-rwval has been added as a new dependency. This package is required in order to read ISA archive content and display a summary to the user. It triggers the installation of packages networkx and six.

This PR is in relation with issue #3134 .

@galaxybot galaxybot added the triage label Mar 27, 2018

@galaxybot galaxybot added this to the 18.05 milestone Mar 27, 2018

@pcm32

This comment has been minimized.

Copy link
Member

commented Mar 28, 2018

The following PR on starforge creates the wheel needed for the ISA Datatype, please review:
galaxyproject/starforge#197

"""
ISA datatype
See http://isa-tools.org

This comment has been minimized.

Copy link
@martenson

martenson Apr 13, 2018

Member

This url seems down to me (and my colleague @natefoo), redirects to a non existing page at http://apps1047.gold-traffic94.loan/?utm_medium=NQ3aDvyuBCtafRQJPeFC66tm%2bMNW8T%2baflxP0d0AJGo%3d&t=main4

Not sure what is up with that.

This comment has been minimized.

Copy link
@pkrog

pkrog Apr 16, 2018

Author Contributor

The website address is correct. Except the site has been hacked. I will replace the URL with the GiHub URL for the organisation ISA tools.

@@ -88,3 +88,7 @@ ecdsa==0.13

# GenomeSpace dependencies
python-genomespaceclient==0.1.8

# ISA
git+https://github.com/ISA-tools/isa-rwval.git@develop#egg=isa-rwval

This comment has been minimized.

Copy link
@martenson

martenson Apr 13, 2018

Member

Galaxy dependencies are handled using pipenv so in order to add deps we need to modify https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/dependencies/pipfiles/default/Pipfile and run the proper makefile target to rebuild this file - isntead of manual modification.

Also as mentioned on the starforge PR I think this should get to pypi first.

This comment has been minimized.

Copy link
@pkrog

pkrog Apr 24, 2018

Author Contributor

Is the wheel ready? May I replace this line with isa-rwval==0.10.0 and test it?

@martenson

This comment has been minimized.

Copy link
Member

commented Apr 22, 2018

Unless there are compelling reasons for not doing so I am pushing this back to 18.09 since we most probably won't have time to properly test and review this PR for 18.05.

@pkrog please let me know if this complicates things significantly for you or the PhenoMeNal team.

@martenson martenson modified the milestones: 18.05, 18.09 Apr 22, 2018

@sneumann

This comment has been minimized.

Copy link
Contributor

commented Apr 23, 2018

Hi, the pushback to 18.09 is a bit unfortunate, since our next release 18.08 will be based on Galaxy-18.05. Is there anything where @pkrog or @ilveroluca could help ?

@pkrog

This comment has been minimized.

Copy link
Contributor Author

commented Apr 23, 2018

Hi, if the wheel/pip package is ready, I think there's not much to do except updating requirement line. Am I wrong?

logger.handlers = []
logger.propagate = False
logger.addHandler(ch)
logger.setLevel(logging.ERROR)

This comment has been minimized.

Copy link
@jmchilton

jmchilton Apr 23, 2018

Member

I sort of understand the logging above - does this logging need to be configured also?

This comment has been minimized.

Copy link
@pkrog

pkrog Apr 24, 2018

Author Contributor

Sorry @jmchilton , I fail to understand your remark. Could you elaborate?

rval.append('<div>ISA Dataset composed of the following files:<p/><ul>')
for cmp_file in os.listdir(dataset.extra_files_path):
opt_text = ''
rval.append('<li><a href="%s" type="text/plain">%s</a>%s</li>' % (cmp_file, cmp_file, opt_text))

This comment has been minimized.

Copy link
@jmchilton

jmchilton Apr 23, 2018

Member

You escape above but not here, I think should escape these file names and such.

This comment has been minimized.

Copy link
@pkrog

pkrog Apr 24, 2018

Author Contributor

I'll escape the displayed filename, not the link, and remove opt_text, which is empty.

html += '<p>Data files:</p>'
html += '<ul>'
for data_file in assay.data_files:
html += '<li>' + str(data_file.id) + ' - ' + str(data_file.filename) + ' - ' + str(data_file.label) + '</li>'

This comment has been minimized.

Copy link
@jmchilton

jmchilton Apr 23, 2018

Member

More escaping is needed in this method as well I believe.

# extract archive if the file corresponds to the ISA archive
if basename == ISA_ARCHIVE_NAME:
# perform extraction
with open(file_name, 'rb') as stream:

This comment has been minimized.

Copy link
@jmchilton

jmchilton Apr 23, 2018

Member

Does

from galaxy.util.compression_utils import CompressedFile
CompressedFile(file_name).extract(output_path)

work here? It'd simplify the implementation a lot I think if you did that. Could then get rid of extract* helpers, maybe that regex at the top, etc....

shutil.move(dp, file_output_path)
# groom the dataset file content if required by the corresponding datatype definition
if datatype.dataset_content_needs_grooming(file_output_path):
datatype.groom_dataset_content(file_output_path)

This comment has been minimized.

Copy link
@jmchilton

jmchilton Apr 23, 2018

Member

Any chance I could get you to write a test case for this - we have really been working on increasing our test coverage for upload and datatype options and such over the last year.

The file https://github.com/galaxyproject/galaxy/blob/dev/test/api/test_tools_upload.py has a lot of test cases, in particular there is a composite upload test case here https://github.com/galaxyproject/galaxy/blob/dev/test/api/test_tools_upload.py#L119 that might be a useful template - I'm not sure. If you don't have the will or time to implement such a test case can you place a small example isa archive in test-data and I could maybe take a stab at it. I guess this change is needed to ensure the archive is uncompressed as part of the upload - I'm not sure groom_dataset_content is the right place to do that but I'm not sure it is wrong either - so a test case really might help clarify and make sure we don't break anything if we need to refactor later.

This comment has been minimized.

Copy link
@jmchilton

jmchilton Apr 23, 2018

Member

I don't know if this is relevant but if you want to set the name dynamically from https://github.com/workflow4metabolomics/mtbls-dwnld/blob/develop/mtbls-dwnld.xml - I think I could help with that. Rather than change the data type I'd recommend setting the name in a tool provided metadata file - I could give you some links for that. If the problem is really about upload though that isn't helpful.

This comment has been minimized.

Copy link
@pkrog

pkrog Apr 24, 2018

Author Contributor

I'm interested in writing the test. I'll have a look at your pointers.

The name of the dataset is set correctly within mtbls-dwnld. So, no, the issue is with the uploader.

This comment has been minimized.

Copy link
@pkrog

pkrog Apr 26, 2018

Author Contributor

Hi @jmchilton , I've added a small zip file for testing the uploader, and started to write a method test_composite_datatype_isatab. Howerver I haven't figured out yet how to submit the content of the zip file to the uploader (or the file path if this is sufficient). Could you help me please?

This comment has been minimized.

Copy link
@pkrog

pkrog May 7, 2018

Author Contributor

Hi @jmchilton , I see that our PR is still scheduled for the 18.09 release, however as stated by @sneumann it is really important for us that it gets included in the 18.05 release. The only issue remaining seems to be this uploader test. If you help me on this, I can finish it.

This comment has been minimized.

Copy link
@djcomlab

djcomlab Jun 29, 2018

Contributor

@pkrog @jmchilton I've updated the test.

if investigation is not None:
dataset.name = investigation.identifier
else:
dataset.name = 'ISA DATASET'

This comment has been minimized.

Copy link
@jmchilton

jmchilton Apr 23, 2018

Member

Would it be better to create some metadata on this datatype and set this there - I'm a bit worried about overriding the behavior of the upload datatype and the names it creates on a per-datatype basis. I do get this increases the usability of this datatype - but suddenly the upload API isn't going to work the way on would expect if you set a name explicitly or something - and we are constantly debating how to build default names from URL uploads - this would seem to override any decision on that as well. Even if you can make a case for using the investigation if set, it shouldn't be setting the default like this is the investigation is not set right?

This comment has been minimized.

Copy link
@pkrog

pkrog Apr 24, 2018

Author Contributor

The else part is certainly not useful.
I've noticed that the uploader set the dataset name to the file name (for instance MTBLS354.zip), now. I don't remember it was doing that before, but maybe I'm wrong. The filename should be sufficient for the user I guess.

html += '<p>Measurement type: %s</p>' % assay.measurement_type.term # OntologyAnnotation
html += '<p>Technology type: %s</p>' % assay.technology_type.term # OntologyAnnotation
html += '<p>Technology platform: %s</p>' % assay.technology_platform
if assay.data_files is not None:

This comment has been minimized.

Copy link
@djcomlab

djcomlab Apr 24, 2018

Contributor

Note that this bit doesn't actually output any of the assay data files as it stands, because you're only using the investigation file parser from isa-rwval. So in all cases this data files list will be empty

This comment has been minimized.

Copy link
@pkrog

pkrog Apr 24, 2018

Author Contributor

Yes, I've noted that it's always empty now. That's unfortunate. Any workaround?

This comment has been minimized.

Copy link
@djcomlab

djcomlab Apr 24, 2018

Contributor

I think this has become like this because the original request for isa-rwval was to support Python 2.7 (since full isatools is Python 3), but then it turns out that for Galaxy we don't want to add pandas to the requirements. The study and assay table parsing relies on pandas currently.

I can probably add to isa-rwval some features to extract the data files (and possibly a little bit more other useful metadata). Once I push a new version, the pinned-requirements.txt may need to be updated.

This comment has been minimized.

Copy link
@pkrog

pkrog Apr 24, 2018

Author Contributor

That would be great. But maybe not for this release, unless you have time to do it now. I'll comment these lines in the meantime.

This comment has been minimized.

Copy link
@djcomlab

djcomlab Apr 25, 2018

Contributor

I can do it today.

This comment has been minimized.

Copy link
@pkrog

pkrog Apr 26, 2018

Author Contributor

Great job @djcomlab , I've just tested the modifications on my computer. It works well.

for data_file in assay.data_files:
html += '<li>' + escape(util.unicodify(str(data_file.id), 'utf-8')) + ' - ' + escape(util.unicodify(str(data_file.filename), 'utf-8')) + ' - ' + escape(util.unicodify(str(data_file.label), 'utf-8')) + '</li>'
html += '</ul>'
# The list of Raw data files contained in the assay file cannot be obtained anymore, due to limitation in isa-rwval library. David Johnson (@djcomlab) is working on a solution, and will update isa-rwval soon.

This comment has been minimized.

Copy link
@djcomlab

djcomlab Apr 25, 2018

Contributor

It's technically not a limitation of isa-rwval. It does it, but needs Pandas to do it. The limitation is because Galaxy does not want to add Pandas to its requirements 😁

@@ -91,4 +91,3 @@ python-genomespaceclient==0.1.8

# ISA
git+https://github.com/ISA-tools/isa-rwval.git@develop#egg=isa-rwval

This comment has been minimized.

Copy link
@djcomlab

djcomlab Apr 26, 2018

Contributor

Actually can now revert this line back to the package name.

isa-rwval==0.10.1 should now allow some table parsing to reintroduce the data files displaying.

pkrog and others added some commits Jan 31, 2018

Add grooming for composite files.
Add call to groom_dataset_content() for composite files.
Add ISA composite data type.
Add new data types isa-tab and isa-json.
Add isa-rwval library (isatools Python 2 version) in requirements.
[ISA datatype] adds logger setting for isatools.isatab to avoid faili…
…ng uploads when no pandas (#31)

* [ISA datatype] adds logger setting for isatools.isatab to avoid failing uploads when no pandas.

* Update isa.py
Installs isa-rwval from develop
Otherwise failing currently when building in the container.
Change ISA Tools web site link.
Replace ISA Tools .org web site that has just been hacked, by the GitHub
page of the organization.

@pkrog pkrog force-pushed the phnmnl:feature/isa_data_type branch from 3858c90 to 832f51b Jun 7, 2018

@martenson

This comment has been minimized.

Copy link
Member

commented Sep 6, 2018

@jmchilton can you please give this a review?

@dannon dannon modified the milestones: 18.09, 19.01 Sep 6, 2018

@fgiacomoni

This comment has been minimized.

Copy link

commented Oct 18, 2018

@martenson and @jmchilton, do you have a short moment to review this PR.
I'm back on this thread because the isa datatype is central in our two galaxy public servers (Workflow4metabolomics and H2020 Phenomenal) dedicated in metabolomics.
In which galaxy release (18.09?) do you plan to merge the PR? A big thanks

pkrog added some commits Oct 18, 2018

Various corrections for py34-lint
Add second blank line.
Remove useless imports.
@martenson

This comment has been minimized.

Copy link
Member

commented Oct 19, 2018

@galaxybot test this

@martenson martenson merged commit 99bc4b3 into galaxyproject:dev Oct 19, 2018

6 checks passed

api test Build finished. 434 tests run, 1 skipped, 0 failed.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
framework test Build finished. 187 tests run, 0 skipped, 0 failed.
Details
integration test Build finished. 123 tests run, 10 skipped, 0 failed.
Details
selenium test Build finished. 147 tests run, 3 skipped, 0 failed.
Details
toolshed test Build finished. 577 tests run, 0 skipped, 0 failed.
Details
@martenson

This comment has been minimized.

Copy link
Member

commented Oct 19, 2018

@pkrog @fgiacomoni @djcomlab This will be part of 19.01 Galaxy release. Thank you for the contribution and sorry for the delay.

We might get back to you asking for some tests I think, but I don't expect any big problems. Thanks again. 👍

ping @jmchilton

@fgiacomoni

This comment has been minimized.

Copy link

commented Oct 20, 2018

Great - @martenson thanks for the review (thanks to @bgruening for the ping 😄 ) and thanks @pkrog and @djcomlab and @jmchilton for isa effort integration. @proccaserra should be interested by this feature!

@pkrog

This comment has been minimized.

Copy link
Contributor Author

commented Oct 20, 2018

Many thanks @martenson , @bgruening and @jmchilton .
And we should not forget @kikkomep for his great work too in this PR, thanks Marco !

nsoranzo added a commit to nsoranzo/galaxy that referenced this pull request Oct 25, 2018

nsoranzo added a commit to nsoranzo/galaxy that referenced this pull request Oct 25, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.