Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sniffing of xlsx file as blastxml #6849

Closed
martenson opened this issue Oct 10, 2018 · 9 comments

Comments

Projects
None yet
3 participants
@martenson
Copy link
Member

commented Oct 10, 2018

reported on galaxy-dev, example file: Book1.xlsx
screenshot 2018-10-10 14 18 04

mvdbeek added a commit to mvdbeek/galaxy that referenced this issue Oct 13, 2018

@mvdbeek

This comment has been minimized.

Copy link
Member

commented Oct 14, 2018

On top of #6867 which should fix the issue it also looks like the blastxml sniffer on usegalaxy.org is a bit overeager, on my dev instance this is being sniffed as a regular xml datatype. Which is weird, since we ship the blastxml datatype. Any chance that somehow a tool shed installed datatype somehow interfered here ? They shouldn't, but might be worth checking.

Or maybe main is using a non-default sniff order ?

@peterjc

This comment has been minimized.

Copy link
Contributor

commented Oct 15, 2018

As per https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/datatypes/blast.py#L64 the BLAST XML sniffer should be looking for files starting as follows (allowing for white space):

<?xml version="1.0"?>':
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.dtd">
<BlastOutput>

or,

<?xml version="1.0"?>':
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "NCBI_BlastOutput.dtd">
<BlastOutput>

This has not diverged meaningfully from the standalone datatype definition which is still available on the tool shed:

https://github.com/peterjc/galaxy_blast/blob/master/datatypes/blast_datatypes/blast.py#L22

From a visual inspection of the contents of the Excel file once unzipped into individual XML files, I struggle to understand how this could go wrong.

Could there be something off in the datatypes_conf.xml perhaps?

@peterjc

This comment has been minimized.

Copy link
Contributor

commented Oct 15, 2018

Further more, even going back to the BLAST XML datatype when it was originally taken out of Galaxy, the essential BLAST XML sniffer logic has not changed:

peterjc/galaxy_blast@7d82452#diff-a32d54c60ab84531f6a6791591abffe5

i.e. Even some older version of the ToolShed definition from http://toolshed.g2.bx.psu.edu/view/devteam/blast_datatypes is installed, the logic ought never to recognise the Excel file.

@peterjc

This comment has been minimized.

Copy link
Contributor

commented Oct 15, 2018

Uploading the test Excel file locally worked under Galaxy release_16.01, it was sniffed as format xlsx, peek text "uploaded xlsx file"

@mvdbeek

This comment has been minimized.

Copy link
Member

commented Oct 15, 2018

Thanks @peterjc, this is not a blastxml issue and it is specific to usegalaxy.org. The major issue is the lack of compressed = True on the xlsx datatype, which is fixed by #6867 and was introduced with the dynamic compression framework in 18.05, I believe.

@mvdbeek

This comment has been minimized.

Copy link
Member

commented Oct 15, 2018

I've set up some more advanced testing in #6869 and I don't see any issue with the blastxml datatype

@peterjc

This comment has been minimized.

Copy link
Contributor

commented Oct 15, 2018

Progress, although I still fail to understand why BLAST XML is picked... but I will leave this in your capable hands. Thanks!

@mvdbeek

This comment has been minimized.

Copy link
Member

commented Oct 15, 2018

Thanks, we'll try to figure this out!

@martenson

This comment has been minimized.

Copy link
Member Author

commented Oct 15, 2018

Resolved by the above-linked PR.

@martenson martenson closed this Oct 15, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.