Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InterProScan results can't be incorporated if PANTHER analysis is run, gene_caller_id is missing #2

Closed
brymerr921 opened this issue Jan 29, 2018 · 2 comments

Comments

@brymerr921
Copy link

brymerr921 commented Jan 29, 2018

Hi,

I downloaded InterProScan from here (https://github.com/ebi-pf-team/interproscan/wiki/HowToDownload) and downloaded the PANTHER dataset as well (Step 2). I then used InterProScan to annotate several of my genomes. However, while this parser does work on GhostKOALA annotations alone, it does not work on my InterProScan anotations if the PANTHER files were available to InterProScan. The command I used to run InterProScan is:

./interproscan.sh -cpu 16 -f tsv --goterms --iprlookup --pathways -i protein-sequences.fa -o interproscan-results.txt

I also cloned this repository and am running the scripts inside a conda environment (python 2.7) with pandas 0.22.0 and Biopython 1.70 also installed. When I run KEGG-to-anvio, I get this error message:

KEGG-to-anvio --KeggDB KO_Orthology_ko00001.txt -i user_ko.txt -o KeggAnnotations-AnviImportable.txt --interproscan interproscan-results.txt
Traceback (most recent call last):
  File "/home/bmerrill/miniconda3/envs/ghostkoala/bin/KEGG-to-anvio", line 41, in <module>
    interpro = pd.read_table(arg_dict["interproscan"],header=None)
  File "/home/bmerrill/miniconda3/envs/ghostkoala/lib/python2.7/site-packages/pandas/io/parsers.py", line 709, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/bmerrill/miniconda3/envs/ghostkoala/lib/python2.7/site-packages/pandas/io/parsers.py", line 455, in _read
    data = parser.read(nrows)
  File "/home/bmerrill/miniconda3/envs/ghostkoala/lib/python2.7/site-packages/pandas/io/parsers.py", line 1069, in read
    ret = self._engine.read(nrows)
  File "/home/bmerrill/miniconda3/envs/ghostkoala/lib/python2.7/site-packages/pandas/io/parsers.py", line 1839, in read
    data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 902, in pandas._libs.parsers.TextReader.read
  File "pandas/_libs/parsers.pyx", line 924, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas/_libs/parsers.pyx", line 978, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 965, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 2208, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 11 fields in line 3, saw 15

I've attached the files I used for these commands (with PANTHER enabled) that resulted in this error message.
protein-sequences.fa.txt
interproscan-results.txt
user_ko.txt

However, when InterProScan no longer has access to the PANTHER files, the output file (KeggAnnotations-AnviImportable-nopanther.txt) is able to be parsed by GhostKOALA:

KEGG-to-anvio --KeggDB KO_Orthology_ko00001.txt -i user_ko.txt -o KeggAnnotations-AnviImportable-nopanther.txt --interproscan interproscan-results-nopanther.txt

However, looking at KeggAnnotations-AnviImportable-nopanther.txt it appears my gene_caller_id column has no entry for all rows rows that have the source "KeggGhostKoala".

InterProScan annotations with PANTHER disabled:
interproscan-results-nopanther.txt

Output of KEGG-to-anvio (using above command):
KeggAnnotations-AnviImportable-nopanther.txt

Do you have any suggestions for how to fix this? Thanks for the great parser, I'm excited to use it!

Best,
Bryan

@brymerr921 brymerr921 changed the title InterProScan results can't be incorporated if PANTHER analysis is run InterProScan results can't be incorporated if PANTHER analysis is run, gene calls missing Jan 29, 2018
@brymerr921 brymerr921 changed the title InterProScan results can't be incorporated if PANTHER analysis is run, gene calls missing InterProScan results can't be incorporated if PANTHER analysis is run, gene_caller_id is missing Jan 29, 2018
@edgraham
Copy link
Owner

edgraham commented Jan 29, 2018 via email

@brymerr921
Copy link
Author

Thanks, this works great when PANTHER results are present!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants