-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
InterProScan results can't be incorporated if PANTHER analysis is run, gene_caller_id is missing #2
Comments
brymerr921
changed the title
InterProScan results can't be incorporated if PANTHER analysis is run
InterProScan results can't be incorporated if PANTHER analysis is run, gene calls missing
Jan 29, 2018
brymerr921
changed the title
InterProScan results can't be incorporated if PANTHER analysis is run, gene calls missing
InterProScan results can't be incorporated if PANTHER analysis is run, gene_caller_id is missing
Jan 29, 2018
Hello Bryan,
I believe I see where the issue is. I just updated the github version to
account for this (it was a small issue that I hadn't run into since I
wasn't using the Panther database!). Just pull the newest version down and
you should be good to go! If you have further issues let me know!
…--
Elaina
On Sun, Jan 28, 2018 at 10:14 PM, brymerr921 ***@***.***> wrote:
Hi,
I downloaded InterProScan from here (https://github.com/ebi-pf-
team/interproscan/wiki/HowToDownload) and downloaded the PANTHER dataset
as well (Step 2). I then used InterProScan to annotate several of my
genomes. However, while this parser does work on GhostKOALA annotations
alone, it does not work on my InterProScan anotations if the PANTHER files
were available to InterProScan. The command I used to run InterProScan is:
./interproscan.sh -cpu 16 -f tsv --goterms --iprlookup --pathways -i
protein-sequences.fa -o interproscan-results.txt
I also cloned this repository and am running the scripts inside a conda
environment (python 2.7) with pandas 0.22.0 and Biopython 1.70 also
installed. When I run KEGG-to-anvio, I get this error message:
KEGG-to-anvio --KeggDB KO_Orthology_ko00001.txt -i user_ko.txt -o KeggAnnotations-AnviImportable.txt --interproscan interproscan-results.txt
Traceback (most recent call last):
File "/home/bmerrill/miniconda3/envs/ghostkoala/bin/KEGG-to-anvio", line 41, in <module>
interpro = pd.read_table(arg_dict["interproscan"],header=None)
File "/home/bmerrill/miniconda3/envs/ghostkoala/lib/python2.7/site-packages/pandas/io/parsers.py", line 709, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/bmerrill/miniconda3/envs/ghostkoala/lib/python2.7/site-packages/pandas/io/parsers.py", line 455, in _read
data = parser.read(nrows)
File "/home/bmerrill/miniconda3/envs/ghostkoala/lib/python2.7/site-packages/pandas/io/parsers.py", line 1069, in read
ret = self._engine.read(nrows)
File "/home/bmerrill/miniconda3/envs/ghostkoala/lib/python2.7/site-packages/pandas/io/parsers.py", line 1839, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 902, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 924, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 978, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 965, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2208, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 11 fields in line 3, saw 15
I've attached the files I used for these commands (with PANTHER enabled)
that resulted in this error message.
protein-sequences.fa.txt
<https://github.com/edgraham/GhostKoalaParser/files/1672419/protein-sequences.fa.txt>
interproscan-results.txt
<https://github.com/edgraham/GhostKoalaParser/files/1672421/interproscan-results.txt>
user_ko.txt
<https://github.com/edgraham/GhostKoalaParser/files/1672422/user_ko.txt>
However, when InterProScan no longer has access to the PANTHER files, the
output file is able to be parsed by GhostKOALA and everything works as
expected:
KEGG-to-anvio --KeggDB KO_Orthology_ko00001.txt -i user_ko.txt -o KeggAnnotations-AnviImportable.txt --interproscan interproscan-results-nopanther.txt
InterProScan annotations with PANTHER disabled:
interproscan-results-nopanther.txt
<https://github.com/edgraham/GhostKoalaParser/files/1672433/interproscan-results-nopanther.txt>
Best,
Bryan
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AQXxo_I6rq6b4A4aVXHybQdhHxjRwCVRks5tPWG4gaJpZM4RwHCS>
.
|
Thanks, this works great when PANTHER results are present! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi,
I downloaded InterProScan from here (https://github.com/ebi-pf-team/interproscan/wiki/HowToDownload) and downloaded the PANTHER dataset as well (Step 2). I then used InterProScan to annotate several of my genomes. However, while this parser does work on GhostKOALA annotations alone, it does not work on my InterProScan anotations if the PANTHER files were available to InterProScan. The command I used to run InterProScan is:
./interproscan.sh -cpu 16 -f tsv --goterms --iprlookup --pathways -i protein-sequences.fa -o interproscan-results.txt
I also cloned this repository and am running the scripts inside a conda environment (python 2.7) with pandas 0.22.0 and Biopython 1.70 also installed. When I run KEGG-to-anvio, I get this error message:
I've attached the files I used for these commands (with PANTHER enabled) that resulted in this error message.
protein-sequences.fa.txt
interproscan-results.txt
user_ko.txt
However, when InterProScan no longer has access to the PANTHER files, the output file (KeggAnnotations-AnviImportable-nopanther.txt) is able to be parsed by GhostKOALA:
However, looking at
KeggAnnotations-AnviImportable-nopanther.txt
it appears my gene_caller_id column has no entry for all rows rows that have the source "KeggGhostKoala".InterProScan annotations with PANTHER disabled:
interproscan-results-nopanther.txt
Output of KEGG-to-anvio (using above command):
KeggAnnotations-AnviImportable-nopanther.txt
Do you have any suggestions for how to fix this? Thanks for the great parser, I'm excited to use it!
Best,
Bryan
The text was updated successfully, but these errors were encountered: