Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues reading SAM file #20

Closed
jhawkey opened this issue Dec 20, 2019 · 6 comments
Closed

Issues reading SAM file #20

jhawkey opened this issue Dec 20, 2019 · 6 comments
Assignees

Comments

@jhawkey
Copy link

jhawkey commented Dec 20, 2019

Hi,

I'm currently attempting to test out syri but I'm having issues using a SAM file as input.

I've used minimap2 to align my two fasta files:
minimap2 -ax asm5 MINF_9D.fasta MSB1_6J.fasta > MINF_9D_vs_MSB1_6J_minimap.sam

I then pass this sam file to syri:
./syri/syri/bin/syri -c MINF_9D_vs_MSB1_6J_minimap.sam -r MINF_9D.fasta -q MSB1_6J.fasta -F S

However I get this error:

syri - WARNING - starting
Reading Coords - ERROR - Error in reading the SAM file

I can't see anything in the log file that might help me work out what's going on - I set the log level to debug, and this is what's inside the log file:

2019-12-20 11:43:15,324 - syri - WARNING - <module>:115 - starting
2019-12-20 11:43:15,324 - syri - DEBUG - <module>:115 - memory usage: 0.07046127319335938
2019-12-20 11:43:15,325 - Reading Coords - DEBUG - <module>:115 - S
2019-12-20 11:43:15,325 - Reading Coords - INFO - <module>:115 - Reading input from .tsv file
2019-12-20 11:43:15,344 - Reading Coords - ERROR - <module>:115 - Error in reading the SAM file

I'm attaching my sam file here, in case there's something in it that's preventing syri from reading it?
MINF_9D_vs_MSB1_6J_minimap.sam.txt

Any help you could give would be greatly appreciated!

@mnshgl0110
Copy link
Member

Hi. SyRI requires the CIGAR strings to have '=' for match and 'X' for mismatch. While running minimap2, please use --eqx parameter (https://lh3.github.io/minimap2/minimap2.html). That should generate the SAM file in the required format.

@jhawkey
Copy link
Author

jhawkey commented Dec 20, 2019

Thanks! I clearly missed this when looking at the working example page in the documentation.

I can now get syri to run (using the same command as above), but it's now apparently having issues with chromosome IDs (see error below).

I'm also attaching the log file in case that is helpful.
syri.log

./syri/syri/bin/syri -c MINF_9D_vs_MSB1_6J_minimap.sam -r MINF_9D.fasta -q MSB1_6J.fasta -F S --log DEBUG
syri - WARNING - starting
Reading Coords - WARNING - Chromosomes IDs do not match.
Reading Coords - WARNING - Matching them automatically. For each reference genome, most similar query genome will be selected. Check mapids.txt for mapping used.
('invOut.txt', ' is empty. Skipping analysing it.')
('TLOut.txt', ' is empty. Skipping analysing it.')
ctxOut.txt is empty. Skipping analysing it.
('inv', 'Out.txt is empty. Skipping analysing it.')
('TL', 'Out.txt is empty. Skipping analysing it.')
('dup', 'Out.txt is empty. Skipping analysing it.')
('invDup', 'Out.txt is empty. Skipping analysing it.')
ctxOut.txt is empty. Skipping analysing it.
Reading Coords - WARNING - Chromosomes IDs do not match.
Reading Coords - WARNING - Matching them automatically. For each reference genome, most similar query genome will be selected. Check mapids.txt for mapping used.
('invOut.txt', ' is empty. Skipping analysing it.')
('TLOut.txt', ' is empty. Skipping analysing it.')
ctxOut.txt is empty. Skipping analysing it.
Traceback (most recent call last):
  File "/Users/jane/miniconda3/envs/syri/lib/python3.5/site-packages/pandas/core/indexes/base.py", line 3078, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'id'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./syri/syri/bin/syri", line 179, in <module>
    getTSV(args.dir, args.prefix, args.ref.name)
  File "syri/pyxFiles/writeout.pyx", line 171, in syri.writeout.getTSV
  File "/Users/jane/miniconda3/envs/syri/lib/python3.5/site-packages/pandas/core/frame.py", line 2688, in __getitem__
    return self._getitem_column(key)
  File "/Users/jane/miniconda3/envs/syri/lib/python3.5/site-packages/pandas/core/frame.py", line 2695, in _getitem_column
    return self._get_item_cache(key)
  File "/Users/jane/miniconda3/envs/syri/lib/python3.5/site-packages/pandas/core/generic.py", line 2489, in _get_item_cache
    values = self._data.get(item)
  File "/Users/jane/miniconda3/envs/syri/lib/python3.5/site-packages/pandas/core/internals.py", line 4115, in get
    loc = self.items.get_loc(item)
  File "/Users/jane/miniconda3/envs/syri/lib/python3.5/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'id'

Thanks!

@mnshgl0110
Copy link
Member

Can you please also share the corrected SAM file? Thanks

@mnshgl0110 mnshgl0110 self-assigned this Dec 20, 2019
@jhawkey
Copy link
Author

jhawkey commented Dec 20, 2019

MINF_9D_vs_MSB1_6J_minimap.sam.txt
Yes, here it is!

@mnshgl0110
Copy link
Member

mnshgl0110 commented Dec 20, 2019

Hi Jane, thanks for sharing the file. There were some uncaught exceptions which were causing issues with your data ('genomes' without structural variations were not expected). I have modified the script and it should finish properly without complaining about it. Please download and re-install SyRI, and rerun the analysis.

@jhawkey
Copy link
Author

jhawkey commented Jan 7, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants