-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SyRI crashing for incomplete assemblies #79
Comments
Hi Joaquim, I updated the comment here, so this should be OK. The issue here is that SyRI is designed for chromosome-level assemblies, so it is expected that it crashed for incomplete assemblies. You can use chroder or other homology based scaffolding methods to generate pseudo-chromosome level assemblies and then can use SyRI. Though, this might result in loss of SV information as they could be wrongly scaffolded, but I think in the absence of de novo chromosome-level assemblies they still provide quite a lot of information (check SyRI's manuscript). I hope this helps. Let me know if you have more questions. |
I actually tried to generate these pseudo-chromosome level assemblies and SyRI worked just fine, but I wanted to integrate the information from the gene and repeat annotation of the current reference into the posterior analyses, to find out if any relation with the generation of structural rearrangements. The out.ref.fasta and out.qry.fasta assemblies no longer have the same genomic coordinates as the original reference, and this becomes a big burden for the study of SV. In addition, if I have six assemblies to compare to the reference, the generation of pseudo-chromosome level assemblies is an independent process that is repeated six times, and resulting in six different out.ref.fasta assemblies that cannot be directly compared. If SyRI currently crashes with "incomplete assemblies", what is the --no-chrmatch argument exactly doing? I guess there is no "trick" to overcome this limitation, am I right? |
You can run You would still need to map the gene coords to the pseudo-chromosome level assembly.
|
Thanks for your answer, I will give it a try. |
Hi, sorry because my first message was very vague. Let me explain it a little more in detail. Firstly, I had no problem in reproducing the example analysis. It just worked as expected. Then, I tried to run the protocol on my own genomes. I have different genome assemblies that I want to compare to the current reference, but none of these is at the chromosome level and have varying numbers of scaffolds. I aligned each of my queries independently to the reference using minimap2, and then tried to call the SR using SyRI, with the following parameters:
./syri/bin/syri -c path/to/sam -r path/to/reference -q path/to/query -F S -k -f --no-chrmatch
SyRI started running, it reported that the reference and the query have different number of scaffolds and some were not aligned, and continued to run until it crashed. The error message as follows:
SAM reader - WARNING - A1_scaffold0151 do not align with any reference sequence and cannot be analysed. Remove all unplaced scaffolds and contigs from the assemblies.
Reading Coords - WARNING - Chromosomes IDs do not match.
Reading Coords - WARNING - --no-chrmatch is set. Not matching chromosomes automatically.
Reading Coords - WARNING - BDIQ01000126.1, BDIQ01000179.1, BDIQ01000057.1, BDIQ01000041.1, BDIQ01000183.1, BDIQ01000086.1, BDIQ01000074.1, BDIQ01000107.1, BDIQ01000051.1, BDIQ01000167.1, BDIQ01000205.1, BDIQ01000152.1, BDIQ01000040.1, BDIQ01000030.1, BDIQ01000163.1, BDIQ01000135.1, BDIQ01000080.1, BDIQ01000117.1, BDIQ01000116.1, BDIQ01000088.1, BDIQ01000136.1, BDIQ01000169.1, BDIQ01000038.1, BDIQ01000055.1, BDIQ01000171.1, BDIQ01000160.1, BDIQ01000028.1, BDIQ01000193.1, BDIQ01000067.1, BDIQ01000191.1, BDIQ01000132.1, BDIQ01000014.1, BDIQ01000130.1, BDIQ01000121.1, BDIQ01000144.1, BDIQ01000024.1, BDIQ01000134.1, BDIQ01000139.1, BDIQ01000185.1, BDIQ01000058.1, BDIQ01000129.1, BDIQ01000032.1, BDIQ01000123.1, BDIQ01000063.1, BDIQ01000166.1, BDIQ01000035.1, BDIQ01000190.1, BDIQ01000102.1, BDIQ01000095.1, BDIQ01000076.1, BDIQ01000100.1, BDIQ01000125.1, BDIQ01000112.1, BDIQ01000006.1, BDIQ01000174.1, BDIQ01000178.1, BDIQ01000085.1, BDIQ01000011.1, BDIQ01000090.1, BDIQ01000151.1, BDIQ01000066.1, BDIQ01000108.1, BDIQ01000013.1, BDIQ01000017.1, BDIQ01000061.1, BDIQ01000098.1, BDIQ01000031.1, BDIQ01000198.1, BDIQ01000075.1, BDIQ01000137.1, BDIQ01000048.1, BDIQ01000156.1, BDIQ01000147.1, BDIQ01000127.1, BDIQ01000054.1, BDIQ01000164.1, BDIQ01000060.1, BDIQ01000081.1, BDIQ01000170.1, BDIQ01000012.1, BDIQ01000010.1, BDIQ01000068.1, BDIQ01000165.1, BDIQ01000184.1, BDIQ01000016.1, BDIQ01000050.1, BDIQ01000131.1, BDIQ01000077.1, BDIQ01000020.1, BDIQ01000042.1, BDIQ01000140.1, BDIQ01000158.1, BDIQ01000097.1, BDIQ01000168.1, BDIQ01000180.1, BDIQ01000105.1, BDIQ01000138.1, BDIQ01000022.1, BDIQ01000089.1, BDIQ01000122.1, BDIQ01000197.1, BDIQ01000161.1, BDIQ01000096.1, BDIQ01000047.1, BDIQ01000146.1, BDIQ01000128.1, BDIQ01000201.1, BDIQ01000001.1, BDIQ01000007.1, BDIQ01000194.1, BDIQ01000154.1, BDIQ01000091.1, BDIQ01000188.1, BDIQ01000120.1, BDIQ01000149.1, BDIQ01000083.1, BDIQ01000109.1, BDIQ01000079.1, BDIQ01000043.1, BDIQ01000143.1, BDIQ01000059.1, BDIQ01000114.1, BDIQ01000070.1, BDIQ01000118.1, BDIQ01000056.1, BDIQ01000033.1, BDIQ01000115.1, BDIQ01000025.1, BDIQ01000052.1, BDIQ01000093.1, BDIQ01000141.1, BDIQ01000199.1, BDIQ01000133.1, BDIQ01000065.1, BDIQ01000071.1, BDIQ01000195.1, BDIQ01000053.1, BDIQ01000039.1, BDIQ01000177.1, BDIQ01000073.1, BDIQ01000162.1, BDIQ01000192.1, BDIQ01000082.1, BDIQ01000034.1, BDIQ01000159.1, BDIQ01000101.1, BDIQ01000106.1, BDIQ01000157.1, BDIQ01000046.1, BDIQ01000145.1, BDIQ01000124.1, BDIQ01000111.1, BDIQ01000148.1, BDIQ01000104.1, BDIQ01000153.1, BDIQ01000150.1, BDIQ01000155.1, BDIQ01000002.1, BDIQ01000187.1, BDIQ01000196.1, BDIQ01000062.1, BDIQ01000078.1, BDIQ01000173.1, BDIQ01000186.1, BDIQ01000110.1, BDIQ01000027.1, BDIQ01000182.1, BDIQ01000021.1, BDIQ01000044.1, BDIQ01000172.1, BDIQ01000023.1, BDIQ01000064.1, BDIQ01000092.1, A1_scaffold0145, A1_scaffold0028, A1_scaffold0004, A1_scaffold0065, A1_scaffold0097, A1_scaffold0096, A1_scaffold0012, A1_scaffold0025, A1_scaffold0136, A1_scaffold0040, A1_scaffold0102, A1_scaffold0044, A1_scaffold0049, A1_scaffold0130, A1_scaffold0123, A1_scaffold0019, A1_scaffold0138, A1_scaffold0017, A1_scaffold0135, A1_scaffold0053, A1_scaffold0127, A1_scaffold0092, A1_scaffold0034, A1_scaffold0041, A1_scaffold0036, A1_scaffold0003, A1_scaffold0133, A1_scaffold0117, A1_scaffold0108, A1_scaffold0061, A1_scaffold0057, A1_scaffold0113, A1_scaffold0119, A1_scaffold0099, A1_scaffold0089, A1_scaffold0093, A1_scaffold0079, A1_scaffold0144, A1_scaffold0018, A1_scaffold0005, A1_scaffold0002, A1_scaffold0048, A1_scaffold0128, A1_scaffold0142, A1_scaffold0084, A1_scaffold0087, A1_scaffold0116, A1_scaffold0141, A1_scaffold0082, A1_scaffold0022, A1_scaffold0043, A1_scaffold0148, A1_scaffold0132, A1_scaffold0143, A1_scaffold0029, A1_scaffold0023, A1_scaffold0088, A1_scaffold0106, A1_scaffold0075, A1_scaffold0045, A1_scaffold0147, A1_scaffold0068, A1_scaffold0067, A1_scaffold0105, A1_scaffold0059, A1_scaffold0125, A1_scaffold0046, A1_scaffold0121, A1_scaffold0140, A1_scaffold0115, A1_scaffold0101, A1_scaffold0078, A1_scaffold0033, A1_scaffold0024, A1_scaffold0085, A1_scaffold0052, A1_scaffold0009, A1_scaffold0026, A1_scaffold0006, A1_scaffold0081, A1_scaffold0071, A1_scaffold0076, A1_scaffold0015, A1_scaffold0124, A1_scaffold0030, A1_scaffold0062, A1_scaffold0011, A1_scaffold0060, A1_scaffold0031, A1_scaffold0055, A1_scaffold0047, A1_scaffold0080, A1_scaffold0131, A1_scaffold0070, A1_scaffold0035, A1_scaffold0074, A1_scaffold0064, A1_scaffold0146, A1_scaffold0014, A1_scaffold0066, A1_scaffold0137, A1_scaffold0069, A1_scaffold0016, A1_scaffold0104, A1_scaffold0122, A1_scaffold0110, A1_scaffold0008, A1_scaffold0126, A1_scaffold0086, A1_scaffold0032, A1_scaffold0027, A1_scaffold0129, A1_scaffold0073, A1_scaffold0109, A1_scaffold0098, A1_scaffold0063, A1_scaffold0090, A1_scaffold0149, A1_scaffold0037, A1_scaffold0042, A1_scaffold0058, A1_scaffold0077, A1_scaffold0020, A1_scaffold0118, A1_scaffold0054, A1_scaffold0111, A1_scaffold0095, A1_scaffold0094, A1_scaffold0001, A1_scaffold0100, A1_scaffold0091, A1_scaffold0007, A1_scaffold0072, A1_scaffold0021, A1_scaffold0050, A1_scaffold0120, A1_scaffold0039, A1_scaffold0083, A1_scaffold0038, A1_scaffold0150, A1_scaffold0010, A1_scaffold0051, A1_scaffold0107, A1_scaffold0134, A1_scaffold0013, A1_scaffold0103, A1_scaffold0114, A1_scaffold0056, A1_scaffold0112, A1_scaffold0139 present in only one genome. Removing corresponding alignments
Traceback (most recent call last):
File "/scratch/jcruzcor/04_SyRI_analysis/syri/syri/bin/syri", line 250, in
startSyri(args, coords[["aStart", "aEnd", "bStart", "bEnd", "aLen", "bLen", "iden", "aDir", "bDir", "aChr", "bChr"]])
File "syri/pyxFiles/synsearchFunctions.pyx", line 467, in syri.pyxFiles.synsearchFunctions.startSyri
File "syri/pyxFiles/synsearchFunctions.pyx", line 860, in syri.pyxFiles.synsearchFunctions.outSyn
File "/scratch/jcruzcor/trial_minimap/SYRI/lib/python3.5/site-packages/pandas/core/generic.py", line 4389, in setattr
return object.setattr(self, name, value)
File "pandas/_libs/properties.pyx", line 69, in pandas._libs.properties.AxisProperty.set
File "/scratch/jcruzcor/trial_minimap/SYRI/lib/python3.5/site-packages/pandas/core/generic.py", line 646, in _set_axis
self._data.set_axis(axis, labels)
File "/scratch/jcruzcor/trial_minimap/SYRI/lib/python3.5/site-packages/pandas/core/internals.py", line 3323, in set_axis
'values have {new} elements'.format(old=old_len, new=new_len))
ValueError: Length mismatch: Expected axis has 0 elements, new values have 7 elements
Originally posted by @Jquimcrz in #42 (comment)
The text was updated successfully, but these errors were encountered: