Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formula decomposition of higher masses #27

Closed
nirshahaf opened this issue Feb 24, 2021 · 3 comments
Closed

Formula decomposition of higher masses #27

nirshahaf opened this issue Feb 24, 2021 · 3 comments

Comments

@nirshahaf
Copy link

Sirius team,

Apart from the blocking issues which seem (independently) be due to connection problems with the CSI server, I have a very slow convergence of the algorithm when running on .ms file of a larger compound of m/z=1388 corresponding with [M+NH4]+.
The running parameters where optimized for this situation - by reducing the PPM threshold and reserving sufficient resources - nevertheless the Sirius command line took roughly a day(!) to conclude, see:

sirius -i 015.ms -o ./Sirius/015 --cores=12 --ignore-formula formula --ppm-max=5 --ppm-max-ms2=15 -e='CHNOPSCl[3]Br[2]Na' --candidates=10 --ilp-solver='GUROBI'

The input spectra below, have you any idea? Maybe related to the Java run time parameters? Or to the large number of putative fragments (originating from DIA method)??

compound NP-008069
formula C61H94O34
parentmass 1388.59904067667
ionization [M+NH4]+

ms1
1371.57494392328 1025.9755859375
1372.57849610784 663.23046875
1373.56091308594 133.44140625
1373.58071729741 333.5439453125
1374.58309329047 97.9529418945312
1375.5654840212 39.0334777832031
1375.61694335938 30.9311218261719
1388.59904067667 1393.7451171875
1389.60537001044 919.021484375
1390.61263799646 394.663330078125
1391.02517184432 17.2550659179688
1391.61883311232 93.4871826171875
1391.64147949219 74.007080078125
1392.65447228716 23.4032440185547

ms2
83.0464075469008 245.248413085938
84.0492461711597 12.1519012451172
105.068158640417 32.4050598144531
111.042515340518 1430.7685546875
111.407310723258 8.61387634277344
111.443288110774 10.126579284668
111.49466964541 6.84843826293945
111.56425560776 8.10126495361328
111.654724121821 8.10126495361328
111.794356094135 6.07595062255859
111.880957345395 11.1485443115234
111.954912147972 15.1898727416992
112.047352787194 123.935729980469
112.483057235946 8.10126495361328
113.04924080486 18.2278442382812
133.101032172488 41.5189819335938
147.11757193408 53.6708984375
153.053653994456 2304.45703125
153.388392130322 9.11392211914062
153.516252004507 12.1139068603516
153.545457032096 8.71035003662109
153.593770181811 15.3144912719727
153.64470464468 19.5127868652344
153.704257084699 13.1645584106445
153.728306154381 7.65116882324219
153.769238996538 11.1392440795898
153.832549266069 10.6513290405273
153.871194687431 15.4863815307617
153.93441710211 12.716682434082
153.967132805929 9.68875122070312
154.059091006073 249.80126953125
154.119863921841 12.1519012451172
154.167065513166 8.40129089355469
154.204067041253 10.1253128051758
154.238483539865 8.60770416259766
154.287281503401 8.63239288330078
154.354295312583 9.11392211914062
154.442366571632 12.802001953125
154.458579268121 13.6596527099609
154.516618876556 7.53665161132812
154.597967058099 9.56630706787109
154.741877511563 7.60279083251953
154.867932118902 8.80154418945312
155.057186257394 33.417724609375
161.132132245247 39.7950744628906
171.066157321442 184.842407226562
172.068898882286 15.1898727416992
203.179434771839 439.214111328125
204.185131411275 77.8417358398438
205.195457121799 71.0940551757812
206.199091397854 13.1645584106445
213.076358931849 413.51806640625
214.081604128092 61.6261596679688
221.19191080542 56.7674255371094
225.150462624481 28.5277709960938
253.145769250348 30.3797454833984
271.017300561015 8.10126495361328
273.097346889741 1529.84765625
273.915130615234 16.1784820556641
274.104743725598 241.557006835938
274.313159180022 8.10126495361328
274.532276354022 8.10126495361328
274.801666259482 13.1645584106445
275.104747385923 55.3226013183594
275.667171499961 8.93072509765625
276.108852648378 7.22447967529297
327.01580132435 10.126579284668
335.223267414442 40.77490234375
375.12562966097 16.0544128417969
387.044061882768 31.8549346923828
388.041061401409 7.08860778808594
389.049219385066 13.3026580810547
390.044051096872 11.0856628417969
391.037421516486 13.8551330566406
429.045277913892 19.2405090332031
435.149311018644 180.959106445312
436.155138304358 42.5316467285156
437.156342045052 20.2531585693359
447.062735513126 41.5189819335938
448.059813189891 15.1898727416992
449.059782854491 33.7029724121094
451.053728262932 10.126579284668
459.063733139018 31.2515411376953
461.053361060316 10.3263168334961
489.076601895619 22.2784881591797
507.073951707729 14.4237823486328
513.172417463455 14.4203948974609
567.197453937355 62.7848205566406
568.195965237054 16.2025299072266
585.196779097862 16.6740417480469
637.323992485113 67.8480834960938
638.326947745576 33.417724609375
639.126516966879 73.3114013671875
641.128830943476 37.8892211914062
641.152369988391 32.8050537109375
647.219895535169 44.4820861816406
647.724347795395 13.1645584106445
648.213351767236 14.8246917724609
655.333107352198 220.65283203125
656.334681209509 95.7450561523438
677.236541863835 16.2025299072266
705.256645257799 22.1676177978516
707.238495741311 552.73828125
708.244755768619 203.0302734375
709.247389400702 114.836059570312
710.254508101981 26.5077514648438
712.293399741442 24.42236328125
713.244267236952 65.3881225585938
713.739835267102 55.22412109375
714.247355511125 82.9747314453125
714.746865060184 60.5936584472656
715.241382631612 49.7239685058594
715.734835319859 46.4164428710938
716.232969030739 10.9199066162109
716.753662479995 12.1519012451172
717.251519725418 38.7474060058594
727.313944260083 11.7802963256836
769.381899986102 25.3446044921875
787.373217106504 70.6663208007812
788.387105843535 32.4050598144531
805.408736998906 13.9015274047852
833.280428532967 24.7292022705078
839.285756703998 37.0708618164062
840.282628786684 15.9805755615234
841.288116921971 42.3271179199219
842.293898423669 16.7073364257812
843.318412584578 28.8607635498047
869.296843485573 453.271728515625
870.298953955147 186.107055664062
871.310912499797 67.4124145507812
872.321418289423 24.3453369140625
887.298386765163 27.2509918212891
909.414671463409 25.3164520263672
919.424468858016 16.1927032470703
927.423522949219 35.0611572265625
927.437877308387 40.5063171386719
928.426100877904 23.3436737060547
1001.33264226971 40.2743530273438
1020.35634155276 16.2025299072266
1071.47793216169 15.6688461303711
1072.48083496094 20.0883941650391
1089.48340305908 153.455444335938
1090.479354639 93.2432861328125
1091.4862340367 40.8655395507812
1106.44499180906 12.8054504394531
1107.48572357247 76.3225708007812
1108.4899059212 55.7180480957031
1124.52558032729 13.0770568847656
1160.38052034597 39.3897094726562
1160.39770507812 37.4783630371094
1161.39329244002 26.2115325927734
1221.51934349623 61.9869384765625
1222.29489839963 15.1898727416992
1222.49035644531 30.8860778808594
1222.53234863281 32.4050598144531
1223.54123535173 14.1772155761719
1239.5293695344 90.3782958984375
1240.53419873005 66.0807495117188
1241.52443343886 29.3670959472656
1256.55279828367 24.3453826904297
1292.43559755503 54.3342895507812
1293.45704348615 59.1096496582031
1294.4746067796 28.8607635498047
1332.48965731506 29.3670959472656
1332.53833007812 21.2658233642578
1333.01085486806 21.2658233642578
1333.50175548727 22.2784881591797
1333.9893951729 14.3023986816406
1353.57559841698 19.1224975585938
1354.55534323493 14.1149215698242
1371.57709521761 82.7237548828125
1372.58424410134 50.5753784179688
1373.57772084761 20.9331665039062

@kaibioinfo
Copy link
Contributor

Hi,

I think compounds above 1000 Da (and with so many high mass fragments) are too much for the ILP. I would suggest that we compute high mass compounds with the heuristic. Yes, computing the exact solution is nice, but it is ridiculous that 99% of the running time of one analysis is spent on a few high mass compounds :/

@nirshahaf
Copy link
Author

Hi Kai,

I'm not sure about the mentioned heuristic, but I found a surprising outcome when testing in the NI mode (full spectra below).
The NI mode has less fragments in this case and I expected Sirius to conclude much faster - however it hasn't and after roughly
an hour of calculating I stopped it and looked at the generated spectra and trees - where I did not find the correct formula (N=10).
I then truncated the original .ms file into to versions: one with just the two highest mass fragments (+isotopes) and another with the remaining two lower mass fragments (one without detected isotopes). I then run Sirius using the same parameters on each truncated input and was glad to notice that in both cases it completed normally within a couple of minutes AND with the correct formula identified either in rank 1 (the lower mass fragments) or rank 2 (the higher mass fragments). I therefore think that there is something funky going on and that you might want to add some heuristic which for the >1000 or >900 Da compounds would truncate the MS2 peaks in a more rational way than I did and leave just sufficient data for the algorithm to converge efficiently.

Here is the full input .ms file (same compound as in the first thread):

compound NP-008069
formula C61H94O34
parentmass 1415.56745241152
ionization [M+FA-H]-

ms1
1369.5620664683 2609.126953125
1370.10416027068 18.6966857910156
1370.56446981945 1801.8505859375
1371.56618519547 844.20751953125
1372.08129882812 21.1581573486328
1372.57047210081 294.842041015625
1373.27502441406 19.445068359375
1373.5701012032 100.971496582031
1373.60245745325 42.7364807128906
1374.27172851562 22.0960540771484
1374.55733493081 31.4545288085938
1374.89514160156 22.106689453125
1375.24062502631 16.9260559082031
1375.55367487289 14.6835479736328
1405.54140239552 70.5264282226562
1406.53388919922 42.8983764648438
1407.52470046546 41.3442077636719
1408.56120447965 28.3544311523438
1415.56745241152 2769.6484375
1416.56816971123 1913.974609375
1417.57561788284 903.10400390625
1418.23605588669 21.3117828369141
1418.57918878837 339.12109375
1418.92863923066 24.0499725341797
1419.56505776432 50.0323791503906
1419.58984662733 113.657287597656
1419.95555610225 24.2431182861328

ms2
723.178848771134 44.1402587890625
911.499147031881 588.01171875
912.502626093085 265.8583984375
913.513163759151 99.551025390625
914.502358631551 29.5118255615234
1140.4976410677 11.6455688476562
1311.52404785156 84.174560546875
1313.58103594598 16.0462646484375
1328.54187990961 113.484313964844
1329.55023359777 59.5953063964844
1330.58057369567 21.9123382568359
1371.18811035156 81.0126342773438
1372.12414550781 40.5063171386719
1372.56727934847 850.6328125
1372.87884447927 81.0126342773438
1373.29724121094 27.4158782958984
1373.57458496094 171.71630859375
1373.89599609375 31.0443572998047
1374.11962890625 27.3267669677734
1374.58142089844 49.1774597167969
1375.13537597656 18.7820587158203
1375.31176757812 17.7598571777344
1376.44311523438 18.0995178222656
1377.12356556204 14.66064453125
1391.51037597656 23.0181121826172

@nirshahaf
Copy link
Author

BTW,

If you need to retrain some model on higher mass compounds, I can generate 275 spectra of chemical standards with a mass range between 1001 and 1964 Da. I've taken Sebastian's comment from a few years ago and have increased the sensitivity of the peak extraction method - i.e. there are more low-mass putative fragments now.

mfleisch pushed a commit that referenced this issue Mar 8, 2024
Resolve "Transform GUI to Nightsky API"

Closes #81, #123, #87, #84, #83, #60, #44, #42, #27, #16, #13, and #11

See merge request bright-giant/sirius/sirius-frontend!26
mfleisch pushed a commit that referenced this issue Aug 2, 2024
…mono-repo' into 'master'

Resolve "Merge sirius-libs into sirius-frontend to create a mono repo."

Closes #300, #132, #126, #123, #128, #125, #127, #124, #122, #121, #120, #109, #115, #106, #99, #112, #111, #61, #110, #103, #108, #107, #104, #221, #102, #98, #198, #92, #94, #93, #85, #84, #87, #76, #56, #73, #32, #70, #65, #43, #63, #33, #53, #58, #59, #57, #54, #52, #41, #47, #50, #49, #48, #39, #46, #37, #38, #27, #34, #22, #31, #28, #18, #23, #21, #19, #9, #240, #280, #254, #242, #246, #238, #156, #25, #155, #189, #172, #154, #11, #6, #16, and #4

See merge request bright-giant/sirius/sirius-frontend!106
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants