Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to process more than one mzML file at a time #356

Closed
shefalilathwal opened this issue Aug 29, 2018 · 9 comments
Closed

Unable to process more than one mzML file at a time #356

shefalilathwal opened this issue Aug 29, 2018 · 9 comments

Comments

@shefalilathwal
Copy link

shefalilathwal commented Aug 29, 2018

I am using MSnbase to read the MS data from multiple .mzML files. While the data is being read, I am unable to extract the mz and intensity values when I have an MSnobject of more than one file.

For example, in the following code, I get the output for rtime(raw_data), but the mz(raw_data) does not give any output and the code keeps running for hours with no errors or warnings. I just have to forcibly stop the run without any results.

filepath=c("test_file1.mzML","test_file2.mzML")
pd=data.frame(sample_name = sub(basename(filepath), pattern = ".mzML",
                                replacement = "", fixed = TRUE),
              sample_group = c(rep("A", 1)),stringsAsFactors = FALSE) 
raw_data <- readMSData(files = files_for_analysis, pdata = new("NAnnotatedDataFrame", pd),
                       mode = "onDisk",msLevel=1)
head(rtime(raw_data))
head(mz(raw_data))

However, if I use only one file at a time, the code runs.

filepath="test_file1.mzML"
pd=data.frame(sample_name = sub(basename(filepath), pattern = ".mzML",
                                replacement = "", fixed = TRUE),
              sample_group = c(rep("A", 1)),stringsAsFactors = FALSE) 
raw_data <- readMSData(files = files_for_analysis, pdata = new("NAnnotatedDataFrame", pd),
                       mode = "onDisk",msLevel=1)
head(rtime(raw_data)head(mz(raw_data))

In this case, I get the output in about 15 seconds as shown below. What is going on here and is there a way to get around it? It is desirable to be able to run multiple files together for downstream analysis (for RT alignment and peak grouping across samples)

F1.S0001 F1.S0002 F1.S0003 F1.S0004 F1.S0005 F1.S0006 
1.225701 2.564204 4.266830 5.618955 6.982832 8.241583
F1.S0001
  [1]   66.33385   66.33392   66.33400   66.33407   67.07291   67.07298   67.07305   67.07313
  [9]   67.07320   67.07327   67.07335   67.07343   67.07349   67.07354   67.07362   67.07369
 [17]   67.07376   70.32546   70.32554   70.32561   70.32570   70.32578   70.32586   70.32594
 [25]   70.32601   70.32610   70.32617   70.32625   70.32633   70.32641   70.41705   70.41713
 [33]   70.41721   70.41729   70.41737   70.41744   70.41753   70.41760   70.41769   70.41770
 [41]   70.41779   70.41786   70.41794   77.41087   77.41096   77.41105   77.41114   77.41123
 [49]   77.41132   77.41142   77.41151   77.41160   77.41167   77.41176   77.41185   77.41194
 [57]   80.45426   80.45436   80.45446   80.45455   80.45465   80.45475   80.45484   80.45494
 [65]   80.45504   80.45505   80.45512   80.45521   80.45531   94.32063   94.32075   94.32087
 [73]   94.32099   94.32111   94.32124   94.32136   94.32148   94.32161   94.32169   94.32182
 [81]   94.32194   94.32206   98.60455   98.60468   98.60481   98.60494   98.60507   98.60520
 [89]   98.60534   98.60547   98.60560   98.60572   98.60585   98.60598   98.60611   99.95339
 [97]   99.95353   99.95366   99.95380   99.95393   99.95406   99.95419   99.95433   99.95447
[105]   99.95456   99.95469   99.95483   99.95496  104.93644  104.93658  104.93673  104.93687
[113]  104.93702  104.93716  104.93731  104.93745  104.93760  104.93772  104.93786  104.93800
[121]  104.93815  107.52412  107.52428  107.52442  107.52457  107.52472  107.52487  107.52502
[129]  107.52517  107.52532  107.52539  107.52554  107.52569  107.52584  116.59905  116.59923
[137]  116.59940  116.59956  116.59973  116.59990  116.60007  116.60023  116.60041  116.60053
[145]  116.60071  116.60088  116.60104  121.29865  121.29884  121.29901  121.29919  121.29937
[153]  121.29955  121.29973  121.29991  121.30009  121.30019  121.30036  121.30054  121.30072
[161]  131.10820  131.10841  131.10861  131.10881  131.10901  131.10921  131.10942  131.10962
[169]  131.10982  131.10995  131.11015  131.11035  131.11055  139.17911  139.17934  139.17955
[177]  139.17978  139.17999  139.18021  139.18044  139.18065  139.18088  139.18112  139.18135
[185]  139.18156  139.18179  141.94484  141.94505  141.94528  141.94551  141.94574  141.94597
[193]  141.94620  141.94643  141.94666  141.94685  141.94707  141.94730  141.94753  145.53978
[201]  145.54001  145.54025  145.54048  145.54071  145.54095  145.54118  145.54143  145.54166
[209]  145.54189  145.54213  145.54236  145.54260  154.22588  154.22612  154.22638  154.22664
[217]  154.22690  154.22716  154.22742  154.22766  154.22792  154.22823  154.22849  154.22874
[225]  154.22900  165.85605  165.85634  165.85661  165.85690  165.85719  165.85748  165.85777
[233]  165.85805  165.85834  165.85861  165.85890  165.85919  165.85948  166.72433  166.72462
[241]  166.72491  166.72520  166.72549  166.72578  166.72607  166.72636  166.72665  166.72696
[249]  166.72725  166.72754  166.72783  175.08791  175.08823  175.08853  175.08884  175.08916
[257]  175.08946  175.08978  175.09009  175.09039  175.09070  175.09102  175.09132  175.09164
[265]  175.79558  175.79588  175.79620  175.79651  175.79683  175.79713  175.79745  175.79776
[273]  175.79808  175.79837  175.79868  175.79900  175.79930  179.77280  179.77312  179.77344
[281]  179.77376  179.77409  179.77441  179.77473  179.77505  179.77538  179.77570  179.77602
[289]  179.77635  179.77667  180.66933  180.66966  180.66998  180.67030  180.67064  180.67096
[297]  180.67128  180.67162  180.67194  180.67229  180.67262  180.67294  180.67326  201.98289
[305]  201.98328  201.98367  201.98405  201.98444  201.98482  201.98521  201.98560  201.98598
[313]  201.98642  201.98680  201.98718  201.98758  208.22681  208.22720  208.22762  208.22801
[321]  208.22842  208.22882  208.22922  208.22963  208.23003  208.23042  208.23083  208.23123
[329]  208.23164  209.09169  209.09209  209.09250  209.09291  209.09331  209.09372  209.09412
[337]  209.09453  209.09494  209.09534  209.09575  209.09615  209.09656  209.22647  209.22688
[345]  209.22728  209.22769  209.22810  209.22850  209.22891  209.22931  209.22972  209.23016
[353]  209.23058  209.23097  209.23138  220.45947  220.45992  220.46036  220.46080  220.46123
[361]  220.46167  220.46211  220.46255  220.46298  220.46342  220.46387  220.46431  220.46475
[369]  220.46518  220.46562  220.46606  220.46651  220.46695  220.46738  220.46782  220.46826
[377]  220.46870  220.46915  220.46957  220.47002  220.47046  220.47090  220.47136  220.47179
[385]  220.47223  220.47267  239.17918  239.17967  239.18018  239.18068  239.18117  239.18167
[393]  239.18216  239.18266  239.18315  239.18370  239.18419  239.18469  239.18518  253.84538
[401]  253.84593  253.84647  253.84702  253.84756  253.84810  253.84865  253.84918  253.84973
[409]  253.85027  253.85080  253.85135  253.85190  256.16595  256.16650  256.16705  256.16760
[417]  256.16815  256.16870  256.16925  256.16980  256.17035  256.17099  256.17154  256.17209
[425]  256.17264  264.37512  264.37570  264.37628  264.37686  264.37744  264.37802  264.37857
[433]  264.37915  264.37973  264.38046  264.38104  264.38159  264.38217  300.56500  300.56570
[441]  300.56641  300.56711  300.56781  300.56851  300.56921  300.56992  300.57062  300.57132
[449]  300.57199  300.57269  300.57339  300.57410  306.88086  306.88156  306.88229  306.88303
[457]  306.88373  306.88446  306.88519  306.88589  306.88663  306.88815  306.88885  306.88959
[465]  306.89032  497.50345  497.50491  497.50641  497.50790  497.50940  497.51089  497.51239
[473]  497.51385  497.51535  497.51724  497.51871  497.52020  497.52170  654.90851  654.91077
[481]  654.91302  654.91528  654.91754  654.91974  654.92200  654.92426  654.92651  654.92896
[489]  654.93121  654.93347  654.93573  778.57568  778.57861  778.58148  778.58441  778.58734
[497]  778.59027  778.59314  778.59607  778.59900  778.60193  778.60492  778.60785  778.61078
[505]  778.61371  846.42523  846.42853  846.43188  846.43518  846.43848  846.44177  846.44507
[513]  846.44836  846.45166  846.45502  846.45831  846.46161  846.46490  856.73773  856.74109
[521]  856.74445  856.74780  856.75116  856.75452  856.75793  856.76129  856.76465  856.76801
[529]  856.77136  856.77478  856.77814  875.44867  875.45215  875.45563  875.45911  875.46259
[537]  875.46606  875.46954  875.47302  875.47650  875.47998  875.48346  875.48694  875.49042
[545]  877.04309  877.04657  877.05005  877.05353  877.05707  877.06055  877.06403  877.06750
[553]  877.07098  877.07446  877.07794  877.08142  877.08496 1010.07391 1010.07825 1010.08252
[561] 1010.08685

$F1.S0002
   [1]  66.33383  66.33391  66.3 ...

The sessionInfo() for R-

R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server >= 2012 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] magrittr_1.5        pander_0.6.2        RColorBrewer_1.1-2  MSnbase_2.6.3      
 [5] ProtGenerics_1.12.0 BiocParallel_1.14.2 mzR_2.14.0          Rcpp_0.12.18       
 [9] Biobase_2.40.0      BiocGenerics_0.26.0

loaded via a namespace (and not attached):
 [1] pillar_1.3.0          compiler_3.5.0        BiocInstaller_1.30.0  plyr_1.8.4           
 [5] iterators_1.0.10      tools_3.5.0           zlibbioc_1.26.0       MALDIquant_1.18      
 [9] digest_0.6.16         tibble_1.4.2          preprocessCore_1.42.0 gtable_0.2.0         
[13] lattice_0.20-35       rlang_0.2.2           foreach_1.4.4         S4Vectors_0.18.3     
[17] IRanges_2.14.11       stats4_3.5.0          grid_3.5.0            impute_1.54.0        
[21] snow_0.4-2            XML_3.98-1.16         limma_3.36.3          ggplot2_3.0.0        
[25] scales_1.0.0          pcaMethods_1.72.0     codetools_0.2-15      MASS_7.3-50          
[29] mzID_1.18.0           colorspace_1.3-2      affy_1.58.0           lazyeval_0.2.1       
[33] munsell_0.5.0         doParallel_1.0.11     vsn_3.48.1            crayon_1.3.4         
[37] affyio_1.50.0 
@lgatto
Copy link
Owner

lgatto commented Aug 30, 2018

Thank you for your message, @shefalilathwal

Here's what I get when reproducing your use case with 3 mzML files:

> suppressPackageStartupMessages(library("MSnbase"))
> fls <- dir("~/Data2/Thermo_HELA_PRT/", full.names = TRUE, pattern = "mzML")
> fls
[1] "/home/lg390/Data2/Thermo_HELA_PRT//Thermo_Hela_PRTC_1.mzML"
[2] "/home/lg390/Data2/Thermo_HELA_PRT//Thermo_Hela_PRTC_2.mzML"
[3] "/home/lg390/Data2/Thermo_HELA_PRT//Thermo_Hela_PRTC_3.mzML"
> x <- readMSData(fls, mode = "onDisk", msLevel = 1)
> system.time(xrt <- rtime(x))
   user  system elapsed 
      0       0       0 
> head(xrt)
F1.S00001 F1.S00002 F1.S00003 F1.S00004 F1.S00005 F1.S00006 
0.3287012 0.7814142 1.0613962 1.3288742 1.5962302 1.8637102 
> system.time(xmz <- mz(x))
   user  system elapsed 
  4.314   2.900 132.740 
> head(xmz)
head(xmz)
$F1.S00001
  [1] 396.0173 396.0191 396.0210 396.0229 400.9157 400.9176 400.9195 400.9214
  [9] 400.9233 400.9252 400.9271 400.9290 400.9309 400.9329 400.9348 400.9367
 [17] 400.9386 402.1674 402.1694 402.1713 402.1732 402.1751 402.1770 402.1789
 [25] 402.1808 402.1827 402.1847 402.1866 402.1885 402.1904 410.8786 410.8806
 [33] 410.8826 410.8846 410.8865 410.8885 410.8905 410.8925 410.8944 410.8964
 [41] 410.8984 410.9004 410.9023 413.2510 413.2530 413.2550 413.2570 413.2590
 [49] 413.2610 413.2630 413.2650 413.2670 413.2690 413.2710 413.2729 413.2749
 [57] 413.2769 413.2789 413.2809 415.0198 415.0218 415.0238 415.0258 415.0278
 [65] 415.0298 415.0318 415.0338 415.0358 415.0378 415.0398 415.0418 415.0438
 [73] 415.0458 415.0478 415.0499 415.0519 416.0221 416.0241 416.0261 416.0281
 [81] 416.0301 416.0321 416.0342 416.0362 416.0382 416.0402 416.0422 416.0442
 [89] 416.0462 416.0482 416.0503 417.0704 417.0725 417.0745 417.0765 417.0785
 [97] 417.0805 417.0826 417.0846 417.0866
 [ reached getOption("max.print") -- omitted 24875 entries ]

$F1.S00002
  [1] 396.0197 396.0215 396.0234 396.0253 401.2001 401.2020 401.2039 401.2058
  [9] 401.2077 401.2096 401.2115 401.2134 401.2153 401.2172 401.2191 401.2210
 [17] 401.2229 401.2248 401.2267 401.8067 401.8087 401.8106 401.8125 401.8144
 [25] 401.8163 401.8182 401.8201 401.8220 401.8237 401.8257 401.8276 401.8295
 [33] 402.1640 402.1659 402.1678 402.1697 402.1716 402.1735 402.1754 402.1773
 [41] 402.1793 402.1812 402.1831 402.1850 402.1869 402.1890 402.1909 402.1928
 [49] 402.1948 402.6638 402.6657 402.6676 402.6695 402.6715 402.6734 402.6753
 [57] 402.6772 402.6791 402.6810 402.6830 402.6849 402.6868 403.1663 403.1682
 [65] 403.1701 403.1720 403.1740 403.1759 403.1778 403.1797 403.1816 403.1836
 [73] 403.1855 403.1874 403.1893 403.1912 403.1932 404.1452 404.1471 404.1491
 [81] 404.1510 404.1529 404.1548 404.1568 404.1587 404.1606 404.1625 404.1645
 [89] 404.1664 404.1683 404.1702 404.1722 404.1741 404.1760 404.1780 404.1799
 [97] 404.1818 404.1837 404.1857 404.1876
 [ reached getOption("max.print") -- omitted 15946 entries ]

$F1.S00003
  [1] 396.0217 396.0235 396.0254 396.0273 401.2021 401.2040 401.2059 401.2078
  [9] 401.2097 401.2116 401.2135 401.2154 401.2173 401.2192 401.2211 401.2230
 [17] 401.2249 401.2268 402.1643 402.1662 402.1681 402.1700 402.1719 402.1738
 [25] 402.1757 402.1777 402.1796 402.1815 402.1834 402.1853 402.1872 402.1891
 [33] 402.1910 402.1930 402.9515 402.9534 402.9553 402.9572 402.9591 402.9611
 [41] 402.9630 402.9649 402.9668 402.9687 402.9707 402.9726 402.9745 403.1683
 [49] 403.1702 403.1722 403.1741 403.1760 403.1779 403.1798 403.1818 403.1837
 [57] 403.1856 403.1875 403.1894 403.1914 403.1933 404.1453 404.1472 404.1492
 [65] 404.1511 404.1530 404.1549 404.1569 404.1588 404.1607 404.1626 404.1646
 [73] 404.1665 404.1684 404.1703 404.1723 405.1490 405.1510 405.1529 405.1549
 [81] 405.1568 405.1587 405.1607 405.1626 405.1645 405.1667 405.1686 405.1706
 [89] 405.1725 413.2499 413.2519 413.2539 413.2559 413.2579 413.2598 413.2618
 [97] 413.2638 413.2658 413.2678 413.2698
 [ reached getOption("max.print") -- omitted 10702 entries ]

$F1.S00004
  [1] 396.0221 396.0239 396.0258 396.0277 401.2025 401.2044 401.2063 401.2082
  [9] 401.2101 401.2120 401.2139 401.2158 401.2177 401.2196 401.2215 401.2234
 [17] 401.2253 402.1647 402.1666 402.1685 402.1704 402.1723 402.1742 402.1761
 [25] 402.1780 402.1800 402.1819 402.1838 402.1857 402.1876 402.1895 402.1914
 [33] 402.1934 403.1687 403.1706 403.1726 403.1745 403.1764 403.1783 403.1802
 [41] 403.1822 403.1841 403.1860 403.1879 403.1898 403.1917 403.1937 413.1664
 [49] 413.1684 413.1704 413.1724 413.1744 413.1764 413.1784 413.1803 413.1823
 [57] 413.1841 413.1861 413.1881 413.1901 413.2499 413.2519 413.2538 413.2558
 [65] 413.2578 413.2598 413.2618 413.2638 413.2658 413.2678 413.2698 413.2718
 [73] 413.2738 413.2760 413.2780 413.2799 413.2819 414.2559 414.2579 414.2599
 [81] 414.2619 414.2639 414.2659 414.2679 414.2699 414.2719 414.2739 414.2759
 [89] 414.2779 414.2799 415.0208 415.0228 415.0248 415.0268 415.0288 415.0308
 [97] 415.0328 415.0348 415.0368 415.0388
 [ reached getOption("max.print") -- omitted 10504 entries ]

$F1.S00005
  [1] 396.0212 396.0231 396.0249 396.0268 401.2035 401.2054 401.2073 401.2092
  [9] 401.2111 401.2130 401.2149 401.2168 401.2187 401.2207 401.2226 401.2245
 [17] 401.2264 402.1638 402.1657 402.1677 402.1696 402.1715 402.1734 402.1753
 [25] 402.1772 402.1791 402.1810 402.1830 402.1849 402.1868 402.1886 402.1906
 [33] 402.1925 402.1944 403.1678 403.1697 403.1717 403.1736 403.1755 403.1774
 [41] 403.1793 403.1813 403.1832 403.1851 403.1870 403.1889 403.1909 404.1448
 [49] 404.1467 404.1487 404.1506 404.1525 404.1544 404.1564 404.1583 404.1602
 [57] 404.1622 404.1641 404.1660 404.1679 404.1699 404.1718 404.1737 404.1756
 [65] 404.1776 404.1795 404.1814 404.1833 404.1853 404.1872 404.1891 408.2977
 [73] 408.2997 408.3017 408.3036 408.3056 408.3075 408.3095 408.3114 408.3134
 [81] 408.3154 408.3173 408.3193 408.3212 408.3232 411.1594 411.1613 411.1633
 [89] 411.1653 411.1673 411.1692 411.1712 411.1732 411.1752 411.1770 411.1789
 [97] 411.1809 411.1829 413.2490 413.2510
 [ reached getOption("max.print") -- omitted 12053 entries ]

$F1.S00006
  [1] 396.0219 396.0237 396.0256 396.0275 401.2023 401.2042 401.2061 401.2080
  [9] 401.2099 401.2118 401.2137 401.2156 401.2175 401.2194 401.2213 401.2232
 [17] 401.2251 402.1683 402.1702 402.1721 402.1740 402.1759 402.1778 402.1798
 [25] 402.1817 402.1836 402.1855 402.1874 402.1893 402.1912 402.1931 403.1685
 [33] 403.1704 403.1723 403.1743 403.1762 403.1781 403.1800 403.1819 403.1839
 [41] 403.1858 403.1877 403.1896 403.1915 403.7700 403.7720 403.7739 403.7758
 [49] 403.7777 403.7797 403.7816 403.7835 403.7854 403.7874 403.7893 403.7912
 [57] 403.7931 404.1474 404.1493 404.1513 404.1532 404.1551 404.1571 404.1590
 [65] 404.1609 404.1628 404.1648 404.1667 404.1686 404.1705 405.2885 405.2904
 [73] 405.2924 405.2943 405.2963 405.2982 405.3001 405.3021 405.3040 405.3059
 [81] 405.3079 405.3098 405.3117 408.2984 408.3004 408.3023 408.3043 408.3063
 [89] 408.3082 408.3102 408.3121 408.3141 408.3160 408.3180 408.3200 408.3219
 [97] 408.3239 411.1601 411.1620 411.1640
 [ reached getOption("max.print") -- omitted 10751 entries ]

with the following setup

> sessionInfo()
R version 3.5.1 Patched (2018-08-16 r75161)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/lapack/liblapack.so.3.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] MSnbase_2.7.4       ProtGenerics_1.12.0 BiocParallel_1.14.2
[4] mzR_2.15.2          Rcpp_0.12.18        Biobase_2.40.0     
[7] BiocGenerics_0.26.0

loaded via a namespace (and not attached):
 [1] msdata_0.20.0         BiocInstaller_1.30.0  pillar_1.3.0         
 [4] compiler_3.5.1        plyr_1.8.4            bindr_0.1.1          
 [7] iterators_1.0.10      zlibbioc_1.26.0       tools_3.5.1          
[10] digest_0.6.15         MALDIquant_1.18       tibble_1.4.2         
[13] preprocessCore_1.42.0 gtable_0.2.0          lattice_0.20-35      
[16] pkgconfig_2.0.2       rlang_0.2.2           foreach_1.4.4        
[19] bindrcpp_0.2.2        dplyr_0.7.6           IRanges_2.14.10      
[22] S4Vectors_0.18.3      stats4_3.5.1          grid_3.5.1           
[25] tidyselect_0.2.4      glue_1.3.0            impute_1.54.0        
[28] R6_2.2.2              XML_3.98-1.16         limma_3.36.2         
[31] ggplot2_3.0.0         purrr_0.2.5           magrittr_1.5         
[34] scales_1.0.0          pcaMethods_1.72.0     codetools_0.2-15     
[37] MASS_7.3-50           mzID_1.18.0           assertthat_0.2.0     
[40] colorspace_1.3-2      affy_1.58.0           doParallel_1.0.11    
[43] lazyeval_0.2.1        munsell_0.5.0         vsn_3.48.1           
[46] crayon_1.3.4          affyio_1.50.0        

Indeed, accessing mz values is considerably slower, because the data needs to be retrieved from disk, while the retention time is readily available in the feature data. The timing in my example is however much smaller that what you report.

The files I used are 1.2G each. The timings above are considerably shorter for smaller files. What sizes are your files? Do you access your data on a remote disk?

Also tagging @jotsetung, who regularly analyses tens or hundreds of files (for RT alignment and feature grouping). Jo, what's the size of the files you analyse?

@shefalilathwal
Copy link
Author

shefalilathwal commented Aug 30, 2018

@lgatto Thank you for responding so quickly to my message. My files are approx. 500MB each and they are saved on my local disk. The size of the mz list that I get for 2 files is around 670MB (see attached screenshot from R environment variable). Does that sound reasonable to you or is something off here? The files I am using are for polarity switching DDA in a ThermoFisher QExactive. I used mconvert to convert them to .mzML format and filtered them by mslevel 1 and single polarity before importing the data with MSnbase.
screen shot 2018-08-30 at 3 20 26 pm

@jorainer
Copy link
Collaborator

Could be that this is not at all related to the files, but the parallel processing setup. On Windows, R uses sock-based parallel processing (sometimes also on mac) and the main worker process has to start a new R instance and connect to it via sock. Sometimes this connection can not be established and the processes get stuck. To test this you can simply try to disable parallel processing by calling register(SerialParam()) before calling mz.

To avoid these deadlocks I usually initiate the parallel processing setup at the very beginning (after loading the libraries):

library(MSnbase)
library(doParallel)
registerDoParallel(3) # define number of parallel processes to be used
register(DoparParam(), default = TRUE)

## some code

@lgatto
Copy link
Owner

lgatto commented Sep 1, 2018

The files aren't that big (I fixed a typo in my earlier post - mine were 1.2 G, not 12G). @shefalilathwal, could you try to disable parallel processing, as suggested by @jotsetung. Also, you could share 2 files and I would try on my computer.

@shefalilathwal
Copy link
Author

@lgatto I tried what @jotsetung suggested and added the parallel processing setup at the beginning of the my R script and was able to run the files! So, it must have been the connection as @jotsetung suggested. Thank you so much for both of your help! You can close the issue :)

@lgatto
Copy link
Owner

lgatto commented Sep 3, 2018

Excellent, thank you for reporting back.

@lgatto lgatto closed this as completed Sep 3, 2018
@jorainer
Copy link
Collaborator

jorainer commented Sep 3, 2018

Eventually something that should be added to the vignette?

@lgatto
Copy link
Owner

lgatto commented Sep 3, 2018

Indeed. I won't have time today, but happy to (try to) do tomorrow. We could add a note in the Speed and memory requirements sub-section at the very beginning, in the Introduction.

@lgatto
Copy link
Owner

lgatto commented Sep 4, 2018

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants