Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow to process QC data #6

Closed
sean-ocall opened this issue Nov 23, 2018 · 12 comments
Closed

Slow to process QC data #6

sean-ocall opened this issue Nov 23, 2018 · 12 comments

Comments

@sean-ocall
Copy link

Hi guys,

I've been successfully using RawTools for QC for a few weeks now since the last release sorted an issue I previously had. Browsing the preprint it seems that the QC stuff should run quite quickly, but my files seem to take at least 2 hours per .raw file to produce an output.

Details:
Rawfile size: ~2.5Gb
MS Run Length: ~240mins
Run using Mono in ubuntu
16Gb RAM, i7-8550U processor.

Command used: mono RawTools.exe qc -d path_to_folder_with_rawfiles -q output_folder

Does this seem right? Or is there an issue here somewhere?

@sean-ocall sean-ocall changed the title Time to process QC data Slow to process QC data Nov 23, 2018
@kevinkovalchik
Copy link
Owner

kevinkovalchik commented Nov 23, 2018 via email

@sean-ocall
Copy link
Author

Hi Kevin, nothing on PRIDE - I work for a biotech company so no public data. I'll take a look and find something I can send your way.

@chrishuges
Copy link
Collaborator

chrishuges commented Nov 24, 2018

Hi Sean,

Are you able to tell us anything about the datafile itself? For example, what was the acquisition mode that was used? MS2 in the Orbitrap? Standard DDA run?

We might have some long runs that are similar to your analysis kicking around our lab that we can test it on (although 4 hours seems very very long).

Cheers,
Chris

@kevinkovalchik
Copy link
Owner

Hi Sean,

When processing your files, does the RAM usage approach the limits of the system? We noticed on our Linux server that RAM usage was in excess of 13 GB for a 3 GB file (as opposed to my laptop which used ~3 GB for the same file). This is probably because the server has a huge amount of RAM and Mono just didn't care about cleaning things up at that point, but I'd like to make sure it isn't something that is happening on some systems with more normal amounts of RAM.

Kevin

@kevinkovalchik
Copy link
Owner

Hi Sean,

I've done some optimization that is speeding up the processing of big files quite a bit! Previously we have just been running on a single thread because with the files we typically process things ran so fast we didn't bother trying to multi-thread. I've done some work implementing multi-threading into the most costly steps, and we're seeing speed improvements of 1.5 to 3-fold depending on the computer and the file.

I'll be releasing the multi-threaded version as a pre-release sometime in the next several days. I'll let you know here when it's available.

Kevin

@sean-ocall
Copy link
Author

sean-ocall commented Nov 27, 2018

Hi Guys,

Are you able to tell us anything about the datafile itself? For example, what was the acquisition mode that was used? MS2 in the Orbitrap? Standard DDA run?

Standard DDA, MS2 in orbitrap, Full scan.

When processing your files, does the RAM usage approach the limits of the system? We noticed on our Linux server that RAM usage was in excess of 13 GB for a 3 GB

This is something I have been watching out for, and I haven't seen RAM usage go over 4Gb, so I don't seem to have an issue with it.

I've done some optimization that is speeding up the processing of big files quite a bit! Previously we have just been running on a single thread

Great, I'll check this out and test it as soon as it appears.

Also, I ran quite a few QC Rawtools runs over the weekend, and with nothing else running on the computer I did about 54 runs in 60 or so hours, so things aren't quite as slow as I thought.

@chrishuges
Copy link
Collaborator

Hi Sean,

Thanks for the info. Just to add to Kevin's comments above.

We ended up testing the speed of QC with some raw files we downloaded from this study. They have some single-shot runs in there that are huge files (120-minute run with 282,799 MS2 scans - 3.0 gigabyte file, and a 180-minute run with 437,090 MS2 scans - 4.7 gigabyte file). Standard DDA with IT-MS2. These are big files that have a ton of MS2 events. RawTools is definitely slower with processing these files. RawTools 1.3.3 requires about 30 minutes for the 3 gig file and 1.5 hours for the 4.7 gig file. The new version of RawTools Kevin mentions above requires about 10 minutes for the 3 gig file, and 60 minutes for the 4.7 gig one on standard hardware. So good improvements.

Cheers,
Chris

@kevinkovalchik
Copy link
Owner

kevinkovalchik commented Dec 4, 2018

We just released v1.4.0-beta which should hopefully run faster for you! It is in pre-release, so please let us know if you have any issues with it and we'll try to sort them out.

@kevinkovalchik
Copy link
Owner

I noticed the output files from parse have different filename endings in the beta version. They are _parse.txt (or something similar) instead of _Matrix.txt. Just wanted you to be aware in case you are using those files in a workflow.

@kevinkovalchik
Copy link
Owner

Hi Sean,

Was the beta release able to process your files any quicker?

Thanks,
Kevin

@sean-ocall
Copy link
Author

Sorry forgot to have a go at this last week.

I've just tried a few files and they seem to be running in about 45 minutes as opposed to a couple of hours with the previous release. Seems like quite an improvement.

Thanks, Sean

@kevinkovalchik
Copy link
Owner

Good to know, thanks!

Kevin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants