New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handling data from different batches and experiments #344

Open
LeZhengThu opened this Issue Dec 20, 2018 · 1 comment

Comments

Projects
None yet
2 participants
@LeZhengThu
Copy link

LeZhengThu commented Dec 20, 2018

Hi community, I don't know how to handle data from different experiments using xcms.
For example, I have 10000 samples from a large cohort A analyzing disease A, and my machines can only do 200 samples a day. Therefore, it will take 50 days to finish running all the samples through mass spec machines. How can I do alignment and correspondence across the batches?
After a year I want to analyze disease B which I can't collect enough controls. So I want to use the samples in cohort A without having disease B as the controls. How can I do alignment and correspondence in this scenario?
Any suggestion or tutorial or example code is appreciated. Thanks in advance.

@jotsetung

This comment has been minimized.

Copy link
Collaborator

jotsetung commented Dec 20, 2018

That's a good question. I would make sure to run several QC samples per measurement run, i.e. a pool of samples measured after every e.g. 8 injections. Just make sure that you use the same QC sample for your whole experiment. You could then even use these to estimate the retention time shifts in the alignment of your full experiment. For correspondence - well after proper alignment that should work out of the box (only you will have to make decision on what to define as a feature, i.e. in how many samples does a peak have to be present to be considered to be grouped into a feature).

It will be tricky to do any comparison between a data set measured this year and any future data sets. If you use however the same QC samples also for the second data set it should be possible to do an alignment.

Apart from alignment and correspondence I would be more concerned about systematic differences in the signal that you measure. You should ensure that you apply a proper normalization to adjust any systematic abundance differences between batches. All in all it is a tricky situation and I don't think there is one simple golden solution to your problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment