Skip to content

FlashLFQ's Settings

Robert Millikin edited this page Oct 25, 2019 · 13 revisions

General Settings

PPM Tolerance - The mass tolerance in parts per million for finding mass spectral peaks in spectra corresponding to the input identifications. This is the mass tolerance for both MS/MS identified species and match-between-run species. A mass tolerance of 5-10ppm is typical for Orbitrap instruments.

Normalize Intensities - This option corrects systematic intensity errors introduced by instrument drift, sample preparation, etc. There are many types of normalization (see this excellent review). FlashLFQ uses a median-center normalization, which sets the the median peptide fold-change between any two samples to zero. The central assumption of this type of normalization (and most types of normalization) is that most proteins do not change in abundance between samples. If you expect most of your proteins to be changing in abundance, such as in certain types of pulldown experiments, then FlashLFQ's normalization may not be appropriate for your experiment. You may need to quantify peptides with FlashLFQ, normalize with separate software, and then re-import your quantitative results to FlashLFQ for statistical analysis. See the normalization page for more info. For most experiments, this option is recommended to be enabled.

Match Between Runs - In LC-MS/MS experiments, the mass spectrometer often cannot fragment all species eluting from the column simply because of time constraints. This leaves some number of species unfragmented and unidentified. Match-between-runs is a strategy which identifies these eluting species from their retention time and isotopic envelopes. If a species was fragmented and identified in one run, that identification can be mapped onto other runs in which that species was not identified. Generally, these identifications are less reliable than their MS/MS-identified counterparts, but are fairly accurate. You may want to leave this option off because of runtime constraints, or if you prefer precise quantification with a higher number of missing values. See the match between runs page for more info. This setting is generally recommended to be enabled.

Use shared peptides for protein quantification - Some peptide sequences can be shared between proteins. In this case, if a peptide is changing in abundance by a large amount, it is not clear which protein is changing. Most analyses do not use shared peptides for quantification for this reason. One caveat to this setting is that FlashLFQ only knows about the proteins that are input along with the identification. Some search programs (e.g., Morpheus) only report a single protein (or a subset of their parent proteins) per peptide. In these cases, peptides are sometimes inappropriately marked as "not shared" when they should technically be marked "shared".

Bayesian Protein Fold-Change Analysis - This estimates each protein's fold-change relative to some control using Bayesian statistics. It also provides a measure of statistical confidence, the "posterior error probability" (PEP). The PEP is the probability that a protein's fold change is below the specified value (the fold-change cutoff). See the Bayesian statistical analysis page for more info.

Control Condition - This is used for the Bayesian statistical analysis only. This is the condition to calculate protein fold-changes relative to, if performing the Bayesian fold-change analysis. For example, if you want to determine which proteins are changing in abundance in tumor samples compared to normal tissue, you would set "normal" to be your control condition. "tumor" would be the treatment condition.

Fold-change cutoff - This is used for the Bayesian statistical analysis only. This is the protein fold-change that is considered to be not important, or "noise". FlashLFQ can estimate a fold-change cutoff for you from the data, or you can input your own.

Advanced Settings

Integrate Peak Areas - By default, FlashLFQ reports the peak height and not peak area. Enabling integration reports the peak area. Generally this results in a more noisy quantification, and it is recommended to leave this disabled.

Only quantify identified charge - FlashLFQ quantifies all charges for a given identification, even if it was not identified in those charges specifically. For example, if a peptide is fragmented in the +2 charge state, FlashLFQ will look for charge +3, etc. You can disable this behavior by enabling this setting; the result will be quantification for only the identified charge.

Require MS/MS identification in condition - This is used for match-between-runs only. Enabling this setting will prevent FlashLFQ from matching peptides from proteins that were not observed in a particular condition. For example, in the "normal vs. tumor" example above, if a protein is not observed in the "normal" samples, then peptides from that protein would not be matched from the "tumor" runs onto the "normal" runs. This generally results in lower sensitivity and is not recommended. This setting is for users who prefer more missing values instead of "noise" or "background"-level match-between-runs intensities.

Isotope PPM tolerance - The mass tolerance in parts per million of isotope peaks (e.g., M+1, M+2 peaks). Usually 5ppm is sufficient.

Number of isotopes required - The number of isotopic peaks required to detect an isotopic envelope. This is set to 2 by default, but can be increased.

Maximum MBR window (minutes) - The retention-time error allowed in match-between runs. This is not the systematic time shift, but a sort of "variance" or "error" in the retention time allowed.

MCMC Iterations - This is used for the Bayesian statistical analysis only. This is the number of iterations to perform for the Markov-Chain Monte Carlo sampler. A larger number of iterations will result in a slower but more precise analysis of the protein fold-change, the PEP, and the FDR. Typically, there is more variance in the estimate of these values for proteins that have few (<5) fold-change measurements. The default number of MCMC iterations is set to 3000.

MCMC Random Seed - This is used for the Bayesian statistical analysis only. The Markov-Chain Monte Carlo sampler uses a random number generator. Thus, each time the statistical analysis is performed, a slightly different answer will result. This makes it difficult to reproduce someone else's work. The random number generator is initialized using a "random seed", which determines the random numbers generated. You can put in a particular random seed to reproduce another analyses' output exactly. In most cases, you do not want to set this yourself unless you are reproducing an analysis.