Skip to content

MSP files management

Sadjad F Baygi edited this page Jan 22, 2023 · 16 revisions

IDSL.FSA was designed to manage .msp format mass spectrometry files with various structures with no pre-processing treatments. Thus, IDSL.FSA was designed to provide multiple easy to use modules to manage .msp files which a number of them are summarized below:

msp2FSdb

The msp2FSdb module can generate organized Fragmentation Spectra DataBase (FSDB) libraries for data parsing using one or multiple .msp files for a comprehensive screening. Additionally, this module is able to deconvolute MSP blocks containing multiple PrecursorMZ values in a msp line (e.g. PrecursorMZ: 208.0615, 146.0611 for N-Benzoylserine in negative mode). The msp2FSdb module was designed to be consistent with various .msp files structures particularly from NIST, GNPS, MoNA, IDSL.CSA libraries. The msp2FSdb module generally can work for any .msp files as long as Num Peaks rows are available in the .msp file.

msp2FSdb(path, MSPfile_vector = "", massIntegrationWindow = 0, allowedNominalMass = FALSE,
	 allowedWeightedSpectralEntropy = TRUE, noiseRemovalRatio = 0.01, number_processing_threads = 1)

path: address of .msp file

MSPfile_vector: a vector of .msp file names

massIntegrationWindow: Mass accuracy in Da

allowedNominalMass: c(TRUE, FALSE). Select TRUE only for nominal mass analysis.

allowedWeightedSpectralEntropy: c(TRUE, FALSE). Weighted entropy to measure entropy similarity score.

noiseRemovalRatio: noise removal ratio relative to the basepeak to measure entropy similarity score (0-1)

number_processing_threads: number of parallel processing threads compatible with the Windows and Linux environments.

msp2TrainingMatrix

The msp2TrainingMatrix can generate aligned match table using ions from individual MSP blocks.

msp2TrainingMatrix(path, MSPfile = "", minDetectionFreq = 100, selectedFSdbIDs = NULL, dimension = "wide",
		   massAccuracy = 0.01, allowedNominalMass = FALSE, allowedWeightedSpectralEntropy = TRUE,
		   noiseRemovalRatio = 0.01, number_processing_threads = 1)

path: Address of .msp file

MSPfile: A .msp file name or FSDB in .Rdata format

minDetectionFreq: The minimum frequency of detection for an ion across the entire MSP blocks

selectedFSdbIDs: selected MSP block/FSDB IDs to limit the screening to specific ion blocks

dimension: c("wide", "long"). wide or long alignment matrix output

massAccuracy: Mass accuracy (Da)

allowedNominalMass: c(TRUE, FALSE). Select TRUE only for nominal mass analysis.

allowedWeightedSpectralEntropy: c(TRUE, FALSE). Weighted entropy to measure entropy similarity score.

noiseRemovalRatio: noise removal ratio relative to the basepeak to measure entropy similarity score (0-1)

number_processing_threads: number of parallel processing threads compatible with the Windows and Linux environments.

mgf2msp

The mgf2msp can convert Mascot generic format files (.mgf) into NIST mass spectra format (.msp). The mgf2msp module is fast which requires <2 sec for .mgf files with ~5,000 fragmentation blocks on a single thread. The converted files are stored in the same directory with .msp extension.

mgf2msp(path, MGFfile = "")

path: Location of the original .msp file

MGFfile: Name of the mgf file with its extension

mspPosNegSplitter

In many instances, .msp public libraries include both positive and negative fragmentation data in one .msp file. Thus, IDSL.FSA utilized a module, mspPosNegSplitter, to separate positive and negative MSP blocks for a rapid and efficient annotation. This module is easy to use:

mspPosNegSplitter(path, MSPfile = "", number_processing_threads = 1)

path: Location of the original .msp file

MSPfile: Name of the .msp file with its extension

number_processing_threads: number of parallel processing threads compatible with the Windows and Linux environments

The separated MSP blocks are stored in the same directory with "_Pos" and "_Neg" suffixes.

FSdb2precursorType

The FSdb2precursorType can detect potential ionization pathways for molecular formulas using a vector of InChIKey values from an FSDB. This module only searches for the first 14 InChIKey letters; and therefore, may result with multiple potential precursor types. This module returns a matrix of frequency for each InChIKey in the FSDB. The headers of the matrix columns represent precursor types.

FSdb2precursorType(InChIKeyVector, libFSdb, tableIndicator = "Frequency", number_processing_threads = 1)

InChIKeyVector: A vector of InChIKey values. This value may contain whole InChIKey strings or first 14 InChIKey letters.

libFSdb: A converted MSP library reference file using the msp2FSdb module which is an FSDB produced by the IDSL.FSA package.

tableIndicator: c("Frequency", "PrecursorMZ"). To show frequency or a median of PrecursorMZ values in the output dataframe for each precursor type.

number_processing_threads: number of parallel processing threads compatible with the Windows and Linux environments

FSA_msp2Cytoscape

This FSA_msp2Cytoscape module performs pairwise MSP block analysis to create Cytoscape networks files. This module is especially beneficial to find related peaks in an analysis.

FSA_msp2Cytoscape(path, MSPfile = "", mspVariableVector = NULL, mspNodeID = NULL,
		  massError = 0.01, RTtolerance = NA, minEntropySimilarity = 0.75, allowedNominalMass = FALSE,
		  allowedWeightedSpectralEntropy = TRUE, noiseRemovalRatio = 0.01, number_processing_threads = 1)

path: address of .msp file

MSPfile: A .msp file name or FSDB in .Rdata format

mspVariableVector: a vector of MSP variables

mspNodeID: MSP Node ID which is the ID that is required for the specsim ID generation

massError: Mass accuracy in Da

RTtolerance: Retention time tolerance (min) to match MSP blocks. Select NA to ignore retention time match. This option is especially beneficial to find co-occurring compounds.

minEntropySimilarity: Minimum entropy similarity score

allowedNominalMass: c(TRUE, FALSE). Select TRUE only for nominal mass analysis.

allowedWeightedSpectralEntropy: c(TRUE, FALSE). Weighted entropy to measure entropy similarity score.

noiseRemovalRatio: noise removal ratio relative to the basepeak to measure entropy similarity score (0-1)

number_processing_threads: number of parallel processing threads compatible with the Windows and Linux environments

FSA_uniqueMSPblockTagger

This FSA_uniqueMSPblockTagger module performs pairwise MSP blocks analysis to remove similar MSP blocks in an .msp file.

FSA_uniqueMSPblockTagger(path, MSPfile = "", aggregateBy = "Name", massError = 0.01,
			 RTtolerance = NA, minEntropySimilarity = 0.75, noiseRemovalRatio = 0.01,
			 allowedNominalMass = FALSE, allowedWeightedSpectralEntropy = TRUE, plotSpectra = FALSE,
			 number_processing_threads = 1)

path: address of .msp file

MSPfile: A .msp file name or FSDB in .Rdata format

aggregateBy: a variable to aggregate the MSP blocks based on

massError: Mass accuracy in Da

RTtolerance: Retention time tolerance (min) to match MSP blocks. Select NA to ignore retention time match. This option is especially beneficial to find co-occurring compounds.

minEntropySimilarity: Minimum entropy similarity score

noiseRemovalRatio: noise removal ratio relative to the basepeak to measure entropy similarity score (0-1)

allowedNominalMass: c(TRUE, FALSE). Select TRUE only for nominal mass analysis.

allowedWeightedSpectralEntropy: c(TRUE, FALSE). Weighted entropy to measure entropy similarity score.

plotSpectra: c(TRUE, FALSE). Select TRUE to plot similar spectra in individual folders

number_processing_threads: number of parallel processing threads compatible with the Windows and Linux environments.

FSA_uniqueMSPblockTaggerUntargeted

This FSA_uniqueMSPblockTagger module performs pairwise MSP blocks analysis to remove similar MSP blocks in an .msp file only using a retention time window.

FSA_uniqueMSPblockTaggerUntargeted(path, MSPfile_vector, minCSAdetectionFrequency = 20,
				   minEntropySimilarity = 0.75, massError = 0.01, massErrorPrecursor = 0.01,
				   RTtolerance = 0.1, noiseRemovalRatio = 0.01, allowedNominalMass = FALSE,
				   allowedWeightedSpectralEntropy = TRUE, plotSpectra = FALSE,
				   number_processing_threads = 1)

path: address of .msp file

MSPfile_vector: a vector of .msp file names

minCSAdetectionFrequency: minimum CSA detection frequency

minEntropySimilarity: Minimum entropy similarity score

massError: Mass accuracy in Da

massErrorPrecursor: Mass accuracy of precursor in Da

RTtolerance: Retention time tolerance (min) to match MSP blocks.

noiseRemovalRatio: noise removal ratio relative to the basepeak to measure entropy similarity score (0-1)

allowedNominalMass: c(TRUE, FALSE). Select TRUE only for nominal mass analysis.

allowedWeightedSpectralEntropy: c(TRUE, FALSE). Weighted entropy to measure entropy similarity score.

plotSpectra: c(TRUE, FALSE). Select TRUE to plot similar spectra in individual folders

number_processing_threads: number of parallel processing threads compatible with the Windows and Linux environments.