Skip to content

Custom scripts and data for the targeted metaproteomics study on AD microbiomes.

Notifications You must be signed in to change notification settings

pchirania/targeted_mp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

Targeted Metaproteomics strategy for screening of enzymes in microbiomes

Custom scripts and out data for an in silico targeted metaproteomics study on Anaerobic Digester microbiomes.

__Project:

The work is part of a study titled “In-silico evaluation of a targeted metaproteomics strategy for broad screening of cellulolytic enzyme capacities in anaerobic microbiome bioreactors,” by Manuel I. Villalobos Solis, Payal Chirania, and Robert L. Hettich.

We have conducted a detailed in-silico examination of the potential of mass spectrometry-based targeted metaproteomics as a means of fast, sensitive, and extensive cellulolytic enzymatic measurements on anaerobic digestion microbiomes. Here, as a critical first step for mass spectrometry-based targeted metaproteomics, we performed an in-silico selection and evaluation of groups of tryptic peptides from five important GH families derived from a dataset of 1401 metagenome-assembled genomes in anaerobic digesters. We selected groups of shared peptides among proteins within a GH family while at the same time being unique compared to all other background proteins. In particular, we were able to identify a tractable unique set of peptides that were sufficient to monitor the range of GH families. The unique peptides selected for groups of GHs were found to be sufficient for distinguishing enzyme specificity or microbial taxonomy. In total, these in-silico results suggest that targeted metaproteomics could be a valuable approach for estimating molecular level enzymatic capabilities and responses of microbial communities to different substrates or conditions, which is a critical need in either building or utilizing constructed communities or defined cultures for bio-production.

__Details about the scripts within the "code" folder:

__1. Script - "get-target-background-sequences.py"

This script takes a protein FASTA file and a list of protein IDs as inputs and outputs two FASTA files - one containing the protein sequences corresponding to the IDs in the input list, and the other one containing protein sequences that do not match protein ID in the input protein ID list.

In the study, this script was utilized to generate target and background protein sequences for each target GH family. The input was protein sequences from all the MAGs and a list of protein IDs belonging to a target GH family across all the MAGs.

__2. Script - "clip_N-terminal-sequence.py"

This script takes a protein FASTA file as input and removes the first X (provided as input by user) amino acids from the N-terminal of every protein sequence in the input FASTA file. The script then outputs a FASTA file containing the trimmed protein sequences.

In the study, this script was utilized to clip first 24 N-terminal amino acids from all the protein sequences of a target GH family before the sequences were digested in silico by trypsin. This was done to prevent inclusion of potential signal peptides (which get cleaved off during protein maturation) in the list of final peptides for monitoring a GH family.

__3. Script - "get-Unique-Peptides-Against-Bkgd.py"

This script takes two FASTA files of peptidomes as input - one for the target protein family and other for the background (non-target protein family). The script then compares the target peptidome with the background peptidome and also saves those peptides that are unique to the target peptidome- called the unique target peptidome as a .csv file with the peptides in column 1 and their corresponding proteins in column 2. Before the comparison, the script also filters the target peptidome to only retain peptides with 6-25 amino acids for the comparison, as this length is recommended for targted proteomic measurements.

In this study, this script was used to get a peptidome for each of the target GH families that were tested which was unique to the GH family against the all the non-GH family peptides or proteins (i.e., the respective background). The script maintained the link between the peptides and the proteins these peptides were derived from.

__4. Script - "find-minimum-peptides-for-enzyme-family.py"

This script takes an input comma separated (.csv) file containing two columns- first column with peptide sequences and the second column with the protein ID it matches in a row (this was generated by the "get-Unique-Peptides-Against-Bkgd.py" script). This script then parses through this file to identify the minimum number of peptides (in column 1) that can represent all the proteins in the column 2.

In this study, this script was used to identify the final list of peptides to monitor all the proteins from a target GH family within the constructed microbiome of 1401 MAGs. The input was the list of peptides unique to specific GH family proteins compared to the background of all other non-target protein sequences.

__Details about the files within the "output-and-data" folder:

The folder contains .xlsx files with data used to generate figures in the article by Villalobos Solis, Chirania & Hettich, 2022.  It also contains .txt files “Unique2_GHFamily_AnDig_Clustered-Peptidome-findCommon-Description” generated from using the “find-minimum-peptides-for-enzyme-family.py” script described here. Files detailing the number of proteins covered by using 10, 50, and 100 selected tryptic peptides per GH family are also included as .xlsx files with naming format “GHFamilyName-findCommon-peptProtList-ALL-Coveragesby10-50-100.xlsx”

Please contact us if you need more details.

About

Custom scripts and data for the targeted metaproteomics study on AD microbiomes.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages