Skip to content

ep142/FoodMicrobionet

Repository files navigation

FoodMicrobionet

This repository will eventually contain the different versions of FoodMicrobionet (as Shiny apps and/or R lists) and of the R scripts created to access the data or to carry out statistical and graphical analysis.

  • file FoodMicrobionet_tablespecs.md describes the specification for FoodMicrobionet tables

  • folder dada2_pipeline contains a modified version of the DADA2 pipeline suitable for (reasonably) small datasets. I have tested it on oldish Macs (a late 2013 iMac and a late 2015 MacBook Pro, both with 8 Gb RAM) with MacOS 10.14.6 and it works reasonably well with V3-V4 datasets of no more than 50 samples (although you might have to run a few steps overnight). The script has handy options for paired-end and non-paired end files, and for Illumina, IonTorrent and old 454 files. You do need to set a few things and at least have a look at the end of a few steps (quality control, trimming and filtering, ASV inference), but otherwise you don't have to stay there all the time: a beep will alert when a time consuming step is finished. Please pay attention at the structure of folders and files required by the pipeline or adapt the corresponding instructions

  • folder dada2_pipeline_big_data will contain a modified version of the previous pipeline and is based of the DADA2 pipeline for big data. I have tested on the same machines with more than 800 samples (usually V4). It is divided in three parts:

    • first, the sequences are divided in groups based on the machine lane data of the headers of the fastq files

    • second, you have to process the sequences in groups (in interactive mode, although you do not have to sit there staring at the screen all the time, a beep will tell you when time consuming steps are finished): the results are sequence tables. Most things are basically what you can find in the bioconductor_pip_v6 script, which you can find in the dada2_pipeline folder

    • third, a script will merge the sequence tables and perform taxonomic assignment (you can split the process in groups to avoid running into memory problems) and, optionally, infer a phylogenetic tree

    • fourth: everything is assembled in the objects I need to populate FoodMicrobionet

  • folder import_in_FMBN contains a script which uses files generated by the two previous scripts (x_study.Rdata, x_taxa.Rdata, x_edges.Rdata, where x is the accession number of the study being imported) to generate a study, taxa and edges file ready to be copied into the .xlsx version of FoodMicrobionet. The script also performs a number of formal checks (to avoid duplicates, NAs, etc.). For taxa, two extra files are needed: a lookup table (species_lookup.txt, included in folder Support) and an Excel file named taxa_in_fmbn.xlsx containing all the taxa already in FoodMicrobionet (three columns, taxon name, taxon id and lineage, a toy example is included). This is necessary to keep the taxonomy in FoodMicrobionet coherent as taxonomic databases change, and to make link to external databases functional (epipets like Clostridium sensu stricto 1 botulinum would return no match in Florilege, LPSN and NCBI Taxonomy).

  • folder assemble_FMBN contains the script used for assembling FMBN tables in two R lists, one of which can be used with v2.3 of the ShinyFMBN app and the other containing all fields (a new version of the app compatible with this version will be hopefully made available in February 2022). The Excel files are not provided. Contact me if you are interested.

  • folder shiny_apps contains the ShinyFMBN app with the most recent public version. The app makes access to the database easier (although working with custom R scripts is much faster).

  • folder FMBNanalyzer contains a script designed to carry out descriptive analysis (bar plots, diversity indices and rarefaction analysis, boxplots, heatmaps, MDS, bipartite analysis) on *agg.RDS files extracted from ShinyFMBN (and saving the results). As of January 2022 I have not had the time to test this script extensively. A RMarkdown document with the script (designed to generate a .html report) is also provided together with a small dataset on seafood. To use this document you need to:

    • perform a search in ShinyFMBN and export the agg file: see the manual for ShinyFMBN on Mendeley Data for further details

    • put the file (you will find it in the output -> aggdata folder, located in the app folder) in a new folder containing this template

    • create a RStudio project for that folder

    • set the options for filtering, saving etc. in the chunk below

    • knit the document (the default is to .html, but you can change the YAML header (or choose the appropriate option in the Knit menu) to obtain .pdf (need LaTex) or MS Word documents

  • folder merge_phyloseq_objs contains a small proof-of-concept script which would allow you to merge physeq objects extracted from FoodMicrobionet with your own physeq objects (provided they are obtained using the DADA2 pipeline). In principle, this would simplify comparisons of your own data with data from the literature.

  • folder the_real_thing contains the database in two formats: R lists and Excel files (each in their own folders). Both are safer to use than text files for which occasional problems with accented letters and special characters occur in different locales. As a bonus, I am including in the folder with the R lists a .Rmd document designed to provide a statistical report on the current version of FMBN and a script (ide_depth.R) which can be used to produce graphs on the depth of taxonomic assignments in FMBN studies. Both version 4.1 and version 4.2 are provided.

  • folder WIMB (Where is my bug?) contains example scripts and data used in our preprint on the ecological distribution of Lactobacillaceae.

  • folder miscellaneous contains miscellaneous scripts used to generate figures and tables in papers related to FoodMicrobionet. While I try to do my best to document what I am doing I don't always have the time to make the scripts fool proof. Read carefully the comments and, if you are reasonably good at programming with R, you should be OK.

Known issues As of April 2022 BioConductor does not support the arm64 build of R. Therefore, if you have a Mac with a M1 or M2 processor you are better off using the standard version of R.

About

apps and data from the FoodMicrobionet project

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published