Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MG-RAST pipeline #2

Open
rx32940 opened this issue Sep 20, 2019 · 5 comments
Open

MG-RAST pipeline #2

rx32940 opened this issue Sep 20, 2019 · 5 comments

Comments

@rx32940
Copy link
Owner

rx32940 commented Sep 20, 2019

This is a complete pipeline for metagenomic analysis. The purpose of trying out this pipeline is due to the discrepancies between my Kraken2/Clark results and the company's result. Due to the fact I skipped the QC and host clean step for Kraken2/Clark analyses (which I believe the company used the tool KneadData for this specific task), I want to use an established metagenomics pipeline to confirm the accuracy of my results.
MG-RAST

@rx32940
Copy link
Owner Author

rx32940 commented Sep 20, 2019

upload all raw data to the web interface inbox through api
upload code from local machine to web interface

@rx32940
Copy link
Owner Author

rx32940 commented Sep 23, 2019

I pair-end joined all the the forward and reverse fastq files for each sample before feeded into the pipeline.
project name:pair-end-joined-metagenomes

pipeline options:

dereplication yes
screening(hostclean) M. musculus, NCBI v37
dynamic trimming yes
minimum quality 10
maximum low quality basepairs 5

steps performed in the pipeline:

Screen Shot 2019-09-23 at 4 12 40 PM

example analysis for R22.L found in this link (need login info):
https://www.mg-rast.org/mgmain.html?mgpage=overview&metagenome=d65d65c7036d676d343836303339302e33

R22.L genus level taxonomic distribution
Screen Shot 2019-09-23 at 4 18 09 PM

  • KRONA result
    Screen Shot 2019-09-23 at 4 31 09 PM

  • The unclassfied section from CLARK and KRAKEN2 results are Eukaryote sequences. (unclassified because Eukaryote sequences were not in their database)

  • because Mus genome was screened in one of the pipeline steps, so the genome should be already removed. Why we can still receive mus hits for taxonomic profiling?
    • what is the host? Rattus or Mus?
    • if Rattus, I need to use bowtie to screen Rattus genome from the samples (reference not available as an option in MG-RAST)
      • use script from the pipeline (this can be find at download section after analysis)
    • but why there are also Mus genome identified?
  • next step:

    • download the after screening sequences from MG-RAST and submit to the pipeline again to confirm host cleanness of the sequence
    • screening Rattus genome for more clean sequences. (ask Sree what is the host of the samples)
  • with representative hit, the absolute abundance of each sample
    Screen Shot 2019-09-23 at 4 39 05 PM

  • with representative hit, the relative abundance of each sample
    Screen Shot 2019-09-23 at 4 40 23 PM

  • in liver sample from subject R22 and R27. Bacteria abundance is significantly higher. This corresponds to the results provide by the company. However, R26.L does not have high abundance in Bacteria Domain. This is inconsistent with the company's result.

@rx32940
Copy link
Owner Author

rx32940 commented Sep 23, 2019

project name: test-hostcleaned

  • Mus screened data from MG-Rast still show a small amount of Mus genome and a large portion of Rodent genome. I will use bowtie to remove Rattus genome from the sequences.
    Screen Shot 2019-09-23 at 4 49 42 PM

This step was tested with only one sample R27.K

  • note that the sequence file was not only Mus screened by also preprocessed with other cleaning processes in the pipeline.

To do:

  • email Sree about the exact host of the samples
  • find Rattus reference genome
  • screening with Rattus genome
  • feed into the pipeline again

conclusion: unclassification portion of the KRAKEN2 (LCA) and CLARK results belongs to Eukaryotes that was not presented in the database.

@rx32940
Copy link
Owner Author

rx32940 commented Sep 24, 2019

Host is Rat, the reference genome for rat is not available for MG-Rast pipeline.
I have found the reference through the UCSC genome browser:
Jul. 2014 (RGSC 6.0/rn6) assembly of the rat genome (rn6, RGSC Rnor_6.0)
and downloaded through FTP.

  • I will use bowtie to screen with rat reference genome for the QC checked fastq files from MGRAST pipeline and feed them back into the pipeline for taxonomic profiling again

bowtie2 screening with Rattus reference genome code

Because this task takes a very long time, I decide to use the two passed screening sequences first to test:

  • project name in MG-Rast: Rattus_hostcleaned_test
  • samples:
    • R22.L_hostcleaned (mgm4860391)
    • R22.S_hostcleaned (mgm4860390)

@rx32940
Copy link
Owner Author

rx32940 commented Sep 26, 2019

R22.S host cleaned data from the company for comparison with host cleaned data with bowtie2 in the pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant