# Description

* Simulating CsCl gradient to reproduce the results in:
> Lueders T, Manefield M, Friedrich MW. (2004). Enhanced sensitivity of DNA- and rRNA-based stable isotope probing by fractionation and quantitative analysis of isopycnic centrifugation gradients. Environmental Microbiology 6:73–78.

* rotor:
  * TV865
* rotor speed:
  * 45000 rpm
* spin time:
  * \>36 hr
* gradient average density:
  * 1.725 g/ml
* Fraction sizes:
  * 400 ul
* Used gDNA
  * 13C-labeled
  * 13C-methanol
  * 5 ug
* strains
  * Methylobacterium extorquens AM1 DSM 1338
  * M. barkeri DSM 800
  

# Setting variables

In [3]:
workDir = "~/notebook/SIPSim/t/data/"

# Init

In [1]:
import os
import sys
%load_ext rpy2.ipython

In [2]:
%%R
library(ggplot2)
library(dplyr)
library(tidyr)


Attaching package: ‘dplyr’

The following object is masked from ‘package:stats’:

    filter

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union



# Downloading genomes

* M. barkeri DSM 800
  * RefSeq = NC_007355.1, NC_007349.1
* Methylobacterium extorquens AM1 DSM 1338
  * RefSeq = NC_012808.1, NC_012811.1, NC_012807.1, NC_012809.1, NC_012810.1

In [4]:
!cd $workDir; \
    seqDB_tools accession-GI2fasta < M.barkeri_refseq.txt > M.barkeri.fna

Starting batch: 1
Starting trial: 1

MSG: No whitespace allowed in FASTA ID [NC_007355|Methanosarcina barkeri str. Fusaro, complete genome.]
---------------------------------------------------

MSG: No whitespace allowed in FASTA ID [NC_007355|Methanosarcina barkeri str. Fusaro, complete genome.]
---------------------------------------------------

MSG: No whitespace allowed in FASTA ID [NC_007349|Methanosarcina barkeri str. fusaro plasmid 1, complete sequence.]
---------------------------------------------------

MSG: No whitespace allowed in FASTA ID [NC_007349|Methanosarcina barkeri str. fusaro plasmid 1, complete sequence.]
---------------------------------------------------


In [5]:
!cd $workDir; \
    seqDB_tools accession-GI2fasta < M.extorquens_AM1_refseq.txt > M.extorquens_AM1.fna

Starting batch: 1
Starting trial: 1

MSG: No whitespace allowed in FASTA ID [NC_012808|Methylobacterium extorquens AM1, complete genome.]
---------------------------------------------------

MSG: No whitespace allowed in FASTA ID [NC_012808|Methylobacterium extorquens AM1, complete genome.]
---------------------------------------------------

MSG: No whitespace allowed in FASTA ID [NC_012811|Methylobacterium extorquens AM1 megaplasmid, complete sequence.]
---------------------------------------------------

MSG: No whitespace allowed in FASTA ID [NC_012811|Methylobacterium extorquens AM1 megaplasmid, complete sequence.]
---------------------------------------------------

MSG: No whitespace allowed in FASTA ID [NC_012807|Methylobacterium extorquens AM1 plasmid p1META1, complete sequence.]
---------------------------------------------------

MSG: No whitespace allowed in FASTA ID [NC_012807|Methylobacterium extorquens AM1 plasmid p1META1, complete sequence.]
----------------------------

# Indexing genomes

In [None]:
%%bash -s "$genomeLocalDir" "$bacGenomeDir"
# renaming sequences
cd $1

# making sure each sequence is unique
find $2 -name "*fasta" |\
    perl -pe 's/.+\///' | \
    xargs -P 24 -I % bash -c \
    "/var/seq_data/ncbi_db/genome/genome_rename.pl < $2/% > %"