## Reference Genome

**Purpose:** 
To build a reference database of loci, fully filtered, from the Korean Pacific cod data (Lanes 1 and 2, unique samples only)

**Pipeline:** 

`process_radtags` --> `ustacks` --> `cstacks` --> `sstacks` --> `populations` --------> 

make a fasta file with the catalog files from `cstacks` and the list of loci from `populations` ---> 

BOWTIE filtering --> parseBowtie_DD.py --> BLAST filtering --> checkBlastResults_DD.py --> Build BOWTIE index with filtered data --> align individual fastq files to BOWTIE index in BOWTIE



### process_radtags


### ustacks
I ran ustacks with the following parameters on Jan. 17th [Notebook](https://github.com/mfisher5/mf-fish546-PCod/blob/master/notebooks/Understanding%20Genotype%20Error.ipynb)

-t gzfastq
<br>
-f L1L2samplesT142
<br>
-r -d
<br>
-o L1L2stacks_m10_boundSNP
<br>
-i 001 to 132
<br>
-m 10
<br>
-M 3
<br>
-p 6
<br>
--model_type bounded
<br>


<img src = picture.png >


### cstacks
I ran cstacks with the following parameters on Jan. 17th [Notebook](https://github.com/mfisher5/mf-fish546-PCod/blob/master/notebooks/Understanding%20Genotype%20Error.ipynb)

-b 3 
<br>
-s L1L2stacks_m10_boundSNP/ 
<br>
-o L1L2stacks_m10_boundSNP 
<br>
-n 3 
<br>
-p 6
<br>


### sstacks
I ran sstacks with the following parameters on Jan. 17th [Notebook](https://github.com/mfisher5/mf-fish546-PCod/blob/master/notebooks/Understanding%20Genotype%20Error.ipynb)

-b 3 
<br>
-c L1L2stacks_m10_boundSNP/batch_3 
<br>
-s L1L2stacks_m10_boundSNP/
<br>
-o L1L2stacks_m10_boundSNP 
<br>
-p 6 
<br>
2>> sstacks_out_b3
<br>


### populations
I ran populations with the following parameters on Jan. 17th [Notebook](https://github.com/mfisher5/mf-fish546-PCod/blob/master/notebooks/Understanding%20Genotype%20Error.ipynb)

-b 3 
<br>
-P L1L2stacks_m10_boundSNP 
<br>
-M scripts/PopMap_L1L2stacks.txt
<br>
-t 36 -r 0.75 -p 2 -m 10
<br>
--genepop 
<br>
--fasta 
<br>
2>> populations_out_batch3
<br>


### Create a FASTA file from stacks output

In [1]:
pwd

u'/mnt/hgfs/Pacific cod/DataAnalysis/PCod-Korea-repo/notebooks'

In [2]:
cd ../../scripts

/mnt/hgfs/Pacific cod/DataAnalysis/scripts


In [3]:
!head genBOWTIEfasta.py

### This script generates a FASTA file for the cstacks reference database, that can then be filtered in BOWTIE to make a cleaned reference database of loci for cstacks ###

## MF 

## ARGUMENTS: 
#ARG 1 - a manually created text file of the "locus_SNP" heading in the .genepop file put out from populations. 
#ARG 2 - the .catalog file output from cstacks

###################################################################################################################



In [4]:
cd ../L1L2stacks_m10_boundSNP/

/mnt/hgfs/Pacific cod/DataAnalysis/L1L2stacks_m10_boundSNP


In [5]:
!gzip -d batch_3.catalog.tags.tsv.gz

In [6]:
cd ../scripts

/mnt/hgfs/Pacific cod/DataAnalysis/scripts


In [7]:
!python genBOWTIEfasta.py \
../L1L2stacks_m10_boundSNP/batch_3_loci.txt \
../L1L2stacks_m10_boundSNP/batch_3.catalog.tags.tsv

### Run BOWTIE filtering

In [8]:
pwd

u'/mnt/hgfs/Pacific cod/DataAnalysis/scripts'

In [10]:
!cd ../L1L2stacks_m10_boundSNP

In [13]:
!mkdir bowtie_refgenome

In [14]:
!mv ../Stacks_Param_Testing/bowtie/bowtie-1.1.2 bowtie_refgenome/bowtie-1.1.2

In [16]:
pwd

u'/mnt/hgfs/Pacific cod/DataAnalysis/scripts'

In [18]:
cd ../scripts

/mnt/hgfs/Pacific cod/DataAnalysis/scripts


In [19]:
!mv seqsforBOWTIE.fa ../L1L2stacks_m10_boundSNP/unfilteredBOWTIE.fa

In [20]:
cd ../L1L2stacks_m10_boundSNP/

/mnt/hgfs/Pacific cod/DataAnalysis/L1L2stacks_m10_boundSNP


In [21]:
!mv unfilteredBOWTIE.fa bowtie_refgenome/unfilteredBOWTIE.fa

In [22]:
cd bowtie_refgenome/

/mnt/hgfs/Pacific cod/DataAnalysis/L1L2stacks_m10_boundSNP/bowtie_refgenome


In [23]:
!./bowtie-build unfilteredBOWTIE.fa ../batch_3

/bin/sh: 1: ./bowtie-build: not found


**2/10/2017**

In [1]:
pwd

u'/mnt/hgfs/Pacific cod/DataAnalysis/PCod-Korea-repo/notebooks'

In [2]:
cd ../../L1L2stacks_m10_boundSNP/

/mnt/hgfs/Pacific cod/DataAnalysis/L1L2stacks_m10_boundSNP


In [3]:
cd bowtie_refgenome/

/mnt/hgfs/Pacific cod/DataAnalysis/L1L2stacks_m10_boundSNP/bowtie_refgenome


In [4]:
!./bowtie-build unfilteredBOWTIE.fa ../batch_3

/bin/sh: 1: ./bowtie-build: not found


In [None]:
# don't use the ./!
bowtie-build unfilteredBOWTIE.fa ../batch_3

In [None]:
!bowtie -f -v 3 --sam --sam-nohead \
batch_3 \
unfilteredBOWTIE.fa \
batch_3_BOWTIEout.sam

In [None]:
cd ../../scripts

In [None]:
python parseBowtie_DD.py \
> ../L1L2stacks_m10_boundSNP/batch_3_BOWTIEout.sam \
> ../L1L2stacks_m10_boundSNP/batch_3_BOWTIEout_filtered.fa
# Number of Bowtie output lines read: 21724
# Number of sequences written to output: 21724

### BLAST filtering

In [None]:
cd ../L1L2stacks_m10_boundSNP

In [None]:
makeblastdb -in batch_3_BOWTIEout_filtered.fa \
-parse_seqids \
-dbtype nucl \
-out batch_3_BOWTIEfilteredDB


In [None]:
blastn -query batch_3_BOWTIEout_filtered.fa \
-db batch_3_BOWTIEfilteredDB \
-out batch_3_BOWTIEout_BLASTout


In [None]:
python ../scripts/checkBlastResults_DD.py \
batch_3_BOWTIEout_BLASTout \
batch_3_BOWTIEout_filtered.fa \
batch_3_BOWTIEout_BLASTout_filtered.fa \
batch_3_BOWTIEout_BLASTout_bad.fa

In [None]:
grep -c "^>" batch_3_BOWTIEout_BLASTout_filtered.fa
# 21397

### Create final SAM file : the reference database of loci

In [None]:
bowtie-build batch_3_BOWTIEout_BLASTout_filtered.fa \
batch_3_ref_genome

### Align process_radtags output to new reference "genome"

*this was done overnight*