### Step 1: Review Notes/Folder

Overall Goal: Examine whether genomic signatures of climate adaptation can be detected across a variety of species

Method: Align sequence data from populations of Acropora digitifera, Acropora millepora, Acropora tenuis, and Acropora cervicornis to the Acropora millepora genome to identify SNPs with associations to 38 years of temperature anomaly data

Bay Lab Goal: Using BAMs from the alignment, use ANGSD to Calculate per-site thetas. **What is the expectation across populations and across species?** 


### Step 2: Need to get parameters figured out for sliding window analysis

##### Day 1 Goal: Run analysis with default parameters to get oriented.
    - popgen.dk/angsd/index.php/Thetas,Tajima,Neutrality_tests
    
- Download NGS tools 
   ```
   $ cd Programs
   Programs$ interact --egress
   Programs$ git clone https://github.com/ncbi/ngs-tools.git
   Programs$ cd ngs-tools
   ngs-tools$ ./configure --help  # configure in this directory.
   ngs-tools$ ./configure --prefix=/$SCRATCH/Programs/ngs-tools
   .
   .
   configure: error: required ngs-sdk package not found.
   
   ```
- Based on the error, I went here (https://github.com/ncbi/sra-tools/wiki/Building-and-Installing-from-Source)and found that I need to download ncbi-vdb. 

    ```
    Programs$ git clone https://github.com/ncbi/ncbi-vdb.git
    Programs$ cd ncbi-vdb/
    ncbi-vdb$ ./configure --prefix=/$SCRATCH/Programs/ncbi-vdb/
    .
    .
    error: required ngs-sdk package not found.
    ```
- More error correcting. Found instructions here : https://github.com/ncbi/ngs-tools/issues/3
    
    ```
    Programs$ git clone https://github.com/ncbi/ngs.git
    Programs$ cd ngs/ngs-sdk
    ngs-sdk]$ ./configure --prefix=/$SCRATCH/Programs/ngs/ngs-sdk && make
    ngs-sdk]$ cd ../../ncbi-vdb
    ncbi-vdb]$ ./configure -p=/$SCRATCH/Programs/ncbi-vdb/ && make
    ncbi-vdb]$ cd ../ngs-tools
    ngs-tools$ ./configure --prefix=/$SCRATCH/Programs/ngs-tools
    ngs-tools$ exit
    ```   
- Download ANGSD

```
$ cd Programs
Programs$ interact --egress
Programs$ git clone https://github.com/samtools/htslib.git
Programs$ git clone https://github.com/ANGSD/angsd.git 
Programs$ cd htslib;make;cd ../angsd ;make HTSSRC=../htslib
```
Yay! Programs have been configured and downloaded.
Versions:
angsd version: 0.933-106-gb0d8011 (htslib: 1.11-35-g85240ba) build(Jan 11 2021 18:17:14)


- Set up symbolic links for all the bam files in /pylon5/ebz3a6p/rachbay/CoralCollab/mil

    ```
    Nt_div]$ mkdir millepora
    Nt_div]$ cd millepora
    millepora]$ ln -s /pylon5/ebz3a6p/rachbay/CoralCollab/mil/*.bam .
    ```
    
- Break down Rachael's code to run my own.
    - Note that the following code has my comments integrated using ```#[lg]```

```
#!/bin/bash 
#
#all commands that start with SBATCH contain commands that are just used by SLURM for scheduling  
#################
#set a job name  
#SBATCH --job-name=THETA
#################  
#a file for job output, you can check job progress
#SBATCH --output=THETA.%j.out
#################
# a file for errors from the job
#SBATCH --error=THETA.%j.err
#################
#time you think you need; default is one hour
#in minutes in this case
#SBATCH -t 48:00:00
#################
#quality of service; think of it as job priority
#SBATCH -p RM
#################
#number of nodes
#SBATCH --nodes=1
#SBATCH --ntasks=28
#################
#SBATCH --mem=120G
#################
#################
#get emailed about job BEGIN, END, and FAIL
#SBATCH --mail-type=ALL
#################
#who to send email to; please change to your email
#SBATCH  --mail-user=rachaelbay@gmail.com
#################
#now run normal batch commands
##################
#echo commands to stdout

set -x

#[lg] variables for ref genome and where dependencies(?) are
REFERENCE=$SCRATCH/CoralMeta/Amil.v2.01.chrs.fasta   
NGSTOOLS=/home/rachbay/programs/ngsTools

#[lg] variable for the species being worked on
pref='mil'

mkdir thetas/$pref



    #~/programs/angsd/angsd -bam milbam.txt -P 2 -ref $REFERENCE -anc $REFERENCE \
    #       -remove_bads 1 -uniqueOnly 1 -only_proper_pairs -minInd 20 -doMaf 1 -doMajorMinor 4 \
    #       -doCounts 1 -GL 1 -doSaf 1 -out thetas/$pref/$pref

#[lg] This code is for allele frequency estimation from genotype likelihoods with bam files 
#[lg] The the site frequency spectrum is also estimated here. 
#[lg] milbam.txt is a list of the bam files 
#[lg] -P............................number of threads allocated to the program
#[lg] -ref..........................path to reference genome
#[lg] -anc..........................path to fasta file with the ancestral alleles, can be null
#[lg] -remove_bads..................Discard 'bad' reads, (flag >=256) 
#[lg] -uniqueOnly...................Discards reads that don't map uniquely
#[lg] -only_proper_pairs............Only use reads where the mate could be mapped
#[lg] -minInd.......................Discard site if effective sample size below value. (based on read depth)
#[lg] -doMaf........................Estimate allele frequencies, 1: Frequency (fixed major and minor)
#[lg] -doMajorMinor.................Infer the major/minor using different approaches, 4:Use reference allele as major (requires -ref) 
#[lg] -doCounts.....................Count the number A,C,G,T. All sites, All samples
#[lg] -GL...........................Estimate genotype likelihoods, 1:SAMtools
#[lg] -doSaf........................Estimate the SFS and/or neutrality tests genotype calling, 1: perform multisample GL estimation
#[lg] -out..........................Out Dir
#[lg] which option requires the -anc



~/programs/angsd/misc/realSFS thetas/$pref/$pref.saf.idx -P 1 -fold 1 -nSites 50000000 > thetas/$pref/$pref.sfs
~/programs/angsd/misc/realSFS saf2theta thetas/$pref/$pref.saf.idx -outname thetas/$pref/$pref -sfs thetas/$pref/$pref.sfs -fold 1

~/programs/angsd/misc/thetaStat do_stat thetas/$pref/$pref.thetas.idx
~/programs/angsd/misc/thetaStat print thetas/$pref/$pref.thetas.idx > thetas/$pref/$pref.thetas
~/programs/angsd/misc/thetaStat do_stat thetas/$pref/$pref.thetas.idx -win 1000 -step 500 -outnames thetas/$pref/$pref.thetasWindow.gz


    
    ```
    
    