## Central question:

How is flowering in Brachypodium controlled?

## Data: 

 1. RNASeq with a long-day/short-day/mutant contrast, with a circadian coordinate
 1. ChIPSeq of ELF3, the transcription factor relaying photoperiod signal from cytochrome c to downstream.
 
## References: 

1. [Phytochrome c mutant has delayed flowering ( brachypodia )](http://www.genetics.org/content/198/1/397.long)

1. [Evening complex in A.thaliana on senseing thermo-signal and photoperiodic signal](https://www.ncbi.nlm.nih.gov/pubmed/27594171)

FlowChart (Based on FuXiang's slides):

1. Mapping reads and QC

  * HISAT + StringTie  for RNA-Seq
  * bowtie2 for ChIP-Seq
  * Metrics: (Coverage and mapped percentage, Assembly version)
  * QC: correlation plots (deeptools), PCA (R?)

1. Differential expression 
  * Contrast level: LD/SD/mutant
  
1. Epigenetics
  * MACS2 for peak calling
  * look for motif enrichment HOMER, MEME?

1. Constrast RNASeq and ChIPSeq result
  * Gene ontology and pathway enrichment of the derived gene set.
  
1. Clustering to obtain coexpression clades.
  * Exploit the ZT coordinate
  * CLUST/HC
  
1. Further analysis

#### May.14th

Aims:

1. [ ]  Setup production environment

1. [ ] Gather and label raw data (.fq)

1. [ ] Build up and test the pipeline for HISAT+StringTie

1. [ ] Build up QC sets.

CCBI symposium: http://talks.cam.ac.uk/talk/index/104587

### May 14th biology talk with Fuxiang

EveningComplex = ELF3 + ELF4 + LUX

EC represses flowering genes, and is released on flowering

ELF3([uniprot_O82804](http://www.uniprot.org/uniprot/O82804#family_and_domains)): mysterious protein with little structural information

phyC --| ELF3  (mutant?)

Temporal dynamics of phyC over the circadian period. (Look at RNASeq)

SD allows phyC to reach a low level at night-day transition permitting a buildup of ELF3, that supposedly repress its targets during the day.

preliminary observation: Genes regulated by ELF3 is upregulated at day-night transition

Time points at transitions.

other players: PRR37 LUX


### Cluster organisation

username: feng
hostname: sl-pw-srv01.slcu.private.cam.ac.uk

Clusters:
  1. Home cluster: 
    * HPC, normal usage. Shared bewtween Jerzy's group and Phil's group (permission groups). 
    * Storage: 
       1. Backup:      TeamPW (severl TB, to be mounted on Friday ), TeamJP.
       1. Not backup:  synology2,  synology3 
  1. Bigger cluster shared by SLCU: 
  1. CSCS

Storage on Home

CSCS Clinical School Computing Service

### Data source

1. RNASeq and ChIP-Seq
  * Ming Jun
  * Katja
  
### MISC

Regular meeting:
  1. Thursday preferably.  

### May 15th

#### RNASeq pipeline

1. Data structure:
  * RunID/SampleID/file
  
#### Structure Pipeline


1. Automator script that scans through the folder, detect and merge the fastq files. (`/media/pw_synology3/Software/mapRNAseq/map-RNA-seq.py`)

1. Take the outputed raw .fastq files do the analyses and return the output to the automator script.

fuxiang/script_nextgen

### Flowchart

[x] STEP 1: remove adaptors using Trimmomatic
Package location: /home/Program_NGS_sl-pw-srv01/Trimmomatic-0.32/trimmomatic-0.32.jar

Manual: [Trimmomatic](http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/TrimmomaticManual_V0.32.pdf)
script: pipeline_rnaseq.sh

[x] STEP 2: FASTQC quality control (you need to do this for the two paired-end read files separately)
Package location: fastqc
* loc:  util.sh: routine_fastqc()

[x] STEP 3: Map the reads to genome using HISAT

Package location: /home/Program_NGS_sl-pw-srv01/hisat2-2.1.0/

Location of indexed genome: /home/ref_genew/Brachypodium_Bd21_v3.1/HISAT2Index/Bdistachyon314_Bd

format: indexed .fasta

Use the following options: --no-mixed, --rna strandness RF, --dta

also, add this "&> align-summary.txt" to the end of the command, to direct the HISAT output into a text file (which includes information about the mapping rate)

[x] STEP 4: Convert the SAM alignment file to BAM
Package location: samtools

[x] STEP 5: sort the BAM file
Package location: samtools
	
[x] STEP 6: remove duplicate reads
Package location: /home/Program_NGS_sl-pw-srv01/picard-tools-1.103/MarkDuplicates.jar

\* [x] STEP 7: index the deduplicated BAM files
Package location: samtools

\* [x] STEP 8: flagstat to get quality metrics
Package location: samtools

See: util.sh::bamqc()

[x] STEP 9: estimate genome average and normalise -- create a bedgraph file 
Package location: genomeCoverageBed

[x] STEP 10: convert bedgraph to bigwig -- to be visualised in IGV
Package location: bedGraphToBigWig

See: util.sh::bam2bigwig()

[ ] STEP 11: transcript assembly with StringTie
Package location: /home/Program_NGS_sl-pw-srv01/stringtie-1.3.3b.Linux_x86_64/stringtie
GTF file: /home/ref_genew/Brachypodium_Bd21_v3.1/annotation/Bdistachyon_314_v3.1.gene_exons.gtf (GFF file is in the same location too)

(WWW phytozome)

[ ] STEP 12: call raw counts using HTSEQ-COUNT (you'll need to use the '>' command to direct the output to a text file)
Package location: htseq-count

Final output should encompass:
	1. BAM file (after duplicate removal, from STEP 6)
	2. *Its BAI file (from STEP 7) 
	3. HISAT align-summary.txt (STEP 3)
	4. StringTie output: a list of TPM values by gene (STEP 11)
	5. HTSEQ-COUNT output: a list of raw counts by gene (STEP 12)
	6. BedGraph file 
	7. BigWig file

Remainder files should be deleted to save space (especially the SAM file)

#### Strandedness (https://github.com/griffithlab/rnaseq_tutorial/blob/master/manuscript/supplementary_tables/supplementary_table_5.md)

| Library Kit | Stranded | 5p to 3p IGV | TopHat (--library-type parameter) | HISAT2 (--rna-strandness) | HTSeq (--stranded/-s) | Picard (STRAND_SPECIFICITY option of CollectRnaSeqMetrics) | Kallisto quant |
| ----------- | -------- | ------------ | --------------------------------- | ------ | --------------------- | ---------------------------------------------------------- |--------------|
| TruSeq Strand Specific Total RNA | Yes | F2R1 | fr-firststrand | R/RF | reverse | SECOND_READ_TRANSCRIPTION_STRAND | --fr-stranded |
| NuGEN Encore | Yes | F1R2 | fr-secondstrand | F/FR | yes | FIRST_READ_TRANSCRIPTION_STRAND | --rf-stranded |
| NuGEN OvationV2 | No | F2R1 or F1R2 | fr-unstranded | NONE | no | NONE | NONE |


26278640 reads; of these:
  26278640 (100.00%) were paired; of these:
    3785173 (14.40%) aligned concordantly 0 times
    18329761 (69.75%) aligned concordantly exactly 1 time
    4163706 (15.84%) aligned concordantly >1 times
    ----
    3785173 pairs aligned concordantly 0 times; of these:
      59060 (1.56%) aligned discordantly 1 time
85.82% overall alignment rate


===== Aligning RNASeq =====
/home/feng/repos/BrachyPhoton/out/Exp0024-ZT8-Bdphyc_S2_R1_raw.fastq
Exp0024-ZT8-Bdphyc_S2_R1_raw
Exp0024-ZT8-Bdphyc_S2_R2_raw
Using 10 threads
Phred quality version: phred33
Exp0024-ZT8-Bdphyc_S2
hisat2 -x /home/ref_genew/Brachypodium_Bd21_v3.1/HISAT2Index/Bdistachyon314_Bd -1 /home/feng/repos/BrachyPhoton/out/Exp0024-ZT8-Bdphyc_S2_R1_raw.fastq -2 /home/feng/repos/BrachyPhoton/out/Exp0024-ZT8-Bdphyc_S2_R2_raw.fastq -S Exp0024-ZT8-Bdphyc_S2.sam --threads 10 --no-mixed --rna-strandness FR --dta

real    5m19.193s
user    48m50.764s
sys     4m6.920s

26278640 reads; of these:
  26278640 (100.00%) were paired; of these:
    3785173 (14.40%) aligned concordantly 0 times
    18329761 (69.75%) aligned concordantly exactly 1 time
    4163706 (15.84%) aligned concordantly >1 times
    ----
    3785173 pairs aligned concordantly 0 times; of these:
      59060 (1.56%) aligned discordantly 1 time
85.82% overall alignment rate


### DATA integrity

cp: error reading 'raw/150R/Doro_150R_Doro_1-43239982/Doro-Bd21-3-ZT0-LD_S1_L002_R2_001.fastq.gz': Bad file descriptor

## PCA results

#### Stringtie Output

Read counts are aggregated on a per-gene basis
 * Coverage
 * TPM
 * FPKM


### Citation

If you use programs that use GNU Parallel to process data for an article in a
scientific publication, please cite:

  O. Tange (2018): GNU Parallel 2018, Mar 2018, ISBN 9781387509881,
  DOI https://doi.org/10.5281/zenodo.1146014
