Protocol used by Albertas to generate this data: https://benchling.com/anavickas/f/lib_VNejBHt6-protocols/prt_2Bkfh557-meripseq/edit

# Preprocessing

We've started analyzing from raw `fastq` files. Then, we trimmed `NNNAGATCGGAAGAGCACACGTCTGAACTCCAGTCA` sequence from each reads using `cutadapt`. Trimmed `fastq` files aligned to Human genome `gencode.v28.annotation.gtf` using STAR aligner. In addition, unmmaped reads for each `fastq` file saved to a new `fastq` file to align seprately to HIV genome. 

> `<sample>.<species>.<treatment>.<input/m6A>.bam`

# Technical vs. biological variance 
Principal component analysis (PCA) of the samples, tell us there are obvious batch effect. We are considering that within `DESeq2` design when we do the statistical tests all over the downstream analysis.

<img src="plots/all_PCAs.png" style="height:500px">

# Internal controls
Here, we aim to statistically check the significance of the knock-down of the m6A subunites. First, we are using `featureCounts` to count genes within `input` samples. Then, we are comparing treated vs. non-treated through `DESeq2` to statistically test the knockdown in each treatment. In conclusion, the comparison for each set of treatments tell us METTL14 and METTL3 didn't pass the test althogh Virma amd WTAP pass (threshold: **pvalue <0.05**). Also, there is significant overrpresntation of WTAP in the METTL3 treatment. 

### METTL14
| ID | name | baseMean | log2FoldChange | pvalue | padj |
|------|------|------|------|------|------|
| ENSG00000145388.14 | **METTL14** | 231.6902 | -0.34141970 | 0.3432071 | 0.9998984
| ENSG00000165819.11 | METTL3 | 682.4459 | -0.03486790 | 0.8705401 | 0.9998984
| ENSG00000164944.11 | KIAA1429 | 663.2608 | 0.03264867 | 0.8802465 | 0.9998984
| ENSG00000146457.14 | WTAP | 532.6499 | 0.27842521 | 0.2549869 | 0.9998984

### METTL3
| ID | name | baseMean | log2FoldChange | pvalue | padj |
|------|------|------|------|------|------|
| ENSG00000145388.14 | METTL14 |281.3092 |  0.2729448 | 0.406580557 |     NA
| ENSG00000165819.11 | **METTL3** |591.5949 | -0.4419943 | 0.057697989 | 0.73893916
| ENSG00000164944.11 | KIAA1429 | 693.4508 |  0.1893738 | 0.379007282 | 0.99856616
| ENSG00000146457.14 | WTAP |625.1083 |  0.7027947 | **0.001935877** | 0.08696554

### Virma
| ID | name | baseMean | log2FoldChange | pvalue | padj |
|------|------|------|------|------|------|
|ENSG00000145388.14 | METTL14 |257.2336 |  0.1623479| 0.629775307| 0.9996953
|ENSG00000165819.11  | METTL3 |626.2647 | -0.1052284| 0.639313230| 0.9996953
|ENSG00000164944.11 |**KIAA1429**| 507.3444 | -0.6341922| **0.009837015**| 0.9996953
|ENSG00000146457.14 | WTAP |538.0204 |  0.4565725 |0.067968639| 0.9996953

### WTAP
| ID | name | baseMean | log2FoldChange | pvalue | padj |
|------|------|------|------|------|------|
|ENSG00000145388.14 |METTL14 | 301.6155 | 0.3928952 | 0.2200466386 | 0.9999349
|ENSG00000165819.11 |METTL3 | 742.0639 | 0.1733562 | 0.4284646260 | 0.9999349
|ENSG00000164944.11 |KIAA1429 | 727.2650 | 0.2549271 | 0.2431260688 | 0.9999349
|ENSG00000146457.14 |**WTAP** | 690.2359 | 0.8689117 | **0.0003008184** | 0.5474645

<img src="plots/internal_control_Volcanos.png" style="height:500px">

# Peakcalling 
We supposed to use `RADAR` or `exomePeak` packages for the downstream differential methylation analysis. Instead, we switched to develop a custom pipeline that is combining `exomePeak`, `featureCounts`, and `DESeq2`. First, we are calling m6A peaks in non-treated samples using `exomePeak` that calculate coverage for only non-treated `input` and `m6A` samples of all replicates to gives us peak coordinates as a `bed12` file. Then, we are using `featureCounts` where we used the output of the previous step (`bed12` file) and convert it `GTF` format to use it as the reference for counting found peaks as features instead of genes. Finally, `DESeq2` evaluate treated vs. non-treated differential methylation which contrast design is (treat_m6A vs. treat_input) vs. (NT_m6A vs. NT_input). 

## Evaluate peaks
### Metagene plot
As expected, there is high density of methylation peaks at end of CDS. (see [Comprehensive Analysis of mRNA Methylation Reveals Enrichment in 3′ UTRs and near Stop Codons](https://doi.org/10.1016/j.cell.2012.05.003)). The following metagene plot generated using `Guitar` packege which takes the `bed12` file output of the `exomePeak` as input. 
<img src="plots/control_mrna_test.png" style="height:400px">        

### Motif analysis 

We've anlyzed only non-trated samples here. We've used [FIRE](https://tavazoielab.c2b2.columbia.edu/FIRE/) to compare m6A vs. input peaks for `[AG]GAC` and `[AGT][AG]AC[ACT]` motifs. The result shows significant presence of these motifs as expected. 
<img src="plots/fire.motifs_of_interest.summary.png" title="summary" style="width:1000px">

## Differential methylation
#### Contrast design: $\frac{m6A^{treated}}{input^{treated}}$ vs.$\frac{m6A^{nontreated}}{input^{nontreated}}$


### Plots
As expected, the Wilcoxon signed-rank test and t-test show overall hypo-methylation in each treatment. Also, noticeably gene names on the volcano plots and heatmap are according to the peaks as described. The heatmap plot shows the top 30 most variable peaks between four different conditions.

<table>
  <tr>
    <td><img src="plots/peak_Histograms.png" style="width:400px">
    <td><img src="plots/peak_Volcanos.png" style="height:400px">
  <tr>
<table>

<table>
  <tr>
    <td><img src="plots/Corrplot.png" style="height:400px">
  <tr>
<table>


<table>
  <tr>
    <td><img src="plots/peak_mostVar_Heatmap.png" style="height:500px">        
  <tr>
<table>


# Enrichment analysis 

## Hypo methylation 
https://github.com/goodarzilab/TEISER

We are using trimmed fastq files of the `input` samples to align them through `Salmon` to the same human reference genome. Then, we do differential expression analysis through `DESeq2` to compare treated vs non-treated samples to provide RNA expression table as the input of the TEISER algorithm. Also, Hypo methylated gene list selected from the above differential methylation analysis (log2FoldChange < -2).

### METTL14
<td><img src="plots/teiser.METTL14_hypo_methylation_GENESET.png" style="height:300px">

### METTL3
<img src="plots/teiser.METTL3_hypo_methylation_GENESET.png" style="height:300px">

### Virma
<img src="plots/teiser.Virma_hypo_methylation_GENESET.png" style="height:300px">

### WTAP
<img src="plots/teiser.WTAP_hypo_methylation_GENESET.png" style="height:300px">

# iPAGE
https://tavazoielab.c2b2.columbia.edu/iPAGE/
Here, we use the differential methylation log2FoldChange for each treatment to run iPAGE with different sets of gene lists like GO, RBPs, etc. 

## METTL14 vs NT
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_METTL14_vs_NT_human_ensembl.png" style="width:600px">
	human_ensembl
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_METTL14_vs_NT_human_ensembl_msigdb_c1.png" style="width:600px">
	human_ensembl_msigdb_c1
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_METTL14_vs_NT_human_ensembl_msigdb_c2.png" style="width:600px">
	human_ensembl_msigdb_c2
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_METTL14_vs_NT_human_ensembl_msigdb_c3.png" style="width:600px">
	human_ensembl_msigdb_c3
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_METTL14_vs_NT_human_ensembl_msigdb_c4.png" style="width:600px">
	human_ensembl_msigdb_c4
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_METTL14_vs_NT_human_ensembl_msigdb_c5.png" style="width:600px">
	human_ensembl_msigdb_c5
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_METTL14_vs_NT_human_ensembl_msigdb_c7.png" style="width:600px">
	human_ensembl_msigdb_c7
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_METTL14_vs_NT_human_ensembl_msigdb_full.png" style="width:600px">
	human_ensembl_msigdb_full
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_METTL14_vs_NT_human_ensembl_RBPs_coding_gene_ids_by_5UTR.png" style="width:600px">
	human_ensembl_RBPs_coding_gene_ids_by_5UTR
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_METTL14_vs_NT_human_ensembl_RBPs_coding_gene_ids_by_coding_exons.png" style="width:600px">
	human_ensembl_RBPs_coding_gene_ids_by_coding_exons
  <tr>
<table>

## METTL3 vs NT
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_METTL3_vs_NT_human_ensembl.png" style="width:600px">
	human_ensembl
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_METTL3_vs_NT_human_ensembl_msigdb_c1.png" style="width:600px">
	human_ensembl_msigdb_c1
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_METTL3_vs_NT_human_ensembl_msigdb_c2.png" style="width:600px">
	human_ensembl_msigdb_c2
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_METTL3_vs_NT_human_ensembl_msigdb_c3.png" style="width:600px">
	human_ensembl_msigdb_c3
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_METTL3_vs_NT_human_ensembl_msigdb_c4.png" style="width:600px">
	human_ensembl_msigdb_c4
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_METTL3_vs_NT_human_ensembl_msigdb_c5.png" style="width:600px">
	human_ensembl_msigdb_c5
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_METTL3_vs_NT_human_ensembl_msigdb_c6.png" style="width:600px">
	human_ensembl_msigdb_c6
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_METTL3_vs_NT_human_ensembl_msigdb_c7.png" style="width:600px">
	human_ensembl_msigdb_c7
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_METTL3_vs_NT_human_ensembl_msigdb_full.png" style="width:600px">
	human_ensembl_msigdb_full
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_METTL3_vs_NT_human_ensembl_msigdb_h.png" style="width:600px">
	human_ensembl_msigdb_h
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_METTL3_vs_NT_human_ensembl_RBPs_coding_gene_ids.png" style="width:600px">
	human_ensembl_RBPs_coding_gene_ids
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_METTL3_vs_NT_human_ensembl_RBPs_coding_gene_ids_by_3UTR.png" style="width:600px">
	human_ensembl_RBPs_coding_gene_ids_by_3UTR
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_METTL3_vs_NT_human_ensembl_RBPs_coding_gene_ids_by_5UTR.png" style="width:600px">
	human_ensembl_RBPs_coding_gene_ids_by_5UTR
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_METTL3_vs_NT_human_ensembl_RBPs_coding_gene_ids_by_coding_exons.png" style="width:600px">
	human_ensembl_RBPs_coding_gene_ids_by_coding_exons
  <tr>
<table>

## Virma vs NT
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_Virma_vs_NT_human_ensembl.png" style="width:600px">
	human_ensembl
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_Virma_vs_NT_human_ensembl_msigdb_c1.png" style="width:600px">
	human_ensembl_msigdb_c1
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_Virma_vs_NT_human_ensembl_msigdb_c2.png" style="width:600px">
	human_ensembl_msigdb_c2
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_Virma_vs_NT_human_ensembl_msigdb_c3.png" style="width:600px">
	human_ensembl_msigdb_c3
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_Virma_vs_NT_human_ensembl_msigdb_c4.png" style="width:600px">
	human_ensembl_msigdb_c4
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_Virma_vs_NT_human_ensembl_msigdb_c5.png" style="width:600px">
	human_ensembl_msigdb_c5
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_Virma_vs_NT_human_ensembl_msigdb_c6.png" style="width:600px">
	human_ensembl_msigdb_c6
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_Virma_vs_NT_human_ensembl_msigdb_c7.png" style="width:600px">
	human_ensembl_msigdb_c7
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_Virma_vs_NT_human_ensembl_msigdb_full.png" style="width:600px">
	human_ensembl_msigdb_full
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_Virma_vs_NT_human_ensembl_msigdb_h.png" style="width:600px">
	human_ensembl_msigdb_h
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_Virma_vs_NT_human_ensembl_RBPs_all_gene_ids.png" style="width:600px">
	human_ensembl_RBPs_all_gene_ids
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_Virma_vs_NT_human_ensembl_RBPs_coding_gene_ids.png" style="width:600px">
	human_ensembl_RBPs_coding_gene_ids
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_Virma_vs_NT_human_ensembl_RBPs_coding_gene_ids_by_3UTR.png" style="width:600px">
	human_ensembl_RBPs_coding_gene_ids_by_3UTR
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_Virma_vs_NT_human_ensembl_RBPs_coding_gene_ids_by_5UTR.png" style="width:600px">
	human_ensembl_RBPs_coding_gene_ids_by_5UTR
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_Virma_vs_NT_human_ensembl_RBPs_coding_gene_ids_by_coding_exons.png" style="width:600px">
	human_ensembl_RBPs_coding_gene_ids_by_coding_exons
  <tr>
<table>

## WTAP vs NT
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_WTAP_vs_NT_human_ensembl.png" style="width:600px">
	human_ensembl
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_WTAP_vs_NT_human_ensembl_msigdb_c1.png" style="width:600px">
	human_ensembl_msigdb_c1
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_WTAP_vs_NT_human_ensembl_msigdb_c2.png" style="width:600px">
	human_ensembl_msigdb_c2
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_WTAP_vs_NT_human_ensembl_msigdb_c3.png" style="width:600px">
	human_ensembl_msigdb_c3
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_WTAP_vs_NT_human_ensembl_msigdb_c4.png" style="width:600px">
	human_ensembl_msigdb_c4
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_WTAP_vs_NT_human_ensembl_msigdb_c5.png" style="width:600px">
	human_ensembl_msigdb_c5
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_WTAP_vs_NT_human_ensembl_msigdb_c6.png" style="width:600px">
	human_ensembl_msigdb_c6
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_WTAP_vs_NT_human_ensembl_msigdb_c7.png" style="width:600px">
	human_ensembl_msigdb_c7
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_WTAP_vs_NT_human_ensembl_msigdb_full.png" style="width:600px">
	human_ensembl_msigdb_full
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_WTAP_vs_NT_human_ensembl_RBPs_coding_gene_ids_by_3UTR.png" style="width:600px">
	human_ensembl_RBPs_coding_gene_ids_by_3UTR
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_WTAP_vs_NT_human_ensembl_RBPs_coding_gene_ids_by_5UTR.png" style="width:600px">
	human_ensembl_RBPs_coding_gene_ids_by_5UTR
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_WTAP_vs_NT_human_ensembl_RBPs_coding_gene_ids_by_coding_exons.png" style="width:600px">
	human_ensembl_RBPs_coding_gene_ids_by_coding_exons
  <tr>
<table>
<table>
  <tr>
    <td><img src="plots/ipage.d_mtyl_WTAP_vs_NT_human_ensembl_RBPs_coding_gene_ids_by_introns.png" style="width:600px">
	human_ensembl_RBPs_coding_gene_ids_by_introns
  <tr>
<table>

## Similar paper 
> [Dynamics of the human and viral m6 A RNA methylomes during HIV-1 infection of T cells](https://www.nature.com/articles/nmicrobiol201611.pdf?proof=t)
[GEO](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE74016)

It might be informative to analyze data in this paper using above pipeline. However, they’re using [pLAI.2 HIV](https://aidsreagent.org/reagentdetail.cfm?t=molecular_clones&id=47) clone. 