This repository contains the scripts and notebooks used to process and analyse the data in the Scitron project.
The data is composed of WES samples from tumor, cfdna, pdx and gdna. The samples come from CRC patients treated with BRAF inhibitors.
The samples were processed in Amazon using Nextflow
(sarek nf-core 2.7.0) and Tower.nf
The configuration for sarek
is found in the file nf_sarek.conf
The genome of choice was GRCh38
. the PON used was obtained from GATK (1000g_pon.hg38.vcf.gz)
and it is available upon request. A BED file with intervales was generated by intersecting
and collapsing all the intervales from the 3 KITs used to sequence the different samples.
This file is also available upon request.
The results obtained are componsed of:
- Recalibrated BAM files
- SNPs and indels from
Mutect2
andStrelka
(paired and tumor-only) - CN alterations from
Control-FREEC
(paired and tumor-only) - Structural variants from
Manta
(paired and tumor-only) MSI scores
from msi-score (paired and tumor-only)- Multiple QC stats and info
The SNPs and indels were post-processed and merged (one single mutation table) using
the script found in sarek_merge
. This script performs some filtering and extra annotation.
The CNVs were post-processed and merged with the notebook parse_cnvs.ipynb
The MSI scores post-processed and added to the metadata file with the notebook parse_msi.ipynb
There are multiple notebooks that were used to perform the analysis and visualizations.