We assume conda is installed properly and the user has write access to create and run an environment.

We also assume enough available resources for the run. The major one is 64gb or memory. 

The rest of the parameter files assume a 12 thread CPU (the one we had for the project).

We setup an environment and apply some must needed fixes to all the applicable software:

In [None]:
%env THREADS 12


# !conda create -n l1em python=3.11 openjdk=21 -y
# !conda activate l1em
%conda config --add channels defaults
%conda config --prepend channels bioconda
%conda config --prepend channels conda-forge
#!conda config --set channel_priority strict
#!conda install bedtools bwa bzip2 fastqc htslib numpy pysam samtools scipy sra-tools trim-galore zstd
# %conda install -y python=3.11 openjdk=21 bwa samtools bioconda/label/cf201901::flux-simulator numpy scipy sra-tools trim-galore pysam bedtools pigz wget
%conda install -y python=3.11 openjdk=21 bwa samtools flux-simulator numpy scipy sra-tools trim-galore pysam bedtools pigz wget ipython=8.14

# Edit the flux-simulator caller script to overcome safety limitation due to old age coding standards (java code reflection)
!if ! grep -q -- '--add-opens java.base/java.util=ALL-UNNAMED' $CONDA_PREFIX/share/flux-simulator-1.2.1-3/bin/flux-simulator; then sed -i '/^java -Xmx\$FLUX_MEM.*/a --add-opens java.base/java.util=ALL-UNNAMED \\\n--add-opens java.desktop/java.awt.font=ALL-UNNAMED \\\n--add-opens java.base/java.text=ALL-UNNAMED' $CONDA_PREFIX/share/flux-simulator-1.2.1-3/bin/flux-simulator; fi

%pip install git+https://github.com/itmat/BEERS2 #git+https://github.com/itmat/CAMPAREE

: 

We go ahead and download the proper experiment data and the reference genome, while also producing the relevant files from each download:

In [None]:
%%bash
mkdir -p ./data/experiment
if [[ ! -f ./data/experiment/SRR3997504_1.fastq || ! -f ./data/experiment/SRR3997504_2.fastq ]]; then
    echo "One or both FASTQ files do not exist in the data folder. Running fasterq-dump..."
    fasterq-dump -O ./data/experiment SRR3997504
else
    echo "Both FASTQ files already exist in the data folder. Skipping fasterq-dump."
fi
if [[ ! -f ./data/experiment/SRR3997504_2_val_2_fastqc.html ]]; then
    "No trimmed files by trim_galore found. Running trim_galore..."
    trim_galore -j $((THREADS/4)) --paired --fastqc -o ./data/experiment/ data/experiment/SRR3997504_1.fastq ./data/experiment/SRR3997504_2.fastq
else
    echo "Trimmed files by trim_galore already exist. Skipping trim_galore."
fi



mkdir -p ./data/genome
if [[ ! -f ./data/genome/hg38.fa ]]; then
    echo "Humane genome 38 does not exist in the data folder. Downloading, unziping and indexing it..."
    wget -P ./data/genome/ http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
    pigz -cd -p $THREADS ./data/genome/hg38.fa.gz > ./data/genome/hg38.fa
    bwa index ./data/genome/hg38.fa
    rm -rf ./data/genome/hg38.fa.gz
else
    echo "Humane genome 38 found. Assuming it is bwa indexed. Skipping downlaoding, unzipping and indexing it."
fi


