## This notebook runs the setup for JBrowse2 using Bash commands.

### Instructions for Running the Notebook:

1. Ensure you have created the APACHE_ROOT environment variable pointing to your Apache root directory.
   Example: export APACHE_ROOT='/path/to/apache/root'
   If you're unsure of the correct path, you can find it by running:
   sudo find / -name "www" 2>/dev/null

2. Ensure the required dependencies are installed, such as wget, samtools, bowtie2, and jbrowse.

3. Modify the APACHE_ROOT and WORKDIR variables in the notebook to reflect your setup.

4. Execute the notebook cells in order.

Notes:
- The notebook assumes you have a JBrowse2 instance installed in the $APACHE_ROOT/jbrowse2 directory.
- If any errors occur, the notebook will output the error message and stop execution.


If you have to restart due to errors, make sure to: 
1. Locate the www folder and clear jbrowse2 and any other contents in the folder (cd cd /usr/local/var/www) using rm -rf jbrowse2
2. Next locate the tmp folder and then clear that as well 
3. Now run the following commands while in the tmp folder on your terminal 

In [None]:
#DO NOT RUN ON NOTEBOOK 
#clearing data terminal commands 
cd /usr/local/var/www
rm -rf jbrowse2
cd
cd tmp 
rm -rf * #deletes everything in the tmp folder

In [None]:
#DO NOT RUN ON NOTEBOOK 
# recreating the jbrowse folder, run while in tmp folder 
jbrowse create output_folder
sudo mv output_folder $APACHE_ROOT/jbrowse2
sudo chown -R $(whoami) $APACHE_ROOT/jbrowse2

### Error Handling Function and Setting up Directories

In [41]:
%%bash
set -e  # Stop execution on errors

# Define WORKDIR and APACHE_ROOT 
WORKDIR="/Users/smrithisurender/tmp"  # Writable directory
mkdir -p "$WORKDIR"
chmod u+w "$WORKDIR"
cd "$WORKDIR"

APACHE_ROOT="/usr/local/var/www"

echo "Working directory: $WORKDIR"
echo "Apache root directory: $APACHE_ROOT"

# Ensure Apache root exists
if [ ! -d "$APACHE_ROOT/jbrowse2" ]; then
    echo "Error: JBrowse2 not found at $APACHE_ROOT/jbrowse2. Please check your APACHE_ROOT variable."
    exit 1
fi

Working directory: /Users/smrithisurender/tmp
Apache root directory: /usr/local/var/www


### Download and Process Dengue Referencec Genome

In [25]:
%%bash
set -e  # Stop execution on errors

cd "/Users/smrithisurender/tmp"  # Ensure you're in the working directory

echo "Downloading Dengue genome (reference genome)..."
wget -q "https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/862/125/GCF_000862125.1_ViralProj15306/GCF_000862125.1_ViralProj15306_genomic.fna.gz" -O genome.fna.gz

echo "Decompressing reference genome file..."
gunzip -f genome.fna.gz || exit 1

echo "Renaming reference genome file..."
mv genome.fna viral_genome.fa || exit 1

echo "Indexing reference genome file with samtools..."
samtools faidx viral_genome.fa || exit 1

Downloading Dengue genome (reference genome)...
Decompressing reference genome file...
Renaming reference genome file...
Indexing reference genome file with samtools...


### Add Reference Genome to JBrowse

In [26]:
%%bash

set -e

cd "/Users/smrithisurender/tmp"  

APACHE_ROOT="/usr/local/var/www"

echo "Adding reference genome assembly to JBrowse..."
jbrowse add-assembly viral_genome.fa --out "$APACHE_ROOT/jbrowse2" --load copy || exit 1


Adding reference genome assembly to JBrowse...
Added assembly "viral_genome" to /usr/local/var/www/jbrowse2/config.json


In [27]:
%%bash

set -e

cd "/Users/smrithisurender/tmp"  

APACHE_ROOT="/usr/local/var/www"

# Path to the JBrowse2 configuration file
CONFIG_FILE="$APACHE_ROOT/jbrowse2/config.json"

echo "Modifying JBrowse2 configuration to add sequenceConfig and hide reverse strand..."

# Add or update the sequenceConfig property
jq '. + {sequenceConfig: {showReverseStrand: false}}' "$CONFIG_FILE" > "$CONFIG_FILE.tmp" && mv "$CONFIG_FILE.tmp" "$CONFIG_FILE"

echo "Configuration updated successfully."


Modifying JBrowse2 configuration to add sequenceConfig and hide reverse strand...


Configuration updated successfully.


### Download and Process Annotations

In [None]:
%%bash

set -e

cd "/Users/smrithisurender/tmp"  


APACHE_ROOT="/opt/homebrew/var/www"

echo "Downloading genome annotations..."
wget -q "https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/862/125/GCF_000862125.1_ViralProj15306/GCF_000862125.1_ViralProj15306_genomic.gff.gz" -O annotations.gff.gz || exit 1

echo "Decompressing annotations file..."
gunzip -f annotations.gff.gz || exit 1

echo "Sorting GFF3 annotations..."
jbrowse sort-gff annotations.gff > genes.gff || exit 1

echo "Compressing sorted GFF3 file..."
bgzip -f genes.gff || exit 1

echo "Indexing compressed GFF3 file with tabix..."
tabix genes.gff.gz || exit 1

echo "Adding annotations track to JBrowse..."
jbrowse add-track genes.gff.gz --out "$APACHE_ROOT/jbrowse2" --load copy || exit 1



Downloading genome annotations...


Decompressing annotations file...
Sorting GFF3 annotations...
Compressing sorted GFF3 file...
Indexing compressed GFF3 file with tabix...
Adding annotations track to JBrowse...
Added track with name "genes.gff" and trackId "genes.gff" to /usr/local/var/www/jbrowse2/config.json


In [None]:

%%bash
set -e 


 # Stop execution on errors

#Downloading all required files for Dengue Virus 

cd "/Users/smrithisurender/tmp"  # Ensure you're in the working directory
APACHE_ROOT="/usr/local/var/www"

#getting the reference genome files 
echo "Downloading Dengue genome (reference genome)..."
wget -q "https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/862/125/GCF_000862125.1_ViralProj15306/GCF_000862125.1_ViralProj15306_genomic.fna.gz" -O dvgenome.fna.gz

echo "Decompressing reference genome file..."
gunzip -f dvgenome.fna.gz || exit 1

echo "Renaming reference genome file..."
mv dvgenome.fna dv_genome.fa || exit 1

echo "Indexing reference genome file with samtools..."
samtools faidx dv_genome.fa || exit 1

echo "Adding reference genome assembly to JBrowse..."
jbrowse add-assembly dv_genome.fa --out "$APACHE_ROOT/jbrowse2" --load copy || exit 1


#getting the reference geneome annotaions 
echo "Downloading genome annotations..."
wget -q "https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/862/125/GCF_000862125.1_ViralProj15306/GCF_000862125.1_ViralProj15306_genomic.gff.gz" -O dvannotations.gff.gz || exit 1

echo "Decompressing annotations file..."
gunzip -f dvannotations.gff.gz || exit 1

echo "Sorting GFF3 annotations..."
jbrowse sort-gff dvannotations.gff > dvgenes.gff || exit 1

echo "Compressing sorted GFF3 file..."
bgzip -f dvgenes.gff || exit 1

echo "Indexing compressed GFF3 file with tabix..."
tabix dvgenes.gff.gz || exit 1

echo "Adding annotations track to JBrowse..."
jbrowse add-track dvgenes.gff.gz --out "$APACHE_ROOT/jbrowse2" --load copy || exit 1


# getting the fasta files for the alignment 
# dengue virus 1
wget "https://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?id=OR486055.1&report=fasta&format=text" -O dengue_virus1.fa
# dengue virus 2
wget "https://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?id=OR771147.1&report=fasta&format=text" -O dengue_virus2.fa
#dengue virus 3
wget "https://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?id=OQ821525&report=fasta&format=text" -O dengue_virus3.fa



Downloading Dengue genome (reference genome)...
Decompressing reference genome file...
Renaming reference genome file...
Indexing reference genome file with samtools...
Adding reference genome assembly to JBrowse...
Added assembly "dv_genome" to /usr/local/var/www/jbrowse2/config.json
Downloading genome annotations...
Decompressing annotations file...
Sorting GFF3 annotations...
Compressing sorted GFF3 file...
Indexing compressed GFF3 file with tabix...
Adding annotations track to JBrowse...
Added track with name "dvgenes.gff" and trackId "dvgenes.gff" to /usr/local/var/www/jbrowse2/config.json


--2024-11-25 15:38:18--  https://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?id=OR486055.1&report=fasta&format=text
Resolving www.ncbi.nlm.nih.gov (www.ncbi.nlm.nih.gov)... 2607:f220:41e:4290::110, 130.14.29.110
Connecting to www.ncbi.nlm.nih.gov (www.ncbi.nlm.nih.gov)|2607:f220:41e:4290::110|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/plain]
Saving to: ‘dengue_virus1.fa’

     0K ..........                                             31.8M=0s

2024-11-25 15:38:19 (31.8 MB/s) - ‘dengue_virus1.fa’ saved [10954]

--2024-11-25 15:38:19--  https://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?id=OR771147.1&report=fasta&format=text
Resolving www.ncbi.nlm.nih.gov (www.ncbi.nlm.nih.gov)... 2607:f220:41e:4290::110, 130.14.29.110
Connecting to www.ncbi.nlm.nih.gov (www.ncbi.nlm.nih.gov)|2607:f220:41e:4290::110|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/plain]
Saving to: ‘dengue_virus2.fa’

     0K .........

In [55]:
%%bash
which bowtie2
which samtools

/Users/smrithisurender/bioe231fp/bowtie2-2.4.2-sra-linux-x86_64/bowtie2
/usr/local/bin/samtools


In [None]:
%%bash
set -e 


 # Stop execution on errors
# processing the reference genome and then creating alignments compared to the other genomes 
export PATH=$PATH:/usr/local/bin

cd "/Users/smrithisurender/tmp"  # Ensure you're in the working directory
APACHE_ROOT="/usr/local/var/www"
WORKDIR="/Users/smrithisurender/tmp"
# creating the dengue virus alignments 
echo "Preparing reference genome..."
bowtie2-build $WORKDIR/dv_genome.fa $WORKDIR/dv_genome


echo "Aligning comparison genome to dengue virus 1 genome using Bowtie2..."
bowtie2 -x $WORKDIR/dv_genome -f $WORKDIR/dengue_virus1.fa -S $WORKDIR/dengue_virus1.sam --very-sensitive
samtools view -bS $WORKDIR/dengue_virus1.sam > $WORKDIR/dengue_virus1.bam
samtools sort $WORKDIR/dengue_virus1.bam -o $WORKDIR/dengue_virus1.sorted.bam
samtools index $WORKDIR/dengue_virus1.sorted.bam

jbrowse add-track $WORKDIR/dengue_virus1.sorted.bam --out $APACHE_ROOT/jbrowse2 --load copy --force

bowtie2-build $WORKDIR/dv_genome.fa $WORKDIR/dv_genome
echo "Aligning comparison genome to dengue virus 2 genome using Bowtie2..."
bowtie2 -x $WORKDIR/dv_genome -f $WORKDIR/dengue_virus2.fa -S $WORKDIR/dengue_virus2.sam --very-sensitive
samtools view -bS $WORKDIR/dengue_virus2.sam > $WORKDIR/dengue_virus2.bam
samtools sort $WORKDIR/dengue_virus2.bam -o $WORKDIR/dengue_virus2.sorted.bam
samtools index $WORKDIR/dengue_virus2.sorted.bam

jbrowse add-track $WORKDIR/dengue_virus2.sorted.bam --out $APACHE_ROOT/jbrowse2 --load copy --force

bowtie2-build $WORKDIR/dv_genome.fa $WORKDIR/dv_genome
echo "Aligning comparison genome to dengue virus 3 genome using Bowtie2..."
bowtie2 -x $WORKDIR/dv_genome -f $WORKDIR/dengue_virus3.fa -S $WORKDIR/dengue_virus3.sam --very-sensitive
samtools view -bS $WORKDIR/dengue_virus3.sam > $WORKDIR/dengue_virus3.bam
samtools sort $WORKDIR/dengue_virus3.bam -o $WORKDIR/dengue_virus3.sorted.bam
samtools index $WORKDIR/dengue_virus3.sorted.bam

jbrowse add-track $WORKDIR/dengue_virus3.sorted.bam --out $APACHE_ROOT/jbrowse2 --load copy --force


echo "Comparison of dengue virus alignments successfully added to JBrowse."

Preparing reference genome...
Settings:
  Output files: "/Users/smrithisurender/tmp/dv_genome.*.bt2"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Max bucket size: default
  Max bucket size, sqrt multiplier: default
  Max bucket size, len divisor: 4
  Difference-cover sample period: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  /Users/smrithisurender/tmp/dv_genome.fa


Building a SMALL index


Reading reference sizes
  Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:00
bmax according to bmaxDivN setting: 2683
Using parameters --bmax 2013 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 2013 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
  Building sPrime
  Building sPrimeOrder
  V-Sorting samples
  V-Sorting samples time: 00:00:00
  Allocating rank array
  Ranking v-sort output
  Ranking v-sort output time: 00:00:00
  Invoking Larsson-Sadakane on ranks
  Invoking Larsson-Sadakane on ranks time: 00:00:00
  Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12

Renaming /Users/smrithisurender/tmp/dv_genome.3.bt2.tmp to /Users/smrithisurender/tmp/dv_genome.3.bt2
Renaming /Users/smrithisurender/tmp/dv_genome.4.bt2.tmp to /Users/smrithisurender/tmp/dv_genome.4.bt2
Renaming /Users/smrithisurender/tmp/dv_genome.1.bt2.tmp to /Users/smrithisurender/tmp/dv_genome.1.bt2
Renaming /Users/smrithisurender/tmp/dv_genome.2.bt2.tmp to /Users/smrithisurender/tmp/dv_genome.2.bt2
Renaming /Users/smrithisurender/tmp/dv_genome.rev.1.bt2.tmp to /Users/smrithisurender/tmp/dv_genome.rev.1.bt2
Renaming /Users/smrithisurender/tmp/dv_genome.rev.2.bt2.tmp to /Users/smrithisurender/tmp/dv_genome.rev.2.bt2


Aligning comparison genome to dengue virus 1 genome using Bowtie2...


1 reads; of these:
  1 (100.00%) were unpaired; of these:
    0 (0.00%) aligned 0 times
    1 (100.00%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
100.00% overall alignment rate


Overwrote track with name "dengue_virus1.sorted" and trackId "dengue_virus1.sorted" in /usr/local/var/www/jbrowse2/config.json
Settings:
  Output files: "/Users/smrithisurender/tmp/dv_genome.*.bt2"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Max bucket size: default
  Max bucket size, sqrt multiplier: default
  Max bucket size, len divisor: 4
  Difference-cover sample period: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  /Users/smrithisurender/tmp/dv_genome.fa


Building a SMALL index


Reading reference sizes
  Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:00
bmax according to bmaxDivN setting: 2683
Using parameters --bmax 2013 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 2013 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
  Building sPrime
  Building sPrimeOrder
  V-Sorting samples
  V-Sorting samples time: 00:00:00
  Allocating rank array
  Ranking v-sort output
  Ranking v-sort output time: 00:00:00
  Invoking Larsson-Sadakane on ranks
  Invoking Larsson-Sadakane on ranks time: 00:00:00
  Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12

Renaming /Users/smrithisurender/tmp/dv_genome.3.bt2.tmp to /Users/smrithisurender/tmp/dv_genome.3.bt2
Renaming /Users/smrithisurender/tmp/dv_genome.4.bt2.tmp to /Users/smrithisurender/tmp/dv_genome.4.bt2
Renaming /Users/smrithisurender/tmp/dv_genome.1.bt2.tmp to /Users/smrithisurender/tmp/dv_genome.1.bt2
Renaming /Users/smrithisurender/tmp/dv_genome.2.bt2.tmp to /Users/smrithisurender/tmp/dv_genome.2.bt2
Renaming /Users/smrithisurender/tmp/dv_genome.rev.1.bt2.tmp to /Users/smrithisurender/tmp/dv_genome.rev.1.bt2
Renaming /Users/smrithisurender/tmp/dv_genome.rev.2.bt2.tmp to /Users/smrithisurender/tmp/dv_genome.rev.2.bt2
1 reads; of these:
  1 (100.00%) were unpaired; of these:
    1 (100.00%) aligned 0 times
    0 (0.00%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
0.00% overall alignment rate


Overwrote track with name "dengue_virus2.sorted" and trackId "dengue_virus2.sorted" in /usr/local/var/www/jbrowse2/config.json
Settings:
  Output files: "/Users/smrithisurender/tmp/dv_genome.*.bt2"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Max bucket size: default
  Max bucket size, sqrt multiplier: default
  Max bucket size, len divisor: 4
  Difference-cover sample period: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  /Users/smrithisurender/tmp/dv_genome.fa
Reading reference sizes


Building a SMALL index


  Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:00
bmax according to bmaxDivN setting: 2683
Using parameters --bmax 2013 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 2013 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
  Building sPrime
  Building sPrimeOrder
  V-Sorting samples
  V-Sorting samples time: 00:00:00
  Allocating rank array
  Ranking v-sort output
  Ranking v-sort output time: 00:00:00
  Invoking Larsson-Sadakane on ranks
  Invoking Larsson-Sadakane on ranks time: 00:00:00
  Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
  (Using differ

Renaming /Users/smrithisurender/tmp/dv_genome.3.bt2.tmp to /Users/smrithisurender/tmp/dv_genome.3.bt2
Renaming /Users/smrithisurender/tmp/dv_genome.4.bt2.tmp to /Users/smrithisurender/tmp/dv_genome.4.bt2
Renaming /Users/smrithisurender/tmp/dv_genome.1.bt2.tmp to /Users/smrithisurender/tmp/dv_genome.1.bt2
Renaming /Users/smrithisurender/tmp/dv_genome.2.bt2.tmp to /Users/smrithisurender/tmp/dv_genome.2.bt2
Renaming /Users/smrithisurender/tmp/dv_genome.rev.1.bt2.tmp to /Users/smrithisurender/tmp/dv_genome.rev.1.bt2
Renaming /Users/smrithisurender/tmp/dv_genome.rev.2.bt2.tmp to /Users/smrithisurender/tmp/dv_genome.rev.2.bt2
1 reads; of these:
  1 (100.00%) were unpaired; of these:
    1 (100.00%) aligned 0 times
    0 (0.00%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
0.00% overall alignment rate


Overwrote track with name "dengue_virus3.sorted" and trackId "dengue_virus3.sorted" in /usr/local/var/www/jbrowse2/config.json
Comparison of dengue virus alignments successfully added to JBrowse.
