# Notes

### Converting .bam to .fastq 11/14/2024

First, one of the bam files was loaded into igv using hg38 as the reference genome to determine strandedness. Due to the sense and antisense strands being intersperced, these mapped reads are part of a paired-end unstranded library. Files were then converted from bam to fastq format using samtools v.1.21:

In [None]:
# Checks if the bam file is intact and not truncated
samtools quickcheck SJAML003320_D3.bam
# No output indicates that there are no issues

# Sort paired-end read alignment .bam file by name
samtools sort -n SJAML003320_D3.bam -o SJAML003320_D3_sorted.bam

# Convert the sorted file into two compressed fastq files, each containing one of the paired-end reads
samtools fastq -@ 8 bamInput/SJAML003320_D3_sorted.bam \
    -1 fastqOutput/SJAML003320_D3_R1.fastq.gz \
    -2 fastqOutput/SJAML003320_D3_R2.fastq.gz \
    -0 /dev/null -s /dev/null -n


According to the output, 0 singletons were discarded using the /dev/null code while 185050754 reads were processed for this first SJAML003320 file. Resulting fastq files can be found in Dropbox in Ryanne Mulligan/Ryanne.2024/RNAseqFastqTestset and in the share drive.

### Using Jupyter Notebook 11/15/2024

Whenever you want to update the coding notebook, use: 


In [None]:
#$ jb build .

then commit to Github and use:

In [None]:
#$ ghp-import -n -p -f _build/html

### Using Poetry to Create a Package 11/18/2024

According to the [python naming conventions](https://peps.python.org/pep-0008/#package-and-module-names), python packages should have short, all-lowercase names that avoid underscores when necessary. Whenever you want to create a package, a build backend is required which will make a source distribution and build distribution that you need. The build backend I have decided to use is Poetry, which is considered the up-and-coming platform. Poetry requires the following file layout:

In [None]:
poetry-demo
├── pyproject.toml
├── README.md
├── poetry_demo
    └── __init__.py
    └── demo.py

To get more information, visit the [poetry website](https://python-poetry.org/docs/). You will need to pipx install poetry.

First, you will want to do a test run of your package upload using TestPyPi. Create a TestPyPi account and verify it to upload packages. Go to the account settings and create an API token. In the termminal, add the TestPyPi repository to your Poetry configuration: 

In [None]:
#$ poetry config repositories.testpypi https://test.pypi.org/legacy/

Then, add your credentials to Poetry:

In [None]:
#$ poetry config pypi-token.testpypi your-token-here

From now on, you do not need to repeat the steps above. Only the below steps will need to be repeated for each upload to TestPyPi. Poetry will create a dist folder containing your source and build distribution from:

In [None]:
#$ poetry build

Finally, use the following command to upload the package:

In [None]:
#$ poetry publish -r testpypi --build


Now it should upload to your projects within TestPyPi. At the top of the page, there is code you can copy that will be used to pip install the package on a system. Note that the pip install here uses a webiste link because this is TestPyPi, and not the real PyPi. Another note- every time you want to re-upload the package, you must change the version number within pyproject.toml or else it will not work.

### Uploading Files from a Third-Party Site to HiPerGator 11/18/2024

Log in using your UF credentials to [UF onDemand](https://ondemand.rc.ufl.edu) and create a console using the default settings. Once the interactive session starts, navigate to the folder you want to upload to within HiPerGator in the terminal. In my case, this was /blue/jatinderklamba/mulligan/data/TestBamFiles/ . Type in the following commands:

In [None]:
#$ module load ubuntu
#$ chrome

This will open Chrome within the console. Navigate to the Chrome settings and change the default download location to the HiPerGator folder you are in. Then, navigate to the third-party site such as Dropbox. Whatever you download will be downloaded to the HiPerGator folder. Note- downloading all the files from Dropbox at once creates a zipped file that won't want to open. It is best to download these one at a time until a better solution is found. For more information about uploading files to HiPerGator visit [this website](https://help.rc.ufl.edu/doc/Transfer_Data).

### List of Cryptic Proteins in AML

Obtained acute myeloid leukemia specific alternatively spliced protein isoforms from the [ASCancer Atlas](https://ngdc.cncb.ac.cn/ascancer/home).

In [2]:
import pandas as pd
pd.read_csv('datasets/Experimentally_supported_AS_events (1).csv')

Unnamed: 0,_id,has_oncoprint,event_id,as_model_id,cancer_name,tcga_project_id,gene_name,hgnc_id,ensembl_id,chr,...,external_intervention,regulatory_mechanism,regulatory_gene,biological_function,functional_description,year,pubmed_id,journal,title,cancer_names
0,"=""62f8e2a5a2bd957939763c39""","=""no""","=""TMEM14C_chr6_+_A3SS_10723148:10723474:107247...","=""TMEM14C_chr6_+_A3SS_10723474:10724789:107248...","=""Acute Myeloid Leukemia""","=""TCGA-LAML""","=""TMEM14C""","=""20952""","=""ENSG00000111843""","=""chr6""",...,"=""-""","=""splicing factor, mutation""","=""SF3B1(K700E)""","=""Cell Growth, Cell Survival""","=""Delivery of synthetic intron-containing HSV–...","=""2022""","=""35241838""","=""Nat Biotechnol""","=""Synthetic introns enable splicing factor mut...","=""Acute Myeloid Leukemia"""
1,"=""62f8e2a5a2bd957939763c41""","=""yes""","=""INTS3_chr1_+_IR_153719433:153719546:15371975...","=""INTS3_chr1_+_IR_153719546:153719755""","=""Acute Myeloid Leukemia""","=""TCGA-LAML""","=""INTS3""","=""26153""","=""ENSG00000143624""","=""chr1""",...,"=""-""","=""splicing factor, mutation""","=""SRSF2(P95H), IDH2(R140Q)""","=""DNA Repair, Cell Differentiation""","=""INTS3 depletion in these cells significantly...","=""2019""","=""31578525""","=""Nature""","=""Coordinated alterations in RNA splicing and ...","=""Acute Myeloid Leukemia"""
2,"=""62f8e2a5a2bd957939763c44""","=""no""","=""CSF3R_chr1_-_A3SS_36931644:36932428:36931644...","=""CSF3R_chr1_-_A3SS_36931644:36932428:36931644...","=""Acute Myeloid Leukemia""","=""TCGA-LAML""","=""CSF3R""","=""2439""","=""ENSG00000119535""","=""chr1""",...,"=""-""","=""splicing factor, mutation, expression""","=""SRSF2""","=""Cell Proliferation""","=""CSF3R is a target of SRSF2 mutations, which ...","=""2020""","=""31462738""","=""Leukemia""","=""Altered expression of CSF3R splice variants ...","=""Acute Myeloid Leukemia"""
3,"=""62f8e2a5a2bd957939763c66""","=""no""","=""KAT5_chr11_+_ES_65480393:65480529:65480819:6...","=""KAT5_chr11_+_ES_65480529:65480819:65480974:6...","=""Acute Myeloid Leukemia""","=""TCGA-LAML""","=""KAT5""","=""5275""","=""ENSG00000172977""","=""chr11""",...,"=""-""","=""expression""","=""PRMT5""","=""DNA Damage, DNA Repair, Cell Cycle""","=""PRMT5 depletion or inhibition induces aberra...","=""2018""","=""30184499""","=""Cell Rep""","=""PRMT5 regulates DNA repair by controlling th...","=""Acute Myeloid Leukemia"""
4,"=""62f8e2a5a2bd957939763cbe""","=""yes""","=""MECOM_chr3_-_A3SS_168810746:168810872:168810...","=""MECOM_chr3_-_A3SS_168810746:168810872:168810...","=""Acute Myeloid Leukemia""","=""TCGA-LAML""","=""MECOM""","=""3498""","=""ENSG00000085276""","=""chr3""",...,"=""-""","=""splicing factor, mutation""","=""SF3B1(K700E/ K666N/G740E)""","=""Cell Differentiation, Self-renewal""","=""A novel EVI1 splice isoform is frequently ex...","=""2022""","=""35709354""","=""Blood""","=""Aberrant EVI1 splicing contributes to EVI1-r...","=""Acute Myeloid Leukemia"""
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
63,"=""62f8e2a6a2bd9579397643e1""","=""no""","=""BCL2L1_chr20_-_A5SS_30252255:30253889:303094...","=""BCL2L1_chr20_-_A5SS_30253889:30309458:303101...","=""Acute Myeloid Leukemia""","=""TCGA-LAML""","=""BCL2L1""","=""992""","=""ENSG00000171552""","=""chr20""",...,"=""-""","=""splicing factor, expression""","=""RBM25""","=""Tumor Growth, Apoptosis""","=""Reduction in apoptosis of RBM25 KD U937 cell...","=""2019""","=""30635567""","=""Nat Commun""","=""The splicing factor RBM25 controls MYC activ...","=""Acute Myeloid Leukemia"""
64,"=""62f8e2a6a2bd9579397643f9""","=""no""","=""CD33_chr19_+_ES_51728354:51728411:51728474:5...","=""CD33_chr19_+_ES_51728411:51728474:51728854:5...","=""Acute Myeloid Leukemia""","=""TCGA-LAML""","=""CD33""","=""1659""","=""ENSG00000105383""","=""chr19""",...,"=""-""","=""splicing factor, expression""","=""SRSF2""","=""Cell Differentiation, Cell Growth""","=""The skipping of exon2, resulting in the shor...","=""2017""","=""28644774""","=""J Clin Oncol""","=""CD33 splicing polymorphism determines gEMTuz...","=""Acute Myeloid Leukemia"""
65,"=""62f8e2a6a2bd9579397643fa""","=""yes""","=""MAP3K7_chr6_-_A3SS_91269795:91269933:9126979...","=""MAP3K7_chr6_-_A3SS_91269795:91269933:9126979...","=""Acute Myeloid Leukemia""","=""TCGA-LAML""","=""MAP3K7""","=""6859""","=""ENSG00000135341""","=""chr6""",...,"=""-""","=""splicing factor, mutation""","=""SF3B1(K700E)""","=""Cell Growth, Cell Survival""","=""Delivery of synthetic intron-containing HSV–...","=""2022""","=""35241838""","=""Nat Biotechnol""","=""Synthetic introns enable splicing factor mut...","=""Acute Myeloid Leukemia"""
66,"=""62f8e2a6a2bd957939764407""","=""no""","=""MTERFD3_chr12_-_IR_107378893:107379003:10738...","=""MTERFD3_chr12_-_IR_107379003:107380747""","=""Acute Myeloid Leukemia""","=""TCGA-LAML""","=""MTERFD3""","=""30779""","=""ENSG00000120832""","=""chr12""",...,"=""-""","=""splicing factor, mutation""","=""SF3B1(K700E)""","=""Cell Growth, Cell Survival""","=""Delivery of synthetic intron-containing HSV–...","=""2022""","=""35241838""","=""Nat Biotechnol""","=""Synthetic introns enable splicing factor mut...","=""Acute Myeloid Leukemia"""
