Introduction

TRIFID is the name of the method described in the manuscript: Assessing the functional relevance of splice isoforms published in NAR Genomics and Bioinformatics the 22 May 2021.

Citation

@article{10.1093/nargab/lqab044,
    author = {Pozo, Fernando and Martinez-Gomez, Laura and Walsh, Thomas A and Rodriguez, José Manuel and Di Domenico, Tomas and Abascal, Federico and Vazquez, Jesús and Tress, Michael L},
    title = "{Assessing the functional relevance of splice isoforms}",
    journal = {NAR Genomics and Bioinformatics},
    volume = {3},
    number = {2},
    year = {2021},
    month = {05},
    abstract = "{Alternative splicing of messenger RNA can generate an array of mature transcripts, but it is not clear how many go on to produce functionally relevant protein isoforms. There is only limited evidence for alternative proteins in proteomics analyses and data from population genetic variation studies indicate that most alternative exons are evolving neutrally. Determining which transcripts produce biologically important isoforms is key to understanding isoform function and to interpreting the real impact of somatic mutations and germline variations. Here we have developed a method, TRIFID, to classify the functional importance of splice isoforms. TRIFID was trained on isoforms detected in large-scale proteomics analyses and distinguishes these biologically important splice isoforms with high confidence. Isoforms predicted as functionally important by the algorithm had measurable cross species conservation and significantly fewer broken functional domains. Additionally, exons that code for these functionally important protein isoforms are under purifying selection, while exons from low scoring transcripts largely appear to be evolving neutrally. TRIFID has been developed for the human genome, but it could in principle be applied to other well-annotated species. We believe that this method will generate valuable insights into the cellular importance of alternative splicing.}",
    issn = {2631-9268},
    doi = {10.1093/nargab/lqab044},
    url = {https://doi.org/10.1093/nargab/lqab044},
    note = {lqab044},
    eprint = {https://academic.oup.com/nargab/article-pdf/3/2/lqab044/38108084/lqab044.pdf},
}

Introduction
Installation instructions
- Package installation
- Package development
  - Update the dependencies
Model reproducibility
Availability of data
- TRIFID predictions and predictive features
- Other useful links
Example: Fibroblast growth factor receptor 1 (FGFR1)
- Loading the model
- Loading the SHAP predictions for a single isoform
TRIFID modules
Directory structure
Author information
Release History
Contributing
- Branching (internal collaboration)
  - Quickstart tips
- Forking (external collaboration)
License

Introduction

TRIFID is a Machine Learning based-model that aims to predict the functionality of every single isoform in the genome. This model has been designed to be accurate, interpretable and reproducible.

This repository has been created to give the bioinformatician a whole recipe for how this method was created. However, if the user is not interested in the complete installation and execution of TRIFID, jumps directly to section 4, where the TRIFID predictions are described. Furthermore, the user can be interested in the TRIFID side modules that generate only some predictive features. If it is the case, go to section 6.

Go back to the table of Contents presented above, open an issue in this repository or contact directly with the main TRIFID developer if the user wants to know more about any other part of the project.

Installation instructions

Package installation

pip install git+https://github.com/fpozoc/trifid.git

Package development

Run the silent installation of Miniconda/Anaconda in case you don't have this software in your environment.

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda3

Remember to install mamba!

conda install -c conda-forge mamba

Once you have installed Miniconda/Anaconda with mamba, create a Python environment from environment.yml.

git clone git@github.com:fpozoc/trifid.git
cd trifid
mamba env create -f environment.yml
conda activate trifid
pre-commit install

Run the pre-commit/tests:

pre-commit run --all-files
pytest -v

Update the dependencies

Re-install the project in edit mode:

pip install -e .[dev]
# optional
pip install .[extra] # to install the visualization dependencies
pip install .[interactive] # to install the interactive dependencies

Model reproducibility

Data sources

The TRIFID model was initially trained with 45 predictive features of a subset of the protein isoforms annotated in GENCODE Release 27 (GRCh38.p10). Features have been described here. Two extra features has been added in the second release.

To set and create these features, we parsed some existing databases and created some specific modules for the task:

GENCODE genome annotation statistics for protein-coding transcripts. Data sets are available in the GENCODE ftp server.
APPRIS methods to quantify protein structural information, functionally important residues, conservation of functional domains and evidence of cross-species conservation. Data sets are available in the APPRIS http server.
PhyloCSF scores as a complementary measure of evolutionary conservation. Pre-computed scores for some genome annotation versions are available in this repository.
ALT-Corsair (APPRIS module) to quantify the age of the last common ancestor of the most distant orthologue that fulfills the search criteria, reporting a score representing the age of the oldest species maps to the whole protein sequence. It is a method based on the Corsair module in APPRIS. Pre-computed scores for some genome versions are available in the APPRIS webserver.
QSplice (TRIFID module) to quantify splice junctions coverage and our RNA-seq Snakemake pipeline to perform a comprehensive RNA-seq analysis. Pre-computed scores for GENCODE 27 available here. More details about this module in section below.
Pfam effects (TRIFID module) to quantify the effect of Alternative Splicing over Pfam domains of every protein-coding gene for the entire genome. Pre-computed scores for some genome annotation versions are available in here. More details about this module in the section below.
Fragment labelling (TRIFID module) to label genome isoforms in duplications or fragments for a further score correction step. More details about this module in below.

The data sources to reproduce our analysis are available for some genome versions through this shared point. In the source folder, the files that a user would need to run TRIFID on GENCODE 27 are freely available. Moreover, the config file contains the source file paths to create a data set to be trained with TRIFID. The user can modify these paths but it is recommendable to run everything inside the TRIFID previously downloaded folder.

Both predictions and features will be available with the second release of TRIFID here.

Preprocessing

Below is an example of how to reproduce the method from scratch for GENCODE 27. The user has to follow the next steps:

To download the annotation files from GENCODE and APPRIS websites:

cd trifid

# GENCODE data
mkdir -p data/external/genome_annotation
curl ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_27/gencode.v27.annotation.gtf.gz -o data/external/genome_annotation/GRCh38/g27/gencode.v27.annotation.gtf.gz
curl ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_27/gencode.v27.annotation.gff3.gz -o data/external/genome_annotation/GRCh38/g27/gencode.v27.annotation.gff3.gz

# APPRIS data
mkdir -p data/external/appris
curl http://apprisws.bioinfo.cnio.es/pub/current_release/datafiles/homo_sapiens/e90v35/appris_data.principal.txt -o data/external/appris/GRCh38/g27/appris_data.principal.txt
curl http://apprisws.bioinfo.cnio.es/pub/current_release/datafiles/homo_sapiens/e90v35/appris_data.appris.txt -o data/external/appris/GRCh38/g27/appris_data.appris.txt
curl http://apprisws.bioinfo.cnio.es/pub/current_release/datafiles/homo_sapiens/e90v35/appris_data.transl.fa.gz -o data/external/appris/GRCh38/g27/appris_data.transl.fa.gz

To compute the splice-junction coverage scores, from a complete set of RNA-seq samples for a wide variety of tissues. Notice that these samples have been processed through an extensive computational pipeline. We provide these pre-computed scores for some genome annotation versions here.

python -m trifid.preprocessing.qsplice \
    --gff   data/external/genome_annotation/GRCh38/g27/gencode.v27.annotation.gff3.gz \
    --outdir data/external/qsplice/GRCh38/g27 \
    --samples out/E-MTAB-2836/GRCh38/STAR/g27 \
    --version g

To compute the Pfam effects of the Alternative Splicing over reference isoform of every protein-coding gene for the entire genome. We provide these pre-computed scores for some genome annotation versions here.

python -m trifid.preprocessing.pfam_effects \
    --appris data/external/appris/GRCh38/g27/appris_data.appris.txt \
    --jobs 10 \
    --seqs data/external/appris/GRCh38/g27/appris_data.transl.fa.gz \
    --spade data/external/appris/GRCh38/g27/appris_method.spade.gtf.gz \
    --outdir data/external/pfam_effects/GRCh38/g27

To generate a non-redundant set of isoforms labelling fragments and duplications. We provide these pre-computed scores for some genome annotation versions here.

python -m trifid.preprocessing.label_fragments  \
    --gtf data/external/genome_annotation/GRCm38/g25/gencode.vM25.annotation.gtf.gz \
    --seqs data/external/appris/GRCm38/g25/appris_data.transl.fa.gz \
    --principals data/external/appris/GRCm38/g25/appris_data.principal.txt \
    --outdir data/external/label_fragments/GRCm38/g25 \

To download the ALT-Corsair and PhyloCSF data sets available in this repository.
To create both the complete set of isoforms with the correspondent predictive features run:

python -m trifid.data.make_dataset

Model training

Once we have created the data set with predictive features for GENCODE 27, we need to use the training set from proteomics experimental evidence (Kim et al., 2014), to train the Machine Learning model run:

python -m trifid.model.train

Finally, to apply the Machine Learning model previously trained to predict the functional probability of each isoform, the user has to run:

python -m trifid.model.predict

Availability of data

For now, predictions and predictive features are available for the genome annotation versions presented in the table. However, if somebody wants to achieve this data for some specific genome versions or some specific specie, please open an issue in this repository.

`TRIFID` predictions and predictive features

Genome assembly	Specie	Name	Model	Version	Database	Release - Date	Features - predictions
GRCh38	Homo sapiens	Human	Human	v1	GENCODE	27 - 08.2017	sharepoint
GRCh38	Homo sapiens	Human	Human	v2	GENCODE	42 - 04.2022	sharepoint
GRCh38	Homo sapiens	Human	Human	v2	GENCODE	37 - 02.2021	sharepoint
GRCh38	Homo sapiens	Human	RefSeq	v2	RefSeq - NCBI	110 - 02.2020	sharepoint
GRCh37	Homo sapiens	Human	RefSeq	v2	RefSeq - NCBI	105 - 02.2020	sharepoint
GRCh37	Homo sapiens	Human	Human	v2	GENCODE	19 - 12.2013	sharepoint
GRCm39	Mus musculus	Mouse	Mouse	v2	GENCODE	31 - 04.2022	sharepoint
GRCm38	Mus musculus	Mouse	Mouse	v2	GENCODE	25 - 11.2019	sharepoint
mRatBN7.2	Rattus norvegicus	Rat	Vertebrates	v2	Ensembl	105 - 12.2021	sharepoint
GRCz11	Danio rerio	Zebrafish	Vertebrates	v2	Ensembl	104 - 05.2021	sharepoint
GRCg7b	Gallus gallus	Chicken	Vertebrates	v2	Ensembl	108 - 10.2022	sharepoint
Pan_tro_3.0	Pan troglodytes	Chimpanzee	Vertebrates	v2	Ensembl	104 - 05.2021	sharepoint
Sscrofa11.1	Sus scrofa	Pig	Vertebrates	v2	Ensembl	108 - 10.2022	sharepoint
ARS-UCD1.2	Bos taurus	Cow	Vertebrates	v2	Ensembl	104 - 05.2021	sharepoint
Mmul_10	Macaca mulatta	Macaque	Vertebrates	v2	Ensembl	105 - 12.2021	sharepoint
BDGP6	Drosophila melanogaster	Fruitfly	Invertebrates	v2	Ensembl - Flybase	107 - 07.2022	sharepoint
WBcel235	Caenorhabditis elegans	Worm	Invertebrates	v2	Ensembl - Wormbase	108 - 10.2022	sharepoint

Example: Fibroblast growth factor receptor 1 (FGFR1)

ENSG00000077782 (Ensembl) - P11362 (FGFR1_HUMAN) (UniProt)

Loading the model

import pandas as pd
predictions = pd.read_csv('data/genomes/GRCh38/g27/trifid_predictions.tsv.gz', compression='gzip', sep='\t')
gene_name = 'FGFR1' # select gene name to explore
predictions.loc[predictions['gene_name'] == gene_name][['transcript_id', 'gene_name', 'trifid_score', 'appris', 'sequence']]

Gene name	Transcript id	APPRIS label	Length	TRIFID Score	TRIFID Score (n)
FGFR1	ENST00000447712	PRINCIPAL:3	822	0.87	0.99
FGFR1	ENST00000356207	MINOR	733	0.60	0.69
FGFR1	ENST00000397103	MINOR	733	0.01	0.08
FGFR1	ENST00000619564	MINOR	228	0.00	0.01

Loading the SHAP predictions for a single isoform

A more detailed explanation of how to load the SHAP local predictions for an isoform of FGFR1 is explained in our tutorial jupyter notebook:

explain_prediction(df_shap, model, features, 'ENST00000356207')

TRIFID modules

To generate a complete set of predictive features aiming to provide precise predictions, we created some extra predictive scores that intend to represent every single isoform.

QSplice

This TRIFID module quantifies the splice junctions coverage from STAR SJ.out.tab. It maps the unique reads to genome positions using the collapsed coding splice junctions to calculate a score per transcript.

To generate the initial splice-junctions coverage file, we mapped the RNA-seq expression samples of 32 tissues from 122 human individuals stored here, using our RNA-seq Snakemake pipeline.

As we have mentioned above, this module uses the gencode annotation gff3 and a set of SJ.out.tab samples generated by a STAR RNA-seq alignment. In our case, these samples will be stored in different folders inside the outdir directory, but it is also available an option to use a customized SJ.out.tab. The user only has to change the --samples tag by --custom SJ.out.customized.tab to use this mode. To generate the TRIFID RNA-seq predictive features with the E-MTAB-2836 samples, we used this command-line order:

python -m trifid.preprocessing.qsplice \
    --gff   data/external/genome_annotation/GRCh38/g27/gencode.v27.annotation.gff3.gz \
    --outdir data/external/qsplice/GRCh38/g27 \
    --samples out/E-MTAB-2836/GRCh38/STAR/g27 \
    --version g

The program releases 2 different files:

sj_maxp.emtab2836.mapped.tsv.gz representing one row and one score per splice-junction.
- RNA2sj is the number of unique reads divided by the gene average unique reads of all splice-junctions.
- RNA2sj_cds is the number of unique reads divided by the gene average unique reads of splice-junctions that are spanning CDS exons.
qsplice.emtab2836.g27.tsv.gz (TRIFID input) representing one row and one score per protein-coding transcript.

Let's see an example that represents this more clearly.

Example: Chromosome 1 open reading frame 112 (C1orf112)

ENSG00000000460 (Ensembl) - Q9NSG2 (CA112_HUMAN) (UniProt)

sj_maxp.emtab2836.mapped.tsv.gz sample output for the isoform ENST00000472795.

seqname	type	start	end	strand	gene_id	gene_name	gene_type	transcript_id	cds_coverage	intron_number	nexons	ncds	unique_reads	tissue	gene_mean	gene_mean_cds	RNA2sj	RNA2sj_cds
chr1	intron	169794906	169798856	+	ENSG00000000460	C1orf112	protein_coding	ENST00000472795	none	1	6	4	2	tonsil	67.3732	73.7826	0.0297	0.0271
chr1	intron	169798959	169800882	+	ENSG00000000460	C1orf112	protein_coding	ENST00000472795	none	2	6	4	69	testis	67.3732	73.7826	1.0241	0.9352
chr1	intron	169800972	169802620	+	ENSG00000000460	C1orf112	protein_coding	ENST00000472795	full	3	6	4	74	testis	67.3732	73.7826	1.0984	1.0029
chr1	intron	169802726	169803168	+	ENSG00000000460	C1orf112	protein_coding	ENST00000472795	full	4	6	4	77	testis	67.3732	73.7826	1.1429	1.0436
chr1	intron	169803310	169804074	+	ENSG00000000460	C1orf112	protein_coding	ENST00000472795	full	5	6	4	57	testis	67.3732	73.7826	0.846	0.7725

In the case of C1orf112 in GENCODE 27, QSplice selects the splice junction number 5, located between 169803310 and 169804074. This splice junction has the maximum coverage value in testis with 57 unique reads spanning the junction. Moreover, this coverage represents the lowest coverage per isoform as you can see in the table (notice that we only take into account introns that have been spanned by coding exons). The final score RNA2sj and RNA2sj_cds are obtained dividing this score by its respective gene means.

qsplice.emtab2836.tsv.gz sample output for some isoforms of C1orf112. The isoform ENST00000472795 represented above gets the same 0.8 score RNAsj and RNA2sj_cds as before.

seqname	gene_id	gene_name	gene_type	transcript_id	intron_number	nexons	ncds	unique_reads	tissue	gene_mean	gene_mean_cds	RNA2sj	RNA2sj_cds
chr1	ENSG00000000460	C1orf112	protein_coding	ENST00000286031	6	24	22	53	testis	67.3732	73.7826	0.7867	0.7183
chr1	ENSG00000000460	C1orf112	protein_coding	ENST00000359326	7	25	22	53	testis	67.3732	73.7826	0.7867	0.7183
chr1	ENSG00000000460	C1orf112	protein_coding	ENST00000413811	20	23	14	62	testis	67.3732	73.7826	0.9202	0.8403
chr1	ENSG00000000460	C1orf112	protein_coding	ENST00000459772	2	23	3	7	fallopiantube	67.3732	73.7826	0.1039	0.0949
chr1	ENSG00000000460	C1orf112	protein_coding	ENST00000466580	2	8	3	7	fallopiantube	67.3732	73.7826	0.1039	0.0949
chr1	ENSG00000000460	C1orf112	protein_coding	ENST00000472795	5	6	4	57	testis	67.3732	73.7826	0.846	0.7725
chr1	ENSG00000000460	C1orf112	protein_coding	ENST00000481744	2	7	3	7	fallopiantube	67.3732	73.7826	0.1039	0.0949
chr1	ENSG00000000460	C1orf112	protein_coding	ENST00000496973	5	6	6	8	tonsil	67.3732	73.7826	0.1187	0.1084
chr1	ENSG00000000460	C1orf112	protein_coding	ENST00000498289	3	29	0	0	-	67.3732	73.7826	0	0

Figure: ENST00000472795 (C1orf112-206) exon distribution scheme to represent how QSplice scores are generated.

Pfam effects

This TRIFID module quantifies Pfam effects over reference isoform of every protein-coding gene for the entire genome. The scores calculated the quantitative impact on Pfam domains of an Alternative Splicing event, and whether a domain would be damaged, lost or intact. To generate the TRIFID Pfam effects predictive features we need the APPRIS scores file, the protein sequences file and the SPADE scores file. To generate the set of predictive features in GENCODE 27, we used this command-line order:

python -m trifid.preprocessing.pfam_effects \
    --appris data/external/appris/GRCh38/g27/appris_data.appris.txt \
    --jobs 10 \
    --seqs data/external/appris/GRCh38/g27/appris_data.transl.fa.gz \
    --spade data/external/appris/GRCh38/g27/appris_method.spade.gtf.gz \
    --outdir data/external/pfam_effects/GRCh38/g27

The program generates:

qpfam.tsv.gz representing one row and several scores per transcript. The final scores are:
- pfam_score shows the direct effect of Alternative Splicing over Pfam domains getting the number of residues conserved after an event.
- pfam_domains_impact_score represents the percentage of Pfam domains that are intact after an event.
- perc_Damaged_State represents the percentage of Pfam domains that are damaged after an event.
- perc_Lost_State represents the percentage of Pfam domains that are lost after an event.
- Lost_residues_pfam counts the number of residues from Pfam domains lost.
- Gain_residues_pfam counts the number of residues from Pfam domains added.

Again, let's see an example to understand these scores.

Example: NIPA like domain containing 3 (NIPAL3)

ENSG00000001461 (Ensembl) - Q6P499 (NPAL3_HUMAN) (UniProt).

The following table represents the qpfam.tsv.gz sample output for isoforms of ENSG00000001461.

transcript_id	pfam_score	pfam_domains_impact_score	perc_Damaged_State	Lost_residues_pfam	pfam_effects_msa
ENST00000374399	1	1	0	0	Reference
ENST00000339255	1	1	0	0	Transcript
ENST00000003912	0.83	0	1	50	Transcript
ENST00000358028	0.62	0	1	112	Transcript
ENST00000432012	0.35	0	1	255	Transcript

This gene has one Pfam domain (Mg_trans_NIPA - PF05653), which represented in green below in the figure.

Figure: Muscle alignment including a fraction of the sequence isoforms of NIPAL3.

Fragment labelling

The module fragment labelling intends to define which fraction of the set of genome isoforms is redundant. In the GENCODE genome annotation, there are some incomplete sequences cds_end_NF or cds_start_NF that must be identified to correct their scores. Moreover, this program also identifies the duplicated protein sequences across the genome. Here, we need the GENCODE gtf annotation, the protein sequences and the APPRIS labels. Therefore, with the command line order presented below, we tagged as Principal, Alternative, Redundant [Principal|Alternative] or [Principal|Alternative] Duplication the whole set the isoforms:

python -m trifid.preprocessing.label_fragments  \
    --gtf data/external/genome_annotation/GRCh38/g27/gencode.v27.annotation.gtf.gz \
    --seqs data/external/appris/GRCh38/g27/appris_data.transl.fa.gz \
    --principals data/external/appris/GRCh38/g27/appris_data.principal.txt \
    --outdir data/external/label_fragments/GRCh38/g27

Directory structure

Project structure from Cookiecutter Data Science.

+-- .gitignore
+-- LICENSE
+-- README.md                       <- The top-level README for developers using this project
+-- config                          <- YAML files to customize the pipelines
¦   +-- features.yaml               <- Features name, category, description and species support
¦   +-- config.yaml                 <- Customized to create the database
¦
+-- img                             <- Repository image logos
¦
+-- models                          <- Trained model, model selection log and results
¦
+-- notebooks                       <- Jupyter notebooks to reproduce interactively the methods
¦   +-- 01.tutorial.ipynb           <- Tutorial to run an end-to-end TRIFID simulation
¦   +-- 02.figures                  <- Useful figures generated
¦
+-- .editorconfig                   <- Editor configuration file
+-- setup.py                        <- Make this project pip installable 
+-- setup.cfg                       <- Setup configuration file
+-- environment.yml                 <- The requirements file for reproducing the analysis environment
+-- pyproject.toml                  <- Project configuration file
+-- trifid                          <- Source code for use in this project.
¦   +-- __init__.py                 <- Makes trifid a Python module
¦   ¦
¦   +-- preprocessing               <- Scripts to run the TRIFID modules
¦   ¦   +-- __init__.py
¦   ¦   +-- fragment_labeling.py
¦   ¦   +-- pfam_effects.py
¦   ¦   +-- qsplice.py
¦   ¦
¦   +-- data                        <- Scripts to download or generate data and turn raw data into features for modeling
¦   ¦   +-- __init__.py
¦   ¦   +-- loaders.py
¦   ¦   +-- feature_engineering.py
¦   ¦   +-- make_dataset.py
¦   ¦
¦   +-- models                      <- Scripts to train models and then use trained models to make predictions
¦   ¦   +-- __init__.py
¦   ¦   +-- interpret.py
¦   ¦   +-- predict.py
¦   ¦   +-- select.py
¦   ¦   +-- train.py
¦   ¦
¦   +-- utils                      <- Useful functions used in several modules of the package
¦   ¦   +-- __init__.py
¦   ¦   +-- utils.py
¦   ¦   +-- analyse_appris_spade_transcripts_nf.pl
¦   ¦   +-- get_NR_list.pl
¦   ¦   +-- get_seqlen.pl
¦   ¦
¦   +-- visualization               <- Scripts to create exploratory and results in oriented visualizations
¦   ¦   +-- __init__.py
¦       +-- figures.py

Author information

Fernando Pozo (@fpozoca – fpozoc@gmx.com)

Contributors: Daniel Cerdán, Laura Martinez-Gomez, Thomas A. Walsh, Tomas Di Domenico, Jose Manuel Rodriguez, Jesus Vazquez, Federico Abascal, Michael L Tress

Release History

TRIFID initial release (March 10, 2021).
TRIFID v2.0.0 release (Sep, 2022).

Contributing

Instructions to contribute to this project:

Branching (internal collaboration)

Read CONTRIBUTING.md

Quickstart tips

Follow a development workflow structure:
1. Open an Issue describing your implementation.
2. Create a Branch called issue-number_developer-name_problem (e.g. 12_fernando_modify-docs) (please don't commit directly to master branch). Commit from here.
3. Create a Pull Request to main or develop branch when code were ready.
Don't upload big files here (only tests/examples if needed). Instead, use Azure.
The Makefile contains some useful command line orders (e.g. make check). Check it.
Read the CONTRIBUTING.md if you are interest to contribute with this repository.

Forking (external collaboration)

Fork it (https://github.com/fpozoc/trifid)
Create your feature branch (git checkout -b feature/fooBar)
Commit your changes (git commit -am 'Add some fooBar')
Push to the branch (git push origin feature/fooBar)
Create a new Pull Request

NOTE: Several functions or classes inside this repository can be useful for Bioinformatics or Machine Learning developers. However, at the moment, the main objective of TRIFID is not to be a Python package explicitly. It only has been designed in this way to facilitate reproducibility.

License

Distributed under the GNU General Public License.

See LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github		.github
config		config
img		img
notebooks		notebooks
trifid		trifid
.editorconfig		.editorconfig
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py

License

fpozoc/trifid

Folders and files

Latest commit

History

Repository files navigation

Introduction

Installation instructions

Package installation

Package development

Update the dependencies

Model reproducibility

Data sources

Preprocessing

Model training

Availability of data

TRIFID predictions and predictive features

Other useful links

Example: Fibroblast growth factor receptor 1 (FGFR1)

Loading the model

Loading the SHAP predictions for a single isoform

TRIFID modules

QSplice

Example: Chromosome 1 open reading frame 112 (C1orf112)

Pfam effects

Example: NIPA like domain containing 3 (NIPAL3)

Fragment labelling

Directory structure

Author information

Release History

Contributing

Branching (internal collaboration)

Quickstart tips

Forking (external collaboration)

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages

`TRIFID` predictions and predictive features