Skip to content

Output Files

bj8th edited this page Jul 27, 2021 · 16 revisions

Output files are customized based on "name" parameter provided as input. here that parameter will be called "name"

reference_tables Output directory: outdir/name/reference_tables
filename description
ensg_gene.tsv GENCODE gene_id to gene_name mapping
enst_isoname.tsv GENCODE transcript_id to transcript_name mapping
gene_ensp.tsv gene_name to GENCODE protein_id mapping
gene_isoname.tsv gene_name and transcript_name mapping
gene_lens.tsv Gene nucleotide length statistics
isoname_lens.tsv Gene isoform length information
protein_coding_genes.txt list of protein coding genes determined by GENCODE
gencode_db Output directory: outdir/name/gencode_db
filename description
gencode_isoname_cluster.tsv Listing of GENCODE transcript_names (i.e., isonames) that create the same proteins. Reference transcript_name (arbitrarily selected) and clustered transcript_names provided
gencode_protein.fasta protein sequence of GENCODE, reference isonames only
isoseq3 Output directory: outdir/name/isoseq3
filename description
name.collapsed.abundance.txt Collapsed isoform abundances
name.collapsed.fasta Representative transcript sequence of collapsed
name.collapsed.gff Collapsed transcript alignment
name.collapsed.report.json Collapsed isoform statistics
name.demult.lima.summary Statistics after lima command
name.flnc.bam Full-length non-concatemer reads
name.flnc.bam.pbi Full-length non-concatemer reads
name.flnc.filter_summary.json Full-length non-concatemer reads summary
star_index Output directory: outdir/name/star_index
star Output directory: outdir/name/star
filename description
nameSJ.out.tab STAR results in tab format
nameLog.final.out Log file and summary statistics
sqanti3 Output directory: outdir/name/sqanti3
filename description
name_classification.txt SQANTI transcript classification of isoforms
name_corrected.fasta Transcript sequences after correction using genome sequence
name_corrected.gtf Alignment of corrected sequences
name_junctions.txt File with attribute information at splice junction level (table explaining feature meaning inside output_info).
name_sqanti_report.pdf PDF file showing different quality control and descriptive plots. An example can be found here
name.params.txt SQANTI parameters used
sqanti3-filtered Output directory: outdir/name/sqanti3-filtered
filename description
filtered_name_classification.tsv SQANTI classification filtered based on protein coding, percent polyA downstream, RTS stage
filtered_name_corrected.fasta SQANTI fasta filtered based on protein coding, percent polyA downstream, RTS stage
filtered_name_corrected.gtf SQANTI gtf filtered based on protein coding, percent polyA downstream, RTS stage
name_classification.5degfilter.txt SQANTI classification for filtered_ criteria and additionally for 5' degregation
name_corrected.5degfilter.fasta SQANTI fasta for filtered_ criteria and additionally for 5' degregation
name_corrected.5degfilter.gtf SQANTI gtf for filtered_ criteria and additionally for 5' degregation
pacbio_6frm_gene_grouped Output directory: outdir/name/pacbio_6frm_gene_grouped
filename description
name.6frame.fasta all possible frames (3+, 3-) of PacBio translated
transcriptome_summary Output directory: outdir/name/transcriptome_summary
filename description
gene_level_tab.tsv CPM (long-read) and TPM (short-read) info provided on gene level
pb_gene.tsv PacBio to gene mapping
sqanti_isoform_info.tsv simplified SQANTI classification info
cpat Output directory: outdir/name/cpat
filename description
CPAT_run_info.log CPAT logging file
name_cpat.error CPAT error / logging info
name_cpat.output CPAT output info
name.no_ORF.txt list of PacBio isoforms that did not produce a valid ORF
name.ORF_prob.best.tsv ORF file, best defined by CPAT
name.ORF_prob.tsv All ORFs found and scored by CPAT
name.ORF_seqs.fa All ORF nucleotide sequences found by CPAT
name.r code run by CPAT to produce ORFs
orf_calling Output directory: outdir/name/orf_calling
filename description
name_best_orf.tsv Best ORF for each PacBio accession, as determined by algorithm
refined_database Output directory: outdir/name/refined_database
filename description
name_orf_refined.fasta protein sequence of ORFs after collapsing based on transcripts producing same protein sequence
name_orf_refined.tsv ORF info, transcripts producing same protein collapsed
pacbio_cds Output directory: outdir/name/pacbio_cds
filename description
name_no_transcript_with_cds.gtf PacBio gtf with CDS info added, transcript line not included
name_with_cds.gtf PacBio gtf with CDS info added
make_pacbio_cds.log logging file
rename_cds Output directory: outdir/name/rename_cds
filename description
gencode.cds_renamed_exon.gtf GENCODE gtf file, exons removed and CDS renamed to exon
gencode.transcript_exons_only.gtf GENCODE gtf file, exons and transcript only
name.cds_renamed_exon.gtf PacBio gtf file, exons removed and CDS renamed to exon
name.transcript_exons_only.gtf PacBio gtf file, exons and transcript only
sqanti_protein Output directory: outdir/name/sqanti_protein
filename description
name_sqanti_protein_classification.tsv splice classification data for proteins generated by PacBio
protein_classification Output directory: outdir/name/protein_classification
filename description
name_genes.tsv Mapping of PacBiio accession to transcript gene and protein gene. These can be different if transcript read spans multiple genes
name_unfiltered.protien_classification.tsv protein classification of all PacBio proteins
protein_gene_rename Output directory: outdir/name/protein_gene_rename
filename description
name_orf_refined_gene_update.tsv refined database with gene name updated to reflect protein gene
name_with_cds_refined.gtf PacBio gtf file that includes CDS information with gene name updated to reflect protein gene
name_protein_refined.fasta protein fasta with gene name updated to reflect protein gene
protein_filter Output directory: outdir/name/protein_filter
filename description
name_with_cds_filtered.gtf GTF, filtered to remove intergenic and truncations
name_classification_filtered.tsv Protein classification, filtered to remove intergenic and truncations
name.filtered_protein.fasta protein sequences, filtered to remove intergenic and truncations
hybrid_protein_database Output directory: outdir/name/hybrid_protein_database

High confidence: 3+CPM per gene, 1-4kb average nucleotide length of gene

filename description
name_cds_high_confidence.gtf GTF of high confidence genes
name_high_confidence_genes.tsv list of high confidence genes
name_hybrid.fasta sequence information of high confidence PacBio and Gencode genes
name_refined_high_confidence.tsv high confidence ORF metadata
metamorpheus

Database Information

database directory database name in files
GENCODE gencode Gencode
UniProt uniprot UniProt
PacBio Filtered pacbio/filtered filtered
PacBio Refined pacbio/refined refined
PacBio Hybrid pacbio/hybrid hybrid
PacBio Rescue & Resolve pacbio/resue_resolve rescue_resolve

toml files

directory database/toml

filename description
CalibrationTask.toml not used
GlycoSearchTask.toml not used
GptmdTask.toml not used
SearchTask.toml Metamorpheus run parameters
XLSearchTask.toml not used

Search Results Files In search_results/Task1SearchTask

filename description
AllPSMs.psmtsv PSM's found
AllQuantifiedPeaks.tsv quantified peaks found
prose.txt Run information
AllPSMs_FormattedForPercolator.tab PSM's found in Percolator format
AllQuantifiedPeptides.tsv quantified peptides
results.txt summary statistics
AllPeptides.database.psmtsv peptides found
AllQuantifiedProteinGroups.database.tsv protein groups found
peptide_analysis Output directory: outdir/name/peptide_analysis
filename description
gc_pb_overlap_peptides.tsv overlap of GENCODE peptides with theoretical peptides that could be found in Pacbio databases
track_visualization

reference

Output directory: outdir/name/track_visualization/reference

filename description
gencode_shaded.bed12 GENCODE bed alignment colored
gencode.filtered.gtf GENCODE alignment

pacbio databases

Output directory: outdir/name/track_visualization/database

database database name
PacBio Refined refined
PacBio Filtered filtered
PacBio Hybrid hybrid

peptide

filename description
name_database_peptides.bed12 peptide bed alignment
name_database_peptides.gtf peptide gtf alignment
name_database_shaded_peptides.bed12 peptide bed alignment, shaded green

protein

filename description
name_hybrid_shaded_cpm.bed12 protein alignment, shaded by transcript abundance (CPM)
name_hybrid_shaded_protein_class.bed12 protein alignment, shaded by protein classification
accession_mapping Output directory: outdir/name/accession_mapping
filename description
accession_map_gencode_uniprot_pacbio.tsv accession mapping between GENCODE, UniProt and Pacbio
accession_map_stats.tsv frequency between database overlap
protein_group_compare Output directory: outdir/name/protein_group_compare
filename description
ProteinInference_GENCODE_PacBio_comparisons.xlsx protein inference overlap between GENCODE and PacBio Hybrid
ProteinInference_UniProt_PacBio_comparisons.xlsx protein inference overlap between UniProt and PacBio Hybrid
ProteinInference_GENCODE_UniProt_comparisons.xlsx protein inference overlap between GENCODE and UniProt
novel_peptides Output directory: outdir/name/novel_peptides
filename description
name_database.pacbio_novel_peptides_to_gencode.tsv novel peptides found in PacBio compared to GENCODE database
name_database.pacbio_novel_peptides_to_uniprot.tsv novel peptides found in PacBio compared to UniProt database
name_database.pacbio_novel_peptides.tsv novel peptides found in PacBio compared to GENCODE and UniProt databases