Output Files

Output files are customized based on "name" parameter provided as input. here that parameter will be called "name"

reference_tables

Output directory: outdir/name/reference_tables

filename	description
ensg_gene.tsv	GENCODE gene_id to gene_name mapping
enst_isoname.tsv	GENCODE transcript_id to transcript_name mapping
gene_ensp.tsv	gene_name to GENCODE protein_id mapping
gene_isoname.tsv	gene_name and transcript_name mapping
gene_lens.tsv	Gene nucleotide length statistics
isoname_lens.tsv	Gene isoform length information
protein_coding_genes.txt	list of protein coding genes determined by GENCODE

gencode_db

Output directory: outdir/name/gencode_db

filename	description
gencode_isoname_cluster.tsv	Listing of GENCODE transcript_names (i.e., isonames) that create the same proteins. Reference transcript_name (arbitrarily selected) and clustered transcript_names provided
gencode_protein.fasta	protein sequence of GENCODE, reference isonames only

isoseq3

Output directory: outdir/name/isoseq3

filename	description
name.collapsed.abundance.txt	Collapsed isoform abundances
name.collapsed.fasta	Representative transcript sequence of collapsed
name.collapsed.gff	Collapsed transcript alignment
name.collapsed.report.json	Collapsed isoform statistics
name.demult.lima.summary	Statistics after lima command
name.flnc.bam	Full-length non-concatemer reads
name.flnc.bam.pbi	Full-length non-concatemer reads
name.flnc.filter_summary.json	Full-length non-concatemer reads summary

star_index

Output directory: outdir/name/star_index

star

Output directory: outdir/name/star

filename	description
nameSJ.out.tab	STAR results in tab format
nameLog.final.out	Log file and summary statistics

sqanti3

Output directory: outdir/name/sqanti3

filename	description
name_classification.txt	SQANTI transcript classification of isoforms
name_corrected.fasta	Transcript sequences after correction using genome sequence
name_corrected.gtf	Alignment of corrected sequences
name_junctions.txt	File with attribute information at splice junction level (table explaining feature meaning inside output_info).
name_sqanti_report.pdf	PDF file showing different quality control and descriptive plots. An example can be found here
name.params.txt	SQANTI parameters used

sqanti3-filtered

Output directory: outdir/name/sqanti3-filtered

filename	description
filtered_name_classification.tsv	SQANTI classification filtered based on protein coding, percent polyA downstream, RTS stage
filtered_name_corrected.fasta	SQANTI fasta filtered based on protein coding, percent polyA downstream, RTS stage
filtered_name_corrected.gtf	SQANTI gtf filtered based on protein coding, percent polyA downstream, RTS stage
name_classification.5degfilter.txt	SQANTI classification for filtered_ criteria and additionally for 5' degregation
name_corrected.5degfilter.fasta	SQANTI fasta for filtered_ criteria and additionally for 5' degregation
name_corrected.5degfilter.gtf	SQANTI gtf for filtered_ criteria and additionally for 5' degregation

pacbio_6frm_gene_grouped

Output directory: outdir/name/pacbio_6frm_gene_grouped

filename	description
name.6frame.fasta	all possible frames (3+, 3-) of PacBio translated

transcriptome_summary

Output directory: outdir/name/transcriptome_summary

filename	description
gene_level_tab.tsv	CPM (long-read) and TPM (short-read) info provided on gene level
pb_gene.tsv	PacBio to gene mapping
sqanti_isoform_info.tsv	simplified SQANTI classification info

cpat

Output directory: outdir/name/cpat

filename	description
CPAT_run_info.log	CPAT logging file
name_cpat.error	CPAT error / logging info
name_cpat.output	CPAT output info
name.no_ORF.txt	list of PacBio isoforms that did not produce a valid ORF
name.ORF_prob.best.tsv	ORF file, best defined by CPAT
name.ORF_prob.tsv	All ORFs found and scored by CPAT
name.ORF_seqs.fa	All ORF nucleotide sequences found by CPAT
name.r	code run by CPAT to produce ORFs

orf_calling

Output directory: outdir/name/orf_calling

filename	description
name_best_orf.tsv	Best ORF for each PacBio accession, as determined by algorithm

refined_database

Output directory: outdir/name/refined_database

filename	description
name_orf_refined.fasta	protein sequence of ORFs after collapsing based on transcripts producing same protein sequence
name_orf_refined.tsv	ORF info, transcripts producing same protein collapsed

pacbio_cds

Output directory: outdir/name/pacbio_cds

filename	description
name_no_transcript_with_cds.gtf	PacBio gtf with CDS info added, transcript line not included
name_with_cds.gtf	PacBio gtf with CDS info added
make_pacbio_cds.log	logging file

rename_cds

Output directory: outdir/name/rename_cds

filename	description
gencode.cds_renamed_exon.gtf	GENCODE gtf file, exons removed and CDS renamed to exon
gencode.transcript_exons_only.gtf	GENCODE gtf file, exons and transcript only
name.cds_renamed_exon.gtf	PacBio gtf file, exons removed and CDS renamed to exon
name.transcript_exons_only.gtf	PacBio gtf file, exons and transcript only

sqanti_protein

Output directory: outdir/name/sqanti_protein

filename	description
name_sqanti_protein_classification.tsv	splice classification data for proteins generated by PacBio

protein_classification

Output directory: outdir/name/protein_classification

filename	description
name_genes.tsv	Mapping of PacBiio accession to transcript gene and protein gene. These can be different if transcript read spans multiple genes
name_unfiltered.protien_classification.tsv	protein classification of all PacBio proteins

protein_gene_rename

Output directory: outdir/name/protein_gene_rename

filename	description
name_orf_refined_gene_update.tsv	refined database with gene name updated to reflect protein gene
name_with_cds_refined.gtf	PacBio gtf file that includes CDS information with gene name updated to reflect protein gene
name_protein_refined.fasta	protein fasta with gene name updated to reflect protein gene

protein_filter

Output directory: outdir/name/protein_filter

filename	description
name_with_cds_filtered.gtf	GTF, filtered to remove intergenic and truncations
name_classification_filtered.tsv	Protein classification, filtered to remove intergenic and truncations
name.filtered_protein.fasta	protein sequences, filtered to remove intergenic and truncations

hybrid_protein_database

Output directory: outdir/name/hybrid_protein_database

High confidence: 3+CPM per gene, 1-4kb average nucleotide length of gene

filename	description
name_cds_high_confidence.gtf	GTF of high confidence genes
name_high_confidence_genes.tsv	list of high confidence genes
name_hybrid.fasta	sequence information of high confidence PacBio and Gencode genes
name_refined_high_confidence.tsv	high confidence ORF metadata

metamorpheus

Database Information

database	directory	database name in files
GENCODE	gencode	Gencode
UniProt	uniprot	UniProt
PacBio Filtered	pacbio/filtered	filtered
PacBio Refined	pacbio/refined	refined
PacBio Hybrid	pacbio/hybrid	hybrid
PacBio Rescue & Resolve	pacbio/resue_resolve	rescue_resolve

toml files

directory database/toml

filename	description
CalibrationTask.toml	not used
GlycoSearchTask.toml	not used
GptmdTask.toml	not used
SearchTask.toml	Metamorpheus run parameters
XLSearchTask.toml	not used

Search Results Files In search_results/Task1SearchTask

filename	description
AllPSMs.psmtsv	PSM's found
AllQuantifiedPeaks.tsv	quantified peaks found
prose.txt	Run information
AllPSMs_FormattedForPercolator.tab	PSM's found in Percolator format
AllQuantifiedPeptides.tsv	quantified peptides
results.txt	summary statistics
AllPeptides.database.psmtsv	peptides found
AllQuantifiedProteinGroups.database.tsv	protein groups found

peptide_analysis

Output directory: outdir/name/peptide_analysis

filename	description
gc_pb_overlap_peptides.tsv	overlap of GENCODE peptides with theoretical peptides that could be found in Pacbio databases

track_visualization

reference

Output directory: outdir/name/track_visualization/reference

filename	description
gencode_shaded.bed12	GENCODE bed alignment colored
gencode.filtered.gtf	GENCODE alignment

pacbio databases

Output directory: outdir/name/track_visualization/database

database	database name
PacBio Refined	refined
PacBio Filtered	filtered
PacBio Hybrid	hybrid

peptide

filename	description
name_database_peptides.bed12	peptide bed alignment
name_database_peptides.gtf	peptide gtf alignment
name_database_shaded_peptides.bed12	peptide bed alignment, shaded green

protein

filename	description
name_hybrid_shaded_cpm.bed12	protein alignment, shaded by transcript abundance (CPM)
name_hybrid_shaded_protein_class.bed12	protein alignment, shaded by protein classification

accession_mapping

Output directory: outdir/name/accession_mapping

filename	description
accession_map_gencode_uniprot_pacbio.tsv	accession mapping between GENCODE, UniProt and Pacbio
accession_map_stats.tsv	frequency between database overlap

protein_group_compare

Output directory: outdir/name/protein_group_compare

filename	description
ProteinInference_GENCODE_PacBio_comparisons.xlsx	protein inference overlap between GENCODE and PacBio Hybrid
ProteinInference_UniProt_PacBio_comparisons.xlsx	protein inference overlap between UniProt and PacBio Hybrid
ProteinInference_GENCODE_UniProt_comparisons.xlsx	protein inference overlap between GENCODE and UniProt

novel_peptides

Output directory: outdir/name/novel_peptides

filename	description
name_database.pacbio_novel_peptides_to_gencode.tsv	novel peptides found in PacBio compared to GENCODE database
name_database.pacbio_novel_peptides_to_uniprot.tsv	novel peptides found in PacBio compared to UniProt database
name_database.pacbio_novel_peptides.tsv	novel peptides found in PacBio compared to GENCODE and UniProt databases

Sheynkman-Lab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output Files

reference

pacbio databases

Clone this wiki locally