swan v2.5

fairliereese released this 25 Jan 23:45

· 35 commits to master since this release

Swan 2.5

SwanGraph structure changes

Counts and other expression structure (ie TPM, PI) are now stored as sparse matrices to massively save on on-disk as well as in-memory storage
Capability of storing gene-level abundance information (SwanGraph.gene_adata) calculated separately from transcript-level
Added AnnData to store intron chain level abundance information (SwanGraph.ic_adata)
Added tracking for stable gene ID in cases where reference annotation versions don't match (ie ENSG000000014.5 --> ENSG000000014)

Native compatibility with cerberus transcriptomes

Will track TSSs, ICs, and TESs called by cerberus based on the names of transcripts provided from the GTF

Other changes

DIE test now reports top 2 DPI isoforms
Faster counts and TPM calculations using Scanpy tools
Added option to sort by isoform's cumulative PI value in the gene report sorting
Added plotting option for plotting browser models directly on to a preexisting Matplotlib axis SwanGraph.plot_browser()
Added plotting option to plot bed regions SwanGraph.pg.plot_regions()
Added options to calculate TPM across multiple datasets as either the minimum or maximum of the values between the datasets

Minor bug fixes

Fixed DIE test bug when there are >11 isoforms / gene
Fixed bugs in SwanGraph.gen_report()

Assets 2