Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Fix] IsoSCM identification output should be from compare step #429

Closed
faricazjj opened this issue Sep 19, 2022 · 1 comment · Fixed by #431
Closed

[Bug Fix] IsoSCM identification output should be from compare step #429

faricazjj opened this issue Sep 19, 2022 · 1 comment · Fixed by #431
Assignees

Comments

@faricazjj
Copy link
Collaborator

faricazjj commented Sep 19, 2022

Currently, we get results for identification challenge from one of the output files from the assemble step. Initially, this file seems to contain the identified changepoints just by looking at the content of the file.
Screen Shot 2022-07-26 at 11 32 38 AM

But as I read further beyond the assemble step, the second step they describe is the compare command. The compare command "reports the differential usage of each identified change-point", which I expected to show the site usages of the same sites as the sites from the previous assemble step shown in the screenshot above. But I see less and also different sites in the compare step output file
Screen Shot 2022-07-26 at 11 35 09 AM

This led me to look more into whether we should be extracting sites for identification challenge from the assemble or compare step.

As additional context, the steps to run isoscm are:

  1. run assemble step which creates a tmp folder: isoscm/tmp/{sample}.cp.filtered.gtf
  2. run compare step, this requires the xml file output from assemble step for two samples, but since we're getting site usage per sample, I put the same sample twice as input to obtain the following output:
    Screen Shot 2022-07-26 at 12 18 54 PM

Even though the first isoscm/tmp/{sample}.cp.filtered.gtf file contains changepoint locations, I'm not entirely sure we should obtain identification output sites from there since it's in a tmp folder and the github readme doesn't explain what the files in the tmp folders are--they only explained the files outside of the tmp folder from assemble step i.e they explained the files from step 2 above but not files from step 1. I think the sites for identification might have to be obtained from the compare step output (i.e. isoscm/compare/{sample}.txt). After reading their readme, I checked their paper and saw that in their paper, they didn't mention assemble step to be where we get identified changpoints. They mentioned from "...Using the “assemble” keyword IsoSCM will assemble the mapped reads in a BAM file into a splice graph, identify nested terminal exons boundaries using the constrained segmentation procedure, and report the resulting models in GTF format....Pairwise comparison of tandem isoform usage can be performed using the “compare” keyword, which reports the relative usage of change points in each sample in a tabular format."

Hence, it sounds like the compare step outputs the identified change points (or PAS) that we want to report.

@faricazjj faricazjj self-assigned this Sep 27, 2022
@faricazjj faricazjj changed the title [Bug Fix] IsoSCM identification output should be from assemble step [Bug Fix] IsoSCM identification output should be from compare step Sep 27, 2022
@faricazjj faricazjj linked a pull request Sep 27, 2022 that will close this issue
8 tasks
@mrgazzara
Copy link
Collaborator

See full comment on the PR 431 (#431 (comment)) where I describe why the compare step output is correct for grabbing PAS and suggest how to also get dPAS coordinates (the above only grabs the pPAS coordinates which are the changepoint)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants