Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the output of MAJIQ new version don't match the input format #3

Closed
Lyuzu opened this issue Dec 17, 2021 · 4 comments
Closed

the output of MAJIQ new version don't match the input format #3

Lyuzu opened this issue Dec 17, 2021 · 4 comments
Labels
help wanted Extra attention is needed wontfix This will not be worked on

Comments

@Lyuzu
Copy link

Lyuzu commented Dec 17, 2021

Dear developers,
Thanks a lot for your creative work for AS analysis. I have been searching around for such tool and had finished the AS findings by MAJIQ and rMATS. When I studied the tutorial for MAJIQ input, I noticed that the colnames of output of MAJIQ be like:
Gene ID LSV ID LSV Type E(dPSI) per LSV junction P(|dPSI|>=0.20) per LSV junction P(|dPSI|<=0.05) per LSV junction RP E(PSI) MP E(PSI) A5SS A3SS ES Num. Junctions Num. Exons Junctions coords IR
And the colnames of newest version of MAJIQ is :
gene_id lsv_id lsv_type mean_dpsi_per_lsv_junction probability_changing probability_non_changing A_mean_psi C_mean_psi num_junctions num_exons junctions_coords ir_coords
The difference made it hard to go on and I found that the main distinction is that new version deleted the "A5SS A3SS ES" columns. In the process.py of NEASE, if data['ES'].dtype=='bool':data=data[ data['ES']==True] else:data=data[ data['ES']=='True'] seems like just extracting the exon skipping events for downstream analysis, which is hard to figure out the type of complex LSV in new version of output. Could I rename the columns of my files and create a new column "ES" with all true values ?
Thanks for any advice.

@louadi
Copy link
Owner

louadi commented Dec 17, 2021

Dear Lyuzu,

Thanks for using our tool!

We were not aware that the new MAJIQ version has a different output, it might take us some time to figure out how to correctly map it indeed.

Your suggestion would work as a quick fix, please try to filter out all events except ES and add a dummy column with all true values...that should work if the junctions (LSVs) are represented in the same way as the old version.

We are trying to add support for multiple tools but since there is no standard format for reporting AS events, it is more work than we thought. In the meantime, let us know if your solution works.

Best regards,
Zakaria

@Lyuzu
Copy link
Author

Lyuzu commented Dec 18, 2021

Dear louadi,
Thank you so much for prompt reply. I extracted the ES events (~500 rows) from the majiq viola html output and made a quick fix as following:
data1=data.rename(columns={'gene_id':'Gene ID','lsv_id':'LSV ID','lsv_type':'LSV Type','mean_dpsi_per_lsv_junction':'E(dPSI) per LSV junction','probability_changing':'P(|dPSI|>=0.20) per LSV junction','probability_non_changing':'P(|dPSI|<=0.05) per LSV junction','A_mean_psi':'RP E(PSI)','C_mean_psi':'MP E(PSI)','num_junctions':'Num. Junctions','num_exons':'Num. Exons','junctions_coords':'Junctions coords','ir_coords':'coords IR'})
z= lambda x: 'gene:'+x.split('.')[0]
data1['Gene ID'] = data1['Gene ID'].apply(z)
z= lambda x: x.split(':',1)[1]
data1['LSV ID'] = data1['LSV ID'].apply(z)
data1['LSV ID'] = data1['Gene ID']+':'+data1['LSV ID']
data1['ES']="True"

Then I ran
events=nease.run(data1, organism='Human',input_type='MAJIQ',remove_non_in_frame=True, only_divisible_by_3=False)
and it worked normally as following:
Processing MAJIQ format...
MAJIQ output converted successfully to NEASE format.
Data Summary
4 protein domains are affected by AS.

0 linear motifs are affected by AS.
0 interacting resiude are affected by AS.

1 of the affected domains/motifs have known interactions.
2 protein interactions/binding affected.

Running enrichment analysis...
NEASE enrichment done.

The number of domains seemed only a few while the rmats output from the same comparison group resulted in ~50 domains . I guessed that maybe caused by the stringent standard of majiq. I still have a small question whether only ES output files from rmats can be used in NEASE?

Best regards,
Lyuzu

@louadi
Copy link
Owner

louadi commented Dec 19, 2021

Hi Lyuzu,

Your fix should be fine in case 'probability_changing' corresponds to old: 'P(|dPSI|>=0.20) per LSV junction' and 'mean_dpsi_per_lsv_junction' to 'E(dPSI) per LSV junction', which I think is true but you can check MAJIQ documentation to be on the safe side or ask the developers.

About the number of events, I think that is normal given that these tools identify events differently and apply different tests. One thing to not forget is that MAJIQ also considers novel events and NEASE do not consider these events for downstream analysis.

You can also play with the dpsi for rmats input if you want to reduce the number of events, the default is 0.05:

events=nease.run(data, organism='Human',input_type='rmats', min_delta=0.05)

or for majiq (two parameters that we mentioned before)

events=nease.run(data1, organism='Human',input_type='MAJIQ', min_delta=0.05,  Majiq_confidence=0.95)

About the question of whether only ES output files from rmats can be used in NEASE:
For now yes, we limit rmats mapping to only ES events because mapping them to domains/motifs is straightforward, but you can also try to input coordinate of other events (the spliced part) as a standard input format for NEASE and proceed carefully. If you find some interesting events, please double-check their origin or visualize them.

I hope this answers your questions and good luck with your analysis.

Best regards,
Zakaria L

@Lyuzu
Copy link
Author

Lyuzu commented Dec 19, 2021

Dear louadi,
Thank you so much for prompt and careful answers which helped me a lot in my urgent task. Hope everything goes well with you.
Best regards,
Lyuzu

@Lyuzu Lyuzu closed this as completed Dec 19, 2021
@louadi louadi added wontfix This will not be worked on help wanted Extra attention is needed labels Apr 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants