-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with JUM_3.sh script #10
Comments
Hi, Would you send me a few lines of the AS_differential.txt file you got from running the R_script_JUM.R script? Thanks! I will help you debug. Qingqing |
Excellent, thanks for your help. First few lines of our AS_differential.txt: groupID featureID exonBaseMean dispersion stat pvalue padj X13m3_14m2 N2 log2fold_N2_13m3_14m2 genomicData.seqnames genomicData.start genomicData.end genomicData.width genomicData.strand countData.IndexA countData.IndexB countData.IndexC countData.IndexD transcripts |
I see. I think it is probably caused by the chromosome naming system. I
have tested JUM on organisms that generally use "chr1, chr2" etc. In your
case it looks like "IV", "V" etc for chromosomes.
Would you send me a few lines of the following files in your JUM_diff
folder:
1) UNION_junc_coor_with_junction_ID_more_than_*
2)
more_than_X_profiled_total_AS_event_junction_first_processing_for_JUM_reference_building.txt
At the same time I will go through every perl script called by JUM_3.sh to
confirm if for intron retention processing the chromosome naming is
restricted to the "chrX" system. I will keep you updated.
…On Thu, Dec 28, 2017 at 8:14 AM, MDBrokaw ***@***.***> wrote:
Excellent, thanks for your help.
First few lines of our AS_differential.txt:
groupID featureID exonBaseMean dispersion stat pvalue padj X13m3_14m2 N2
<https://maps.google.com/?q=X13m3_14m2+N2&entry=gmail&source=g> log2fold_
N2_13m3_14m2
<https://maps.google.com/?q=N2_13m3_14m2&entry=gmail&source=g>
genomicData.seqnames genomicData.start genomicData.end genomicData.width
genomicData.strand countData.IndexA countData.IndexB countData.IndexC
countData.IndexD transcripts
5_Junction_70652_Junction_70653:E001 5_Junction_70652_Junction_70653 E001
220.071429128683 0.0012714572240681 5.62693062493901 0.0176865742458979
0.055308582157126 2.30659362437753 2.38247202833251 0.253209445884905 IV
4389750 4389804 55 - 171 171 284 284 exonic_part_number 001 gene_id
5_Junction_70652_Junction_70653
5_Junction_70652_Junction_70653:E002 5_Junction_70652_Junction_70653 E002
4510.78340115722 0.000776507110775998 6.47168092666713 0.0109606814530061
0.0374330880787359 3.65835086903501 3.6452533692903 -0.0435186550123472 IV
4389750 4389889 140 - 3144 3144 6414 6414 exonic_part_number 002 gene_id
5_Junction_70652_Junction_70653
5_Junction_69673_Junction_69674:E001 5_Junction_69673_Junction_69674 E001
77.3547483148137 0.00737832058902636 6.22512000683747 0.0125950371902186
0.0420323408605057 1.86361802390202 1.92451620812156 0.204916821318095 IV
363879 363974 96 - 60 60 100 100 exonic_part_number 001 gene_id
5_Junction_69673_Junction_69674
5_Junction_69673_Junction_69674:E002 5_Junction_69673_Junction_69674 E002
9.5724903310595 0.00367549702846617 6.56934219635649 0.0103750290524118
0.0356617696026629 1.14516301760211 0.815348486625332 -1.22800023701497 IV
363879 364207 329 - 4 4 18 18 exonic_part_number 002 gene_id
5_Junction_69673_Junction_69674
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#10 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AZPn2-1jAQNGO0L3GWkw8HEJk0QaOkb0ks5tE75rgaJpZM4RHSZM>
.
|
Ahh, interesting. Thanks for the help. Here are the first few lines of UNION_junc_coor_with_junction_ID_more_than_* I 10432829 10432855 0 Junction_685 And here is more_than_...... 5_Junction_70652_Junction_70653:001 IV - 4389750 4389804 Thanks again. |
Hi there,
I am terribly sorry for my really late response... I hope you are still
there.
It looks like the input files for the perl script
count_intron_read_long_intron_retention_step2.pl have some formatting issue
and where it supposed to be numbers (genomic coordinates of the junctions)
turned out to be strand (which is "+" and "-"). It may have something to
do with the specific organism that you work with and the format the genome
is arranged.
From the large file size you reported that experienced error, I think it is
the total long intron calculation that experienced problems. Is it OK that
you can send me a few lines for the following file, in your JUM_diff folder:
1) a file ending with "coverage_temp_long_intron_overlap_total.txt"
2) a file ending with "
temp_long_intron_retention_junction_coordinate_with_read_num_total.txt"
3) a file called "temp_long_intron_retention_junction_coordinate_total.txt"
I also realized that this perl script is not very good at memory usage. I
will fix it in the next big update, which will come in a week or two.
Thanks and I promise that from now on I will be much more responsive about
issue reported by users.
Thank you so much for running JUM and providing feedbacks!
Qingqing
…On Mon, Jan 1, 2018 at 1:34 PM, MDBrokaw ***@***.***> wrote:
Ahh, interesting. Thanks for the help. Here are the first few lines of
UNION_junc_coor_with_junction_ID_more_than_*
I 10432829 10432855 0 Junction_685
I 13653263 13653368 0 Junction_4534
I 14178098 14178142 0 Junction_5097
I 4117400 4117448 0 Junction_9531
I 4186712 4186768 0 Junction_9674
I 4556015 4556061 0 Junction_10152
I 536306 536348 0 Junction_11629
I 5429623 5429825 0 Junction_11762
I 6029749 6029947 0 Junction_12959
I 6100718 6100763 0 Junction_13098
I 10014582 10014730 + Junction_2
I 10015142 10015194 + Junction_3
And here is more_than_......
5_Junction_70652_Junction_70653:001 IV - 4389750 4389804
5_Junction_70652_Junction_70653:002 IV - 4389750 4389889
5_Junction_69673_Junction_69674:001 IV - 363879 363974
5_Junction_69673_Junction_69674:002 IV - 363879 364207
5_Junction_67594_Junction_67595:001 IV - 17446610 17447286
5_Junction_67594_Junction_67595:002 IV - 17446610 17449561
5_Junction_71869_Junction_71870:001 IV - 5351583 5351797
5_Junction_71869_Junction_71870:002 IV - 5351583 5351805
Thanks again.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#10 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AZPn21o04Dr5BQ2gaZk_cvUh54Eb15gdks5tGU9_gaJpZM4RHSZM>
.
|
Greetings, still here! Below are a few lines from the three files you suggested. Thanks for your assistance! IndexC_coverage_temp_long_intron_overlap_total.txt IndexC_temp_long_intron_retention_junction_coordinate_with_read_num_total.txt temp_long_intron_retention_junction_coordinate_total.txt |
OK. The format for these files look good. I am wondering if there are some
weird lines in these input files that are causing trouble. Do you mind
sharing with me the following files (this time, complete files):
Set 1:
1) Any file that ending with
coverage_temp_long_intron_overlap_"$pvalue_padj"_"$cutoff".txt
2) A file called
temp_long_intron_retention_junction_coordinate_"$pvalue_padj"_"$cutoff".txt;
Set 2:
1) Any file that ending with coverage_temp_long_intron_overlap_total.txt
2) A file called temp_long_intron_retention_junction_coordinate_total.txt
These are the only two sets of files that the script
count_intron_read_long_intron_retention_step2.pl calls. I am going to test
them on my end and see if they spit out similar errors. I am also going
to check
if there are any weird lines in these input files.
Feel free to share with me with either dropbox or google drive. Let me
know if you prefer other ways to send these files.
Qingqing
…On Thu, Feb 15, 2018 at 11:23 AM, MDBrokaw ***@***.***> wrote:
Greetings, still here!
Below are a few lines from the three files you suggested. Thanks for your
assistance!
IndexC_coverage_temp_long_intron_overlap_total.txt
X 60037 60038 32
X 60038 60040 30
X 60040 60041 29
IndexC_temp_long_intron_retention_junction_coordinate_
with_read_num_total.txt
I 100625 100626 - Junction_72 3
I 100626 100627 - Junction_72 3
I 100627 100628 - Junction_72 3
temp_long_intron_retention_junction_coordinate_total.txt
I 100625 101503 - Junction_72
I 10188896 10189261 - Junction_282
I 10189166 10189990 - Junction_283
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#10 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AZPn23YsWEEmyiDWOHBZbNF5BXI1t857ks5tVIQxgaJpZM4RHSZM>
.
|
OK, I have attached one of the files you mentioned. The other three requests you made are for files that are created but left empty. (Disclosure: since the time of the initial message I deleted all files/scripts, re-installed the newest version of JUM [1.3.11] and started over. I am now getting errors at the same step, but of a slightly different nature. Giant error file created and the new error message is as follows…) |
I see. Now it seems the previous error is skipped and the error message
comes from a later step in JUM_3.sh. I will need another set of input
files. To save us time, will it be possible for me to access your JUM_diff
folder before you ran JUM_3.sh? A share through dropbox or google drive
will suffice, and will be the size of several GBs.
The files I need to run a thorough debug are:
UNION_junc_coor_with_junction_ID*
more_than_X_profiled_total_AS_event*
*combined_count.txt
*Aligned.out_coverage.bed
AS_differential.txt
combined_AS_JUM.gff
Basically, all the input files in the JUM_diff folder after your Rscript
run. If you do a "ls -l -t" , the files I need are the files before and
upon the generation of AS_differential.txt.
Is it possible to do this? The error sounds like some weird file format
issue, which should be easy to fix but I will need to spot the weird lines.
Thank you!
Qingqing
…On Mon, Feb 19, 2018 at 2:24 PM, MDBrokaw ***@***.***> wrote:
OK, I have attached one of the files you mentioned. The other three
requests you made are for files that are created but left empty.
(Disclosure: since the time of the initial message I deleted all
files/scripts, re-installed the newest version of JUM [1.3.11] and started
over. I am now getting errors at the same step, but of a slightly different
nature. Giant error file created and the new error message is as follows…)
Use of uninitialized value $array[3] in hash element at
../../JUM_1.3.11/profiling_splicing_patterns_from_AS_events_1.pl line 25,
line 1.
Use of uninitialized value $array[1] in hash element at
../../JUM_1.3.11/profiling_splicing_patterns_from_AS_events_1.pl line 25,
line 1.
Use of uninitialized value $array[2] in hash element at
../../JUM_1.3.11/profiling_splicing_patterns_from_AS_events_1.pl line 25,
line 1.
Use of uninitialized value $array[3] in hash element at
../../JUM_1.3.11/profiling_splicing_patterns_from_AS_events_1.pl line 25,
line 2.
…etc.
temp_long_intron_retention_AS_differential_pvalue_0.05.txt
<https://github.com/qqwang-berkeley/JUM/files/1738322/temp_long_intron_retention_AS_differential_pvalue_0.05.txt>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#10 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AZPn2_hY1x7FbwT_nvdA0mghe1WWwRORks5tWfSzgaJpZM4RHSZM>
.
|
Excellent. Here is a link to a Google Drive that should have all files you requested. https://drive.google.com/open?id=1ARU9VJwuV-259q6q-cFg2j0RKG_KCB7f Thanks again! (Sorry for the hassle.) |
The reason you got that error message is because the file:
UNION_junc_coor_with_junction_ID_more_than_5_read_in_at_least_2_samples_formatted_junction_list.txt
is put in the JUM_diff folder for downstream analysis, but the correct file
should be:
*UNION_junc_coor_with_junction_ID_more_than_5_read_in_at_least_2_samples.txt*
These are two completely different files with different format; that is why
the script profiling_splicing_patterns_from_AS_events_1.pl in JUM_3.sh could
not recognize it and generated empty files.
Is there any chance that the wrong file was copied to the JUM_diff folder
by mistake? At the end of JUM_2-3.sh it should copy the right file to the
JUM_diff folder as
the last line of JUM_2-3.sh is:
cp
UNION_junc_coor_with_junction_ID_more_than_"$threshold"_read_in_at_least_"$file_num"_samples.txt
JUM_diff/
I understand that the file names are similar to each other and cause
confusion. I will keep a note about that and in the upcoming upgrade I
will let JUM delete the intermediate files that will no longer be needed in
the downstream analysis, so as to reduce the confusion from users. Thank
you for the feedback!
Let me know if it runs fine now once the correct file is put to the
JUM_diff folder AND the wrong file is deleted.
P.S. I notice that the replicates count files are the same for each
condition. Is it because you originally didn't have biological
replicates? But the coverage.bed files look different for the replicates
though. If you don't have replicates let me know and I will send you a
detailed instruction about a workaround. You can still use JUM in that
scenario.
…On Wed, Feb 21, 2018 at 10:21 AM, MDBrokaw ***@***.***> wrote:
Excellent. Here is a link to a Google Drive that should have all files you
requested.
https://drive.google.com/open?id=1ARU9VJwuV-259q6q-cFg2j0RKG_KCB7f
Thanks again! (Sorry for the hassle.)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#10 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AZPn28WApduIHLZscH0FNoWRI-cH7rKkks5tXF65gaJpZM4RHSZM>
.
|
Oops, that was my mistake in uploading to the Google Drive. I accidentally uploaded the wrong file (it came from the JUMwork directory). The correct file, UNION_junc_coor_with_junction_ID_more_than_5_read_in_at_least_2_samples.txt, is correctly copied into the JUM_diff directory. I have added it to the Google Drive as well. |
I just ran JUM_3.sh in the JUM_diff folder and it went smooth without any
error report... And it is finished within minutes.
*qingqing@compute1*:*qingqing/JUM_troubleshoot/JUM_diff*$ bash
~/JUM_1.3.11/JUM_3.sh ~/JUM_1.3.11 pvalue 0.05 4 2
Smartmatch is experimental at
/mnt//riolab/qingqing/JUM_1.3.11/profiling_splicing_patterns_from_AS_events_3_updated.pl
line 116.
*qingqing@compute1*:*qingqing/JUM_troubleshoot/JUM_diff*$
I did delete the wrong file "UNION_junc_coor_with_junction_
ID_more_than_5_read_in_at_least_2_samples_formatted_junction_list.txt"
first.
I am sharing with you through google drive the folder JUM_diff after
running JUM_3.sh and the resulted FINAL_JUM_OUTPUT inside.
https://drive.google.com/drive/folders/1x8rpg0I6InjRUkZsmCsn90NgqrE8SaQ6?usp=sharing
Also, since your files don't have "chr" in the chromosome names, I am
attaching an updated gene name mapping script with this email for running
JUM_4.sh. I will include this change in the upcoming update. I will
definitely add some cleaning and renaming steps in the update too so that
file names are not as confusing to users.
…On Sat, Feb 24, 2018 at 12:00 PM, MDBrokaw ***@***.***> wrote:
Oops, that was my mistake in uploading to the Google Drive. I accidentally
uploaded the wrong file (it came from the JUMwork directory).
The correct file, UNION_junc_coor_with_junction_ID_more_than_5_read_in_at_least_2_samples.txt,
is correctly copied into the JUM_diff file. I have added it to the Google
Drive as well.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#10 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AZPn20VdC22tJ7Hf3IBUm4DI6LLkrcvBks5tYGpWgaJpZM4RHSZM>
.
|
Ooooooooo, you just gave me the hint as to what I was doing wrong.... I didn't realize I needed to enter the JUM_diff directory and run JUM_3 from there. I was still in JUMwork where I had run previous JUM_2-3.... OOPS! When I run JUM_3 in the proper directory, everything looks great! Thanks for your help/patience! Regarding the updated JUM_4 script, where can I find it / where has it been attached? THANKS AGAIN. |
Glad to hear that it works! Thank you for running JUM and for the great
feedback. The manual will definitely be updated to facilitate users more.
For running JUM_4.sh, simply do the following:
1) copy the attached script to your JUM script folder (the folder you
downloaded from the JUM github page, named as JUM_1.3_11; aka the one
contains all the bash scripts and perl scripts, with file names ending with
".sh" or ".pl"). The copy command will automatically replace the original
perl script in the folder.
2) Proceed to JUM_4.sh as instructed in the manual, step 17 and 18. Do
notice that you need to run JUM_4.sh in the FINAL_JUM_OUTPUT folder :)
JUM_4.sh will call a few scripts, including the newly edited gene name
mapping one.
Let me know if you have any questions.
Qingqing
…On Mon, Feb 26, 2018 at 7:03 PM, MDBrokaw ***@***.***> wrote:
Ooooooooo, you just gave me the hint as to what I was doing wrong....
I didn't realize I needed to enter the JUM_diff directory and run JUM_3
from there. I was still in JUMwork where I had run previous JUM_2-3....
OOPS!
When I run JUM_3 in the proper directory, everything looks great! Thanks
for your help/patience!
Regarding the updated JUM_4 script, where can I find it / where has it
been attached? THANKS AGAIN.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#10 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AZPn20D2uv5IGokxvmwBZqSatv5dEBspks5tY3CHgaJpZM4RHSZM>
.
|
Happy to help! (And be helped :) Hmmmm, I don't see any script attachment associated with your message on GitHub. |
OK. I put the file called:
identify_gene_name_for_JUM_output_1.pl
in the google drive folder I shared with you previously (JUM_diff). Check
if you have it. Then you can:
1) copy the attached script to your JUM script folder (the folder you
downloaded from the JUM github page, named as JUM_1.3_11; aka the one
contains all the bash scripts and perl scripts, with file names ending with
".sh" or ".pl"). The copy command will automatically replace the original
perl script in the folder.
2) Proceed to JUM_4.sh as instructed in the manual, step 17 and 18. Do
notice that you need to run JUM_4.sh in the FINAL_JUM_OUTPUT folder :)
JUM_4.sh will call a few scripts, including the newly edited gene name
mapping one.
Let me know if you have any questions :)
…On Tue, Feb 27, 2018 at 5:51 AM, MDBrokaw ***@***.***> wrote:
Happy to help! (And be helped :)
Hmmmm, I don't see any script attachment associated with your message on
GitHub.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#10 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AZPn21CNde04InB0jRQK5N_hzH3lb2-nks5tZAhdgaJpZM4RHSZM>
.
|
Ah, beautiful! I have completed the pipeline (with my test data) successfully, and everything looks great. Ready to plug in my full experiment in the future. Thanks! As an aside, is there any way to handle more than two conditions simultaneously? e.g., wild type, mutant1, mutant2? THANKS AGAIN. |
Absolutely!
So for more than two conditions, it can come down to two scenarios:
1) time course experiments
For this scenario, you just need to run all samples together from the very
start as instructed on the manual. Then in the JUM_diff folder, right
before the Rscript running step, you want to supply an
experiment_design.txt file with the time course information, for example:
condition
sample1_1 0h
sample1_2 0h
sample2_1 2h
sample2_2 2h
sample3_1 4h
sample3_2 4h
etc.
Then you follow the instructions again until the end. What you get in the
FINAL_JUM_OUTPUT are files recording AS events that are changed in at least
one of the time points compared to the beginning time point.
2) multiple conditions
For this scenario, you have two options:
2.1 . You can run everything together from the beginning as instructed,
and then in the JUM_diff folder, right before you run the Rscript step, you
separate the files into sub directories, like:
mut1_vs_WT, mut2_vs_WT, etc. In each of the directories, copy the
corresponding input files from the current JUM_diff folder into each
subdirectory, for example in the mut1_vs_WT directory the input files
should include:
combined_AS_JUM.gff
more_than_5_profiled_total_AS_event_junction_first_processing_for_JUM_reference_building.txt
UNION_junc_coor_with_junction_ID_more_than_5_read_in_at_least_2_samples.txt
WT1Aligned.out_coverage.bed
WT2Aligned.out_coverage.bed
WT3Aligned.out_coverage.bed
mut1_1Aligned.out_coverage.bed
mut1_2Aligned.out_coverage.bed
mut1_3Aligned.out_coverage.bed
WT1_combined_count.txt
WT2_combined_count.txt
WT3_combined_count.txt
mut1_1_combined_count.txt
mut1_2_combined_count.txt
mut1_3_combined_count.txt
Note, for common comparison samples like the WT samples here, you need to
copy twice, to each of the subdirectories.
Then, in each of the subdirectories you ran the Rscript and JUM_3.sh,
JUM_4.sh as instructed in the manual.
The advantage for running this way is that all junctions detected from all
samples are named using the same ID number system, so it is extremely easy
for you to compare the two mutant conditions later after the JUM pipeline
is finished, for overlapped AS events detected by both mutant conditions or
each mutant condition respectively, because all AS event will share the
same naming ID system.
2.2 You can also choose to run everything separately from the beginning.
Basically, construct subdirectories for each comparison before the
JUM_2-1.sh run, and then each mutant condition will run its own independent
JUM pipeline. The advantage for this way is that it is straightforward and
it will specifically compare each mutant condition to WT (for example,
suppose you have mutant condition 1 that is vastly different from mutant
condition 2 compared to WT, then some junctions that are specific to mutant
condition 1 will be taken into account when doing the mutant condition 2 vs
WT comparison if you are following the way of 2.1, which may not be
ideal). The disadvantage is that you can't compare the results from the
two mutant conditions straightly, because each comparison will have their
own naming system. So you need to compare the results from condition 1vs
WT and conditon 2vs WT through checking the junction coordinates. It is
not too bad, but just more work.
I would say if your conditions are biologically similar (for example,
similar clones of the same edited cell line etc.) and sequencing samples
are quite rigorously prepared with good depth and not so much variation,
then go for 2.1. If your conditions are different, then go for 2.2.
I will include these instructions to the upcoming update of JUM. Thank you
for all the feedback.
…On Wed, Feb 28, 2018 at 12:29 PM, MDBrokaw ***@***.***> wrote:
Ah, beautiful! I have completed the pipeline (with my test data)
successfully, and everything looks great. Ready to plug in my full
experiment in the future. Thanks!
As an aside, is there any way to handle more than two conditions
simultaneously? e.g., wild type, mutant1, mutant2? THANKS AGAIN.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#10 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AZPn24-AhK2k2VIlDzmTbjsH7duBzJcwks5tZbdEgaJpZM4RHSZM>
.
|
Excellent! Thanks again! |
I’ve encountered an issue with the JUM_3.sh script.
Runs OK until reaching the “long_intron_retention” processing steps. At this point it generates enormous temp files (e.g. IndexA_temp_long_intron_retention_junction_coordinate_with_read_num_pvalue_0.05) that are 20-200 GB.
At this point it also reports millions of identical errors:
Use of uninitialized value in print at count_intron_read_long_intron_retention_step2.pl line 44, <IN2> line 1
These millions of errors are preceded by this single error:
ERROR: illegal character ‘+’ found in integer conversion of string “+”. Exiting…
Argument “+” isn’t numeric in subtraction (-) at count_intron_read_long_intron_retention_step2.pl line 40, <IN2> line 1.
Any ideas? All the other classes of Alternative splicing (e.g. 5’SS, etc.) seem to have been computed just fine. Thanks!
The text was updated successfully, but these errors were encountered: