Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Main command causing issue with getA2I.py #36

Closed
haydenshinn opened this issue Dec 6, 2023 · 8 comments
Closed

Main command causing issue with getA2I.py #36

haydenshinn opened this issue Dec 6, 2023 · 8 comments

Comments

@haydenshinn
Copy link

I am trying to look at RNA editing sites for multiple samples. I successfully ran sprint prepare, changed my repeat masker to bed file, etc. but with main - it runs and hits an error but it does not terminate the program. Using bwa0.7.17 and samtools-1.18 (although I attempted to make an environment with the specific versions even though they are out of date). Here is the error I am getting in the main command:

[E::hts_open_format] Failed to open file "/path/to/RNA_Editing/SPRINT/Output/Ctrl-03//tmp//genome_mskAG/all.bam" : No such file or directory
samtools view: failed to open "/path/to/RNA_Editing/SPRINT/Output/Ctrl-03//tmp//genome_mskAG/all.bam" for reading: No such file or directory
cp: cannot stat '/path/to/RNA_Editing/SPRINT/Output/Ctrl-03//tmp//genome_mskAG/./all.sam': No such file or directory
[E::hts_open_format] Failed to open file "/path/to/RNA_Editing/SPRINT/Output/Ctrl-03//tmp//genome_mskAG/all.bam" : No such file or directory
samtools view: failed to open "/path/to/RNA_Editing/SPRINT/Output/Ctrl-03//tmp//genome_mskAG/all.bam" for reading: No such file or directory

And here is my script:

# Find all R1 files in the fastq directory
find "$fastq_dir" -name "*_trimmed_R1.fq" | while read -r fq1; do
    # Extract sample name from R1 file without "trimmed"
    sample_name=$(basename "$fq1" | sed -E 's/_trimmed_R1.fq$//')

    # Construct R2 file path without "trimmed"
    fq2="${fastq_dir}/${sample_name}_trimmed_R2.fq"

    # Create a directory for the sample without "trimmed"
    sample_output_dir="${output_dir%/}/${sample_name}"
    mkdir -p "$sample_output_dir"

    # Print information for debugging
    echo "Sample Name: $sample_name"
    echo "Read 1: $fq1"
    echo "Read 2: $fq2"
    echo "Output Directory: $sample_output_dir"

    # Run the sprint command
    python2 "${sprint_path}/run.py" main -1 "$fq1" -2 "$fq2" -rp "${reference_dir}/ref.bed" "${reference_dir}/ref.fa" "${sample_output_dir}" "${bwa_installation}" "${samtools_installation}"

done

I don't understand there is a // in all of these paths. I even tried removing them manually in my script when I defined an output dir for each of my samples. As a result, the all.bam is not being created. I decided to see what kind of results I would get if I ran the getA2I.py since my main still finishing running with no errors other than that and created the bam files for each read. When I ran the A2I script this was the error I ran into:

IOError: [Errno 2] No such file or directory: 'path/to/RNA_Editing/SPRINT/Output/NO-01/tmp/SPRINT_identified_regular.res'

Can you please explain in detail what the inputs and outputs are for getA2I.py and how I can get around this error in the main? I am assuming they are linked because there is no .res file to be found in any of my main output folders and that seems to be what getA2I.py is taking in as an input. I also do not understand if I should literally just be passing in A_to_I_OUT or if that is a path to a directory or the name of the file I want to output to. Again, I'm just a bit confused as the manual is not very descriptive. Thanks in advance for the help.

@jumphone
Copy link
Owner

jumphone commented Dec 7, 2023

  1. '//' and ‘/’ have the same function in linux;
  2. You can directly download the executable program of SPRINT, BWA (0.7.12) and SAMTOOLS (1.2) from "https://github.com/jumphone/SPRINT/tree/master/bin/" and "https://github.com/jumphone/SPRINT/tree/master/samtools_and_bwa" . After clicking the file, there is a button for "download raw file". And please remember to make them executable “e.g. chmod 777 sprint”
  3. python | getA2I.py | 0 | path_to_directory_of_sprint_output | path_to_A_to_I_result

@haydenshinn
Copy link
Author

Hi there, I reran SPRINT using the correct versions of bwa and samtools and I definitely noticed more output files created so it fixed something but I'm not getting any .res files outputted.

Here is my out file:

$1: INT, 1 for strand-specific sequencing data. 
$2: OUTPUT_DIR of SPRINT
$3: OUTPUT_PATH of A-to-G RESs

Traceback (most recent call last):
  File "/home/user/github/SPRINT/utilities/getA2I.py", line 15, in <module>
    fi=open(regular)
IOError: [Errno 2] No such file or directory: 'path/SPRINT/Output/NO-01/tmp/SPRINT_identified_regular.res'

I have all sorts of output files in my designated output file from SPRINT main and I have run multiple different samples and none of them have outputted any sort of .res file. I thought maybe that was supposed to be an output from getA2I.py but I'm now seeing that that is part of what SPRINT main outputs. I looked into sprint main to try and figure out why the .res noticed in sprint_main.py around line 625 that there is good amount of lines commented out that involved the .res files so I thought maybe it had something to do with that. I could really use a bit more description as to what I could possibly be doing wrong or why I am not getting the expected output.

Here is what is contained in each of the output folders that I have "successfully" run with no error (copied from my SPRINT output directory).

all_combined.zz                hyper_mskAG.snv                              transcript_mskAG_all.zz.dedup.genome.zz
all_combined.zz.sorted         hyper_mskTC.snv                              transcript_mskAG_all.zz.dedup.snv
baseq.cutoff                   PARAMETER.txt                                transcript_mskAG_all.zz.dedup.snv.genome.snv
genome                         regular.snv                                  transcript_mskAG_all.zz.dedup.snv.genome.snv.sort
genome_all.zz.dedup            transcript                                   transcript_mskTC
genome_all.zz.dedup.snv        transcript_all.zz.dedup                      transcript_mskTC_all.zz.dedup
genome_mskAG                   transcript_all.zz.dedup.genome.zz            transcript_mskTC_all.zz.dedup.genome.zz
genome_mskAG_all.zz.dedup      transcript_all.zz.dedup.snv                  transcript_mskTC_all.zz.dedup.snv
genome_mskAG_all.zz.dedup.snv  transcript_all.zz.dedup.snv.genome.snv       transcript_mskTC_all.zz.dedup.snv.genome.snv
genome_mskTC                   transcript_all.zz.dedup.snv.genome.snv.sort  transcript_mskTC_all.zz.dedup.snv.genome.snv.sort
genome_mskTC_all.zz.dedup      transcript_mskAG
genome_mskTC_all.zz.dedup.snv  transcript_mskAG_all.zz.dedup

@jumphone jumphone reopened this Dec 19, 2023
@jumphone
Copy link
Owner

Your input path of getA2I.py is the "tmp" folder. Please directly use the output folder "path/SPRINT/Output/NO-01/".

@haydenshinn
Copy link
Author

Appreciate the quick response - I reran my script with your suggested path modification and came up with the same error.

Traceback (most recent call last):
  File "/path/github/SPRINT/utilities/getA2I.py", line 15, in <module>
    fi=open(regular)
IOError: [Errno 2] No such file or directory: '/path/SPRINT/Output/NO-01/SPRINT_identified_regular.res'

A .res file is supposed to be included in the output from sprint_main, correct? I was not seeing in the tmp directory but I am not seeing it in the parent directory either (the specified output dir).

@jumphone
Copy link
Owner

Here are my scripts:

sprint  main  -p 10 -1  $FQ1 -2 $FQ2  $REF  $SPRINT_OUTPUT_FOLDER  $BWA  $SAMTOOLS
python    getA2I.py     0     $SPRINT_OUTPUT_FOLDER      $A_TO_I_PATH

"SPRINT_identified_all.res", "SPRINT_identified_hyper.res", and "SPRINT_identified_regular.res" should be in the "$SPRINT_OUTPUT_FOLDER"

@haydenshinn
Copy link
Author

I got it working - thanks for your help :)

@qimiaonnegliguo
Copy link

May I ask the ref.fa you used for analysis? In my case it always reminds 'Traceback (most recent call last):
File "run.py", line 7, in
File "sprint/pipeline.py", line 42, in pipeline
File "sprint/sprint_main.py", line 567, in main
File "sprint/tools_bed/annotate.py", line 16, in annotate
ValueError: invalid literal for int() with base 10: 'chr1''

Mai I ask if I should remove the 'chr' in chr name?

@jumphone
Copy link
Owner

May I ask the ref.fa you used for analysis? In my case it always reminds 'Traceback (most recent call last): File "run.py", line 7, in File "sprint/pipeline.py", line 42, in pipeline File "sprint/sprint_main.py", line 567, in main File "sprint/tools_bed/annotate.py", line 16, in annotate ValueError: invalid literal for int() with base 10: 'chr1''

Mai I ask if I should remove the 'chr' in chr name?

Hi,

Error occurs in "sprint/tools_bed/annotate.py"

Please check the "repeat annotation file". It should be in "BED" format (without header).

Reason: Line 16 is "anno[seq[0]]=[ [int(seq[1])+1,int(seq[2]),seq[3],seq[4],seq[5]] ]". Therefore, it's not caused by the "chr" in the first column. Please check the 2,3,5,6 columns of your "repeat annotation file". They should be integer.

Best,
Feng

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants