Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error in HERV #10

Closed
SIJAE-WOO opened this issue Jul 27, 2021 · 11 comments
Closed

error in HERV #10

SIJAE-WOO opened this issue Jul 27, 2021 · 11 comments

Comments

@SIJAE-WOO
Copy link

Hi,
xTea works well for L1/Alu/SVA, but I got some errors in HERV.
First error is like below.
Traceback (most recent call last): File "/home/sijaewoo/tools/xTea/xtea/x_TEA_main.py", line 1118, in <module> xmutation.call_mutations_from_reads_algnmt(sf_sites, sf_cns, n_len_cutoff, n_jobs, sf_merged_vcf) File "/home/sijaewoo/tools/xTea/xtea/x_mutation.py", line 39, in call_mutations_from_reads_algnmt m_sites=xsites.load_in_qualified_sites_from_xTEA_output(i_len_cutoff) File "/home/sijaewoo/tools/xTea/xtea/x_sites.py", line 49, in load_in_qualified_sites_from_xTEA_output is_lth=int(fields[-1]) ValueError: invalid literal for int() with base 10: '867.0'
I change ' is_lth=int(fields[-1])' to ' is_lth=int(float(fields[-1]))'.
Second error was in "x_mutation.py".
"pysam/libcalignmentfile.pyx", line 990, in pysam.libcalignmentfile.AlignmentFile._open ValueError: file has no sequences defined (mode='rb') - is it SAM/BAM format? Consider opening with check_sq=False
So, I changed the code like this.
165 bam_in = pysam.AlignmentFile(sf_algnmt, 'rb', reference_filename=sf_ref, check_sq=False)
After that, I ran the xtea HERV again. But it seems it's not work because the final file is internal_snp.vcf.gz.
If you know the solution, please let me know.
Thank you,
Sijae

@simoncchu
Copy link
Collaborator

Hi, for HERV, could you try to run with "-f 1555" instead of "-f 5907".

@SIJAE-WOO
Copy link
Author

Hi, I tried "-f 1555" with other default options, but it didn't work too. The job stopped in the middle and the final files are like this.
image

Last standard out is below.
`Running command: bwa mem -t 8 -T 9 -k 9 -o my_path/tmp/cns/temp_clip.sam.non_polyA.sam /home/sijaewoo/tools/xTea/rep_lib_annotation/consensus/HERV.fa my_path/tmp/cns/candidate_sites_all_clip.fq.non_polyA.fq

Running command: bwa mem -t 8 -T 7 -k 7 -o my_path/tmp/cns/temp_clip.sam.polyA.sam -c 70 /home/sijaewoo/tools/xTea/rep_lib_annotation/consensus/HERV.fa my_path/tmp/cns/candidate_sites_all_clip.fq.polyA.fq

Running command: bwa mem -t 8 -o my_path/tmp/cns/temp_disc.sam /home/sijaewoo/tools/xTea/rep_lib_annotation/consensus/HERV.fa my_path/tmp/cns/candidate_sites_all_disc.fa

/home/sijaewoo/.conda/envs/xtea/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see t
he module's documentation for alternative uses
import imp`

@simoncchu
Copy link
Collaborator

Could you post the content of the generated .sh file? Also any error reported?

@SIJAE-WOO
Copy link
Author

This is the content of the generated .sh file.
image
And total standard output and standard error are in this file. Final standard output and final generated files are same as above comment.
test1_xtea_HERV_sdout.txt

@simoncchu
Copy link
Collaborator

Could you post the content of run_xTEA_pipeline.sh?

@SIJAE-WOO
Copy link
Author

SIJAE-WOO commented Aug 1, 2021

This is the content of run_xTEA_pipeline.sh.
PREFIX=/home/sijaewoo/test_xtea/test1/HERV/ ############ ############ ANNOTATION=/home/sijaewoo/tools/xTea/rep_lib_annotation/HERV/hg38/hg38_HERV.out REF=/commons/Reference/1000Genomes/GRCh38_full_analysis_set_plus_decoy_hla.fa ANNOTATION1=/home/sijaewoo/tools/xTea/rep_lib_annotation/HERV/hg38/hg38_HERV.out GENE=/home/sijaewoo/gencode/gencode.v38.chr_patch_hapl_scaff.annotation.gff3 BLACK_LIST=null L1_COPY_WITH_FLANK=/home/sijaewoo/tools/xTea/rep_lib_annotation/HERV/hg38/hg38_HERV_copies_with_flank.fa SF_FLANK=null L1_CNS=/home/sijaewoo/tools/xTea/rep_lib_annotation/consensus/HERV.fa XTEA_PATH=/home/sijaewoo/tools/xTea/xtea/ BAM_LIST=${PREFIX}"bam_list.txt" BAM1=${PREFIX}"10X_phased_possorted_bam.bam" BARCODE_BAM=${PREFIX}"10X_barcode_indexed.sorted.bam" TMP=${PREFIX}"tmp/" TMP_CLIP=${PREFIX}"tmp/clip/" TMP_CNS=${PREFIX}"tmp/cns/" TMP_TNSD=${PREFIX}"tmp/transduction/" ############ ############ python ${XTEA_PATH}"x_TEA_main.py" -C --dna -i ${BAM_LIST} --lc 3 --rc 3 --cr 1 -r ${L1_COPY_WITH_FLANK} -a ${ANNOTATION} --cns ${L1_CNS} --ref ${REF} -p ${TMP} -o ${PREFIX}"candidate_list_from_clip.txt" -n 8 --cp /home/sijaewoo/test_xtea/test1/pub_clip/ python ${XTEA_PATH}"x_TEA_main.py" -D --dna -i ${PREFIX}"candidate_list_from_clip.txt" --nd 5 --ref ${REF} -a ${ANNOTATION} -b ${BAM_LIST} -p ${TMP} -o ${PREFIX}"candidate_list_from_disc.txt" -n 8 python ${XTEA_PATH}"x_TEA_main.py" -N --dna --cr 3 --nd 5 -b ${BAM_LIST} -p ${TMP_CNS} --fflank ${SF_FLANK} --flklen 3000 -n 8 -i ${PREFIX}"candidate_list_from_disc.txt" -r ${L1_CNS} --ref ${REF} -a ${ANNOTATION} -o ${PREFIX}"candidate_disc_filtered_cns.txt" python ${XTEA_PATH}"x_TEA_main.py" --gene -a ${GENE} -i ${PREFIX}"candidate_disc_filtered_cns.txt" -n 8 -o ${PREFIX}"candidate_disc_filtered_cns_with_gene.txt"

@simoncchu
Copy link
Collaborator

candidate_disc_filtered_cns_with_gene.txt this one is the final output.

@dr-ashu-geno
Copy link

dr-ashu-geno commented Dec 20, 2021

Dear @simoncchu

Thanks a lot for helping with issues and questions, the program works perfectly for LINE1, ALU, and SVA now. But I have the same error mentioned above for HERV. However, job is completed (not failed) and the final output (candidate_disc_filtered_cns_with_gene.txt) is there. I was wondering why VCF file is not made for HERV like other MEIs (such as L1, ALU, and SVA). And how to use the information of candidate_disc_filtered_cns_with_gene.txt file along with those VCF files of other MEIs for downstream analysis?

Thank you for your kind help,

Best regards,

@simoncchu
Copy link
Collaborator

@dr-ashu-geno right now I didn't output the vcf format. As initially, we do not think there are many herv insertions. The current output tells you the positions, maybe you can covert to a simpler version of vcf for now. I'll add the standard output in the next release.

@wxu005
Copy link

wxu005 commented Mar 4, 2024

Hello,
Upon running the HERV module, I encountered an issue within the function:
xfilter.merge_clip_disc(HERV/tmp/disc_tmp.list, HERV/candidate_list_from_clip.txt, sf_out) of x_TEA_main.py.
The file HERV/tmp/disc_tmp.list contains only about 120 rows, which is too small and leads to the generation of thousands of subsequent error lines, such as:
"Error happen at merge clip and disc feature step: 16 not exist
…….."
According to literature, Human Endogenous Retroviruses (HERV) constitute a significant portion of the human genome, comprising approximately 98,000 ERV elements and fragments, accounting for 5–8%.
I am curious as to why the disc_tmp.list file is so small. Are the parameters settings too stringent, or are there any parameters that can be adjusted to increase its size?
thanks

@simoncchu
Copy link
Collaborator

@wxu005, not that many HERV are polymorphic, most of the copies are fixed and no "SV level difference" compared to the reference genome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants