error in HERV #10

SIJAE-WOO · 2021-07-27T07:52:52Z

Hi,
xTea works well for L1/Alu/SVA, but I got some errors in HERV.
First error is like below.
Traceback (most recent call last): File "/home/sijaewoo/tools/xTea/xtea/x_TEA_main.py", line 1118, in <module> xmutation.call_mutations_from_reads_algnmt(sf_sites, sf_cns, n_len_cutoff, n_jobs, sf_merged_vcf) File "/home/sijaewoo/tools/xTea/xtea/x_mutation.py", line 39, in call_mutations_from_reads_algnmt m_sites=xsites.load_in_qualified_sites_from_xTEA_output(i_len_cutoff) File "/home/sijaewoo/tools/xTea/xtea/x_sites.py", line 49, in load_in_qualified_sites_from_xTEA_output is_lth=int(fields[-1]) ValueError: invalid literal for int() with base 10: '867.0'
I change ' is_lth=int(fields[-1])' to ' is_lth=int(float(fields[-1]))'.
Second error was in "x_mutation.py".
"pysam/libcalignmentfile.pyx", line 990, in pysam.libcalignmentfile.AlignmentFile._open ValueError: file has no sequences defined (mode='rb') - is it SAM/BAM format? Consider opening with check_sq=False
So, I changed the code like this.
165 bam_in = pysam.AlignmentFile(sf_algnmt, 'rb', reference_filename=sf_ref, check_sq=False)
After that, I ran the xtea HERV again. But it seems it's not work because the final file is internal_snp.vcf.gz.
If you know the solution, please let me know.
Thank you,
Sijae

The text was updated successfully, but these errors were encountered:

simoncchu · 2021-07-27T21:11:57Z

Hi, for HERV, could you try to run with "-f 1555" instead of "-f 5907".

SIJAE-WOO · 2021-07-28T04:22:24Z

Hi, I tried "-f 1555" with other default options, but it didn't work too. The job stopped in the middle and the final files are like this.

Last standard out is below.
`Running command: bwa mem -t 8 -T 9 -k 9 -o my_path/tmp/cns/temp_clip.sam.non_polyA.sam /home/sijaewoo/tools/xTea/rep_lib_annotation/consensus/HERV.fa my_path/tmp/cns/candidate_sites_all_clip.fq.non_polyA.fq

Running command: bwa mem -t 8 -T 7 -k 7 -o my_path/tmp/cns/temp_clip.sam.polyA.sam -c 70 /home/sijaewoo/tools/xTea/rep_lib_annotation/consensus/HERV.fa my_path/tmp/cns/candidate_sites_all_clip.fq.polyA.fq

Running command: bwa mem -t 8 -o my_path/tmp/cns/temp_disc.sam /home/sijaewoo/tools/xTea/rep_lib_annotation/consensus/HERV.fa my_path/tmp/cns/candidate_sites_all_disc.fa

/home/sijaewoo/.conda/envs/xtea/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see t
he module's documentation for alternative uses
import imp`

simoncchu · 2021-07-29T20:49:43Z

Could you post the content of the generated .sh file? Also any error reported?

SIJAE-WOO · 2021-07-30T02:10:59Z

This is the content of the generated .sh file.

And total standard output and standard error are in this file. Final standard output and final generated files are same as above comment.
test1_xtea_HERV_sdout.txt

simoncchu · 2021-08-01T17:24:46Z

Could you post the content of run_xTEA_pipeline.sh?

SIJAE-WOO · 2021-08-01T23:40:13Z

This is the content of run_xTEA_pipeline.sh.
PREFIX=/home/sijaewoo/test_xtea/test1/HERV/ ############ ############ ANNOTATION=/home/sijaewoo/tools/xTea/rep_lib_annotation/HERV/hg38/hg38_HERV.out REF=/commons/Reference/1000Genomes/GRCh38_full_analysis_set_plus_decoy_hla.fa ANNOTATION1=/home/sijaewoo/tools/xTea/rep_lib_annotation/HERV/hg38/hg38_HERV.out GENE=/home/sijaewoo/gencode/gencode.v38.chr_patch_hapl_scaff.annotation.gff3 BLACK_LIST=null L1_COPY_WITH_FLANK=/home/sijaewoo/tools/xTea/rep_lib_annotation/HERV/hg38/hg38_HERV_copies_with_flank.fa SF_FLANK=null L1_CNS=/home/sijaewoo/tools/xTea/rep_lib_annotation/consensus/HERV.fa XTEA_PATH=/home/sijaewoo/tools/xTea/xtea/ BAM_LIST=${PREFIX}"bam_list.txt" BAM1=${PREFIX}"10X_phased_possorted_bam.bam" BARCODE_BAM=${PREFIX}"10X_barcode_indexed.sorted.bam" TMP=${PREFIX}"tmp/" TMP_CLIP=${PREFIX}"tmp/clip/" TMP_CNS=${PREFIX}"tmp/cns/" TMP_TNSD=${PREFIX}"tmp/transduction/" ############ ############ python ${XTEA_PATH}"x_TEA_main.py" -C --dna -i ${BAM_LIST} --lc 3 --rc 3 --cr 1 -r ${L1_COPY_WITH_FLANK} -a ${ANNOTATION} --cns ${L1_CNS} --ref ${REF} -p ${TMP} -o ${PREFIX}"candidate_list_from_clip.txt" -n 8 --cp /home/sijaewoo/test_xtea/test1/pub_clip/ python ${XTEA_PATH}"x_TEA_main.py" -D --dna -i ${PREFIX}"candidate_list_from_clip.txt" --nd 5 --ref ${REF} -a ${ANNOTATION} -b ${BAM_LIST} -p ${TMP} -o ${PREFIX}"candidate_list_from_disc.txt" -n 8 python ${XTEA_PATH}"x_TEA_main.py" -N --dna --cr 3 --nd 5 -b ${BAM_LIST} -p ${TMP_CNS} --fflank ${SF_FLANK} --flklen 3000 -n 8 -i ${PREFIX}"candidate_list_from_disc.txt" -r ${L1_CNS} --ref ${REF} -a ${ANNOTATION} -o ${PREFIX}"candidate_disc_filtered_cns.txt" python ${XTEA_PATH}"x_TEA_main.py" --gene -a ${GENE} -i ${PREFIX}"candidate_disc_filtered_cns.txt" -n 8 -o ${PREFIX}"candidate_disc_filtered_cns_with_gene.txt"

simoncchu · 2021-08-02T11:45:48Z

candidate_disc_filtered_cns_with_gene.txt this one is the final output.

dr-ashu-geno · 2021-12-20T09:23:13Z

Dear @simoncchu

Thanks a lot for helping with issues and questions, the program works perfectly for LINE1, ALU, and SVA now. But I have the same error mentioned above for HERV. However, job is completed (not failed) and the final output (candidate_disc_filtered_cns_with_gene.txt) is there. I was wondering why VCF file is not made for HERV like other MEIs (such as L1, ALU, and SVA). And how to use the information of candidate_disc_filtered_cns_with_gene.txt file along with those VCF files of other MEIs for downstream analysis?

Thank you for your kind help,

Best regards,

simoncchu · 2021-12-20T16:44:50Z

@dr-ashu-geno right now I didn't output the vcf format. As initially, we do not think there are many herv insertions. The current output tells you the positions, maybe you can covert to a simpler version of vcf for now. I'll add the standard output in the next release.

wxu005 · 2024-03-04T17:11:59Z

Hello,
Upon running the HERV module, I encountered an issue within the function:
xfilter.merge_clip_disc(HERV/tmp/disc_tmp.list, HERV/candidate_list_from_clip.txt, sf_out) of x_TEA_main.py.
The file HERV/tmp/disc_tmp.list contains only about 120 rows, which is too small and leads to the generation of thousands of subsequent error lines, such as:
"Error happen at merge clip and disc feature step: 16 not exist
…….."
According to literature, Human Endogenous Retroviruses (HERV) constitute a significant portion of the human genome, comprising approximately 98,000 ERV elements and fragments, accounting for 5–8%.
I am curious as to why the disc_tmp.list file is so small. Are the parameters settings too stringent, or are there any parameters that can be adjusted to increase its size?
thanks

simoncchu · 2024-03-08T14:56:57Z

@wxu005, not that many HERV are polymorphic, most of the copies are fixed and no "SV level difference" compared to the reference genome.

simoncchu closed this as completed Aug 2, 2021

simoncchu mentioned this issue Nov 9, 2021

TypeError: 'str' object does not support item assignment #25

Closed

eduymaz mentioned this issue May 5, 2023

HERV doesn't generate .vcf and gets error while reading reads_fa folder #82

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error in HERV #10

error in HERV #10

SIJAE-WOO commented Jul 27, 2021

simoncchu commented Jul 27, 2021

SIJAE-WOO commented Jul 28, 2021

simoncchu commented Jul 29, 2021

SIJAE-WOO commented Jul 30, 2021

simoncchu commented Aug 1, 2021

SIJAE-WOO commented Aug 1, 2021 •

edited

Loading

simoncchu commented Aug 2, 2021

dr-ashu-geno commented Dec 20, 2021 •

edited

Loading

simoncchu commented Dec 20, 2021

wxu005 commented Mar 4, 2024

simoncchu commented Mar 8, 2024

error in HERV #10

error in HERV #10

Comments

SIJAE-WOO commented Jul 27, 2021

simoncchu commented Jul 27, 2021

SIJAE-WOO commented Jul 28, 2021

simoncchu commented Jul 29, 2021

SIJAE-WOO commented Jul 30, 2021

simoncchu commented Aug 1, 2021

SIJAE-WOO commented Aug 1, 2021 • edited Loading

simoncchu commented Aug 2, 2021

dr-ashu-geno commented Dec 20, 2021 • edited Loading

simoncchu commented Dec 20, 2021

wxu005 commented Mar 4, 2024

simoncchu commented Mar 8, 2024

SIJAE-WOO commented Aug 1, 2021 •

edited

Loading

dr-ashu-geno commented Dec 20, 2021 •

edited

Loading