Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xTea Long_read SyntaxWarning #98

Closed
vaksmaz opened this issue Mar 11, 2024 · 6 comments
Closed

xTea Long_read SyntaxWarning #98

vaksmaz opened this issue Mar 11, 2024 · 6 comments

Comments

@vaksmaz
Copy link

vaksmaz commented Mar 11, 2024

I am running long read (nanopore) crams with the following commands:

BAMS=long_read_bam_list.txt
WFOLDER=/gpfs/commons/groups/compbio/vaksman/Long_read/test_COLO829-T/COLO829-T/
OUT_SCRTP=submit_jobs.sh
TIME=60:00
REF=/path to/GRCh38_full_analysis_set_plus_decoy_hla.fa
XTEA=/path to/xTea/xtea_long/
RMSK=/path to/xTea/rep_lib_annotation/LINE/hg38/hg38_L1_larger500_with_all_L1HS.out
CNS_L1=/path to/xTea/rep_lib_annotation/consensus/LINE1.fa
REP_LIB=/path to/xTea/rep_lib_annotation/
 
 python ${XTEA}"gnrt_pipeline_local_long_read_v38.py"  \
 -i ${SAMPLE_ID} -b ${BAMS} -p ${WFOLDER} -o ${OUT_SCRTP} \
 --xtea ${XTEA} \
 -n 16 -m 48 -t ${TIME} \
 -r ${REF} --rmsk ${RMSK} \
 --cns ${CNS_L1} --rep ${REP_LIB}  \
 --min 4000  -f 31 -y 15 --clean --fast \
--mei_no_asm --complex --slurm```

and I get the following massage:
Further the code does not run through just goes to sleep after 26 hours of running. No errors , Just sleeps

```/gpfs/commons/groups/compbio/vaksman/bin/xTea_long/xTea/xtea_long/x_reference.py:26: SyntaxWarning: "is" with a literal. Did you mean "=="?
  if len(chrm)>3 and ((chrm[:3] is "HLA") or (chrm[:3] is "HPV") or (chrm[:3] is "HIV") or (chrm[:3] is "CMV")
/gpfs/commons/groups/compbio/vaksman/bin/xTea_long/xTea/xtea_long/x_reference.py:26: SyntaxWarning: "is" with a literal. Did you mean "=="?
  if len(chrm)>3 and ((chrm[:3] is "HLA") or (chrm[:3] is "HPV") or (chrm[:3] is "HIV") or (chrm[:3] is "CMV")
/gpfs/commons/groups/compbio/vaksman/bin/xTea_long/xTea/xtea_long/x_reference.py:26: SyntaxWarning: "is" with a literal. Did you mean "=="?
  if len(chrm)>3 and ((chrm[:3] is "HLA") or (chrm[:3] is "HPV") or (chrm[:3] is "HIV") or (chrm[:3] is "CMV")
/gpfs/commons/groups/compbio/vaksman/bin/xTea_long/xTea/xtea_long/x_reference.py:26: SyntaxWarning: "is" with a literal. Did you mean "=="?
  if len(chrm)>3 and ((chrm[:3] is "HLA") or (chrm[:3] is "HPV") or (chrm[:3] is "HIV") or (chrm[:3] is "CMV")
/gpfs/commons/groups/compbio/vaksman/bin/xTea_long/xTea/xtea_long/x_reference.py:27: SyntaxWarning: "is" with a literal. Did you mean "=="?
  or (chrm[:3] is "CMV") or (chrm[:3] is "MCV") or (chrm[:2] is "SV") or (chrm[:4] is "KSHV")
/gpfs/commons/groups/compbio/vaksman/bin/xTea_long/xTea/xtea_long/x_reference.py:27: SyntaxWarning: "is" with a literal. Did you mean "=="?
  or (chrm[:3] is "CMV") or (chrm[:3] is "MCV") or (chrm[:2] is "SV") or (chrm[:4] is "KSHV")
/gpfs/commons/groups/compbio/vaksman/bin/xTea_long/xTea/xtea_long/x_reference.py:27: SyntaxWarning: "is" with a literal. Did you mean "=="?
  or (chrm[:3] is "CMV") or (chrm[:3] is "MCV") or (chrm[:2] is "SV") or (chrm[:4] is "KSHV")
/gpfs/commons/groups/compbio/vaksman/bin/xTea_long/xTea/xtea_long/x_reference.py:27: SyntaxWarning: "is" with a literal. Did you mean "=="?
  or (chrm[:3] is "CMV") or (chrm[:3] is "MCV") or (chrm[:2] is "SV") or (chrm[:4] is "KSHV")
/gpfs/commons/groups/compbio/vaksman/bin/xTea_long/xTea/xtea_long/x_reference.py:28: SyntaxWarning: "is" with a literal. Did you mean "=="?
  or (chrm[:5] is "decoy") or (chrm[:6] is "random")):#this is some special fields in the bam file
/gpfs/commons/groups/compbio/vaksman/bin/xTea_long/xTea/xtea_long/x_reference.py:28: SyntaxWarning: "is" with a literal. Did you mean "=="?
  or (chrm[:5] is "decoy") or (chrm[:6] is "random")):#this is some special fields in the bam file
/gpfs/commons/groups/compbio/vaksman/bin/xTea_long/xTea/xtea_long/l_rep_masker.py:489: SyntaxWarning: "is" with a literal. Did you mean "=="?
  if (s_polyA is self._s_no_polyA) and (s_TSD is "None"):
/gpfs/commons/groups/compbio/vaksman/bin/xTea_long/xTea/xtea_long/l_rep_masker.py:796: SyntaxWarning: "is" with a literal. Did you mean "=="?
  if s_rg is "":
/gpfs/commons/groups/compbio/vaksman/bin/xTea_long/xTea/xtea_long/l_output_fmt_parser.py:35: SyntaxWarning: "is" with a literal. Did you mean "=="?
  return s_chk is "None"
/gpfs/commons/groups/compbio/vaksman/bin/xTea_long/xTea/xtea_long/l_transduction.py:575: SyntaxWarning: "is not" with a literal. Did you mean "!="?
  if s_region1 is not "None":
/gpfs/commons/groups/compbio/vaksman/bin/xTea_long/xTea/xtea_long/l_transduction.py:577: SyntaxWarning: "is not" with a literal. Did you mean "!="?
  if s_region2 is not "None":```
@simoncchu
Copy link
Collaborator

I am not sure what's the depth of the file. If it is a sample of high depth, I would suggest trying with a larger memory.

@vaksmaz
Copy link
Author

vaksmaz commented Mar 19, 2024

An increase in memory worked. It is using lots of ram, is that OK. Also, can you give a bit of information on the Long_Read output (example, what does this mean chr41584154715847572L1HS0R511~116 (see below) ). Most things are self evident but some are a bit hard to interpret. Which column is for gene annotation. etc...

Also, does long-read support somatic analysis?

6	66703922	LINE1	internal_inversion	4242:5324:+	5328:6070:-	CTATATCTTCTGGTAGAAACCTCAAGGCAAAATCAATTAGATTTCCAACT	chr4~15841547~15847572~L1HS~0R~51~1~116	None	TTTTTGGGGTAAGACCAATTTTTTATTTTCCCCGTCTTTGTCTAGGTTTCTACTATTCTATTTTCTCCATGGAACATATCTTTGAACAGCTTAGACTGGTCAGATATCTGCAAGATTGCTTTTTTTTTTTTTTTTTTCTTTGAGAGTTGTTTTTTATTTATTTTTTTTTTTTTTATACTCTAAGTTTTAGGGTACATGTGCACATTGTGCAGGTTAGTTACATATGTATACATGTGCCATGCTGGTGCGCTGCACCCACTAATGTGTCATCTAGCATTAGGTATATCTCCCAATGCTATCCCTCCCCCCCCCCCCCCACCACAGTCCCCAGAGTGTGATATTCCCCTTCCTGTGTCCATGTGATCTCATTGTTCAATTCCCACCTATGAGTGAGAATATGCGGTGTTTGGTTTTTTGTTCTTGCGATAGTTTACTGAGAATGATGGTTTCCAATTTCATCCATGTCCCTACAAAGGATATGAACTCATCATTTTTTATGGCTGCATAGTATTCCATGGTGTATATGTGCCACATTTTCTTAATCCAGTCATCATTGTTGGACATTTGGGTTGGTTCCAAGCTTTGCTATTGTGAATAGTGCCGCAATAAACATACGTGTGCAGTGTCTTTATAGAAATGTTTATATCATTTGGGAAAAATTTCCCCAAAAGGGATGGCTGGGGCAAAAAGGGAAATTTTTAATTTAATCCCCGAGGAATCGCCACACTGACTTCCACAATGGTTGGAAATAGTTTACAGTCCCACCAACAGTGTAAAAGTGTTCCCCATTTCTCCACATCCTCTCCAGCACCTGTTGTTTCCTGACTTTTTAATGATTGCCATTCTAACTGGTGTGAGAACAAATTATGGGGGAACTCCCATTCACAATTGCTTCAAAGAGAATAAAATACCTAGGAATCCAACTTACAAGGGATGTGAAGGACCTCTTCAAGGAGAACTACAAACCACTGCTCAAGGAAATAAAAGAGGACACAAACAAATGGAAGAACATTCCATGCTCATGGGTAGGAAGAATCAATATCGTGAAAATGGCCATACTGCCCAAGGTAATTTACAGATTCAATGCCATCCCCATCAAGCTACCAATGACTTTCTTCACAGAATTGGAAAAAACTACTTTAAAGTTCATATGGAACCAAAAAAGAGCCCGCATTGCCAAGTCAATCCTAAGCCAAAAGAACAAAGCTGGAGGCATCACACTACCTGACTTCAAACTATACTACAAGGCTACAGTAACCAAAACAGCATGGTACTGGTACCAAAACAGAGATATAGACCCAAAGGAACAGAACAGAGCCCTCAGAAATAATGCCGCATATCTACAACTATCTGATCTTTACAAACCTGGAAAAAAAAAGCAATGGGGAAAGGATTCCCTATTTAATAAATGGTGCTGGGAAAACTGGCTAGCCATATGTAGAAAGCTGAAACTGGATCCCTTCCTTACACCTTATACAAAAATCAATTCAAGATGGATTAAAGATTTAAACGTTAAACCTAAAACCATAAAAACCCTAGAAGAAAACCTAGGCATTACCATTCAGGACATAGGCGTGGGCAAGGACTTCATGTCCAAAACACCAAAAGCAATGGCAACAAAAGACAAAATTGACAAATGGGATCTAATTAAACTAAAGAGCTTCTGCACAGCAAAAGAAACTACCATCAGAGTGAACAGGCAACCTACAACATGGGAGAAAATTTTTGCAACCTACTCATCTGACAAAGGGCTAATATCCAGAATCTACAATGAACTCAAACAAATTTACAAGAAAAAAACAAACAACCCCATCAAAAAGTGGGCGAAGGTCATGAACAGACACTTCTCAAAAGAAGACATTTATGCAGCCAAAAAACACATGAAGAAATGCTCATCATCACTGGCCATCAGAGAAATGCAAATCAAAACCACTATGAGATACTATATCTTCTGGTAGAAACCTCAAGGCAAAATCAATTAGATTTCCAACT	GCAATCTTGCAGATATCTGACCAGTCTAAGCTGTTCAAAGATATGTTCCATGGAGAAAATAGAATAGTAGAAACCTAGACAAAGACGGGGAAAATAAAAAATTGGTCTTACCCCAA	None	clip_seq_with_polyA```

@simoncchu
Copy link
Collaborator

I have some notes here: https://github.com/parklab/xTea_paper/tree/main/run_tools/xTea/HG002 for the columns.

Long-read module doesn't have a somatic mode, which means it outputs combined germline and potential somatic. Also, it used a local assembly step, thus will miss the low VAF ones if the assembly step failed because of the low depth.

@vaksmaz
Copy link
Author

vaksmaz commented Mar 19, 2024

And how do I interpret this chr4~15841547~15847572~L1HS~0R~51~1~116 in the example above?

@simoncchu
Copy link
Collaborator

simoncchu commented Mar 19, 2024

It indicates this insertion contains a transduction region from a reference L1HS copy chr4:15841547-15847572, and the length of the transduction sequence is 116bp.

@vaksmaz
Copy link
Author

vaksmaz commented Mar 19, 2024

Thank you very much for all the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants