Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GISAID XT recombinant not detected by sc2rf #31

Open
BenjaminDelisle opened this issue May 16, 2022 · 4 comments
Open

GISAID XT recombinant not detected by sc2rf #31

BenjaminDelisle opened this issue May 16, 2022 · 4 comments

Comments

@BenjaminDelisle
Copy link

Hi, I've noticed that sc2rf.py (version sc2rf-7427d2f94b69c965362034c2597b643c5dfaa1cf) could not find any recombination for XT samples available on GISAID python sc2rf.py nextclade.aligned_XT_Gisaid.fasta. Here are the available aligned sequences.
nextclade.aligned_XT_Gisaid.txt

Nextclade:
image
sc2rf:
image

Thanks for looking into this and other lineages that might be in the same situation.

@ktmeaton
Copy link

ktmeaton commented May 16, 2022

Hi @BenjaminDelisle,

I think for XT, the flag --unique 1 is required, because there is only 1 SNP contributed by a BA.1 parent (A26530G). For added confidence, the flag --enable-deletions is helpful. This way, we can confirm that the 3' end of the genome is from BA.1, as it lacks the S2M deletion (29734:29759) which defines BA.2 (and it's descendants BA.3, BA.4, BA.5).

With these parameters, I get a breakpoint interval of 26061:26529 for XT, which is very close to cov-lineages/pango-designation#478 (26062:26528).

git clone https://github.com/lenaschimmel/sc2rf.git
cd sc2rf
git checkout 7427d2f94b69c96536
python3 sc2rf.py nextclade.aligned_XT_Gisaid.txt --unique 1 --enable-deletions

image

@BenjaminDelisle
Copy link
Author

Thanks for finding this @ktmeaton ! This precludes a straitghtforward implementation of sc2rf in our pipeline. We will have to reflect on this

@ktmeaton
Copy link

This precludes a straightforward implementation of sc2rf in our pipeline.

The recombinants with very few alleles contributed by a donor (ex. XP, XT) are extremely difficult to detect systematically without introducing a large number of false positives :(
Not that I recommend this... but I've got some gnarly post-processing in my fork of sc2rf. It's an example of one way of tackling this problem, but not a rigorously tested solution.

  1. Clone the fork

    git clone https://github.com/ktmeaton/sc2rf.git sc2rf-ktmeaton
    cd sc2rf-ktmeaton
  2. Install post-processing dependencies.

    pip install pandas click
  3. Run sc2rf with highly-sensitive parameters.

    python3 sc2rf.py nextclade.aligned_XT_Gisaid.txt \
      --parents 2-4 \
      --breakpoints 0-4 \
      --unique 1 \
      --max-ambiguous 20 \
      --max-intermission-length 3 \
      --max-intermission-count 3 \
      --csvfile XT.csv \
      --ignore-shared

    image

  4. Post-process the CSV

    python3 postprocess.py --csv XT.csv --prefix XT
  5. Post-processed table is XT.tsv

strain sc2rf_parents sc2rf_regions sc2rf_breakpoints sc2rf_num_breakpoints sc2rf_regions_length
hCoV-19/South_Africa/NICD-N33091/2021 Omicron/BA.2,Omicron/BA.1 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 26061:26529 1 25390,2980
hCoV-19/South_Africa/NICD-N33825/2022 Omicron/BA.2,Omicron/BA.1 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 26061:26529 1 25390,2980
hCoV-19/South_Africa/NICD-N33849/2022 Omicron/BA.2,Omicron/BA.1 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 26061:26529 1 25390,2980
hCoV-19/South_Africa/NICD-N35577/2022 Omicron/BA.2,Omicron/BA.1 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 26061:26529 1 25390,2980
hCoV-19/South_Africa/NICD-N36231/2022 Omicron/BA.2,Omicron/BA.1 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 26061:26529 1 25390,2980
hCoV-19/South_Africa/NICD-N37349/2022 Omicron/BA.2,Omicron/BA.1 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 26061:26529 1 25390,2980
hCoV-19/South_Africa/NCV1024/2022 Omicron/BA.2,Omicron/BA.1 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 26061:26529 1 25390,2980
hCoV-19/South_Africa/NICD-N37519/2022 Omicron/BA.2,Omicron/BA.1 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 26061:26529 1 25390,2980
hCoV-19/South_Africa/NICD-N37608/2022 Omicron/BA.2,Omicron/BA.1 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 26061:26529 1 25390,2980
hCoV-19/South_Africa/NICD-N37626/2022 Omicron/BA.2,Omicron/BA.1 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 26061:26529 1 25390,2980
hCoV-19/South_Africa/NICD-CRDM03060/2022 Omicron/BA.2,Omicron/BA.1 670:26060|Omicron/BA.2,26530:29510|Omicron/BA.1 26061:26529 1 25390,2980

@BenjaminDelisle
Copy link
Author

Thanks @ktmeaton ! Will have a look into this shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants