Rolling results #2

mbhall88 · 2022-09-27T02:52:19Z

This issue will document the rolling results.

The first sneak peak is all 437 Nanopore isolates and 400 Illumina isolates (selected at random).

Nanopore

Drug	Tool	FN(R)	FP(S)	Sensitivity (95% CI)	Specificity (95% CI)	MCC
Amikacin	Drprg	1(15)	3(157)	93.3% (70.2-98.8%)	98.1% (94.5-99.3%)	0.8642875644329534
Amikacin	Mykrobe	1(15)	3(157)	93.3% (70.2-98.8%)	98.1% (94.5-99.3%)	0.8642875644329534
Amikacin	Tbprofiler	1(15)	3(157)	93.3% (70.2-98.8%)	98.1% (94.5-99.3%)	0.8642875644329534
Capreomycin	Drprg	1(3)	1(64)	66.7% (20.8-93.9%)	98.4% (91.7-99.7%)	0.6510416666666666
Capreomycin	Mykrobe	1(3)	1(64)	66.7% (20.8-93.9%)	98.4% (91.7-99.7%)	0.6510416666666666
Capreomycin	Tbprofiler	1(3)	1(64)	66.7% (20.8-93.9%)	98.4% (91.7-99.7%)	0.6510416666666666
Ethambutol	Drprg	7(29)	21(328)	75.9% (57.9-87.8%)	93.6% (90.4-95.8%)	0.5830010462966467
Ethambutol	Mykrobe	6(29)	22(328)	79.3% (61.6-90.2%)	93.3% (90.1-95.5%)	0.5975951983859127
Ethambutol	Tbprofiler	7(29)	22(328)	75.9% (57.9-87.8%)	93.3% (90.1-95.5%)	0.5747241429661474
Ethionamide	Drprg	26(30)	1(86)	13.3% (5.3-29.7%)	98.8% (93.7-99.8%)	0.2624057235284411
Ethionamide	Mykrobe	5(30)	57(86)	83.3% (66.4-92.7%)	33.7% (24.6-44.2%)	0.16405763204424978
Ethionamide	Tbprofiler	14(30)	4(86)	53.3% (36.1-69.8%)	95.3% (88.6-98.2%)	0.5643248464313276
Isoniazid	Drprg	35(117)	3(265)	70.1% (61.3-77.6%)	98.9% (96.7-99.6%)	0.7641591750285314
Isoniazid	Mykrobe	18(117)	45(265)	84.6% (77.0-90.0%)	83.0% (78.0-87.1%)	0.6432989433455073
Isoniazid	Tbprofiler	21(117)	4(265)	82.1% (74.1-88.0%)	98.5% (96.2-99.4%)	0.8445257661394283
Kanamycin	Drprg	0(3)	2(118)	100.0% (43.9-100.0%)	98.3% (94.0-99.5%)	0.7680042372764464
Kanamycin	Mykrobe	0(3)	2(118)	100.0% (43.9-100.0%)	98.3% (94.0-99.5%)	0.7680042372764464
Kanamycin	Tbprofiler	0(3)	2(118)	100.0% (43.9-100.0%)	98.3% (94.0-99.5%)	0.7680042372764464
Moxifloxacin	Drprg	0(0)	1(1)	-	0.0% (0.0-79.3%)	-
Moxifloxacin	Mykrobe	0(0)	1(1)	-	0.0% (0.0-79.3%)	-
Moxifloxacin	Tbprofiler	0(0)	1(1)	-	0.0% (0.0-79.3%)	-
Ofloxacin	Drprg	13(15)	1(158)	13.3% (3.7-37.9%)	99.4% (96.5-99.9%)	0.27378347692948213
Ofloxacin	Mykrobe	3(15)	4(158)	80.0% (54.8-93.0%)	97.5% (93.7-99.0%)	0.7524691275756947
Ofloxacin	Tbprofiler	3(15)	3(158)	80.0% (54.8-93.0%)	98.1% (94.6-99.4%)	0.7810126582278482
Pyrazinamide	Drprg	16(28)	3(243)	42.9% (26.5-60.9%)	98.8% (96.4-99.6%)	0.5540455669886942
Pyrazinamide	Mykrobe	10(28)	5(243)	64.3% (45.8-79.3%)	97.9% (95.3-99.1%)	0.6796400181154288
Pyrazinamide	Tbprofiler	12(28)	6(243)	57.1% (39.1-73.5%)	97.5% (94.7-98.9%)	0.6093260891549527
Rifampicin	Drprg	8(77)	6(287)	89.6% (80.8-94.6%)	97.9% (95.5-99.0%)	0.8837166974977075
Rifampicin	Mykrobe	6(77)	6(287)	92.2% (84.0-96.4%)	97.9% (95.5-99.0%)	0.9011719987329744
Rifampicin	Tbprofiler	75(77)	1(287)	2.6% (0.7-9.0%)	99.7% (98.1-99.9%)	0.10159114367294636
Streptomycin	Drprg	9(55)	16(126)	83.6% (71.7-91.1%)	87.3% (80.4-92.0%)	0.6875051115519748
Streptomycin	Mykrobe	7(55)	38(126)	87.3% (76.0-93.7%)	69.8% (61.3-77.2%)	0.526015018770466
Streptomycin	Tbprofiler	15(55)	19(126)	72.7% (59.8-82.7%)	84.9% (77.6-90.1%)	0.5656453811464112

Next avenues of investigation:

Figure out what has happened to cause the crazy dropout in tbprofiler RIF sensitivity - there must be a problem with the panel I created for it
There is something weird going on with the PZA, FLQ and ETO sensitivities for drprg. Will also look into INH sensitivity

Positives:

STM

Illumina

Drug	Tool	FN(R)	FP(S)	Sensitivity (95% CI)	Specificity (95% CI)	MCC
Amikacin	Drprg	0(15)	1(166)	100.0% (79.6-100.0%)	99.4% (96.7-99.9%)	0.9653250279768749
Amikacin	Mykrobe	1(15)	1(166)	93.3% (70.2-98.8%)	99.4% (96.7-99.9%)	0.9273092369477912
Amikacin	Tbprofiler	0(15)	1(166)	100.0% (79.6-100.0%)	99.4% (96.7-99.9%)	0.9653250279768749
Capreomycin	Drprg	2(13)	4(93)	84.6% (57.8-95.7%)	95.7% (89.5-98.3%)	0.7558571990231637
Capreomycin	Mykrobe	3(13)	4(93)	76.9% (49.7-91.8%)	95.7% (89.5-98.3%)	0.70359611692274
Capreomycin	Tbprofiler	2(13)	4(93)	84.6% (57.8-95.7%)	95.7% (89.5-98.3%)	0.7558571990231637
Delamanid	Drprg	1(1)	0(93)	0.0% (0.0-79.3%)	100.0% (96.0-100.0%)	-
Delamanid	Mykrobe	1(1)	0(93)	0.0% (0.0-79.3%)	100.0% (96.0-100.0%)	-
Delamanid	Tbprofiler	1(1)	0(93)	0.0% (0.0-79.3%)	100.0% (96.0-100.0%)	-
Ethambutol	Drprg	5(53)	16(229)	90.6% (79.7-95.9%)	93.0% (89.0-95.7%)	0.7795344823612268
Ethambutol	Mykrobe	5(53)	17(229)	90.6% (79.7-95.9%)	92.6% (88.4-95.3%)	0.7712443312992736
Ethambutol	Tbprofiler	5(53)	18(229)	90.6% (79.7-95.9%)	92.1% (87.9-95.0%)	0.7631197122668696
Ethionamide	Drprg	20(37)	11(127)	45.9% (31.0-61.6%)	91.3% (85.2-95.1%)	0.4141740735523773
Ethionamide	Mykrobe	11(37)	15(127)	70.3% (54.2-82.5%)	88.2% (81.4-92.7%)	0.5643018224168724
Ethionamide	Tbprofiler	10(37)	16(127)	73.0% (57.0-84.6%)	87.4% (80.5-92.1%)	0.5737592501981046
Isoniazid	Drprg	17(135)	5(218)	87.4% (80.8-92.0%)	97.7% (94.7-99.0%)	0.868118053524601
Isoniazid	Mykrobe	14(135)	7(218)	89.6% (83.3-93.7%)	96.8% (93.5-98.4%)	0.8735871080954598
Isoniazid	Tbprofiler	10(135)	7(218)	92.6% (86.9-95.9%)	96.8% (93.5-98.4%)	0.8977596305525896
Kanamycin	Drprg	4(25)	2(168)	84.0% (65.3-93.6%)	98.8% (95.8-99.7%)	0.8582554180919595
Kanamycin	Mykrobe	6(25)	2(168)	76.0% (56.6-88.5%)	98.8% (95.8-99.7%)	0.8066918414409607
Kanamycin	Tbprofiler	5(25)	2(168)	80.0% (60.9-91.1%)	98.8% (95.8-99.7%)	0.8327103314106743
Levofloxacin	Drprg	18(28)	5(145)	35.7% (20.7-54.2%)	96.6% (92.2-98.5%)	0.42231266491410374
Levofloxacin	Mykrobe	2(28)	10(145)	92.9% (77.4-98.0%)	93.1% (87.8-96.2%)	0.7799214704762708
Levofloxacin	Tbprofiler	1(28)	11(145)	96.4% (82.3-99.4%)	92.4% (86.9-95.7%)	0.7903590726235468
Linezolid	Drprg	0(0)	0(128)	-	100.0% (97.1-100.0%)	-
Linezolid	Mykrobe	0(0)	0(128)	-	100.0% (97.1-100.0%)	-
Linezolid	Tbprofiler	0(0)	0(128)	-	100.0% (97.1-100.0%)	-
Moxifloxacin	Drprg	11(15)	8(126)	26.7% (10.9-52.0%)	93.7% (88.0-96.7%)	0.2244992239690201
Moxifloxacin	Mykrobe	2(15)	18(126)	86.7% (62.1-96.3%)	85.7% (78.5-90.8%)	0.5388625547887866
Moxifloxacin	Tbprofiler	2(15)	20(126)	86.7% (62.1-96.3%)	84.1% (76.8-89.5%)	0.5155328733959855
Ofloxacin	Drprg	3(3)	0(5)	0.0% (0.0-56.1%)	100.0% (56.6-100.0%)	-
Ofloxacin	Mykrobe	0(3)	0(5)	100.0% (43.9-100.0%)	100.0% (56.6-100.0%)	1.0
Ofloxacin	Tbprofiler	0(3)	0(5)	100.0% (43.9-100.0%)	100.0% (56.6-100.0%)	1.0
Pyrazinamide	Drprg	9(23)	0(132)	60.9% (40.8-77.8%)	100.0% (97.2-100.0%)	0.7548792871746883
Pyrazinamide	Mykrobe	2(23)	2(132)	91.3% (73.2-97.6%)	98.5% (94.6-99.6%)	0.8978919631093544
Pyrazinamide	Tbprofiler	2(23)	2(132)	91.3% (73.2-97.6%)	98.5% (94.6-99.6%)	0.8978919631093544
Rifampicin	Drprg	8(118)	4(240)	93.2% (87.2-96.5%)	98.3% (95.8-99.4%)	0.9237938244724586
Rifampicin	Mykrobe	6(118)	4(240)	94.9% (89.3-97.6%)	98.3% (95.8-99.4%)	0.936595807066951
Rifampicin	Tbprofiler	117(118)	1(240)	0.8% (0.1-4.6%)	99.6% (97.7-99.9%)	0.0271689721964718
Streptomycin	Drprg	12(42)	1(64)	71.4% (56.4-82.8%)	98.4% (91.7-99.7%)	0.7512240395539047
Streptomycin	Mykrobe	8(42)	5(64)	81.0% (66.7-90.0%)	92.2% (83.0-96.6%)	0.7418210904541338
Streptomycin	Tbprofiler	8(42)	2(64)	81.0% (66.7-90.0%)	96.9% (89.3-99.1%)	0.8037977341533646

Next avenues of investigation:

I'll probably wait until the Nanopore stuff is debugged and then run on a larger sample of data before debugging Illumina

mbhall88 · 2022-10-04T05:50:28Z

I took a look at the INH FNs today. For those where mykrobe made a TP call (so I can see which mutation we missed) 13/16 were use missing fabG1 C-15T promoter mutation. drprg called this variant, for all of those 13, but they were filtered by the fraction of read support (FRS) filter - which is set to 0.70. Nearly all of those mutations had an FRS of 0.58-0.64.

The reason for this is the alleles are quite similar, and I suspect maybe some shared minimizers are wreaking havoc here.

An example VCF record showing the alleles and coverage

fabG1   81      555cbd3d        CGAGACGATAGGT   CGAGACGATAGGC,CGAGATGATAGGT,TGAGACGATAGGT       .       frs     VC=PH_SNPs;GRAPHTYPE=SIMPLE;VARID=fabG1_G-17T,fabG1_A-16X,fabG1_C-15X,fabG1_T-8X;PREDICT=S,S,R,S      GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF      2:13,16,22,3:11,13,27,2:7,14,19,0:4,8,30,1:82,82,134,14:71,68,164,10:0.5,0.4,0,0.75:-441.21,-406.243,-273.794,-577.451:132.448

These dont get collapsed by make PRG because the minimum match lengths between the three variants described by this allele are 4 and 6 (we use min match len of 7).

The other interesting thing is that each time, the allele with the next best coverage is allele 1 which differs in two positions from allele 2 (middle and end), so I reckon there's a minimizer that covers the start of this allele before the two alleles differ.

Not sure whether the "hacky" way of decreasing the FRS threshold is the best way to go? Or changing some parameters in make prg or pandora...

iqbal-lab · 2022-10-04T06:07:10Z

Could drop min match length also?
In our covid work using pileups, we find frs of 0.7 is too high just because of noise in the reads

mbhall88 · 2022-10-04T22:24:42Z

Interesting. What FRS have you been using in your covid work?

iqbal-lab · 2022-10-04T23:04:05Z

Well, we've just shifted to 0.6 for nanopore, but now we're distracted fixing bugs before going back to carefully choose FRS thresholds

mbhall88 · 2022-10-06T04:08:43Z

Okay, so after changing the minimum match length to 5 and the minimum FRS to 0.60, there are only two FNs that mykrobe calls that we don't. One of those is an indel which fails the FRS filter at 0.59 and the other is a dodgey looking indel call from mykrobe that isn't called by tbprofiler or drprg, so I'm not phased about that. Interestingly, that sample has a synonymous SNP in the first codon.

The allele at that fabG1 variant now looks slightly better and has been split in two

fabG1   81      8ca378d9        CGAGAC  CGAGAT,TGAGAC   .       PASS    VC=PH_SNPs;GRAPHTYPE=SIMPLE;VARID=fabG1_G-17T,fabG1_A-16X,fabG1_C-15X;PREDICT=S,S,R     GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF      1:11,16,6:12,25,2:9,15,0:5,28,1:71,101,25:75,155,11:0.5,0,0.75:-281.052,-151.402,-388.948:129.65
fabG1   93      fa7956b3        T       C       .       ld;sb   VC=SNP;GRAPHTYPE=SIMPLE;VARID=fabG1_T-8X;PREDICT=S      GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF      0:0,0:1,0:0,0:1,0:0,0:2,0:1,1:-129.795,-138.605:8.80986

Next task is to dig into the poor ofloxacin sensitivity.

mbhall88 · 2022-10-06T05:46:45Z

The OFX FNs are a fairly straightforward fix. It turns out that all of the FNs are gyrA D94X. What is happening here is that there is a silent mutation (does not confer resistance) at codon 95 (S95T) that occurs in the same allele as that variant in the VCF/PRG. Long story short, I end up combining these two variants and calling an unknown prediction for OFX with novel variant gyrA_DS94GT. So I just need to break these up and check whether any of them are in the panel and associated with resistance. Should hopefully finish the implementation tomorrow.

iqbal-lab · 2022-10-06T06:05:17Z

Great!

mbhall88 · 2022-10-10T05:37:19Z

Here's the updated plots after the INH and OFX fixes listed above

Nanopore

Illumina

Some good improvements for nanopore. I'm going to have a look at the drprg STM, ETO, PZA and RIF sensitivity as mykrobe seems to be better than drprg. But pretty happy with the specificity of drprg at the moment.

lachlancoin · 2022-10-10T05:51:57Z

Looks good. Not sure what going on with TB profiler with RIF

…

On Mon, 10 Oct 2022 at 16:37, Michael Hall ***@***.***> wrote: Here's the updated plots after the INH and OFX fixes listed above Nanopore [image: image] <https://user-images.githubusercontent.com/20403931/194803916-6afb90ec-f32d-463d-98e3-a6e342a4ec48.png> Illumina [image: image] <https://user-images.githubusercontent.com/20403931/194803927-c55a2ad4-5264-4a70-913e-6da61b54a33e.png> ------------------------------ Some good improvements for nanopore. I'm going to have a look at the drprg STM, ETO, PZA and RIF sensitivity as mykrobe seems to be better than drprg. But pretty happy with the specificity of drprg at the moment. — Reply to this email directly, view it on GitHub <#2>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA6TKZGF6CM36LGQH63SLBTWCOTRVANCNFSM6AAAAAAQWLHP7A> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

mbhall88 · 2022-10-10T05:55:30Z

Looks good. Not sure what going on with TB profiler with RIF

It's most likely an issue with the custom panel I built. I'll fix that up once I've finished debugging drprg. I'm assuming tb profiler is on par with mykrobe for RIF though

iqbal-lab · 2022-10-10T06:12:18Z

Looks great. I am actually surprised how good the specificity is for PZA nanopore, given it is dominated by indels. I had thought we had issues with indels

mbhall88 · 2022-10-11T03:42:26Z

I've now gone through the remainder of the FNs and nearly all of the FPs for nanopore.

RIF

2 FNs where drprg didn't discover the variant

4 FPs where all three tools calls rpoB L430X
1 FP where mykrobe and drprg call rpoB L452X
1 FP that just scraped through the low depth cutoff of 3 in drprg - it had a depth of 3.

STM

1 FN where drprg calls 2 non-synonymous mutations (not in the panel) - these are also called by tb-profiler
1 FN where drprg calls the correct variant rpsL K43R but fails FRS (0.52)

14 FPs are called by all three callers. 9 were mutations in gid, 3 were in rrs, and 2 in rpsL. I'm not sure what we want to do here, because it is fairly reasonable this is a phenotyping problem. After a quick search, I found two references to support this [1, 2]. From 1

low-level streptomycin resistance mediated by gidB were frequently misclassified with respect to streptomycin resistance when using the WHO-recommended critical concentration of 2 μg/ml.

2 FPs were rpsL K88R which is very strongly associated with STM resistance. These were also called by mykrobe but not tb-profiler.

2 FPs were confident deletions in gid only called by drprg. mykrobe called one of them, but it was filtered due to low expected proportion of expected depth.

ETO

In total, there were 9 FNs which were all called by mykrobe, but not tb-profiler or drprg. They're all indels in ethA and de novo variant discovery was not triggered in drprg for any of them. In one of those FNs, there was a promoter mutation as well, which drprg did call, but it was filtered out for low FRS (0.55).

The other thing to note here is while mykrobe's sensitivity is much better than drprg and tbprofiler, it's specificity is terrible.

PZA

1 FN is pncA R154G which is called by mykrobe and tb-profiler. drprg calls the correct allele, but it is filtered out for low FRS (0.54)
2 FNs are indels called by mykrobe only. Both indels were null genotype (drprg calls this F for failed) as the depth was split evenly across two alleles.

INH

4 FPs were called confidently by all three tools (fabG1 C-15T)
3 FPs are deletions called by drprg not called by either tool. They had below 10x depth on drprg so probably not super confident

I don't think there is much more I can do to improve drprg's nanopore performance here. FRS could possibly be lowered? but it would only save a small number of FN/FPs

I'll fix up the tb-profiler RIF sensitivity and then get stuck into the Illumina results

(This text file is my notepad while I was investigating these FNs and FPs)

drprg_nanopore_fn_fp_investigation.txt

mbhall88 · 2022-10-13T06:03:10Z

As our specificity is better or the same compared to mykrobe and tbprofiler for all drugs on Illumina, I only investigated the FNs.

tl;dr there are three things we may want to try to improve sensitivity as I suspect once we scale this analysis up to thousands of samples some of these problems will get bigger

there are a noticeable amount of variants where de novo discovery in pandora cannot find a path between the start and end kmer of a candidate region. All of the FNs that we miss and mykrobe and/or tbprofiler get are due to this. Along with some RIF and ethionamide (ETO) FNs. The solution to this seems to be switching to Leandro's racon version of variant discovery in pandora. However, this is not very easy as that fork the current tip of pandora have diverged quite a bit. @iqbal-lab do you think it's realistic that Leandro could try and get this on master in the next few weeks? Or will I have to try and do it myself?
There are 3 FNs in total that fail FRS with values 0.54, 0.53, and 0.58. Maybe we look to lower it even further? Does 0.50 or 0.51 sound fair?
tbprofiler has an unfair advantage over mykrobe (and drprg). There were quite a few minor allele resistance calls. I have put mykrobe in haploid mode so it won't call minor resistance. drprg can't call minor resistance either way. Should I let mykrobe call minor variants on Illumina or make the minor allele frequency 0.5 for tbprofiler?

Aside from those overarching points, there were also some other FNs which were due to two variants right next to each other. For example, ERR2510154 has rpoB_S450F, which is actually caused by a 2bp MNP. One of these positions exists in the reference PRG drprg uses. The other variant gets discovered, but gets added in as a separate allele (bubble) in the PRG - I guess this is how make_prg update works though? Here it is

rpoB    1449    fa44b92a        C       T       .       ld;lgc  VC=SNP;GRAPHTYPE=SIMPLE;VARID=rpoB_ACTGTCGGCG1344A,rpoB_GTC1347G,rpoB_TC1348T,rpoB_S450X,rpoB_TCG1348T,rpoB_S450*,rpoB_C1349CA,rpoB_C1349CAA,rpoB_C1349CAC,rpoB_C1349CAG,rpoB_C1349CAT,rpoB_C1349CC,rpoB_C1349CCA,rpoB_C1349CCC,rpoB_C1349CCG,rpoB_C1349CCT,rpoB_C1349CG,rpoB_C1349CGA,rpoB_C1349CGC,rpoB_C1349CGG,rpoB_C1349CGT,rpoB_C1349CT,rpoB_C1349CTA,rpoB_C1349CTC,rpoB_C1349CTG,rpoB_C1349CTT,rpoB_CG1349C,rpoB_CGG1349C;PREDICT=F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F        GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF      .:0,0:0,0:0,0:0,0:0,0:0,1:1,1:-278,-278:0
rpoB    1450    9fbca785        G       T       .       ld;lgc  VC=SNP;GRAPHTYPE=SIMPLE;VARID=rpoB_ACTGTCGGCG1344A,rpoB_S450X,rpoB_TCG1348T,rpoB_S450*,rpoB_CG1349C,rpoB_CGG1349C,rpoB_G1350GA,rpoB_G1350GAA,rpoB_G1350GAC,rpoB_G1350GAG,rpoB_G1350GAT,rpoB_G1350GC,rpoB_G1350GCA,rpoB_G1350GCC,rpoB_G1350GCG,rpoB_G1350GCT,rpoB_G1350GG,rpoB_G1350GGA,rpoB_G1350GGC,rpoB_G1350GGG,rpoB_G1350GGT,rpoB_G1350GT,rpoB_G1350GTA,rpoB_G1350GTC,rpoB_G1350GTG,rpoB_G1350GTT,rpoB_GG1350G,rpoB_GGC1350G;PREDICT=F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F        GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF      .:0,0:0,0:0,0:0,0:0,0:0,0:1,1:-278,-278:0

The allele of this variant is TCG>TTT so pandora should be able to thread reads through both of these variants, but doesn't seem to be able to....??

A similar thing happened for a few other FNs. I'm wondering if I should run on the full dataset and then manually add to the reference PRG some of the common variants that cause this problem? I was thinking the "correct" way to do this @iqbal-lab would be to look through the cryptic metadata sheets and add some samples that contain those variants that cause some of these problems?

drprg_illumina_fn_investigation.txt

iqbal-lab · 2022-10-13T09:32:15Z

Hi there, I'm a bit wiped out so will be brief

Leandro is buried in 2 projects (plasmid stuff using Pandora, and Karel's mof prpject) and trying to extricate from the latter, so I think it would be hard for him to merge the racon branch soon. I must admit, I had forgotten it wasn't merged v sorry Michael. I think it would be great if you could do it, and I think this could help a lot, and might make redundant the issue about adjacent variants, as racon just does the whole gene .
I agree use 0.51, we've moved to that with covid.
I how about return to minor alleles after addressing the above?

In terms of adding more variants to the graph; we can do this, but racon might mean you don't need to

mbhall88 · 2022-10-19T04:33:08Z

I've just realised, we should probably put some kind of minimum depth filter on these results too. i.e. samples with less than d depth are excluded from the sensitivity/specificity plots.

Does everyone agree? If so, does anyone have a preference for what d should be? I arbitrarily thought of 15x? (This is separate from the depth analysis in #3)

Here is the depth distribution for the 400 Illumina test set

and the full nanopore

Additionally, it might be wise to have a contamination proportion filter? For instance, when I align the reads to the decontamination database, I calculate the fraction of reads that we keep (i.e. MTB), fraction of reads that align to a contaminant, and a fraction of reads unmapped. Again, arbitrarily was thinking exclude samples with more than 5% contamination? Too harsh?

This is the fraction of contamination for Illumina

and nanopore

iqbal-lab · 2022-10-19T06:14:15Z

Both seem perfectly reasonable to me

mbhall88 · 2022-10-28T01:18:43Z

So I have a working drprg branch adapted to use pandora with the racon denovo method (iqbal-lab-org/pandora#299).

I've tested it out on two Illumina runs listed in #2.

The first, ERR2510154, was the example VCF above. So, with the old pandora denovo process, at the allele for rpoB_S450F we had

rpoB    1449    fa44b92a        C       T       .       ld;lgc  VC=SNP;GRAPHTYPE=SIMPLE;VARID=rpoB_ACTGTCGGCG1344A,rpoB_GTC1347G,rpoB_TC1348T,rpoB_S450X,rpoB_TCG1348T,rpoB_S450*,rpoB_C1349CA,rpoB_C1349CAA,rpoB_C1349CAC,rpoB_C1349CAG,rpoB_C1349CAT,rpoB_C1349CC,rpoB_C1349CCA,rpoB_C1349CCC,rpoB_C1349CCG,rpoB_C1349CCT,rpoB_C1349CG,rpoB_C1349CGA,rpoB_C1349CGC,rpoB_C1349CGG,rpoB_C1349CGT,rpoB_C1349CT,rpoB_C1349CTA,rpoB_C1349CTC,rpoB_C1349CTG,rpoB_C1349CTT,rpoB_CG1349C,rpoB_CGG1349C;PREDICT=F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F        GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF      .:0,0:0,0:0,0:0,0:0,0:0,1:1,1:-278,-278:0
rpoB    1450    9fbca785        G       T       .       ld;lgc  VC=SNP;GRAPHTYPE=SIMPLE;VARID=rpoB_ACTGTCGGCG1344A,rpoB_S450X,rpoB_TCG1348T,rpoB_S450*,rpoB_CG1349C,rpoB_CGG1349C,rpoB_G1350GA,rpoB_G1350GAA,rpoB_G1350GAC,rpoB_G1350GAG,rpoB_G1350GAT,rpoB_G1350GC,rpoB_G1350GCA,rpoB_G1350GCC,rpoB_G1350GCG,rpoB_G1350GCT,rpoB_G1350GG,rpoB_G1350GGA,rpoB_G1350GGC,rpoB_G1350GGG,rpoB_G1350GGT,rpoB_G1350GT,rpoB_G1350GTA,rpoB_G1350GTC,rpoB_G1350GTG,rpoB_G1350GTT,rpoB_GG1350G,rpoB_GGC1350G;PREDICT=F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F        GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF      .:0,0:0,0:0,0:0,0:0,0:0,0:1,1:-278,-278:0

with the new denovo process, we get

rpoB    1449    ba1603d4        CG      TG,TT   .       PASS    VC=PH_SNPs;GRAPHTYPE=SIMPLE;VARID=rpoB_ACTGTCGGCG1344A,rpoB_GTC1347G,rpoB_TC1348T,rpoB_S450X,rpoB_TCG1348T,rpoB_S450*,rpoB_C1349CA,rpoB_C1349CAA,rpoB_C1349CAC,rpoB_C1349CAG,rpoB_C1349CAT,rpoB_C1349CC,rpoB_C1349CCA,rpoB_C1349CCC,rpoB_C1349CCG,rpoB_C1349CCT,rpoB_C1349CG,rpoB_C1349CGA,rpoB_C1349CGC,rpoB_C1349CGG,rpoB_C1349CGT,rpoB_C1349CT,rpoB_C1349CTA,rpoB_C1349CTC,rpoB_C1349CTG,rpoB_C1349CTT,rpoB_CG1349C,rpoB_CGG1349C,rpoB_G1350GA,rpoB_G1350GAA,rpoB_G1350GAC,rpoB_G1350GAG,rpoB_G1350GAT,rpoB_G1350GC,rpoB_G1350GCA,rpoB_G1350GCC,rpoB_G1350GCG,rpoB_G1350GCT,rpoB_G1350GG,rpoB_G1350GGA,rpoB_G1350GGC,rpoB_G1350GGG,rpoB_G1350GGT,rpoB_G1350GT,rpoB_G1350GTA,rpoB_G1350GTC,rpoB_G1350GTG,rpoB_G1350GTT,rpoB_GG1350G,rpoB_GGC1350G;PREDICT=S,S,S,R,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S,S     GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF   2:0,0,51:0,0,41:0,0,61:0,0,48:0,0,308:0,1,251:1,1,0.166667:-703.676,-703.676,-35.8875:667.788

I guess another reason this might have been fixed is that instead of using make_prg update to add the denovo sequences into the PRG, we recreate the MSAs and rebuild the PRGs for those genes with novel variants. So this particular case could be a weakness of make_prg update as it just updated with the novel variant - rpoB 1450 G>T - without combining it with the previous position into a single allele.

The second run, ERR4828599, had both a RIF FN and an INH FN. The RIF FN was an interesting case where the isolate has both L449M and S450F. drprg/pandora previously failed to find a novel variant. With the new pandora denovo process, we found (and called) both of these variants. The INH FN was katG S315N, which is a rarer mutation at that locus - normally S315T. Previously both mykrobe and drprg had no depth in this area and drprg did not find a novel variant. With the new pandora, we do find and call this mutation.

I'm going to run a few more of the Illumina FNs, but this is very promising!

mbhall88 · 2022-11-24T02:06:24Z

Okay, since we had a last update of results we have switch to using racon for denovo discovery and dropped the old nanopore data. I have also increased the number of illumina samples to 8,587

Illumina

Note I am going to change the markers so you can see the error bars now that they are so small

Drug	Tool	FN(R)	FP(S)	Sensitivity (95% CI)	Specificity (95% CI)	MCC
Amikacin	drprg	77(485)	50(6958)	84.1% (80.6-87.1%)	99.3% (99.1-99.5%)	0.857
Amikacin	mykrobe	101(485)	46(6958)	79.2% (75.3-82.6%)	99.3% (99.1-99.5%)	0.831
Amikacin	tbprofiler	62(485)	59(6958)	87.2% (83.9-89.9%)	99.2% (98.9-99.3%)	0.866
Capreomycin	drprg	62(235)	92(2449)	73.6% (67.6-78.8%)	96.2% (95.4-96.9%)	0.662
Capreomycin	mykrobe	78(235)	85(2449)	66.8% (60.6-72.5%)	96.5% (95.7-97.2%)	0.625
Capreomycin	tbprofiler	54(235)	96(2449)	77.0% (71.2-81.9%)	96.1% (95.2-96.8%)	0.679
Delamanid	drprg	111(116)	1(8152)	4.3% (1.9-9.7%)	100.0% (99.9-100.0%)	0.188
Delamanid	mykrobe	111(116)	1(8152)	4.3% (1.9-9.7%)	100.0% (99.9-100.0%)	0.188
Delamanid	tbprofiler	111(116)	2(8152)	4.3% (1.9-9.7%)	100.0% (99.9-100.0%)	0.173
Ethambutol	drprg	146(1538)	736(4936)	90.5% (88.9-91.9%)	85.1% (84.1-86.1%)	0.685
Ethambutol	mykrobe	149(1538)	728(4936)	90.3% (88.7-91.7%)	85.3% (84.2-86.2%)	0.686
Ethambutol	tbprofiler	118(1538)	765(4936)	92.3% (90.9-93.6%)	84.5% (83.5-85.5%)	0.691
Ethionamide	drprg	341(1104)	372(6105)	69.1% (66.3-71.8%)	93.9% (93.3-94.5%)	0.623
Ethionamide	mykrobe	276(1104)	395(6105)	75.0% (72.4-77.5%)	93.5% (92.9-94.1%)	0.658
Ethionamide	tbprofiler	272(1104)	414(6105)	75.4% (72.7-77.8%)	93.2% (92.6-93.8%)	0.653
Isoniazid	drprg	362(3900)	164(4194)	90.7% (89.8-91.6%)	96.1% (95.5-96.6%)	0.871
Isoniazid	mykrobe	366(3900)	163(4194)	90.6% (89.7-91.5%)	96.1% (95.5-96.7%)	0.87
Isoniazid	tbprofiler	297(3900)	181(4194)	92.4% (91.5-93.2%)	95.7% (95.0-96.3%)	0.882
Kanamycin	drprg	142(670)	101(6975)	78.8% (75.6-81.7%)	98.6% (98.2-98.8%)	0.796
Kanamycin	mykrobe	166(670)	96(6975)	75.2% (71.8-78.3%)	98.6% (98.3-98.9%)	0.776
Kanamycin	tbprofiler	122(670)	107(6975)	81.8% (78.7-84.5%)	98.5% (98.1-98.7%)	0.811
Levofloxacin	drprg	105(1040)	97(5454)	89.9% (87.9-91.6%)	98.2% (97.8-98.5%)	0.884
Levofloxacin	mykrobe	108(1040)	97(5454)	89.6% (87.6-91.3%)	98.2% (97.8-98.5%)	0.882
Levofloxacin	tbprofiler	85(1040)	109(5454)	91.8% (90.0-93.3%)	98.0% (97.6-98.3%)	0.89
Linezolid	drprg	49(65)	4(6110)	24.6% (15.8-36.3%)	99.9% (99.8-100.0%)	0.441
Linezolid	mykrobe	49(65)	4(6110)	24.6% (15.8-36.3%)	99.9% (99.8-100.0%)	0.441
Linezolid	tbprofiler	48(65)	5(6110)	26.2% (17.0-38.0%)	99.9% (99.8-100.0%)	0.447
Moxifloxacin	drprg	60(603)	464(5431)	90.0% (87.4-92.2%)	91.5% (90.7-92.2%)	0.656
Moxifloxacin	mykrobe	59(603)	460(5431)	90.2% (87.6-92.3%)	91.5% (90.8-92.2%)	0.658
Moxifloxacin	tbprofiler	42(603)	482(5431)	93.0% (90.7-94.8%)	91.1% (90.3-91.9%)	0.668
Ofloxacin	drprg	31(105)	4(424)	70.5% (61.2-78.4%)	99.1% (97.6-99.6%)	0.782
Ofloxacin	mykrobe	32(105)	4(424)	69.5% (60.2-77.5%)	99.1% (97.6-99.6%)	0.776
Ofloxacin	tbprofiler	26(105)	6(424)	75.2% (66.2-82.5%)	98.6% (96.9-99.3%)	0.802
Pyrazinamide	drprg	75(341)	47(822)	78.0% (73.3-82.1%)	94.3% (92.5-95.7%)	0.742
Pyrazinamide	mykrobe	73(341)	45(822)	78.6% (73.9-82.6%)	94.5% (92.8-95.9%)	0.751
Pyrazinamide	tbprofiler	45(341)	62(822)	86.8% (82.8-90.0%)	92.5% (90.4-94.1%)	0.782
Rifampicin	drprg	142(3222)	166(4586)	95.6% (94.8-96.2%)	96.4% (95.8-96.9%)	0.919
Rifampicin	mykrobe	187(3222)	165(4586)	94.2% (93.3-95.0%)	96.4% (95.8-96.9%)	0.907
Rifampicin	tbprofiler	102(3222)	177(4586)	96.8% (96.2-97.4%)	96.1% (95.5-96.7%)	0.927
Streptomycin	drprg	278(1042)	130(1205)	73.3% (70.6-75.9%)	89.2% (87.3-90.8%)	0.637
Streptomycin	mykrobe	295(1042)	132(1205)	71.7% (68.9-74.3%)	89.0% (87.2-90.7%)	0.621
Streptomycin	tbprofiler	257(1042)	136(1205)	75.3% (72.6-77.9%)	88.7% (86.8-90.4%)	0.649

Nanopore

Drug	Tool	FN(R)	FP(S)	Sensitivity (95% CI)	Specificity (95% CI)	MCC
Amikacin	drprg	0(11)	3(78)	100.0% (74.1-100.0%)	96.2% (89.3-98.7%)	0.869
Amikacin	mykrobe	0(11)	3(78)	100.0% (74.1-100.0%)	96.2% (89.3-98.7%)	0.869
Amikacin	tbprofiler	0(11)	3(78)	100.0% (74.1-100.0%)	96.2% (89.3-98.7%)	0.869
Capreomycin	drprg	1(1)	1(51)	0.0% (0.0-79.3%)	98.0% (89.7-99.7%)	-0.02
Capreomycin	mykrobe	1(1)	1(51)	0.0% (0.0-79.3%)	98.0% (89.7-99.7%)	-0.02
Capreomycin	tbprofiler	1(1)	1(51)	0.0% (0.0-79.3%)	98.0% (89.7-99.7%)	-0.02
Ethambutol	drprg	4(14)	15(77)	71.4% (45.4-88.3%)	80.5% (70.3-87.8%)	0.42
Ethambutol	mykrobe	4(14)	15(77)	71.4% (45.4-88.3%)	80.5% (70.3-87.8%)	0.42
Ethambutol	tbprofiler	5(14)	15(77)	64.3% (38.8-83.7%)	80.5% (70.3-87.8%)	0.367
Ethionamide	drprg	0(4)	1(9)	100.0% (51.0-100.0%)	88.9% (56.5-98.0%)	0.843
Ethionamide	mykrobe	0(4)	1(9)	100.0% (51.0-100.0%)	88.9% (56.5-98.0%)	0.843
Ethionamide	tbprofiler	0(4)	1(9)	100.0% (51.0-100.0%)	88.9% (56.5-98.0%)	0.843
Isoniazid	drprg	9(51)	4(48)	82.4% (69.7-90.4%)	91.7% (80.4-96.7%)	0.742
Isoniazid	mykrobe	9(51)	4(48)	82.4% (69.7-90.4%)	91.7% (80.4-96.7%)	0.742
Isoniazid	tbprofiler	9(51)	3(48)	82.4% (69.7-90.4%)	93.8% (83.2-97.9%)	0.764
Kanamycin	drprg	0(0)	1(52)	-	98.1% (89.9-99.7%)	-
Kanamycin	mykrobe	0(0)	1(52)	-	98.1% (89.9-99.7%)	-
Kanamycin	tbprofiler	0(0)	1(52)	-	98.1% (89.9-99.7%)	-
Moxifloxacin	drprg	0(0)	1(1)	-	0.0% (0.0-79.3%)	-
Moxifloxacin	mykrobe	0(0)	1(1)	-	0.0% (0.0-79.3%)	-
Moxifloxacin	tbprofiler	0(0)	1(1)	-	0.0% (0.0-79.3%)	-
Ofloxacin	drprg	0(10)	4(77)	100.0% (72.2-100.0%)	94.8% (87.4-98.0%)	0.823
Ofloxacin	mykrobe	0(10)	4(77)	100.0% (72.2-100.0%)	94.8% (87.4-98.0%)	0.823
Ofloxacin	tbprofiler	0(10)	3(77)	100.0% (72.2-100.0%)	96.1% (89.2-98.7%)	0.86
Pyrazinamide	drprg	0(0)	0(1)	-	100.0% (20.7-100.0%)	-
Pyrazinamide	mykrobe	0(0)	0(1)	-	100.0% (20.7-100.0%)	-
Pyrazinamide	tbprofiler	0(0)	0(1)	-	100.0% (20.7-100.0%)	-
Rifampicin	drprg	5(48)	1(44)	89.6% (77.8-95.5%)	97.7% (88.2-99.6%)	0.873
Rifampicin	mykrobe	5(48)	1(44)	89.6% (77.8-95.5%)	97.7% (88.2-99.6%)	0.873
Rifampicin	tbprofiler	5(48)	1(44)	89.6% (77.8-95.5%)	97.7% (88.2-99.6%)	0.873
Streptomycin	drprg	2(8)	14(83)	75.0% (40.9-92.9%)	83.1% (73.7-89.7%)	0.398
Streptomycin	mykrobe	2(8)	27(83)	75.0% (40.9-92.9%)	67.5% (56.8-76.6%)	0.25
Streptomycin	tbprofiler	2(8)	12(83)	75.0% (40.9-92.9%)	85.5% (76.4-91.5%)	0.43

lachlancoin · 2022-11-24T02:14:38Z

Looks pretty good I think

…

On Thu, 24 Nov 2022, 1:06 pm Michael Hall, ***@***.***> wrote: Okay, since we had a last update of results we have switch to using racon for denovo discovery and dropped the old nanopore data. I have also increased the number of illumina samples to 8,587 Illumina Note I am going to change the markers so you can see the error bars now that they are so small [image: image] <https://user-images.githubusercontent.com/20403931/203677683-39591450-c68e-47a8-9b2c-d77f511c5b0e.png> Drug Tool FN(R) FP(S) Sensitivity (95% CI) Specificity (95% CI) MCC Amikacin drprg 77(485) 50(6958) 84.1% (80.6-87.1%) 99.3% (99.1-99.5%) 0.857 Amikacin mykrobe 101(485) 46(6958) 79.2% (75.3-82.6%) 99.3% (99.1-99.5%) 0.831 Amikacin tbprofiler 62(485) 59(6958) 87.2% (83.9-89.9%) 99.2% (98.9-99.3%) 0.866 Capreomycin drprg 62(235) 92(2449) 73.6% (67.6-78.8%) 96.2% (95.4-96.9%) 0.662 Capreomycin mykrobe 78(235) 85(2449) 66.8% (60.6-72.5%) 96.5% (95.7-97.2%) 0.625 Capreomycin tbprofiler 54(235) 96(2449) 77.0% (71.2-81.9%) 96.1% (95.2-96.8%) 0.679 Delamanid drprg 111(116) 1(8152) 4.3% (1.9-9.7%) 100.0% (99.9-100.0%) 0.188 Delamanid mykrobe 111(116) 1(8152) 4.3% (1.9-9.7%) 100.0% (99.9-100.0%) 0.188 Delamanid tbprofiler 111(116) 2(8152) 4.3% (1.9-9.7%) 100.0% (99.9-100.0%) 0.173 Ethambutol drprg 146(1538) 736(4936) 90.5% (88.9-91.9%) 85.1% (84.1-86.1%) 0.685 Ethambutol mykrobe 149(1538) 728(4936) 90.3% (88.7-91.7%) 85.3% (84.2-86.2%) 0.686 Ethambutol tbprofiler 118(1538) 765(4936) 92.3% (90.9-93.6%) 84.5% (83.5-85.5%) 0.691 Ethionamide drprg 341(1104) 372(6105) 69.1% (66.3-71.8%) 93.9% (93.3-94.5%) 0.623 Ethionamide mykrobe 276(1104) 395(6105) 75.0% (72.4-77.5%) 93.5% (92.9-94.1%) 0.658 Ethionamide tbprofiler 272(1104) 414(6105) 75.4% (72.7-77.8%) 93.2% (92.6-93.8%) 0.653 Isoniazid drprg 362(3900) 164(4194) 90.7% (89.8-91.6%) 96.1% (95.5-96.6%) 0.871 Isoniazid mykrobe 366(3900) 163(4194) 90.6% (89.7-91.5%) 96.1% (95.5-96.7%) 0.87 Isoniazid tbprofiler 297(3900) 181(4194) 92.4% (91.5-93.2%) 95.7% (95.0-96.3%) 0.882 Kanamycin drprg 142(670) 101(6975) 78.8% (75.6-81.7%) 98.6% (98.2-98.8%) 0.796 Kanamycin mykrobe 166(670) 96(6975) 75.2% (71.8-78.3%) 98.6% (98.3-98.9%) 0.776 Kanamycin tbprofiler 122(670) 107(6975) 81.8% (78.7-84.5%) 98.5% (98.1-98.7%) 0.811 Levofloxacin drprg 105(1040) 97(5454) 89.9% (87.9-91.6%) 98.2% (97.8-98.5%) 0.884 Levofloxacin mykrobe 108(1040) 97(5454) 89.6% (87.6-91.3%) 98.2% (97.8-98.5%) 0.882 Levofloxacin tbprofiler 85(1040) 109(5454) 91.8% (90.0-93.3%) 98.0% (97.6-98.3%) 0.89 Linezolid drprg 49(65) 4(6110) 24.6% (15.8-36.3%) 99.9% (99.8-100.0%) 0.441 Linezolid mykrobe 49(65) 4(6110) 24.6% (15.8-36.3%) 99.9% (99.8-100.0%) 0.441 Linezolid tbprofiler 48(65) 5(6110) 26.2% (17.0-38.0%) 99.9% (99.8-100.0%) 0.447 Moxifloxacin drprg 60(603) 464(5431) 90.0% (87.4-92.2%) 91.5% (90.7-92.2%) 0.656 Moxifloxacin mykrobe 59(603) 460(5431) 90.2% (87.6-92.3%) 91.5% (90.8-92.2%) 0.658 Moxifloxacin tbprofiler 42(603) 482(5431) 93.0% (90.7-94.8%) 91.1% (90.3-91.9%) 0.668 Ofloxacin drprg 31(105) 4(424) 70.5% (61.2-78.4%) 99.1% (97.6-99.6%) 0.782 Ofloxacin mykrobe 32(105) 4(424) 69.5% (60.2-77.5%) 99.1% (97.6-99.6%) 0.776 Ofloxacin tbprofiler 26(105) 6(424) 75.2% (66.2-82.5%) 98.6% (96.9-99.3%) 0.802 Pyrazinamide drprg 75(341) 47(822) 78.0% (73.3-82.1%) 94.3% (92.5-95.7%) 0.742 Pyrazinamide mykrobe 73(341) 45(822) 78.6% (73.9-82.6%) 94.5% (92.8-95.9%) 0.751 Pyrazinamide tbprofiler 45(341) 62(822) 86.8% (82.8-90.0%) 92.5% (90.4-94.1%) 0.782 Rifampicin drprg 142(3222) 166(4586) 95.6% (94.8-96.2%) 96.4% (95.8-96.9%) 0.919 Rifampicin mykrobe 187(3222) 165(4586) 94.2% (93.3-95.0%) 96.4% (95.8-96.9%) 0.907 Rifampicin tbprofiler 102(3222) 177(4586) 96.8% (96.2-97.4%) 96.1% (95.5-96.7%) 0.927 Streptomycin drprg 278(1042) 130(1205) 73.3% (70.6-75.9%) 89.2% (87.3-90.8%) 0.637 Streptomycin mykrobe 295(1042) 132(1205) 71.7% (68.9-74.3%) 89.0% (87.2-90.7%) 0.621 Streptomycin tbprofiler 257(1042) 136(1205) 75.3% (72.6-77.9%) 88.7% (86.8-90.4%) 0.649 Nanopore [image: image] <https://user-images.githubusercontent.com/20403931/203677879-e90cc0ce-d034-4cfb-a49f-85b72afca86b.png> Drug Tool FN(R) FP(S) Sensitivity (95% CI) Specificity (95% CI) MCC Amikacin drprg 0(11) 3(78) 100.0% (74.1-100.0%) 96.2% (89.3-98.7%) 0.869 Amikacin mykrobe 0(11) 3(78) 100.0% (74.1-100.0%) 96.2% (89.3-98.7%) 0.869 Amikacin tbprofiler 0(11) 3(78) 100.0% (74.1-100.0%) 96.2% (89.3-98.7%) 0.869 Capreomycin drprg 1(1) 1(51) 0.0% (0.0-79.3%) 98.0% (89.7-99.7%) -0.02 Capreomycin mykrobe 1(1) 1(51) 0.0% (0.0-79.3%) 98.0% (89.7-99.7%) -0.02 Capreomycin tbprofiler 1(1) 1(51) 0.0% (0.0-79.3%) 98.0% (89.7-99.7%) -0.02 Ethambutol drprg 4(14) 15(77) 71.4% (45.4-88.3%) 80.5% (70.3-87.8%) 0.42 Ethambutol mykrobe 4(14) 15(77) 71.4% (45.4-88.3%) 80.5% (70.3-87.8%) 0.42 Ethambutol tbprofiler 5(14) 15(77) 64.3% (38.8-83.7%) 80.5% (70.3-87.8%) 0.367 Ethionamide drprg 0(4) 1(9) 100.0% (51.0-100.0%) 88.9% (56.5-98.0%) 0.843 Ethionamide mykrobe 0(4) 1(9) 100.0% (51.0-100.0%) 88.9% (56.5-98.0%) 0.843 Ethionamide tbprofiler 0(4) 1(9) 100.0% (51.0-100.0%) 88.9% (56.5-98.0%) 0.843 Isoniazid drprg 9(51) 4(48) 82.4% (69.7-90.4%) 91.7% (80.4-96.7%) 0.742 Isoniazid mykrobe 9(51) 4(48) 82.4% (69.7-90.4%) 91.7% (80.4-96.7%) 0.742 Isoniazid tbprofiler 9(51) 3(48) 82.4% (69.7-90.4%) 93.8% (83.2-97.9%) 0.764 Kanamycin drprg 0(0) 1(52) - 98.1% (89.9-99.7%) - Kanamycin mykrobe 0(0) 1(52) - 98.1% (89.9-99.7%) - Kanamycin tbprofiler 0(0) 1(52) - 98.1% (89.9-99.7%) - Moxifloxacin drprg 0(0) 1(1) - 0.0% (0.0-79.3%) - Moxifloxacin mykrobe 0(0) 1(1) - 0.0% (0.0-79.3%) - Moxifloxacin tbprofiler 0(0) 1(1) - 0.0% (0.0-79.3%) - Ofloxacin drprg 0(10) 4(77) 100.0% (72.2-100.0%) 94.8% (87.4-98.0%) 0.823 Ofloxacin mykrobe 0(10) 4(77) 100.0% (72.2-100.0%) 94.8% (87.4-98.0%) 0.823 Ofloxacin tbprofiler 0(10) 3(77) 100.0% (72.2-100.0%) 96.1% (89.2-98.7%) 0.86 Pyrazinamide drprg 0(0) 0(1) - 100.0% (20.7-100.0%) - Pyrazinamide mykrobe 0(0) 0(1) - 100.0% (20.7-100.0%) - Pyrazinamide tbprofiler 0(0) 0(1) - 100.0% (20.7-100.0%) - Rifampicin drprg 5(48) 1(44) 89.6% (77.8-95.5%) 97.7% (88.2-99.6%) 0.873 Rifampicin mykrobe 5(48) 1(44) 89.6% (77.8-95.5%) 97.7% (88.2-99.6%) 0.873 Rifampicin tbprofiler 5(48) 1(44) 89.6% (77.8-95.5%) 97.7% (88.2-99.6%) 0.873 Streptomycin drprg 2(8) 14(83) 75.0% (40.9-92.9%) 83.1% (73.7-89.7%) 0.398 Streptomycin mykrobe 2(8) 27(83) 75.0% (40.9-92.9%) 67.5% (56.8-76.6%) 0.25 Streptomycin tbprofiler 2(8) 12(83) 75.0% (40.9-92.9%) 85.5% (76.4-91.5%) 0.43 — Reply to this email directly, view it on GitHub <#2>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA6TKZC4MDZPDVRG56HLV7DWJ3ESVANCNFSM6AAAAAAQWLHP7A> . You are receiving this because you commented.Message ID: ***@***.***>

iqbal-lab · 2022-11-24T02:18:00Z

Yeah!

mbhall88 · 2022-11-24T03:26:32Z

I disagree sadly haha. TBProfiler is beating us on a lot of drugs. Going to dig into why that is now

iqbal-lab · 2022-11-24T06:50:32Z

Looking again (now on laptop), if i was to summarise those results:

On illumina, tb-profiler often has the highest sensitivity. It does pay a very small price in specificity, but it's much less noticeable than the sensitivity increase. So i agree, good to look into that

On nanopore: sensitivity of all tools is essentially identical (except tb-profiler has a problem on EMB). Specificity is also essentially identical, although for two drugs (streptomycin and ofloxacin) tb-prof has a slightly increased specificity.
I'm quite impressed/surprised how well all 3 do on the 4 drugs where any frameshift in a gene causes a resistant call. It matches what you found for Mykrobe in your Lancet Microbe paper @mbhall88 , but am delighted it's also true for DrPrg; also a bit surprised that tb-profiler does that well too given it uses bcftools. We didn't find we could call indels with this level of specificity. (I guess, just refusing to make indel calls with nanopore would give v high specificity?)

mbhall88 · 2022-11-25T05:48:03Z

I've been looking through the variants where drprg is FN but either of the other tools is TP (on Illumina) to see what variants we have missed. (I'm not finished yet) but a lot of the tbprofiler TPs where we are FN are to do with minor alleles. By default, TBProfiler will call anything with a fraction of 0.1 or more. This brings up point 3 from #2 again. We tell mykrobe to run in haploid mode and drprg only runs in haploid mode. The options forward I see are:

Run mykrobe in diploid mode (call minor alleles) and take the sensitivity hit in drprg as we cant call minor alleles
force fraction 0.5 in tbprofiler so that all tools are effectively in haploid mode
implement a diploid model in pandora (not sure how much work this would be? will alert Leandro to get his input too)

Option 2 is obviously the easiest and most likely to make us look better, but it sits somewhat uncomfortably with me as we are kind of skewing the results in our favour right?

I will keep working through these results next week for other drugs as there are also a few cases on weird indels which I will document when I have a better understanding of what's going on.

iqbal-lab · 2022-11-25T07:26:14Z

I think detecting minors could easily be done directly in drprg, no need to implement in Pandora. You get coverage info on the S and R alleles right? Just ask if the coverage on any R allele is >0.1 of the total

leoisl · 2022-11-25T13:00:44Z

3. implement a diploid model in pandora (not sure how much work this would be? will alert Leandro to get his input too)

IDK neither how much work this would be, because the only experience I have with genotyping models actually is in pandora, which has a haploid model. If implementing a diploid model is simply calling the two most likely alleles, then maybe a simple implementation of getting the most likely allele (what is currently implemented) and the second most likely allele (remove/ignore the most likely and rerun the genotyping algorithm) is not hard. This can be easily generalised to n-ploid... but I don't think it is as simple as this...

iqbal-lab · 2022-11-25T22:29:45Z

The problem is its not really diploid. Minor alleles in bacteria sometimes occur at a few (or many) placrs across the genome,but at different frequencies. Diploid assumes 50/50. Mykrobe uses a kind if diploid model, but it's a hack, and the genotype confidence is not well calibrated.
In Pandora, there are a bunch of things you could do, but you're describing whole genome variation. Maybe you'd say "looks like there is a mixture of 2 genomes at 70:30 ratio". Or "lots of mixed positions in this data, looks dodgy".
In Mykrobe and drprg, you have positions you care about, and knowledge that low frequency Resistance alleles cause drug resistance. So you just need to spot them, independently, at each snp. Pandora makes a vcf which I believe drprg parses here (https://github.com/mbhall88/drprg/blob/265c25c9e027a26f8c671931a736e19da399142e/src/predict.rs#L402) The vcf has coverage on both alleles, or all alleles. So you can just parse that to spot minors.

mbhall88 · 2022-11-27T21:45:36Z

I think detecting minors could easily be done directly in drprg, no need to implement in Pandora. You get coverage info on the S and R alleles right? Just ask if the coverage on any R allele is >0.1 of the total

True. I'll have to do some reimplementing though as I currently only pay attention to the called alleles. But it shouldn't take too long to get this working 🤞

iqbal-lab · 2022-11-27T21:56:31Z

Hurrah! I would reread the section on minor alleles here
https://wellcomeopenresearch.org/articles/4-191
I just reread it and it was informative, reminded me of differences between drugs

iqbal-lab · 2022-12-15T06:37:07Z

Two solutions

These minor alleles must by definition be catalogue snps. So they are in our graph, but the problem is that racon has found something new in our consensus, and that is close to the minor allele and the combination of both is not in the graph. Right? So all we need to do is rebuild the prg including catalogue and also the racon call?
There is a clean solution the requires a fair bit of new code. At the end of drprg take the consensus and align to the reference so we know what positions in the consensus correspond to catalogue snp coords. Then minimap the reads to the consensus (I see a rust port of minimap exists, but looks maybe immature ? https://crates.io/crates/minimap2 ). Then just count minor allele bases in the pile up at the catalogue snp coords.

iqbal-lab · 2022-12-15T07:06:31Z

This is v exciting and good news really, there a lot of sensitivity gain to be had from the minor alleles and gene deletions

mbhall88 · 2022-12-15T20:50:20Z

For point 1, no, that is not right. We don't have these variants in the graph, which is the problem. (Remember our graph is not the panel, but the sparse popn. PRG from randomly sampled cryptic samples). And racon can't find them in these samples beacsue they're only minor alleles. Racon will find the major allele - the reference.

Point 2 seems like it effectively does away with the need for pandora though - is almost basically what tbprofiler does? It will also dramatically increase our runtime and memory usage, which at the moment is our biggest selling point really.

iqbal-lab · 2022-12-15T22:45:37Z

Ah, I did forget
No, it doesn't at all do away with the need for Pandora, mapping to the consensus will be dominated by exact matches (for illumina), so much easier to spot minors

I need to think!

iqbal-lab · 2022-12-15T22:59:20Z

Follow up to 2. I'm not pushing for this solution, but just to say, we do this for covid, 30kb long, and use <500 mb ram and 45 seconds for the whole process. I think performance is not a barrier . But there are other,arguments not to do it

mbhall88 · 2023-01-03T03:46:07Z

After closing mbhall88/drprg#23 the current (Illumina) results are

Drug	Tool	FN(R)	FP(S)	Sensitivity (95% CI)	Specificity (95% CI)	MCC
Amikacin	drprg	68(484)	57(6958)	86.0% (82.6-88.8%)	99.2% (98.9-99.4%)	0.86
Amikacin	mykrobe	93(484)	51(6958)	80.8% (77.0-84.0%)	99.3% (99.0-99.4%)	0.835
Amikacin	tbprofiler	62(484)	59(6958)	87.2% (83.9-89.9%)	99.2% (98.9-99.3%)	0.866
Capreomycin	drprg	57(235)	94(2448)	75.7% (69.9-80.8%)	96.2% (95.3-96.9%)	0.673
Capreomycin	mykrobe	72(235)	87(2448)	69.4% (63.2-74.9%)	96.4% (95.6-97.1%)	0.64
Capreomycin	tbprofiler	54(235)	95(2448)	77.0% (71.2-81.9%)	96.1% (95.3-96.8%)	0.681
Delamanid	drprg	111(116)	5(8151)	4.3% (1.9-9.7%)	99.9% (99.9-100.0%)	0.144
Delamanid	mykrobe	111(116)	2(8151)	4.3% (1.9-9.7%)	100.0% (99.9-100.0%)	0.173
Delamanid	tbprofiler	111(116)	2(8151)	4.3% (1.9-9.7%)	100.0% (99.9-100.0%)	0.173
Ethambutol	drprg	121(1537)	752(4935)	92.1% (90.7-93.4%)	84.8% (83.7-85.7%)	0.693
Ethambutol	mykrobe	133(1537)	747(4935)	91.3% (89.8-92.7%)	84.9% (83.8-85.8%)	0.688
Ethambutol	tbprofiler	118(1537)	765(4935)	92.3% (90.9-93.6%)	84.5% (83.5-85.5%)	0.691
Ethionamide	drprg	273(1103)	417(6105)	75.2% (72.6-77.7%)	93.2% (92.5-93.8%)	0.651
Ethionamide	mykrobe	265(1103)	413(6105)	76.0% (73.4-78.4%)	93.2% (92.6-93.8%)	0.658
Ethionamide	tbprofiler	272(1103)	414(6105)	75.3% (72.7-77.8%)	93.2% (92.6-93.8%)	0.653
Isoniazid	drprg	307(3899)	173(4193)	92.1% (91.2-92.9%)	95.9% (95.2-96.4%)	0.882
Isoniazid	mykrobe	333(3899)	170(4193)	91.5% (90.5-92.3%)	95.9% (95.3-96.5%)	0.876
Isoniazid	tbprofiler	297(3899)	181(4193)	92.4% (91.5-93.2%)	95.7% (95.0-96.3%)	0.882
Kanamycin	drprg	128(669)	107(6975)	80.9% (77.7-83.7%)	98.5% (98.1-98.7%)	0.805
Kanamycin	mykrobe	152(669)	98(6975)	77.3% (74.0-80.3%)	98.6% (98.3-98.8%)	0.788
Kanamycin	tbprofiler	122(669)	107(6975)	81.8% (78.7-84.5%)	98.5% (98.1-98.7%)	0.811
Levofloxacin	drprg	81(1040)	102(5454)	92.2% (90.4-93.7%)	98.1% (97.7-98.5%)	0.896
Levofloxacin	mykrobe	88(1040)	102(5454)	91.5% (89.7-93.1%)	98.1% (97.7-98.5%)	0.892
Levofloxacin	tbprofiler	85(1040)	109(5454)	91.8% (90.0-93.3%)	98.0% (97.6-98.3%)	0.89
Linezolid	drprg	48(65)	4(6109)	26.2% (17.0-38.0%)	99.9% (99.8-100.0%)	0.457
Linezolid	mykrobe	48(65)	4(6109)	26.2% (17.0-38.0%)	99.9% (99.8-100.0%)	0.457
Linezolid	tbprofiler	48(65)	5(6109)	26.2% (17.0-38.0%)	99.9% (99.8-100.0%)	0.447
Moxifloxacin	drprg	41(603)	478(5430)	93.2% (90.9-94.9%)	91.2% (90.4-91.9%)	0.67
Moxifloxacin	mykrobe	44(603)	472(5430)	92.7% (90.3-94.5%)	91.3% (90.5-92.0%)	0.669
Moxifloxacin	tbprofiler	42(603)	481(5430)	93.0% (90.7-94.8%)	91.1% (90.4-91.9%)	0.668
Ofloxacin	drprg	24(104)	5(424)	76.9% (68.0-84.0%)	98.8% (97.3-99.5%)	0.82
Ofloxacin	mykrobe	26(104)	5(424)	75.0% (65.9-82.3%)	98.8% (97.3-99.5%)	0.807
Ofloxacin	tbprofiler	26(104)	6(424)	75.0% (65.9-82.3%)	98.6% (96.9-99.3%)	0.8
Pyrazinamide	drprg	68(341)	53(820)	80.1% (75.5-84.0%)	93.5% (91.6-95.0%)	0.746
Pyrazinamide	mykrobe	55(341)	56(820)	83.9% (79.6-87.4%)	93.2% (91.2-94.7%)	0.77
Pyrazinamide	tbprofiler	45(341)	62(820)	86.8% (82.8-90.0%)	92.4% (90.4-94.1%)	0.782
Rifampicin	drprg	133(3221)	166(4585)	95.9% (95.1-96.5%)	96.4% (95.8-96.9%)	0.921
Rifampicin	mykrobe	164(3221)	169(4585)	94.9% (94.1-95.6%)	96.3% (95.7-96.8%)	0.912
Rifampicin	tbprofiler	102(3221)	177(4585)	96.8% (96.2-97.4%)	96.1% (95.5-96.7%)	0.927
Streptomycin	drprg	266(1041)	133(1205)	74.4% (71.7-77.0%)	89.0% (87.1-90.6%)	0.644
Streptomycin	mykrobe	282(1041)	135(1205)	72.9% (70.1-75.5%)	88.8% (86.9-90.5%)	0.629
Streptomycin	tbprofiler	257(1041)	136(1205)	75.3% (72.6-77.8%)	88.7% (86.8-90.4%)	0.649

The nanopore results remain unchanged

mbhall88 · 2023-01-04T22:12:45Z

After the updates in minor allele calling in mbhall88/drprg#19 (comment)

Drug	Tool	FN(R)	FP(S)	Sensitivity (95% CI)	Specificity (95% CI)	MCC
Amikacin	drprg	68(484)	57(6958)	86.0% (82.6-88.8%)	99.2% (98.9-99.4%)	0.86
Amikacin	mykrobe	93(484)	51(6958)	80.8% (77.0-84.0%)	99.3% (99.0-99.4%)	0.835
Amikacin	tbprofiler	62(484)	59(6958)	87.2% (83.9-89.9%)	99.2% (98.9-99.3%)	0.866
Capreomycin	drprg	57(235)	94(2448)	75.7% (69.9-80.8%)	96.2% (95.3-96.9%)	0.673
Capreomycin	mykrobe	72(235)	87(2448)	69.4% (63.2-74.9%)	96.4% (95.6-97.1%)	0.64
Capreomycin	tbprofiler	54(235)	95(2448)	77.0% (71.2-81.9%)	96.1% (95.3-96.8%)	0.681
Delamanid	drprg	111(116)	5(8151)	4.3% (1.9-9.7%)	99.9% (99.9-100.0%)	0.144
Delamanid	mykrobe	111(116)	2(8151)	4.3% (1.9-9.7%)	100.0% (99.9-100.0%)	0.173
Delamanid	tbprofiler	111(116)	2(8151)	4.3% (1.9-9.7%)	100.0% (99.9-100.0%)	0.173
Ethambutol	drprg	121(1537)	752(4935)	92.1% (90.7-93.4%)	84.8% (83.7-85.7%)	0.693
Ethambutol	mykrobe	133(1537)	747(4935)	91.3% (89.8-92.7%)	84.9% (83.8-85.8%)	0.688
Ethambutol	tbprofiler	118(1537)	765(4935)	92.3% (90.9-93.6%)	84.5% (83.5-85.5%)	0.691
Ethionamide	drprg	272(1103)	420(6105)	75.3% (72.7-77.8%)	93.1% (92.5-93.7%)	0.651
Ethionamide	mykrobe	265(1103)	413(6105)	76.0% (73.4-78.4%)	93.2% (92.6-93.8%)	0.658
Ethionamide	tbprofiler	272(1103)	414(6105)	75.3% (72.7-77.8%)	93.2% (92.6-93.8%)	0.653
Isoniazid	drprg	307(3899)	173(4193)	92.1% (91.2-92.9%)	95.9% (95.2-96.4%)	0.882
Isoniazid	mykrobe	333(3899)	170(4193)	91.5% (90.5-92.3%)	95.9% (95.3-96.5%)	0.876
Isoniazid	tbprofiler	297(3899)	181(4193)	92.4% (91.5-93.2%)	95.7% (95.0-96.3%)	0.882
Kanamycin	drprg	128(669)	107(6975)	80.9% (77.7-83.7%)	98.5% (98.1-98.7%)	0.805
Kanamycin	mykrobe	152(669)	98(6975)	77.3% (74.0-80.3%)	98.6% (98.3-98.8%)	0.788
Kanamycin	tbprofiler	122(669)	107(6975)	81.8% (78.7-84.5%)	98.5% (98.1-98.7%)	0.811
Levofloxacin	drprg	79(1040)	104(5454)	92.4% (90.6-93.9%)	98.1% (97.7-98.4%)	0.896
Levofloxacin	mykrobe	88(1040)	102(5454)	91.5% (89.7-93.1%)	98.1% (97.7-98.5%)	0.892
Levofloxacin	tbprofiler	85(1040)	109(5454)	91.8% (90.0-93.3%)	98.0% (97.6-98.3%)	0.89
Linezolid	drprg	48(65)	4(6109)	26.2% (17.0-38.0%)	99.9% (99.8-100.0%)	0.457
Linezolid	mykrobe	48(65)	4(6109)	26.2% (17.0-38.0%)	99.9% (99.8-100.0%)	0.457
Linezolid	tbprofiler	48(65)	5(6109)	26.2% (17.0-38.0%)	99.9% (99.8-100.0%)	0.447
Moxifloxacin	drprg	40(603)	478(5430)	93.4% (91.1-95.1%)	91.2% (90.4-91.9%)	0.671
Moxifloxacin	mykrobe	44(603)	472(5430)	92.7% (90.3-94.5%)	91.3% (90.5-92.0%)	0.669
Moxifloxacin	tbprofiler	42(603)	481(5430)	93.0% (90.7-94.8%)	91.1% (90.4-91.9%)	0.668
Ofloxacin	drprg	24(104)	5(424)	76.9% (68.0-84.0%)	98.8% (97.3-99.5%)	0.82
Ofloxacin	mykrobe	26(104)	5(424)	75.0% (65.9-82.3%)	98.8% (97.3-99.5%)	0.807
Ofloxacin	tbprofiler	26(104)	6(424)	75.0% (65.9-82.3%)	98.6% (96.9-99.3%)	0.8
Pyrazinamide	drprg	67(341)	54(820)	80.4% (75.8-84.2%)	93.4% (91.5-94.9%)	0.746
Pyrazinamide	mykrobe	55(341)	56(820)	83.9% (79.6-87.4%)	93.2% (91.2-94.7%)	0.77
Pyrazinamide	tbprofiler	45(341)	62(820)	86.8% (82.8-90.0%)	92.4% (90.4-94.1%)	0.782
Rifampicin	drprg	114(3221)	168(4585)	96.5% (95.8-97.0%)	96.3% (95.8-96.8%)	0.926
Rifampicin	mykrobe	164(3221)	169(4585)	94.9% (94.1-95.6%)	96.3% (95.7-96.8%)	0.912
Rifampicin	tbprofiler	102(3221)	177(4585)	96.8% (96.2-97.4%)	96.1% (95.5-96.7%)	0.927
Streptomycin	drprg	267(1041)	134(1205)	74.4% (71.6-76.9%)	88.9% (87.0-90.5%)	0.643
Streptomycin	mykrobe	282(1041)	135(1205)	72.9% (70.1-75.5%)	88.8% (86.9-90.5%)	0.629
Streptomycin	tbprofiler	257(1041)	136(1205)	75.3% (72.6-77.8%)	88.7% (86.8-90.4%)	0.649

iqbal-lab · 2023-01-06T06:51:22Z

OK, so looking at those results now, we can definitely see a sensitive improvement over Mykrobe with no precision loss. Compared with tbprofiler we are broadly the same - tbprofiler mostly has slightly better recall and slightly worse precision (except for fluoroquinolones). The biggest difference is 7% higher recall for tbprofiler for pyrazinamide . Fair summary?

mbhall88 · 2023-01-08T23:01:54Z

Yep, fair summary. The work in mbhall88/drprg#24 should improve the PZA recall slightly too.

mbhall88 · 2023-01-12T22:10:21Z

After the work in mbhall88/drprg#26 , we get the following Illumina results (nanopore is unchanged). Note: only ETO and PZA change from last results

Drug	Tool	FN(R)	FP(S)	Sensitivity (95% CI)	Specificity (95% CI)	MCC
Amikacin	drprg	66(484)	57(6958)	86.4% (83.0-89.1%)	99.2% (98.9-99.4%)	0.863
Amikacin	mykrobe	93(484)	51(6958)	80.8% (77.0-84.0%)	99.3% (99.0-99.4%)	0.835
Amikacin	tbprofiler	62(484)	59(6958)	87.2% (83.9-89.9%)	99.2% (98.9-99.3%)	0.866
Capreomycin	drprg	56(235)	94(2448)	76.2% (70.3-81.2%)	96.2% (95.3-96.9%)	0.676
Capreomycin	mykrobe	72(235)	87(2448)	69.4% (63.2-74.9%)	96.4% (95.6-97.1%)	0.64
Capreomycin	tbprofiler	54(235)	95(2448)	77.0% (71.2-81.9%)	96.1% (95.3-96.8%)	0.681
Delamanid	drprg	111(116)	4(8151)	4.3% (1.9-9.7%)	100.0% (99.9-100.0%)	0.152
Delamanid	mykrobe	111(116)	2(8151)	4.3% (1.9-9.7%)	100.0% (99.9-100.0%)	0.173
Delamanid	tbprofiler	111(116)	2(8151)	4.3% (1.9-9.7%)	100.0% (99.9-100.0%)	0.173
Ethambutol	drprg	120(1537)	754(4935)	92.2% (90.7-93.4%)	84.7% (83.7-85.7%)	0.693
Ethambutol	mykrobe	133(1537)	747(4935)	91.3% (89.8-92.7%)	84.9% (83.8-85.8%)	0.688
Ethambutol	tbprofiler	118(1537)	765(4935)	92.3% (90.9-93.6%)	84.5% (83.5-85.5%)	0.691
Ethionamide	drprg	245(1103)	418(6105)	77.8% (75.2-80.1%)	93.2% (92.5-93.8%)	0.669
Ethionamide	mykrobe	265(1103)	413(6105)	76.0% (73.4-78.4%)	93.2% (92.6-93.8%)	0.658
Ethionamide	tbprofiler	272(1103)	414(6105)	75.3% (72.7-77.8%)	93.2% (92.6-93.8%)	0.653
Isoniazid	drprg	305(3899)	173(4193)	92.2% (91.3-93.0%)	95.9% (95.2-96.4%)	0.882
Isoniazid	mykrobe	333(3899)	170(4193)	91.5% (90.5-92.3%)	95.9% (95.3-96.5%)	0.876
Isoniazid	tbprofiler	297(3899)	181(4193)	92.4% (91.5-93.2%)	95.7% (95.0-96.3%)	0.882
Kanamycin	drprg	126(669)	107(6975)	81.2% (78.0-83.9%)	98.5% (98.1-98.7%)	0.807
Kanamycin	mykrobe	152(669)	98(6975)	77.3% (74.0-80.3%)	98.6% (98.3-98.8%)	0.788
Kanamycin	tbprofiler	122(669)	107(6975)	81.8% (78.7-84.5%)	98.5% (98.1-98.7%)	0.811
Levofloxacin	drprg	80(1040)	106(5454)	92.3% (90.5-93.8%)	98.1% (97.7-98.4%)	0.895
Levofloxacin	mykrobe	88(1040)	102(5454)	91.5% (89.7-93.1%)	98.1% (97.7-98.5%)	0.892
Levofloxacin	tbprofiler	85(1040)	109(5454)	91.8% (90.0-93.3%)	98.0% (97.6-98.3%)	0.89
Linezolid	drprg	48(65)	4(6109)	26.2% (17.0-38.0%)	99.9% (99.8-100.0%)	0.457
Linezolid	mykrobe	48(65)	4(6109)	26.2% (17.0-38.0%)	99.9% (99.8-100.0%)	0.457
Linezolid	tbprofiler	48(65)	5(6109)	26.2% (17.0-38.0%)	99.9% (99.8-100.0%)	0.447
Moxifloxacin	drprg	39(603)	477(5430)	93.5% (91.3-95.2%)	91.2% (90.4-91.9%)	0.673
Moxifloxacin	mykrobe	44(603)	472(5430)	92.7% (90.3-94.5%)	91.3% (90.5-92.0%)	0.669
Moxifloxacin	tbprofiler	42(603)	481(5430)	93.0% (90.7-94.8%)	91.1% (90.4-91.9%)	0.668
Ofloxacin	drprg	25(104)	5(424)	76.0% (66.9-83.2%)	98.8% (97.3-99.5%)	0.813
Ofloxacin	mykrobe	26(104)	5(424)	75.0% (65.9-82.3%)	98.8% (97.3-99.5%)	0.807
Ofloxacin	tbprofiler	26(104)	6(424)	75.0% (65.9-82.3%)	98.6% (96.9-99.3%)	0.8
Pyrazinamide	drprg	57(341)	55(820)	83.3% (79.0-86.9%)	93.3% (91.4-94.8%)	0.767
Pyrazinamide	mykrobe	55(341)	56(820)	83.9% (79.6-87.4%)	93.2% (91.2-94.7%)	0.77
Pyrazinamide	tbprofiler	45(341)	62(820)	86.8% (82.8-90.0%)	92.4% (90.4-94.1%)	0.782
Rifampicin	drprg	112(3221)	168(4585)	96.5% (95.8-97.1%)	96.3% (95.8-96.8%)	0.926
Rifampicin	mykrobe	164(3221)	169(4585)	94.9% (94.1-95.6%)	96.3% (95.7-96.8%)	0.912
Rifampicin	tbprofiler	102(3221)	177(4585)	96.8% (96.2-97.4%)	96.1% (95.5-96.7%)	0.927
Streptomycin	drprg	259(1041)	135(1205)	75.1% (72.4-77.7%)	88.8% (86.9-90.5%)	0.648
Streptomycin	mykrobe	282(1041)	135(1205)	72.9% (70.1-75.5%)	88.8% (86.9-90.5%)	0.629
Streptomycin	tbprofiler	257(1041)	136(1205)	75.3% (72.6-77.8%)	88.7% (86.8-90.4%)	0.649

PZA still isn't great, but there are just so many different mutations with minor alleles that we don't have in the graph and hand-picking them all could lead to a complicated graph. Although I can try adding them if we really want to try boosting PZA sensitivity...

iqbal-lab · 2023-01-12T22:43:07Z

I think those results are much improved, am wondering what the pitch is for drprg though. Illumina is better than Mykrobe and ~same as tbprofiler. Are the nanopore results really unchanged from before ? Leandros mapping fixes will help too

mbhall88 · 2023-01-12T23:11:27Z

am wondering what the pitch is for drprg though

Yeah, this has been troubling me too...I mean we can notice gene deletions...We use a lot less resources....

Are the nanopore results really unchanged from before ?

Here are the current nanopore results

Drug	Tool	FN(R)	FP(S)	Sensitivity (95% CI)	Specificity (95% CI)	MCC
Amikacin	drprg	0(11)	3(78)	100.0% (74.1-100.0%)	96.2% (89.3-98.7%)	0.869
Amikacin	mykrobe	0(11)	3(78)	100.0% (74.1-100.0%)	96.2% (89.3-98.7%)	0.869
Amikacin	tbprofiler	0(11)	3(78)	100.0% (74.1-100.0%)	96.2% (89.3-98.7%)	0.869
Capreomycin	drprg	1(1)	1(51)	0.0% (0.0-79.3%)	98.0% (89.7-99.7%)	-0.02
Capreomycin	mykrobe	1(1)	1(51)	0.0% (0.0-79.3%)	98.0% (89.7-99.7%)	-0.02
Capreomycin	tbprofiler	1(1)	1(51)	0.0% (0.0-79.3%)	98.0% (89.7-99.7%)	-0.02
Ethambutol	drprg	4(14)	15(77)	71.4% (45.4-88.3%)	80.5% (70.3-87.8%)	0.42
Ethambutol	mykrobe	4(14)	15(77)	71.4% (45.4-88.3%)	80.5% (70.3-87.8%)	0.42
Ethambutol	tbprofiler	5(14)	15(77)	64.3% (38.8-83.7%)	80.5% (70.3-87.8%)	0.367
Ethionamide	drprg	0(4)	1(9)	100.0% (51.0-100.0%)	88.9% (56.5-98.0%)	0.843
Ethionamide	mykrobe	0(4)	1(9)	100.0% (51.0-100.0%)	88.9% (56.5-98.0%)	0.843
Ethionamide	tbprofiler	0(4)	1(9)	100.0% (51.0-100.0%)	88.9% (56.5-98.0%)	0.843
Isoniazid	drprg	9(51)	5(48)	82.4% (69.7-90.4%)	89.6% (77.8-95.5%)	0.72
Isoniazid	mykrobe	9(51)	4(48)	82.4% (69.7-90.4%)	91.7% (80.4-96.7%)	0.742
Isoniazid	tbprofiler	9(51)	3(48)	82.4% (69.7-90.4%)	93.8% (83.2-97.9%)	0.764
Kanamycin	drprg	0(0)	1(52)	-	98.1% (89.9-99.7%)	-
Kanamycin	mykrobe	0(0)	1(52)	-	98.1% (89.9-99.7%)	-
Kanamycin	tbprofiler	0(0)	1(52)	-	98.1% (89.9-99.7%)	-
Moxifloxacin	drprg	0(0)	1(1)	-	0.0% (0.0-79.3%)	-
Moxifloxacin	mykrobe	0(0)	1(1)	-	0.0% (0.0-79.3%)	-
Moxifloxacin	tbprofiler	0(0)	1(1)	-	0.0% (0.0-79.3%)	-
Ofloxacin	drprg	0(10)	4(77)	100.0% (72.2-100.0%)	94.8% (87.4-98.0%)	0.823
Ofloxacin	mykrobe	0(10)	4(77)	100.0% (72.2-100.0%)	94.8% (87.4-98.0%)	0.823
Ofloxacin	tbprofiler	0(10)	3(77)	100.0% (72.2-100.0%)	96.1% (89.2-98.7%)	0.86
Pyrazinamide	drprg	0(0)	0(1)	-	100.0% (20.7-100.0%)	-
Pyrazinamide	mykrobe	0(0)	0(1)	-	100.0% (20.7-100.0%)	-
Pyrazinamide	tbprofiler	0(0)	0(1)	-	100.0% (20.7-100.0%)	-
Rifampicin	drprg	5(48)	1(44)	89.6% (77.8-95.5%)	97.7% (88.2-99.6%)	0.873
Rifampicin	mykrobe	5(48)	1(44)	89.6% (77.8-95.5%)	97.7% (88.2-99.6%)	0.873
Rifampicin	tbprofiler	5(48)	1(44)	89.6% (77.8-95.5%)	97.7% (88.2-99.6%)	0.873
Streptomycin	drprg	2(8)	14(83)	75.0% (40.9-92.9%)	83.1% (73.7-89.7%)	0.398
Streptomycin	mykrobe	2(8)	27(83)	75.0% (40.9-92.9%)	67.5% (56.8-76.6%)	0.25
Streptomycin	tbprofiler	2(8)	12(83)	75.0% (40.9-92.9%)	85.5% (76.4-91.5%)	0.43

Sample sizes are so small it makes it hard to get a clear picture for a lot of drugs.

mbhall88 · 2023-02-16T01:47:27Z

Here are the Illumina results on the full dataset (45,193 samples)

Drug	Tool	FN(R)	FP(S)	Sensitivity (95% CI)	Specificity (95% CI)	MCC
Amikacin	drprg	270(1864)	225(18732)	85.5% (83.8-87.0%)	98.8% (98.6-98.9%)	0.852
Amikacin	mykrobe	358(1864)	195(18732)	80.8% (78.9-82.5%)	99.0% (98.8-99.1%)	0.831
Amikacin	tbprofiler	269(1864)	227(18732)	85.6% (83.9-87.1%)	98.8% (98.6-98.9%)	0.852
Capreomycin	drprg	293(1298)	300(13034)	77.4% (75.1-79.6%)	97.7% (97.4-97.9%)	0.749
Capreomycin	mykrobe	367(1298)	265(13034)	71.7% (69.2-74.1%)	98.0% (97.7-98.2%)	0.723
Capreomycin	tbprofiler	292(1298)	305(13034)	77.5% (75.2-79.7%)	97.7% (97.4-97.9%)	0.748
Delamanid	drprg	111(116)	4(8151)	4.3% (1.9-9.7%)	100.0% (99.9-100.0%)	0.152
Delamanid	mykrobe	111(116)	2(8151)	4.3% (1.9-9.7%)	100.0% (99.9-100.0%)	0.173
Delamanid	tbprofiler	111(116)	2(8151)	4.3% (1.9-9.7%)	100.0% (99.9-100.0%)	0.173
Ethambutol	drprg	484(5706)	2287(26863)	91.5% (90.8-92.2%)	91.5% (91.1-91.8%)	0.749
Ethambutol	mykrobe	499(5706)	2265(26863)	91.3% (90.5-92.0%)	91.6% (91.2-91.9%)	0.749
Ethambutol	tbprofiler	471(5706)	2290(26863)	91.7% (91.0-92.4%)	91.5% (91.1-91.8%)	0.751
Ethionamide	drprg	672(2853)	992(11016)	76.4% (74.9-78.0%)	91.0% (90.4-91.5%)	0.649
Ethionamide	mykrobe	772(2853)	960(11016)	72.9% (71.3-74.5%)	91.3% (90.7-91.8%)	0.627
Ethionamide	tbprofiler	787(2853)	964(11016)	72.4% (70.7-74.0%)	91.2% (90.7-91.8%)	0.623
Isoniazid	drprg	1016(14531)	593(25764)	93.0% (92.6-93.4%)	97.7% (97.5-97.9%)	0.913
Isoniazid	mykrobe	1054(14531)	560(25764)	92.7% (92.3-93.2%)	97.8% (97.6-98.0%)	0.913
Isoniazid	tbprofiler	987(14531)	648(25764)	93.2% (92.8-93.6%)	97.5% (97.3-97.7%)	0.912
Kanamycin	drprg	359(2205)	316(17934)	83.7% (82.1-85.2%)	98.2% (98.0-98.4%)	0.827
Kanamycin	mykrobe	437(2205)	300(17934)	80.2% (78.5-81.8%)	98.3% (98.1-98.5%)	0.808
Kanamycin	tbprofiler	349(2205)	322(17934)	84.2% (82.6-85.6%)	98.2% (98.0-98.4%)	0.828
Levofloxacin	drprg	272(3102)	355(14867)	91.2% (90.2-92.2%)	97.6% (97.4-97.8%)	0.879
Levofloxacin	mykrobe	299(3102)	330(14867)	90.4% (89.3-91.4%)	97.8% (97.5-98.0%)	0.878
Levofloxacin	tbprofiler	276(3102)	356(14867)	91.1% (90.0-92.1%)	97.6% (97.3-97.8%)	0.878
Linezolid	drprg	104(152)	30(10911)	31.6% (24.7-39.3%)	99.7% (99.6-99.8%)	0.436
Linezolid	mykrobe	105(152)	29(10911)	30.9% (24.1-38.7%)	99.7% (99.6-99.8%)	0.432
Linezolid	tbprofiler	104(152)	31(10911)	31.6% (24.7-39.3%)	99.7% (99.6-99.8%)	0.433
Moxifloxacin	drprg	178(2255)	1133(14696)	92.1% (90.9-93.1%)	92.3% (91.8-92.7%)	0.732
Moxifloxacin	mykrobe	207(2255)	1113(14696)	90.8% (89.6-91.9%)	92.4% (92.0-92.8%)	0.726
Moxifloxacin	tbprofiler	182(2255)	1141(14696)	91.9% (90.7-93.0%)	92.2% (91.8-92.7%)	0.729
Ofloxacin	drprg	166(778)	68(6007)	78.7% (75.6-81.4%)	98.9% (98.6-99.1%)	0.823
Ofloxacin	mykrobe	147(778)	62(6007)	81.1% (78.2-83.7%)	99.0% (98.7-99.2%)	0.842
Ofloxacin	tbprofiler	138(778)	65(6007)	82.3% (79.4-84.8%)	98.9% (98.6-99.2%)	0.848
Pyrazinamide	drprg	786(3682)	500(17748)	78.7% (77.3-79.9%)	97.2% (96.9-97.4%)	0.783
Pyrazinamide	mykrobe	776(3682)	444(17748)	78.9% (77.6-80.2%)	97.5% (97.3-97.7%)	0.794
Pyrazinamide	tbprofiler	715(3682)	502(17748)	80.6% (79.3-81.8%)	97.2% (96.9-97.4%)	0.796
Rifampicin	drprg	576(11766)	593(28292)	95.1% (94.7-95.5%)	97.9% (97.7-98.1%)	0.93
Rifampicin	mykrobe	523(11766)	604(28292)	95.6% (95.2-95.9%)	97.9% (97.7-98.0%)	0.932
Rifampicin	tbprofiler	370(11766)	788(28292)	96.9% (96.5-97.2%)	97.2% (97.0-97.4%)	0.931
Streptomycin	drprg	784(5362)	760(10179)	85.4% (84.4-86.3%)	92.5% (92.0-93.0%)	0.78
Streptomycin	mykrobe	903(5362)	677(10179)	83.2% (82.1-84.1%)	93.3% (92.8-93.8%)	0.773
Streptomycin	tbprofiler	778(5362)	662(10179)	85.5% (84.5-86.4%)	93.5% (93.0-94.0%)	0.794

I am currently working through the INH FNs and have learned a lot and fixed some bugs. Most important result to understand here though will be the RIF sensitivity which is significantly lower than tb-profiler

mbhall88 · 2023-02-21T06:40:54Z

I think I might have gotten to the bottom of the RIF sensitivity issue (also impacts a decent amount of INH FNs).

tl;dr we need a smaller minimum cluster size for (some) Illumina reads in pandora.

Cluster size dictates whether we recognise a read as "hitting" a locus. The default is 10. But I was finding a lot of FNs where we just have these big random stretches of zero depth - generally in and around the RRDR. When I map these reads to H37Rv with minimap2 it was showing that we should definitely have depth over the RRDR and it's surrounding regions. Turns out most of them are unmapped in the pandora SAM file. In the end, most of these reads were getting ~4-6 hits, therefore they were being marked as unmapped because they're below the default of 10. I have also noticed a lot of the samples with this issue are Illumina HiSeq 2000 75bp reads. This relates back to mbhall88/drprg#12 (comment).

I've run on a few samples with the minimum cluster size set to 4 and it seems to have resolved the issue for those samples. So I'm going to rerun all samples and reasssess the results after than 🤞

iqbal-lab · 2023-02-21T07:31:50Z

Also relates to long reads that overlap a prg only at the end .

mbhall88 · 2023-02-22T23:32:58Z

Changing the minimum cluster size to 4 we get the following diff for Illumina

Tool	Drug	ΔFN	ΔFP
drprg	Amikacin	0	0
drprg	Capreomycin	-1	1
drprg	Delamanid	0	-1
drprg	Ethambutol	-5	-8
drprg	Ethionamide	-1	6
drprg	Isoniazid	-31	-6
drprg	Kanamycin	-5	4
drprg	Levofloxacin	-4	0
drprg	Linezolid	0	0
drprg	Moxifloxacin	0	2
drprg	Ofloxacin	-28	1
drprg	Pyrazinamide	-9	-29
drprg	Rifampicin	-137	54
drprg	Streptomycin	-1	-80

This is great, and the only real concern is 57 extra RIF FPs. I'll take a look at those and see if I can figure out if they're fixable or not.

The overall results now are

Drug	Tool	FN(R)	FP(S)	Sensitivity (95% CI)	Specificity (95% CI)	MCC
Amikacin	drprg	270(1864)	225(18732)	85.5% (83.8-87.0%)	98.8% (98.6-98.9%)	0.852
Amikacin	mykrobe	358(1864)	195(18732)	80.8% (78.9-82.5%)	99.0% (98.8-99.1%)	0.831
Amikacin	tbprofiler	269(1864)	227(18732)	85.6% (83.9-87.1%)	98.8% (98.6-98.9%)	0.852
Capreomycin	drprg	292(1298)	301(13034)	77.5% (75.2-79.7%)	97.7% (97.4-97.9%)	0.75
Capreomycin	mykrobe	367(1298)	265(13034)	71.7% (69.2-74.1%)	98.0% (97.7-98.2%)	0.723
Capreomycin	tbprofiler	292(1298)	305(13034)	77.5% (75.2-79.7%)	97.7% (97.4-97.9%)	0.748
Delamanid	drprg	111(116)	3(8151)	4.3% (1.9-9.7%)	100.0% (99.9-100.0%)	0.162
Delamanid	mykrobe	111(116)	2(8151)	4.3% (1.9-9.7%)	100.0% (99.9-100.0%)	0.173
Delamanid	tbprofiler	111(116)	2(8151)	4.3% (1.9-9.7%)	100.0% (99.9-100.0%)	0.173
Ethambutol	drprg	479(5706)	2279(26863)	91.6% (90.9-92.3%)	91.5% (91.2-91.8%)	0.75
Ethambutol	mykrobe	499(5706)	2265(26863)	91.3% (90.5-92.0%)	91.6% (91.2-91.9%)	0.749
Ethambutol	tbprofiler	471(5706)	2290(26863)	91.7% (91.0-92.4%)	91.5% (91.1-91.8%)	0.751
Ethionamide	drprg	671(2853)	998(11016)	76.5% (74.9-78.0%)	90.9% (90.4-91.5%)	0.648
Ethionamide	mykrobe	772(2853)	960(11016)	72.9% (71.3-74.5%)	91.3% (90.7-91.8%)	0.627
Ethionamide	tbprofiler	787(2853)	964(11016)	72.4% (70.7-74.0%)	91.2% (90.7-91.8%)	0.623
Isoniazid	drprg	985(14531)	587(25764)	93.2% (92.8-93.6%)	97.7% (97.5-97.9%)	0.915
Isoniazid	mykrobe	1054(14531)	560(25764)	92.7% (92.3-93.2%)	97.8% (97.6-98.0%)	0.913
Isoniazid	tbprofiler	987(14531)	648(25764)	93.2% (92.8-93.6%)	97.5% (97.3-97.7%)	0.912
Kanamycin	drprg	354(2205)	320(17934)	83.9% (82.4-85.4%)	98.2% (98.0-98.4%)	0.827
Kanamycin	mykrobe	437(2205)	300(17934)	80.2% (78.5-81.8%)	98.3% (98.1-98.5%)	0.808
Kanamycin	tbprofiler	349(2205)	322(17934)	84.2% (82.6-85.6%)	98.2% (98.0-98.4%)	0.828
Levofloxacin	drprg	268(3102)	355(14867)	91.4% (90.3-92.3%)	97.6% (97.4-97.8%)	0.88
Levofloxacin	mykrobe	299(3102)	330(14867)	90.4% (89.3-91.4%)	97.8% (97.5-98.0%)	0.878
Levofloxacin	tbprofiler	276(3102)	356(14867)	91.1% (90.0-92.1%)	97.6% (97.3-97.8%)	0.878
Linezolid	drprg	104(152)	30(10911)	31.6% (24.7-39.3%)	99.7% (99.6-99.8%)	0.436
Linezolid	mykrobe	105(152)	29(10911)	30.9% (24.1-38.7%)	99.7% (99.6-99.8%)	0.432
Linezolid	tbprofiler	104(152)	31(10911)	31.6% (24.7-39.3%)	99.7% (99.6-99.8%)	0.433
Moxifloxacin	drprg	178(2255)	1135(14696)	92.1% (90.9-93.1%)	92.3% (91.8-92.7%)	0.731
Moxifloxacin	mykrobe	207(2255)	1113(14696)	90.8% (89.6-91.9%)	92.4% (92.0-92.8%)	0.726
Moxifloxacin	tbprofiler	182(2255)	1141(14696)	91.9% (90.7-93.0%)	92.2% (91.8-92.7%)	0.729
Ofloxacin	drprg	138(778)	69(6007)	82.3% (79.4-84.8%)	98.9% (98.5-99.1%)	0.845
Ofloxacin	mykrobe	147(778)	62(6007)	81.1% (78.2-83.7%)	99.0% (98.7-99.2%)	0.842
Ofloxacin	tbprofiler	138(778)	65(6007)	82.3% (79.4-84.8%)	98.9% (98.6-99.2%)	0.848
Pyrazinamide	drprg	777(3682)	471(17748)	78.9% (77.5-80.2%)	97.3% (97.1-97.6%)	0.789
Pyrazinamide	mykrobe	776(3682)	444(17748)	78.9% (77.6-80.2%)	97.5% (97.3-97.7%)	0.794
Pyrazinamide	tbprofiler	715(3682)	502(17748)	80.6% (79.3-81.8%)	97.2% (96.9-97.4%)	0.796
Rifampicin	drprg	439(11766)	647(28292)	96.3% (95.9-96.6%)	97.7% (97.5-97.9%)	0.935
Rifampicin	mykrobe	523(11766)	604(28292)	95.6% (95.2-95.9%)	97.9% (97.7-98.0%)	0.932
Rifampicin	tbprofiler	370(11766)	788(28292)	96.9% (96.5-97.2%)	97.2% (97.0-97.4%)	0.931
Streptomycin	drprg	783(5362)	680(10179)	85.4% (84.4-86.3%)	93.3% (92.8-93.8%)	0.791
Streptomycin	mykrobe	903(5362)	677(10179)	83.2% (82.1-84.1%)	93.3% (92.8-93.8%)	0.773
Streptomycin	tbprofiler	778(5362)	662(10179)	85.5% (84.5-86.4%)	93.5% (93.0-94.0%)	0.794

Here is a table of the drug, tool combinations where the CIs don't overlap

Metric	Drug	Tool1	Tool1 CI	Tool2	Tool2 CI
sensitivity	Capreomycin	drprg	(75.2, 79.7)	mykrobe	(69.2, 74.1)
sensitivity	Capreomycin	mykrobe	(69.2, 74.1)	tbprofiler	(75.2, 79.7)
sensitivity	Amikacin	drprg	(83.8, 87.0)	mykrobe	(78.9, 82.5)
sensitivity	Amikacin	mykrobe	(78.9, 82.5)	tbprofiler	(83.9, 87.1)
sensitivity	Ethionamide	drprg	(74.9, 78.0)	mykrobe	(71.3, 74.5)
sensitivity	Ethionamide	drprg	(74.9, 78.0)	tbprofiler	(70.7, 74.0)
sensitivity	Streptomycin	drprg	(84.4, 86.3)	mykrobe	(82.1, 84.1)
sensitivity	Streptomycin	mykrobe	(82.1, 84.1)	tbprofiler	(84.5, 86.4)
sensitivity	Kanamycin	drprg	(82.4, 85.4)	mykrobe	(78.5, 81.8)
sensitivity	Kanamycin	mykrobe	(78.5, 81.8)	tbprofiler	(82.6, 85.6)
specificity	Rifampicin	drprg	(97.5, 97.9)	tbprofiler	(97.0, 97.4)
sensitivity	Rifampicin	mykrobe	(95.2, 95.9)	tbprofiler	(96.5, 97.2)
specificity	Rifampicin	mykrobe	(97.7, 98.0)	tbprofiler	(97.0, 97.4)

I think we're very close to done. I just want to do a last check of discrepancies and see if I can salvage some more FNs and FPs.

iqbal-lab · 2023-02-23T05:06:07Z

Looking good! What does that change if cluster size do to nanopore though

mbhall88 · 2023-02-23T06:01:55Z

What does that change if cluster size do to nanopore though

No change in results for nanopore

iqbal-lab · 2023-02-23T06:30:23Z

Woah

mbhall88 · 2023-02-28T06:20:47Z

I rerun the pipeline again after fixing a couple of bugs and adding/removing some more mutations from the graph.

Here is the diff between this latest run and the run above

Tool	Drug	ΔFN	ΔFP
drprg	Amikacin	0	0
drprg	Capreomycin	0	0
drprg	Delamanid	0	0
drprg	Ethambutol	-2	6
drprg	Ethionamide	-1	0
drprg	Isoniazid	0	-1
drprg	Kanamycin	0	0
drprg	Levofloxacin	0	0
drprg	Linezolid	0	0
drprg	Moxifloxacin	0	0
drprg	Ofloxacin	0	0
drprg	Pyrazinamide	-16	13
drprg	Rifampicin	-13	-27
drprg	Streptomycin	0	0

And the overall results

Drug	Tool	FN(R)	FP(S)	Sensitivity (95% CI)	Specificity (95% CI)	MCC
Amikacin	drprg	270(1864)	225(18732)	85.5% (83.8-87.0%)	98.8% (98.6-98.9%)	0.852
Amikacin	mykrobe	358(1864)	195(18732)	80.8% (78.9-82.5%)	99.0% (98.8-99.1%)	0.831
Amikacin	tbprofiler	269(1864)	227(18732)	85.6% (83.9-87.1%)	98.8% (98.6-98.9%)	0.852
Capreomycin	drprg	292(1298)	301(13034)	77.5% (75.2-79.7%)	97.7% (97.4-97.9%)	0.75
Capreomycin	mykrobe	367(1298)	265(13034)	71.7% (69.2-74.1%)	98.0% (97.7-98.2%)	0.723
Capreomycin	tbprofiler	292(1298)	305(13034)	77.5% (75.2-79.7%)	97.7% (97.4-97.9%)	0.748
Delamanid	drprg	111(116)	3(8151)	4.3% (1.9-9.7%)	100.0% (99.9-100.0%)	0.162
Delamanid	mykrobe	111(116)	2(8151)	4.3% (1.9-9.7%)	100.0% (99.9-100.0%)	0.173
Delamanid	tbprofiler	111(116)	2(8151)	4.3% (1.9-9.7%)	100.0% (99.9-100.0%)	0.173
Ethambutol	drprg	477(5706)	2285(26863)	91.6% (90.9-92.3%)	91.5% (91.2-91.8%)	0.75
Ethambutol	mykrobe	499(5706)	2265(26863)	91.3% (90.5-92.0%)	91.6% (91.2-91.9%)	0.749
Ethambutol	tbprofiler	471(5706)	2290(26863)	91.7% (91.0-92.4%)	91.5% (91.1-91.8%)	0.751
Ethionamide	drprg	670(2853)	998(11016)	76.5% (74.9-78.0%)	90.9% (90.4-91.5%)	0.649
Ethionamide	mykrobe	772(2853)	960(11016)	72.9% (71.3-74.5%)	91.3% (90.7-91.8%)	0.627
Ethionamide	tbprofiler	787(2853)	964(11016)	72.4% (70.7-74.0%)	91.2% (90.7-91.8%)	0.623
Isoniazid	drprg	985(14531)	586(25764)	93.2% (92.8-93.6%)	97.7% (97.5-97.9%)	0.915
Isoniazid	mykrobe	1054(14531)	560(25764)	92.7% (92.3-93.2%)	97.8% (97.6-98.0%)	0.913
Isoniazid	tbprofiler	987(14531)	648(25764)	93.2% (92.8-93.6%)	97.5% (97.3-97.7%)	0.912
Kanamycin	drprg	354(2205)	320(17934)	83.9% (82.4-85.4%)	98.2% (98.0-98.4%)	0.827
Kanamycin	mykrobe	437(2205)	300(17934)	80.2% (78.5-81.8%)	98.3% (98.1-98.5%)	0.808
Kanamycin	tbprofiler	349(2205)	322(17934)	84.2% (82.6-85.6%)	98.2% (98.0-98.4%)	0.828
Levofloxacin	drprg	268(3102)	355(14867)	91.4% (90.3-92.3%)	97.6% (97.4-97.8%)	0.88
Levofloxacin	mykrobe	299(3102)	330(14867)	90.4% (89.3-91.4%)	97.8% (97.5-98.0%)	0.878
Levofloxacin	tbprofiler	276(3102)	356(14867)	91.1% (90.0-92.1%)	97.6% (97.3-97.8%)	0.878
Linezolid	drprg	104(152)	30(10911)	31.6% (24.7-39.3%)	99.7% (99.6-99.8%)	0.436
Linezolid	mykrobe	105(152)	29(10911)	30.9% (24.1-38.7%)	99.7% (99.6-99.8%)	0.432
Linezolid	tbprofiler	104(152)	31(10911)	31.6% (24.7-39.3%)	99.7% (99.6-99.8%)	0.433
Moxifloxacin	drprg	178(2255)	1135(14696)	92.1% (90.9-93.1%)	92.3% (91.8-92.7%)	0.731
Moxifloxacin	mykrobe	207(2255)	1113(14696)	90.8% (89.6-91.9%)	92.4% (92.0-92.8%)	0.726
Moxifloxacin	tbprofiler	182(2255)	1141(14696)	91.9% (90.7-93.0%)	92.2% (91.8-92.7%)	0.729
Ofloxacin	drprg	138(778)	69(6007)	82.3% (79.4-84.8%)	98.9% (98.5-99.1%)	0.845
Ofloxacin	mykrobe	147(778)	62(6007)	81.1% (78.2-83.7%)	99.0% (98.7-99.2%)	0.842
Ofloxacin	tbprofiler	138(778)	65(6007)	82.3% (79.4-84.8%)	98.9% (98.6-99.2%)	0.848
Pyrazinamide	drprg	761(3682)	484(17748)	79.3% (78.0-80.6%)	97.3% (97.0-97.5%)	0.79
Pyrazinamide	mykrobe	776(3682)	444(17748)	78.9% (77.6-80.2%)	97.5% (97.3-97.7%)	0.794
Pyrazinamide	tbprofiler	715(3682)	502(17748)	80.6% (79.3-81.8%)	97.2% (96.9-97.4%)	0.796
Rifampicin	drprg	426(11766)	620(28292)	96.4% (96.0-96.7%)	97.8% (97.6-98.0%)	0.937
Rifampicin	mykrobe	523(11766)	604(28292)	95.6% (95.2-95.9%)	97.9% (97.7-98.0%)	0.932
Rifampicin	tbprofiler	370(11766)	788(28292)	96.9% (96.5-97.2%)	97.2% (97.0-97.4%)	0.931
Streptomycin	drprg	783(5362)	680(10179)	85.4% (84.4-86.3%)	93.3% (92.8-93.8%)	0.791
Streptomycin	mykrobe	903(5362)	677(10179)	83.2% (82.1-84.1%)	93.3% (92.8-93.8%)	0.773
Streptomycin	tbprofiler	778(5362)	662(10179)	85.5% (84.5-86.4%)	93.5% (93.0-94.0%)	0.794

And the table of the drug, tool combinations where the CIs don't overlap

Metric	Drug	Tool1	Tool1 CI	Tool2	Tool2 CI
sensitivity	Amikacin	tbprofiler	(83.9, 87.1)	mykrobe	(78.9, 82.5)
sensitivity	Amikacin	mykrobe	(78.9, 82.5)	drprg	(83.8, 87.0)
sensitivity	Streptomycin	tbprofiler	(84.5, 86.4)	mykrobe	(82.1, 84.1)
sensitivity	Streptomycin	mykrobe	(82.1, 84.1)	drprg	(84.4, 86.3)
sensitivity	Kanamycin	tbprofiler	(82.6, 85.6)	mykrobe	(78.5, 81.8)
sensitivity	Kanamycin	mykrobe	(78.5, 81.8)	drprg	(82.4, 85.4)
sensitivity	Ethionamide	tbprofiler	(70.7, 74.0)	drprg	(74.9, 78.0)
sensitivity	Ethionamide	mykrobe	(71.3, 74.5)	drprg	(74.9, 78.0)
sensitivity	Capreomycin	tbprofiler	(75.2, 79.7)	mykrobe	(69.2, 74.1)
sensitivity	Capreomycin	mykrobe	(69.2, 74.1)	drprg	(75.2, 79.7)
sensitivity	Rifampicin	tbprofiler	(96.5, 97.2)	mykrobe	(95.2, 95.9)
specificity	Rifampicin	tbprofiler	(97.0, 97.4)	mykrobe	(97.7, 98.0)
specificity	Rifampicin	tbprofiler	(97.0, 97.4)	drprg	(97.6, 98.0)
sensitivity	Rifampicin	mykrobe	(95.2, 95.9)	drprg	(96.0, 96.7)

I've been through and made a couple more changes to the pipeline/mutations added and have rerun the pipeline. Fingers crossed this might be the last run. Although there is iqbal-lab-org/make_prg#55 which would also improve results if we can get a fix inplace.

mbhall88 · 2023-03-09T04:30:13Z

I've been reworking the sensitivity/specificity plots a little as I don't love them in their current form. Now that the CIs are so small it makes it hard to see some. In particular this is because we have both Sn and Sp in the same plot and their scales probably aren't matched well.

As such, I have split them into separate plots and used a white background for easier determination of colours. I've also added a red, dashed line for the WHO target product profiles for both sensitivity and specificity (see here).

Sensitivity

Specificity

Feedback is very welcome

iqbal-lab · 2023-03-09T23:13:16Z

These are great 👍

iqbal-lab · 2023-03-10T08:43:13Z

That said, the y axis on the sensitivity plot is weird, right?

iqbal-lab · 2023-03-10T08:45:08Z

so, mykrobe is ~always the most specific but at a loss in sensitivity compared to the other two tools. Drprg retains almost as high specificity, but improves recall to the extent that is has the best recall of all tools (except only for Rif)

mbhall88 · 2023-03-10T23:38:49Z

That said, the y axis on the sensitivity plot is weird, right?

How so? It's a logit (logistic regression) scale. This scale is similar to a log scale close to zero and to one, and almost linear around 0.5. Seemed the best fit given we have stuff near 100% that we want to zoom in on, but we also have stuff well below. Without it the CI's are basically invisible for a lot of the drugs

so, mykrobe is ~always the most specific but at a loss in sensitivity compared to the other two tools. Drprg retains almost as high specificity, but improves recall to the extent that is has the best recall of all tools (except only for Rif)

Fair summary for the most part. Although mykrobe's specificity is never significantly better than drprg or tbprofiler

mbhall88 mentioned this issue Oct 28, 2022

Use racon for de novo variant discovery iqbal-lab-org/pandora#299

Merged

8 tasks

mbhall88 mentioned this issue Nov 27, 2022

Parse pandora VCF to detect minor alleles mbhall88/drprg#19

Closed

mbhall88 mentioned this issue Dec 21, 2022

Add some common resistance-conferring mutations that do no exist in population graph mbhall88/drprg#23

Closed

mbhall88 mentioned this issue Jan 6, 2023

Notice partial gene deletion that spans start codon mbhall88/drprg#24

Closed

leoisl mentioned this issue Feb 23, 2023

Improving cluster size threshold choice iqbal-lab-org/pandora#328

Open

mbhall88 transferred this issue from mbhall88/drprg Mar 13, 2023

mbhall88 closed this as completed Mar 29, 2023

Rolling results #2

Rolling results #2

Comments

mbhall88 commented Sep 27, 2022 • edited Loading

Nanopore

Illumina

mbhall88 commented Oct 4, 2022

iqbal-lab commented Oct 4, 2022 • edited Loading

mbhall88 commented Oct 4, 2022

iqbal-lab commented Oct 4, 2022

mbhall88 commented Oct 6, 2022

mbhall88 commented Oct 6, 2022 • edited Loading

iqbal-lab commented Oct 6, 2022

mbhall88 commented Oct 10, 2022

Nanopore

Illumina

lachlancoin commented Oct 10, 2022 via email

mbhall88 commented Oct 10, 2022

iqbal-lab commented Oct 10, 2022

mbhall88 commented Oct 11, 2022

RIF

STM

ETO

PZA

INH

mbhall88 commented Oct 13, 2022 • edited Loading

iqbal-lab commented Oct 13, 2022 • edited Loading

mbhall88 commented Oct 19, 2022 • edited Loading

iqbal-lab commented Oct 19, 2022

mbhall88 commented Oct 28, 2022

mbhall88 commented Nov 24, 2022

Illumina

Nanopore

lachlancoin commented Nov 24, 2022 via email

iqbal-lab commented Nov 24, 2022

mbhall88 commented Nov 24, 2022

iqbal-lab commented Nov 24, 2022

mbhall88 commented Nov 25, 2022 • edited Loading

iqbal-lab commented Nov 25, 2022

leoisl commented Nov 25, 2022

iqbal-lab commented Nov 25, 2022 • edited Loading

mbhall88 commented Nov 27, 2022

iqbal-lab commented Nov 27, 2022

iqbal-lab commented Dec 15, 2022 • edited Loading

iqbal-lab commented Dec 15, 2022

mbhall88 commented Dec 15, 2022 • edited Loading

iqbal-lab commented Dec 15, 2022

iqbal-lab commented Dec 15, 2022

mbhall88 commented Jan 3, 2023 • edited Loading

mbhall88 commented Jan 4, 2023

iqbal-lab commented Jan 6, 2023

mbhall88 commented Jan 8, 2023

mbhall88 commented Jan 12, 2023 • edited Loading

iqbal-lab commented Jan 12, 2023

mbhall88 commented Jan 12, 2023 • edited Loading

mbhall88 commented Feb 16, 2023

mbhall88 commented Feb 21, 2023

iqbal-lab commented Feb 21, 2023

mbhall88 commented Feb 22, 2023

iqbal-lab commented Feb 23, 2023

mbhall88 commented Feb 23, 2023 • edited Loading

iqbal-lab commented Feb 23, 2023

mbhall88 commented Feb 28, 2023 • edited Loading

mbhall88 commented Mar 9, 2023

Sensitivity

Specificity

iqbal-lab commented Mar 9, 2023

iqbal-lab commented Mar 10, 2023

iqbal-lab commented Mar 10, 2023

mbhall88 commented Mar 10, 2023

mbhall88 commented Sep 27, 2022 •

edited

Loading

iqbal-lab commented Oct 4, 2022 •

edited

Loading

mbhall88 commented Oct 6, 2022 •

edited

Loading

mbhall88 commented Oct 13, 2022 •

edited

Loading

iqbal-lab commented Oct 13, 2022 •

edited

Loading

mbhall88 commented Oct 19, 2022 •

edited

Loading

mbhall88 commented Nov 25, 2022 •

edited

Loading

iqbal-lab commented Nov 25, 2022 •

edited

Loading

iqbal-lab commented Dec 15, 2022 •

edited

Loading

mbhall88 commented Dec 15, 2022 •

edited

Loading

mbhall88 commented Jan 3, 2023 •

edited

Loading

mbhall88 commented Jan 12, 2023 •

edited

Loading

mbhall88 commented Jan 12, 2023 •

edited

Loading

mbhall88 commented Feb 23, 2023 •

edited

Loading

mbhall88 commented Feb 28, 2023 •

edited

Loading