-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rolling results #2
Comments
I took a look at the INH FNs today. For those where mykrobe made a TP call (so I can see which mutation we missed) 13/16 were use missing fabG1 C-15T promoter mutation. drprg called this variant, for all of those 13, but they were filtered by the fraction of read support (FRS) filter - which is set to 0.70. Nearly all of those mutations had an FRS of 0.58-0.64. The reason for this is the alleles are quite similar, and I suspect maybe some shared minimizers are wreaking havoc here. An example VCF record showing the alleles and coverage
These dont get collapsed by make PRG because the minimum match lengths between the three variants described by this allele are 4 and 6 (we use min match len of 7). The other interesting thing is that each time, the allele with the next best coverage is allele 1 which differs in two positions from allele 2 (middle and end), so I reckon there's a minimizer that covers the start of this allele before the two alleles differ. Not sure whether the "hacky" way of decreasing the FRS threshold is the best way to go? Or changing some parameters in make prg or pandora... |
Could drop min match length also? |
Interesting. What FRS have you been using in your covid work? |
Well, we've just shifted to 0.6 for nanopore, but now we're distracted fixing bugs before going back to carefully choose FRS thresholds |
Okay, so after changing the minimum match length to 5 and the minimum FRS to 0.60, there are only two FNs that mykrobe calls that we don't. One of those is an indel which fails the FRS filter at 0.59 and the other is a dodgey looking indel call from mykrobe that isn't called by tbprofiler or drprg, so I'm not phased about that. Interestingly, that sample has a synonymous SNP in the first codon. The allele at that fabG1 variant now looks slightly better and has been split in two
Next task is to dig into the poor ofloxacin sensitivity. |
The OFX FNs are a fairly straightforward fix. It turns out that all of the FNs are gyrA D94X. What is happening here is that there is a silent mutation (does not confer resistance) at codon 95 (S95T) that occurs in the same allele as that variant in the VCF/PRG. Long story short, I end up combining these two variants and calling an unknown prediction for OFX with novel variant |
Great! |
Looks good. Not sure what going on with TB profiler with RIF
…On Mon, 10 Oct 2022 at 16:37, Michael Hall ***@***.***> wrote:
Here's the updated plots after the INH and OFX fixes listed above
Nanopore
[image: image]
<https://user-images.githubusercontent.com/20403931/194803916-6afb90ec-f32d-463d-98e3-a6e342a4ec48.png>
Illumina
[image: image]
<https://user-images.githubusercontent.com/20403931/194803927-c55a2ad4-5264-4a70-913e-6da61b54a33e.png>
------------------------------
Some good improvements for nanopore. I'm going to have a look at the drprg
STM, ETO, PZA and RIF sensitivity as mykrobe seems to be better than drprg.
But pretty happy with the specificity of drprg at the moment.
—
Reply to this email directly, view it on GitHub
<#2>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA6TKZGF6CM36LGQH63SLBTWCOTRVANCNFSM6AAAAAAQWLHP7A>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
It's most likely an issue with the custom panel I built. I'll fix that up once I've finished debugging drprg. I'm assuming tb profiler is on par with mykrobe for RIF though |
Looks great. I am actually surprised how good the specificity is for PZA nanopore, given it is dominated by indels. I had thought we had issues with indels |
I've now gone through the remainder of the FNs and nearly all of the FPs for nanopore. RIF2 FNs where drprg didn't discover the variant 4 FPs where all three tools calls rpoB L430X STM1 FN where drprg calls 2 non-synonymous mutations (not in the panel) - these are also called by tb-profiler 14 FPs are called by all three callers. 9 were mutations in gid, 3 were in rrs, and 2 in rpsL. I'm not sure what we want to do here, because it is fairly reasonable this is a phenotyping problem. After a quick search, I found two references to support this [1, 2]. From 1
2 FPs were rpsL K88R which is very strongly associated with STM resistance. These were also called by mykrobe but not tb-profiler. 2 FPs were confident deletions in gid only called by drprg. mykrobe called one of them, but it was filtered due to low expected proportion of expected depth. ETOIn total, there were 9 FNs which were all called by mykrobe, but not tb-profiler or drprg. They're all indels in ethA and de novo variant discovery was not triggered in drprg for any of them. In one of those FNs, there was a promoter mutation as well, which drprg did call, but it was filtered out for low FRS (0.55). The other thing to note here is while mykrobe's sensitivity is much better than drprg and tbprofiler, it's specificity is terrible. PZA1 FN is pncA R154G which is called by mykrobe and tb-profiler. drprg calls the correct allele, but it is filtered out for low FRS (0.54) INH4 FPs were called confidently by all three tools (fabG1 C-15T) I don't think there is much more I can do to improve drprg's nanopore performance here. FRS could possibly be lowered? but it would only save a small number of FN/FPs I'll fix up the tb-profiler RIF sensitivity and then get stuck into the Illumina results (This text file is my notepad while I was investigating these FNs and FPs) |
As our specificity is better or the same compared to mykrobe and tbprofiler for all drugs on Illumina, I only investigated the FNs. tl;dr there are three things we may want to try to improve sensitivity as I suspect once we scale this analysis up to thousands of samples some of these problems will get bigger
Aside from those overarching points, there were also some other FNs which were due to two variants right next to each other. For example, ERR2510154 has
The allele of this variant is TCG>TTT so pandora should be able to thread reads through both of these variants, but doesn't seem to be able to....?? A similar thing happened for a few other FNs. I'm wondering if I should run on the full dataset and then manually add to the reference PRG some of the common variants that cause this problem? I was thinking the "correct" way to do this @iqbal-lab would be to look through the cryptic metadata sheets and add some samples that contain those variants that cause some of these problems? |
Hi there, I'm a bit wiped out so will be brief
In terms of adding more variants to the graph; we can do this, but racon might mean you don't need to |
I've just realised, we should probably put some kind of minimum depth filter on these results too. i.e. samples with less than d depth are excluded from the sensitivity/specificity plots. Does everyone agree? If so, does anyone have a preference for what d should be? I arbitrarily thought of 15x? (This is separate from the depth analysis in #3) Here is the depth distribution for the 400 Illumina test set and the full nanopore Additionally, it might be wise to have a contamination proportion filter? For instance, when I align the reads to the decontamination database, I calculate the fraction of reads that we keep (i.e. MTB), fraction of reads that align to a contaminant, and a fraction of reads unmapped. Again, arbitrarily was thinking exclude samples with more than 5% contamination? Too harsh? This is the fraction of contamination for Illumina and nanopore |
Both seem perfectly reasonable to me |
So I have a working drprg branch adapted to use pandora with the racon denovo method (iqbal-lab-org/pandora#299). I've tested it out on two Illumina runs listed in #2. The first, ERR2510154, was the example VCF above. So, with the old pandora denovo process, at the allele for
with the new denovo process, we get
I guess another reason this might have been fixed is that instead of using The second run, ERR4828599, had both a RIF FN and an INH FN. The RIF FN was an interesting case where the isolate has both L449M and S450F. drprg/pandora previously failed to find a novel variant. With the new pandora denovo process, we found (and called) both of these variants. The INH FN was katG S315N, which is a rarer mutation at that locus - normally S315T. Previously both mykrobe and drprg had no depth in this area and drprg did not find a novel variant. With the new pandora, we do find and call this mutation. I'm going to run a few more of the Illumina FNs, but this is very promising! |
Okay, since we had a last update of results we have switch to using racon for denovo discovery and dropped the old nanopore data. I have also increased the number of illumina samples to 8,587 Illumina
Nanopore
|
Looks pretty good I think
…On Thu, 24 Nov 2022, 1:06 pm Michael Hall, ***@***.***> wrote:
Okay, since we had a last update of results we have switch to using racon
for denovo discovery and dropped the old nanopore data. I have also
increased the number of illumina samples to 8,587
Illumina
Note I am going to change the markers so you can see the error bars now
that they are so small
[image: image]
<https://user-images.githubusercontent.com/20403931/203677683-39591450-c68e-47a8-9b2c-d77f511c5b0e.png>
Drug Tool FN(R) FP(S) Sensitivity (95% CI) Specificity (95% CI) MCC
Amikacin drprg 77(485) 50(6958) 84.1% (80.6-87.1%) 99.3% (99.1-99.5%)
0.857
Amikacin mykrobe 101(485) 46(6958) 79.2% (75.3-82.6%) 99.3% (99.1-99.5%)
0.831
Amikacin tbprofiler 62(485) 59(6958) 87.2% (83.9-89.9%) 99.2% (98.9-99.3%)
0.866
Capreomycin drprg 62(235) 92(2449) 73.6% (67.6-78.8%) 96.2% (95.4-96.9%)
0.662
Capreomycin mykrobe 78(235) 85(2449) 66.8% (60.6-72.5%) 96.5% (95.7-97.2%)
0.625
Capreomycin tbprofiler 54(235) 96(2449) 77.0% (71.2-81.9%) 96.1%
(95.2-96.8%) 0.679
Delamanid drprg 111(116) 1(8152) 4.3% (1.9-9.7%) 100.0% (99.9-100.0%)
0.188
Delamanid mykrobe 111(116) 1(8152) 4.3% (1.9-9.7%) 100.0% (99.9-100.0%)
0.188
Delamanid tbprofiler 111(116) 2(8152) 4.3% (1.9-9.7%) 100.0% (99.9-100.0%)
0.173
Ethambutol drprg 146(1538) 736(4936) 90.5% (88.9-91.9%) 85.1% (84.1-86.1%)
0.685
Ethambutol mykrobe 149(1538) 728(4936) 90.3% (88.7-91.7%) 85.3%
(84.2-86.2%) 0.686
Ethambutol tbprofiler 118(1538) 765(4936) 92.3% (90.9-93.6%) 84.5%
(83.5-85.5%) 0.691
Ethionamide drprg 341(1104) 372(6105) 69.1% (66.3-71.8%) 93.9%
(93.3-94.5%) 0.623
Ethionamide mykrobe 276(1104) 395(6105) 75.0% (72.4-77.5%) 93.5%
(92.9-94.1%) 0.658
Ethionamide tbprofiler 272(1104) 414(6105) 75.4% (72.7-77.8%) 93.2%
(92.6-93.8%) 0.653
Isoniazid drprg 362(3900) 164(4194) 90.7% (89.8-91.6%) 96.1% (95.5-96.6%)
0.871
Isoniazid mykrobe 366(3900) 163(4194) 90.6% (89.7-91.5%) 96.1%
(95.5-96.7%) 0.87
Isoniazid tbprofiler 297(3900) 181(4194) 92.4% (91.5-93.2%) 95.7%
(95.0-96.3%) 0.882
Kanamycin drprg 142(670) 101(6975) 78.8% (75.6-81.7%) 98.6% (98.2-98.8%)
0.796
Kanamycin mykrobe 166(670) 96(6975) 75.2% (71.8-78.3%) 98.6% (98.3-98.9%)
0.776
Kanamycin tbprofiler 122(670) 107(6975) 81.8% (78.7-84.5%) 98.5%
(98.1-98.7%) 0.811
Levofloxacin drprg 105(1040) 97(5454) 89.9% (87.9-91.6%) 98.2%
(97.8-98.5%) 0.884
Levofloxacin mykrobe 108(1040) 97(5454) 89.6% (87.6-91.3%) 98.2%
(97.8-98.5%) 0.882
Levofloxacin tbprofiler 85(1040) 109(5454) 91.8% (90.0-93.3%) 98.0%
(97.6-98.3%) 0.89
Linezolid drprg 49(65) 4(6110) 24.6% (15.8-36.3%) 99.9% (99.8-100.0%)
0.441
Linezolid mykrobe 49(65) 4(6110) 24.6% (15.8-36.3%) 99.9% (99.8-100.0%)
0.441
Linezolid tbprofiler 48(65) 5(6110) 26.2% (17.0-38.0%) 99.9% (99.8-100.0%)
0.447
Moxifloxacin drprg 60(603) 464(5431) 90.0% (87.4-92.2%) 91.5% (90.7-92.2%)
0.656
Moxifloxacin mykrobe 59(603) 460(5431) 90.2% (87.6-92.3%) 91.5%
(90.8-92.2%) 0.658
Moxifloxacin tbprofiler 42(603) 482(5431) 93.0% (90.7-94.8%) 91.1%
(90.3-91.9%) 0.668
Ofloxacin drprg 31(105) 4(424) 70.5% (61.2-78.4%) 99.1% (97.6-99.6%) 0.782
Ofloxacin mykrobe 32(105) 4(424) 69.5% (60.2-77.5%) 99.1% (97.6-99.6%)
0.776
Ofloxacin tbprofiler 26(105) 6(424) 75.2% (66.2-82.5%) 98.6% (96.9-99.3%)
0.802
Pyrazinamide drprg 75(341) 47(822) 78.0% (73.3-82.1%) 94.3% (92.5-95.7%)
0.742
Pyrazinamide mykrobe 73(341) 45(822) 78.6% (73.9-82.6%) 94.5% (92.8-95.9%)
0.751
Pyrazinamide tbprofiler 45(341) 62(822) 86.8% (82.8-90.0%) 92.5%
(90.4-94.1%) 0.782
Rifampicin drprg 142(3222) 166(4586) 95.6% (94.8-96.2%) 96.4% (95.8-96.9%)
0.919
Rifampicin mykrobe 187(3222) 165(4586) 94.2% (93.3-95.0%) 96.4%
(95.8-96.9%) 0.907
Rifampicin tbprofiler 102(3222) 177(4586) 96.8% (96.2-97.4%) 96.1%
(95.5-96.7%) 0.927
Streptomycin drprg 278(1042) 130(1205) 73.3% (70.6-75.9%) 89.2%
(87.3-90.8%) 0.637
Streptomycin mykrobe 295(1042) 132(1205) 71.7% (68.9-74.3%) 89.0%
(87.2-90.7%) 0.621
Streptomycin tbprofiler 257(1042) 136(1205) 75.3% (72.6-77.9%) 88.7%
(86.8-90.4%) 0.649 Nanopore
[image: image]
<https://user-images.githubusercontent.com/20403931/203677879-e90cc0ce-d034-4cfb-a49f-85b72afca86b.png>
Drug Tool FN(R) FP(S) Sensitivity (95% CI) Specificity (95% CI) MCC
Amikacin drprg 0(11) 3(78) 100.0% (74.1-100.0%) 96.2% (89.3-98.7%) 0.869
Amikacin mykrobe 0(11) 3(78) 100.0% (74.1-100.0%) 96.2% (89.3-98.7%) 0.869
Amikacin tbprofiler 0(11) 3(78) 100.0% (74.1-100.0%) 96.2% (89.3-98.7%)
0.869
Capreomycin drprg 1(1) 1(51) 0.0% (0.0-79.3%) 98.0% (89.7-99.7%) -0.02
Capreomycin mykrobe 1(1) 1(51) 0.0% (0.0-79.3%) 98.0% (89.7-99.7%) -0.02
Capreomycin tbprofiler 1(1) 1(51) 0.0% (0.0-79.3%) 98.0% (89.7-99.7%)
-0.02
Ethambutol drprg 4(14) 15(77) 71.4% (45.4-88.3%) 80.5% (70.3-87.8%) 0.42
Ethambutol mykrobe 4(14) 15(77) 71.4% (45.4-88.3%) 80.5% (70.3-87.8%) 0.42
Ethambutol tbprofiler 5(14) 15(77) 64.3% (38.8-83.7%) 80.5% (70.3-87.8%)
0.367
Ethionamide drprg 0(4) 1(9) 100.0% (51.0-100.0%) 88.9% (56.5-98.0%) 0.843
Ethionamide mykrobe 0(4) 1(9) 100.0% (51.0-100.0%) 88.9% (56.5-98.0%)
0.843
Ethionamide tbprofiler 0(4) 1(9) 100.0% (51.0-100.0%) 88.9% (56.5-98.0%)
0.843
Isoniazid drprg 9(51) 4(48) 82.4% (69.7-90.4%) 91.7% (80.4-96.7%) 0.742
Isoniazid mykrobe 9(51) 4(48) 82.4% (69.7-90.4%) 91.7% (80.4-96.7%) 0.742
Isoniazid tbprofiler 9(51) 3(48) 82.4% (69.7-90.4%) 93.8% (83.2-97.9%)
0.764
Kanamycin drprg 0(0) 1(52) - 98.1% (89.9-99.7%) -
Kanamycin mykrobe 0(0) 1(52) - 98.1% (89.9-99.7%) -
Kanamycin tbprofiler 0(0) 1(52) - 98.1% (89.9-99.7%) -
Moxifloxacin drprg 0(0) 1(1) - 0.0% (0.0-79.3%) -
Moxifloxacin mykrobe 0(0) 1(1) - 0.0% (0.0-79.3%) -
Moxifloxacin tbprofiler 0(0) 1(1) - 0.0% (0.0-79.3%) -
Ofloxacin drprg 0(10) 4(77) 100.0% (72.2-100.0%) 94.8% (87.4-98.0%) 0.823
Ofloxacin mykrobe 0(10) 4(77) 100.0% (72.2-100.0%) 94.8% (87.4-98.0%)
0.823
Ofloxacin tbprofiler 0(10) 3(77) 100.0% (72.2-100.0%) 96.1% (89.2-98.7%)
0.86
Pyrazinamide drprg 0(0) 0(1) - 100.0% (20.7-100.0%) -
Pyrazinamide mykrobe 0(0) 0(1) - 100.0% (20.7-100.0%) -
Pyrazinamide tbprofiler 0(0) 0(1) - 100.0% (20.7-100.0%) -
Rifampicin drprg 5(48) 1(44) 89.6% (77.8-95.5%) 97.7% (88.2-99.6%) 0.873
Rifampicin mykrobe 5(48) 1(44) 89.6% (77.8-95.5%) 97.7% (88.2-99.6%) 0.873
Rifampicin tbprofiler 5(48) 1(44) 89.6% (77.8-95.5%) 97.7% (88.2-99.6%)
0.873
Streptomycin drprg 2(8) 14(83) 75.0% (40.9-92.9%) 83.1% (73.7-89.7%) 0.398
Streptomycin mykrobe 2(8) 27(83) 75.0% (40.9-92.9%) 67.5% (56.8-76.6%)
0.25
Streptomycin tbprofiler 2(8) 12(83) 75.0% (40.9-92.9%) 85.5% (76.4-91.5%)
0.43
—
Reply to this email directly, view it on GitHub
<#2>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA6TKZC4MDZPDVRG56HLV7DWJ3ESVANCNFSM6AAAAAAQWLHP7A>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Yeah! |
I disagree sadly haha. TBProfiler is beating us on a lot of drugs. Going to dig into why that is now |
Looking again (now on laptop), if i was to summarise those results: On illumina, tb-profiler often has the highest sensitivity. It does pay a very small price in specificity, but it's much less noticeable than the sensitivity increase. So i agree, good to look into that On nanopore: sensitivity of all tools is essentially identical (except tb-profiler has a problem on EMB). Specificity is also essentially identical, although for two drugs (streptomycin and ofloxacin) tb-prof has a slightly increased specificity. |
I've been looking through the variants where drprg is FN but either of the other tools is TP (on Illumina) to see what variants we have missed. (I'm not finished yet) but a lot of the tbprofiler TPs where we are FN are to do with minor alleles. By default, TBProfiler will call anything with a fraction of 0.1 or more. This brings up point 3 from #2 again. We tell mykrobe to run in haploid mode and drprg only runs in haploid mode. The options forward I see are:
Option 2 is obviously the easiest and most likely to make us look better, but it sits somewhat uncomfortably with me as we are kind of skewing the results in our favour right? I will keep working through these results next week for other drugs as there are also a few cases on weird indels which I will document when I have a better understanding of what's going on. |
I think detecting minors could easily be done directly in drprg, no need to implement in Pandora. You get coverage info on the S and R alleles right? Just ask if the coverage on any R allele is >0.1 of the total |
IDK neither how much work this would be, because the only experience I have with genotyping models actually is in pandora, which has a haploid model. If implementing a diploid model is simply calling the two most likely alleles, then maybe a simple implementation of getting the most likely allele (what is currently implemented) and the second most likely allele (remove/ignore the most likely and rerun the genotyping algorithm) is not hard. This can be easily generalised to n-ploid... but I don't think it is as simple as this... |
|
True. I'll have to do some reimplementing though as I currently only pay attention to the called alleles. But it shouldn't take too long to get this working 🤞 |
Hurrah! I would reread the section on minor alleles here |
Two solutions
|
This is v exciting and good news really, there a lot of sensitivity gain to be had from the minor alleles and gene deletions |
For point 1, no, that is not right. We don't have these variants in the graph, which is the problem. (Remember our graph is not the panel, but the sparse popn. PRG from randomly sampled cryptic samples). And racon can't find them in these samples beacsue they're only minor alleles. Racon will find the major allele - the reference. Point 2 seems like it effectively does away with the need for pandora though - is almost basically what tbprofiler does? It will also dramatically increase our runtime and memory usage, which at the moment is our biggest selling point really. |
I need to think! |
Follow up to 2. I'm not pushing for this solution, but just to say, we do this for covid, 30kb long, and use <500 mb ram and 45 seconds for the whole process. I think performance is not a barrier . But there are other,arguments not to do it |
After closing mbhall88/drprg#23 the current (Illumina) results are
The nanopore results remain unchanged |
After the updates in minor allele calling in mbhall88/drprg#19 (comment)
|
OK, so looking at those results now, we can definitely see a sensitive improvement over Mykrobe with no precision loss. Compared with tbprofiler we are broadly the same - tbprofiler mostly has slightly better recall and slightly worse precision (except for fluoroquinolones). The biggest difference is 7% higher recall for tbprofiler for pyrazinamide . Fair summary? |
Yep, fair summary. The work in mbhall88/drprg#24 should improve the PZA recall slightly too. |
After the work in mbhall88/drprg#26 , we get the following Illumina results (nanopore is unchanged). Note: only ETO and PZA change from last results
PZA still isn't great, but there are just so many different mutations with minor alleles that we don't have in the graph and hand-picking them all could lead to a complicated graph. Although I can try adding them if we really want to try boosting PZA sensitivity... |
I think those results are much improved, am wondering what the pitch is for drprg though. Illumina is better than Mykrobe and ~same as tbprofiler. Are the nanopore results really unchanged from before ? Leandros mapping fixes will help too |
Yeah, this has been troubling me too...I mean we can notice gene deletions...We use a lot less resources....
Here are the current nanopore results
Sample sizes are so small it makes it hard to get a clear picture for a lot of drugs. |
Here are the Illumina results on the full dataset (45,193 samples)
I am currently working through the INH FNs and have learned a lot and fixed some bugs. Most important result to understand here though will be the RIF sensitivity which is significantly lower than tb-profiler |
I think I might have gotten to the bottom of the RIF sensitivity issue (also impacts a decent amount of INH FNs). tl;dr we need a smaller minimum cluster size for (some) Illumina reads in pandora. Cluster size dictates whether we recognise a read as "hitting" a locus. The default is 10. But I was finding a lot of FNs where we just have these big random stretches of zero depth - generally in and around the RRDR. When I map these reads to H37Rv with minimap2 it was showing that we should definitely have depth over the RRDR and it's surrounding regions. Turns out most of them are unmapped in the pandora SAM file. In the end, most of these reads were getting ~4-6 hits, therefore they were being marked as unmapped because they're below the default of 10. I have also noticed a lot of the samples with this issue are Illumina HiSeq 2000 75bp reads. This relates back to mbhall88/drprg#12 (comment). I've run on a few samples with the minimum cluster size set to 4 and it seems to have resolved the issue for those samples. So I'm going to rerun all samples and reasssess the results after than 🤞 |
Also relates to long reads that overlap a prg only at the end . |
Changing the minimum cluster size to 4 we get the following diff for Illumina
This is great, and the only real concern is 57 extra RIF FPs. I'll take a look at those and see if I can figure out if they're fixable or not. The overall results now are
Here is a table of the drug, tool combinations where the CIs don't overlap
I think we're very close to done. I just want to do a last check of discrepancies and see if I can salvage some more FNs and FPs. |
Looking good! What does that change if cluster size do to nanopore though |
No change in results for nanopore |
Woah |
I rerun the pipeline again after fixing a couple of bugs and adding/removing some more mutations from the graph. Here is the diff between this latest run and the run above
And the overall results
And the table of the drug, tool combinations where the CIs don't overlap
I've been through and made a couple more changes to the pipeline/mutations added and have rerun the pipeline. Fingers crossed this might be the last run. Although there is iqbal-lab-org/make_prg#55 which would also improve results if we can get a fix inplace. |
I've been reworking the sensitivity/specificity plots a little as I don't love them in their current form. Now that the CIs are so small it makes it hard to see some. In particular this is because we have both Sn and Sp in the same plot and their scales probably aren't matched well. As such, I have split them into separate plots and used a white background for easier determination of colours. I've also added a red, dashed line for the WHO target product profiles for both sensitivity and specificity (see here). SensitivitySpecificityFeedback is very welcome |
These are great 👍 |
That said, the y axis on the sensitivity plot is weird, right? |
so, mykrobe is ~always the most specific but at a loss in sensitivity compared to the other two tools. Drprg retains almost as high specificity, but improves recall to the extent that is has the best recall of all tools (except only for Rif) |
How so? It's a logit (logistic regression) scale. This scale is similar to a log scale close to zero and to one, and almost linear around 0.5. Seemed the best fit given we have stuff near 100% that we want to zoom in on, but we also have stuff well below. Without it the CI's are basically invisible for a lot of the drugs
Fair summary for the most part. Although mykrobe's specificity is never significantly better than drprg or tbprofiler |
This issue will document the rolling results.
The first sneak peak is all 437 Nanopore isolates and 400 Illumina isolates (selected at random).
Nanopore
Next avenues of investigation:
Positives:
Illumina
Next avenues of investigation:
I'll probably wait until the Nanopore stuff is debugged and then run on a larger sample of data before debugging Illumina
The text was updated successfully, but these errors were encountered: