Validation with shotgun data #129

EricRaes · 2020-07-22T05:06:02Z

Hi Gavin,

Hope you are having a good day.

I was hoping I could bug you with a question. My aim is to do a validation with the KO outputs from both PICRUSt2 and my shotgun data as you outlined nicely in Nature paper and on Github rep: https://github.com/gavinmdouglas/picrust2_manuscript/blob/master/scripts/analyses/validations/16S_vs_MGS/16S_vs_MGS_KO_validations.R.

I have 12 samples for which I have 16S rRNA and shotgun data. I have been going through your script on github but unfortunately I am stuck.

a. In R I am loading the functions you listed in - https://github.com/gavinmdouglas/picrust2_manuscript/blob/master/scripts/picrust2_ms_functions.R
b. I then execute the function ‘read_in_ko_predictions’ for my two KO files (pointing to my local paths)
c. and load the ‘compute_ko_validation_metrics’ you listed in - https://github.com/gavinmdouglas/picrust2_manuscript/blob/master/scripts/analyses/validations/16S_vs_MGS/16S_vs_MGS_KO_validations.R

At the end, unfortunately I don’t follow you script anymore. When you wrote:

Loop over all dataset names: read in predictions (restrict to overlapping samples only,

and get subsets with all possible KOs that overlap across tools filled in) and compute performance metrics.

datasets <- c("hmp", "mammal", "ocean", "blueberry", "indian", "cameroon", "primate")

At which stage did you define your database names ? e.g., c("hmp", "mammal", "ocean", "blueberry", "indian", "cameroon", "primate"). I just have one project (two files; one for 16S rRNA and one for my shotgun data).

I am bit confused as to how I can get to the validation (Spearman-correlation) outputs.

Many thanks in advance for your help and suggestions!!

Best regards,
Eric

MetaG_ALL_Stations_November_KO.txt
PICRUSt2_ALL_Stations_November_KO.txt

gavinmdouglas · 2020-07-22T13:52:06Z

Hi there, That code may take a little work to alter for use with other data. However, to perform the correlations on just those two files you can do that with just a little custom R code. You'll want to restrict to KOs that could have been predicted as present by both approaches. You should then fill in 0s for any of these KOs that could have been predicted but weren't (you can look at the function you cited for the files I used for these purposes for PICRUSt2 and HUMAnN2). Finally you could sort and subset the tables to the same samples and KO orderings and then loop over every sample name (i.e. column name) to calculate the Spearman correlation for each with cor.test. Hopefully that helps! Gavin

…

On Wed, Jul 22, 2020, 2:06 AM EricRaes, ***@***.***> wrote: Hi Gavin, Hope you are having a good day. I was hoping I could bug you with a question. My aim is to do a validation with the KO outputs from both PICRUSt2 and my shotgun data as you outlined nicely in Nature paper and on Github rep: https://github.com/gavinmdouglas/picrust2_manuscript/blob/master/scripts/analyses/validations/16S_vs_MGS/16S_vs_MGS_KO_validations.R . I have 12 samples for which I have 16S rRNA and shotgun data. I have been going through your script on github but unfortunately I am stuck. a. In R I am loading the functions you listed in - https://github.com/gavinmdouglas/picrust2_manuscript/blob/master/scripts/picrust2_ms_functions.R b. I then execute the function ‘read_in_ko_predictions’ for my two KO files (pointing to my local paths) c. and load the ‘compute_ko_validation_metrics’ you listed in - https://github.com/gavinmdouglas/picrust2_manuscript/blob/master/scripts/analyses/validations/16S_vs_MGS/16S_vs_MGS_KO_validations.R At the end, unfortunately I don’t follow you script anymore. When you wrote: Loop over all dataset names: read in predictions (restrict to overlapping samples only, and get subsets with all possible KOs that overlap across tools filled in) and compute performance metrics. datasets <- c("hmp", "mammal", "ocean", "blueberry", "indian", "cameroon", "primate") At which stage did you define your database names ? e.g., c("hmp", "mammal", "ocean", "blueberry", "indian", "cameroon", "primate"). I just have one project (two files; one for 16S rRNA and one for my shotgun data). I am bit confused as to how I can get to the validation (Spearman-correlation) outputs. Many thanks in advance for your help and suggestions!! Best regards, Eric MetaG_ALL_Stations_November_KO.txt <https://github.com/picrust/picrust2/files/4957881/MetaG_ALL_Stations_November_KO.txt> PICRUSt2_ALL_Stations_November_KO.txt <https://github.com/picrust/picrust2/files/4957882/PICRUSt2_ALL_Stations_November_KO.txt> — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#129>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC7JHU25OASO5QOGV2EF4ILR4ZXUVANCNFSM4PEKLAOA> .

EricRaes · 2020-07-23T00:59:22Z

Hi Gavin,

Many thanks for your swift reply!

I quickly wanted to do a sanity check and ask whether I am doing the right thing here.

As you said I restricted to KOs that could have been predicted as present by both approaches and then filled in 0s for any of these KOs that could have been predicted but weren't => this results in my final file "KOs_which_overlap_with_MetaG_and_Picrust2"

KOs_which_overlap_with_MetaG_and_Picrust2 <- read.csv( "KOs_which_overlap_with_MetaG_and_Picrust2.csv",header=T,row.names=1 )

subset <- c("Sample_1_MetaG_KO", "Sample_1_Picrust2_KO")
newdata <- KOs_which_overlap_with_MetaG_and_Picrust2[subset]

ggscatter(newdata, x = "Sample_1_MetaG_KO", y = "Sample_1_Picrust2_KO",
add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "spearman",
xlab = "Sample_1_KO_PICRUST2", ylab = "Sample_1_KO_MetaG")

and then did a Spearman correlation; get a R2 value back and that's it ? :-)

gavinmdouglas · 2020-07-24T22:09:03Z

Hi @EricRaes, yes that looks like the approach!

EricRaes · 2020-07-25T04:29:40Z

Thanks heaps!

EricRaes closed this as completed Jul 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validation with shotgun data #129

Validation with shotgun data #129

EricRaes commented Jul 22, 2020

gavinmdouglas commented Jul 22, 2020 via email

EricRaes commented Jul 23, 2020

gavinmdouglas commented Jul 24, 2020

EricRaes commented Jul 25, 2020

Validation with shotgun data #129

Validation with shotgun data #129

Comments

EricRaes commented Jul 22, 2020

Loop over all dataset names: read in predictions (restrict to overlapping samples only,

and get subsets with all possible KOs that overlap across tools filled in) and compute performance metrics.

gavinmdouglas commented Jul 22, 2020 via email

EricRaes commented Jul 23, 2020

gavinmdouglas commented Jul 24, 2020

EricRaes commented Jul 25, 2020