-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validation with shotgun data #129
Comments
Hi there,
That code may take a little work to alter for use with other data.
However, to perform the correlations on just those two files you can do
that with just a little custom R code. You'll want to restrict to KOs that
could have been predicted as present by both approaches. You should then
fill in 0s for any of these KOs that could have been predicted but weren't
(you can look at the function you cited for the files I used for these
purposes for PICRUSt2 and HUMAnN2). Finally you could sort and subset the
tables to the same samples and KO orderings and then loop over every sample
name (i.e. column name) to calculate the Spearman correlation for each with
cor.test.
Hopefully that helps!
Gavin
…On Wed, Jul 22, 2020, 2:06 AM EricRaes, ***@***.***> wrote:
Hi Gavin,
Hope you are having a good day.
I was hoping I could bug you with a question. My aim is to do a validation
with the KO outputs from both PICRUSt2 and my shotgun data as you outlined
nicely in Nature paper and on Github rep:
https://github.com/gavinmdouglas/picrust2_manuscript/blob/master/scripts/analyses/validations/16S_vs_MGS/16S_vs_MGS_KO_validations.R
.
I have 12 samples for which I have 16S rRNA and shotgun data. I have been
going through your script on github but unfortunately I am stuck.
a. In R I am loading the functions you listed in -
https://github.com/gavinmdouglas/picrust2_manuscript/blob/master/scripts/picrust2_ms_functions.R
b. I then execute the function ‘read_in_ko_predictions’ for my two KO
files (pointing to my local paths)
c. and load the ‘compute_ko_validation_metrics’ you listed in -
https://github.com/gavinmdouglas/picrust2_manuscript/blob/master/scripts/analyses/validations/16S_vs_MGS/16S_vs_MGS_KO_validations.R
At the end, unfortunately I don’t follow you script anymore. When you
wrote:
Loop over all dataset names: read in predictions (restrict to overlapping
samples only, and get subsets with all possible KOs that overlap across
tools filled in) and compute performance metrics.
datasets <- c("hmp", "mammal", "ocean", "blueberry", "indian", "cameroon",
"primate")
At which stage did you define your database names ? e.g., c("hmp",
"mammal", "ocean", "blueberry", "indian", "cameroon", "primate"). I just
have one project (two files; one for 16S rRNA and one for my shotgun data).
I am bit confused as to how I can get to the validation
(Spearman-correlation) outputs.
Many thanks in advance for your help and suggestions!!
Best regards,
Eric
MetaG_ALL_Stations_November_KO.txt
<https://github.com/picrust/picrust2/files/4957881/MetaG_ALL_Stations_November_KO.txt>
PICRUSt2_ALL_Stations_November_KO.txt
<https://github.com/picrust/picrust2/files/4957882/PICRUSt2_ALL_Stations_November_KO.txt>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#129>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC7JHU25OASO5QOGV2EF4ILR4ZXUVANCNFSM4PEKLAOA>
.
|
Hi Gavin, Many thanks for your swift reply! I quickly wanted to do a sanity check and ask whether I am doing the right thing here. As you said I restricted to KOs that could have been predicted as present by both approaches and then filled in 0s for any of these KOs that could have been predicted but weren't => this results in my final file "KOs_which_overlap_with_MetaG_and_Picrust2" KOs_which_overlap_with_MetaG_and_Picrust2 <- read.csv( "KOs_which_overlap_with_MetaG_and_Picrust2.csv",header=T,row.names=1 ) subset <- c("Sample_1_MetaG_KO", "Sample_1_Picrust2_KO") ggscatter(newdata, x = "Sample_1_MetaG_KO", y = "Sample_1_Picrust2_KO", and then did a Spearman correlation; get a R2 value back and that's it ? :-) |
Hi @EricRaes, yes that looks like the approach! |
Thanks heaps! |
Hi Gavin,
Hope you are having a good day.
I was hoping I could bug you with a question. My aim is to do a validation with the KO outputs from both PICRUSt2 and my shotgun data as you outlined nicely in Nature paper and on Github rep: https://github.com/gavinmdouglas/picrust2_manuscript/blob/master/scripts/analyses/validations/16S_vs_MGS/16S_vs_MGS_KO_validations.R.
I have 12 samples for which I have 16S rRNA and shotgun data. I have been going through your script on github but unfortunately I am stuck.
a. In R I am loading the functions you listed in - https://github.com/gavinmdouglas/picrust2_manuscript/blob/master/scripts/picrust2_ms_functions.R
b. I then execute the function ‘read_in_ko_predictions’ for my two KO files (pointing to my local paths)
c. and load the ‘compute_ko_validation_metrics’ you listed in - https://github.com/gavinmdouglas/picrust2_manuscript/blob/master/scripts/analyses/validations/16S_vs_MGS/16S_vs_MGS_KO_validations.R
At the end, unfortunately I don’t follow you script anymore. When you wrote:
Loop over all dataset names: read in predictions (restrict to overlapping samples only,
and get subsets with all possible KOs that overlap across tools filled in) and compute performance metrics.
datasets <- c("hmp", "mammal", "ocean", "blueberry", "indian", "cameroon", "primate")
At which stage did you define your database names ? e.g., c("hmp", "mammal", "ocean", "blueberry", "indian", "cameroon", "primate"). I just have one project (two files; one for 16S rRNA and one for my shotgun data).
I am bit confused as to how I can get to the validation (Spearman-correlation) outputs.
Many thanks in advance for your help and suggestions!!
Best regards,
Eric
MetaG_ALL_Stations_November_KO.txt
PICRUSt2_ALL_Stations_November_KO.txt
The text was updated successfully, but these errors were encountered: