Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation with shotgun data #129

Closed
EricRaes opened this issue Jul 22, 2020 · 4 comments
Closed

Validation with shotgun data #129

EricRaes opened this issue Jul 22, 2020 · 4 comments

Comments

@EricRaes
Copy link

Hi Gavin,

Hope you are having a good day.

I was hoping I could bug you with a question. My aim is to do a validation with the KO outputs from both PICRUSt2 and my shotgun data as you outlined nicely in Nature paper and on Github rep: https://github.com/gavinmdouglas/picrust2_manuscript/blob/master/scripts/analyses/validations/16S_vs_MGS/16S_vs_MGS_KO_validations.R.

I have 12 samples for which I have 16S rRNA and shotgun data. I have been going through your script on github but unfortunately I am stuck.

a. In R I am loading the functions you listed in - https://github.com/gavinmdouglas/picrust2_manuscript/blob/master/scripts/picrust2_ms_functions.R
b. I then execute the function ‘read_in_ko_predictions’ for my two KO files (pointing to my local paths)
c. and load the ‘compute_ko_validation_metrics’ you listed in - https://github.com/gavinmdouglas/picrust2_manuscript/blob/master/scripts/analyses/validations/16S_vs_MGS/16S_vs_MGS_KO_validations.R

At the end, unfortunately I don’t follow you script anymore. When you wrote:

Loop over all dataset names: read in predictions (restrict to overlapping samples only,

and get subsets with all possible KOs that overlap across tools filled in) and compute performance metrics.

datasets <- c("hmp", "mammal", "ocean", "blueberry", "indian", "cameroon", "primate")

At which stage did you define your database names ? e.g., c("hmp", "mammal", "ocean", "blueberry", "indian", "cameroon", "primate"). I just have one project (two files; one for 16S rRNA and one for my shotgun data).

I am bit confused as to how I can get to the validation (Spearman-correlation) outputs.

Many thanks in advance for your help and suggestions!!

Best regards,
Eric

MetaG_ALL_Stations_November_KO.txt
PICRUSt2_ALL_Stations_November_KO.txt

@gavinmdouglas
Copy link
Member

gavinmdouglas commented Jul 22, 2020 via email

@EricRaes
Copy link
Author

Hi Gavin,

Many thanks for your swift reply!

I quickly wanted to do a sanity check and ask whether I am doing the right thing here.

As you said I restricted to KOs that could have been predicted as present by both approaches and then filled in 0s for any of these KOs that could have been predicted but weren't => this results in my final file "KOs_which_overlap_with_MetaG_and_Picrust2"

KOs_which_overlap_with_MetaG_and_Picrust2 <- read.csv( "KOs_which_overlap_with_MetaG_and_Picrust2.csv",header=T,row.names=1 )

subset <- c("Sample_1_MetaG_KO", "Sample_1_Picrust2_KO")
newdata <- KOs_which_overlap_with_MetaG_and_Picrust2[subset]

ggscatter(newdata, x = "Sample_1_MetaG_KO", y = "Sample_1_Picrust2_KO",
add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "spearman",
xlab = "Sample_1_KO_PICRUST2", ylab = "Sample_1_KO_MetaG")

and then did a Spearman correlation; get a R2 value back and that's it ? :-)

Spearman_1_sample_PICRUSt2 and MetaG KOs

@gavinmdouglas
Copy link
Member

Hi @EricRaes, yes that looks like the approach!

@EricRaes
Copy link
Author

Thanks heaps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants