-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Genes and samples switched in goodSamplesGenes #88
Comments
I tried looking into this issue hoping that I could maybe fix if it was just a matter of switching 2 variable names. And found this in the documentation of the WGCNA class:
So However in the PyWGCNA_object tutorial it is stated that the data is stored "in AnnData format which rows/obs are samples/sample information and cols/var are genes/gene information" and indeed it is shown that the .obs field contains samples and the .var field contains genes. I'm a bit confused and don't know if I should transpose my data or not anymore, this can be easily solved by using dataframes instead of anndata objects, but maybe it should be clarified a bit more clearly since an anndata option is available? |
Hi @lorenzoamir , Thank you for mentioning this! I'll look into this matter in the next few days and update the related documents :) meanwhile, as you mentioned you can pass data in separate tsv/csv files. |
Hi @lorenzoamir, Sorry for the confusion! I was trying to use mostly the similar function/format of input that has been used in the original WGCNA in R. you were right about the API documentation part. I fixed the documentation so X should be an expression matrix. var is a gene information and obs is a sample information. for goodSamplesGenes, if you look at the wrapper function (
I just updated the documentation! It should reflect on the website in the next few minutes. Sorry again for the confusion and thanks for pointing it out |
Hi, thanks a lot for the package,
I just noticed that good samples and and good genes seem to be switched in goodSamplesGenes. This could be an issue with the documentation and not the function itself.
I am working with an anndata object with shape n_obs × n_vars = 79 × 19013 (79 samples, 19013 genes). Following the documentation, goodSamplesGenes takes "A data frame in which columns are genes and rows are samples" and returns "A triple containing (goodGenes, goodSamples, allOK)". But when I run
good_genes has shape 79 and good_samples has 19013. So the documentation should probably be changed to "A triple containing (goodSamples, goodGenes, allOK)". Or maybe it's just me mixing up rows and columns, but my dataframe looked like the ones that were shown in the tutorials.
EDIT: made anndata object name more clear and fixed typos
The text was updated successfully, but these errors were encountered: