Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genes and samples switched in goodSamplesGenes #88

Closed
lorenzoamir opened this issue Jan 22, 2024 · 3 comments
Closed

Genes and samples switched in goodSamplesGenes #88

lorenzoamir opened this issue Jan 22, 2024 · 3 comments

Comments

@lorenzoamir
Copy link

lorenzoamir commented Jan 22, 2024

Hi, thanks a lot for the package,

I just noticed that good samples and and good genes seem to be switched in goodSamplesGenes. This could be an issue with the documentation and not the function itself.

I am working with an anndata object with shape n_obs × n_vars = 79 × 19013 (79 samples, 19013 genes). Following the documentation, goodSamplesGenes takes "A data frame in which columns are genes and rows are samples" and returns "A triple containing (goodGenes, goodSamples, allOK)". But when I run

good_genes, good_samples, all_ok = WGCNA.goodSamplesGenes(datExpr=pd.DataFrame(adata.X))

good_genes has shape 79 and good_samples has 19013. So the documentation should probably be changed to "A triple containing (goodSamples, goodGenes, allOK)". Or maybe it's just me mixing up rows and columns, but my dataframe looked like the ones that were shown in the tutorials.

EDIT: made anndata object name more clear and fixed typos

@lorenzoamir
Copy link
Author

lorenzoamir commented Jan 23, 2024

I tried looking into this issue hoping that I could maybe fix if it was just a matter of switching 2 variable names. And found this in the documentation of the WGCNA class:

class WGCNA(GeneExp):
"""
A class used to do weighted gene co-expression network analysis.
[ . . . ]
:param anndata: if the expression data is in anndata format you should pass it through this parameter. X should be expression matrix. var is a sample information and obs is a gene information.
:param anndata: anndata

So :param anndata: is repeated twice (the second time it should probably be :type anndata:), but the documentation states that genes should be stored in the obs field and samples should be in the var field. It is a bit counterintuitive, because, as far as I know, it is usually the other way round (obs = samples and var = genes), but this could explain why they were switched in my previous example.

However in the PyWGCNA_object tutorial it is stated that the data is stored "in AnnData format which rows/obs are samples/sample information and cols/var are genes/gene information" and indeed it is shown that the .obs field contains samples and the .var field contains genes.

I'm a bit confused and don't know if I should transpose my data or not anymore, this can be easily solved by using dataframes instead of anndata objects, but maybe it should be clarified a bit more clearly since an anndata option is available?

@nargesr
Copy link
Member

nargesr commented Jan 23, 2024

Hi @lorenzoamir ,

Thank you for mentioning this!

I'll look into this matter in the next few days and update the related documents :)

meanwhile, as you mentioned you can pass data in separate tsv/csv files.

@nargesr
Copy link
Member

nargesr commented Jan 24, 2024

Hi @lorenzoamir,

Sorry for the confusion! I was trying to use mostly the similar function/format of input that has been used in the original WGCNA in R.

you were right about the API documentation part. I fixed the documentation so X should be an expression matrix. var is a gene information and obs is a sample information.

for goodSamplesGenes, if you look at the wrapper function (preprocess()) for the preprocessing steps you can see that I transposed the expression matrix before passing it to the goodSamplesGenes() function.

goodGenes, goodSamples, allOK = WGCNA.goodSamplesGenes(self.datExpr.to_df().T)

I just updated the documentation! It should reflect on the website in the next few minutes.

Sorry again for the confusion and thanks for pointing it out
Please feel free to reopen this issue if there is still a problem.

@nargesr nargesr closed this as completed Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants