Genes and samples switched in goodSamplesGenes #88

lorenzoamir · 2024-01-22T18:26:38Z

Hi, thanks a lot for the package,

I just noticed that good samples and and good genes seem to be switched in goodSamplesGenes. This could be an issue with the documentation and not the function itself.

I am working with an anndata object with shape n_obs × n_vars = 79 × 19013 (79 samples, 19013 genes). Following the documentation, goodSamplesGenes takes "A data frame in which columns are genes and rows are samples" and returns "A triple containing (goodGenes, goodSamples, allOK)". But when I run

good_genes, good_samples, all_ok = WGCNA.goodSamplesGenes(datExpr=pd.DataFrame(adata.X))

good_genes has shape 79 and good_samples has 19013. So the documentation should probably be changed to "A triple containing (goodSamples, goodGenes, allOK)". Or maybe it's just me mixing up rows and columns, but my dataframe looked like the ones that were shown in the tutorials.

EDIT: made anndata object name more clear and fixed typos

The text was updated successfully, but these errors were encountered:

lorenzoamir · 2024-01-23T18:31:59Z

I tried looking into this issue hoping that I could maybe fix if it was just a matter of switching 2 variable names. And found this in the documentation of the WGCNA class:

class WGCNA(GeneExp):
"""
A class used to do weighted gene co-expression network analysis.
[ . . . ]
:param anndata: if the expression data is in anndata format you should pass it through this parameter. X should be expression matrix. var is a sample information and obs is a gene information.
:param anndata: anndata

So :param anndata: is repeated twice (the second time it should probably be :type anndata:), but the documentation states that genes should be stored in the obs field and samples should be in the var field. It is a bit counterintuitive, because, as far as I know, it is usually the other way round (obs = samples and var = genes), but this could explain why they were switched in my previous example.

However in the PyWGCNA_object tutorial it is stated that the data is stored "in AnnData format which rows/obs are samples/sample information and cols/var are genes/gene information" and indeed it is shown that the .obs field contains samples and the .var field contains genes.

I'm a bit confused and don't know if I should transpose my data or not anymore, this can be easily solved by using dataframes instead of anndata objects, but maybe it should be clarified a bit more clearly since an anndata option is available?

nargesr · 2024-01-23T19:12:28Z

Hi @lorenzoamir ,

Thank you for mentioning this!

I'll look into this matter in the next few days and update the related documents :)

meanwhile, as you mentioned you can pass data in separate tsv/csv files.

…sue #88

nargesr · 2024-01-24T20:02:47Z

Hi @lorenzoamir,

Sorry for the confusion! I was trying to use mostly the similar function/format of input that has been used in the original WGCNA in R.

you were right about the API documentation part. I fixed the documentation so X should be an expression matrix. var is a gene information and obs is a sample information.

for goodSamplesGenes, if you look at the wrapper function (preprocess()) for the preprocessing steps you can see that I transposed the expression matrix before passing it to the goodSamplesGenes() function.

goodGenes, goodSamples, allOK = WGCNA.goodSamplesGenes(self.datExpr.to_df().T)

I just updated the documentation! It should reflect on the website in the next few minutes.

Sorry again for the confusion and thanks for pointing it out
Please feel free to reopen this issue if there is still a problem.

lorenzoamir closed this as completed Jan 23, 2024

lorenzoamir reopened this Jan 23, 2024

lorenzoamir closed this as completed Jan 23, 2024

lorenzoamir reopened this Jan 23, 2024

nargesr added a commit that referenced this issue Jan 24, 2024

updated documentation + fix module_trait_relationships_heatmap() + is…

a48daa6

…sue #88

nargesr closed this as completed Jan 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Genes and samples switched in goodSamplesGenes #88

Genes and samples switched in goodSamplesGenes #88

lorenzoamir commented Jan 22, 2024 •

edited

Loading

lorenzoamir commented Jan 23, 2024 •

edited

Loading

nargesr commented Jan 23, 2024

nargesr commented Jan 24, 2024

Genes and samples switched in goodSamplesGenes #88

Genes and samples switched in goodSamplesGenes #88

Comments

lorenzoamir commented Jan 22, 2024 • edited Loading

lorenzoamir commented Jan 23, 2024 • edited Loading

nargesr commented Jan 23, 2024

nargesr commented Jan 24, 2024

lorenzoamir commented Jan 22, 2024 •

edited

Loading

lorenzoamir commented Jan 23, 2024 •

edited

Loading