Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Select samples for PCoA #49

Closed
rupesh-sinha opened this issue Nov 29, 2022 · 3 comments
Closed

Select samples for PCoA #49

rupesh-sinha opened this issue Nov 29, 2022 · 3 comments

Comments

@rupesh-sinha
Copy link

rupesh-sinha commented Nov 29, 2022

Hi sbslee,
I was using dokdo for plotting PCoA as below:

import dokdo
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set()

qza_file = '/home/rupesh/rupesh/qiime2/new_qiime/uchime-dn-out_wdout_clustering/table-nc-wobl-nmcl.qza'
metadata_file = '/home/rupesh/rupesh/qiime2/new_qiime/sample_metada.tsv'


pcoa_results = dokdo.ordinate(qza_file)

dokdo.beta_2d_plot(
    pcoa_results,
    hue='matrix',
    metadata=metadata_file,
    figsize=(8, 8)
)

plt.tight_layout()

which is working fine.
But,
I was wondering if we can have function like 'where' or 'include_samples' or 'exclude_samples' or something like that to control the samples to be presented on the plot.

Thanks

@rupesh-sinha rupesh-sinha changed the title Hi sbslee, Select samples for PCoA Nov 29, 2022
@sbslee
Copy link
Owner

sbslee commented Dec 1, 2022

@rupesh-sinha,

Thank you for the feature request!

The reason I didn't include options like where to the beta_2d_plot function is because I was afraid the option might be abused by users, both intentionally and unintentionally. As you may already know, when presenting a PCoA plot you are supposed to show all samples that were used to calculate the original distance matrix. If a given PCoA plot is showing only a subset of samples, then it should not be used to make any definitive conclusions. For example, when you try to subset some samples in an emperor QZV file using QIIME 2 View, it will still do it for you but with a warning: "WARNING: hiding samples in an ordination can be misleading". Below you will see that I removed "left palm" samples (blue). The warning is displayed on the top left corner.

Screen Shot 2022-12-01 at 3 45 37 PM

Therefore, if the goal is to see how PCoA looks in the absence of certain samples, then you should calculate a new distance matrix and plot it again. This can be easily done within Dokdo and an example of this is illustrated in the ordinate function's documentation (compare the first and second figures). I will show the code example below:

import dokdo
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set()

qza_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/table.qza'
metadata_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/sample-metadata.tsv'

pcoa_results = dokdo.ordinate(qza_file)

dokdo.beta_2d_plot(
    pcoa_results,
    metadata=metadata_file,
    hue='body-site',
    figsize=(8, 8)
)

plt.tight_layout()

ordinate-1

from qiime2 import Metadata

mf = dokdo.get_mf(metadata_file)
mf = mf[mf['body-site'].isin(['gut', 'left palm'])]

pcoa_results = dokdo.ordinate(qza_file, metadata=Metadata(mf))

dokdo.beta_2d_plot(
    pcoa_results,
    metadata=metadata_file,
    hue='body-site',
    figsize=(8, 8)
)

plt.tight_layout()

ordinate-2

I think there could still be an argument to be made in the favor of adding options like where to the beta_2d_plot function. For example, the user may want to explore the PCoA space in greater depth. However, if that's the case the user should use QIIME 2 View because that's why the platform exists in the first place.

Hope this helps!

@sbslee sbslee closed this as completed Dec 1, 2022
@rupesh-sinha
Copy link
Author

Thanks @sbslee

This works perfectly.
I understand the concern for hiding some samples and was looking for something like this to recompute the distance matrix for the selected samples. Thanks for your excellent support with dokdo!

When I recompute using the function you suggested in the second part of the script I get a plot with warning message like this:
/home/rupesh/anaconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/sklearn/metrics/pairwise.py:1776: DataConversionWarning: Data was converted to boolean for metric jaccard
warnings.warn(msg, DataConversionWarning)

What does this actually mean?
Again thanks for your support.

@sbslee
Copy link
Owner

sbslee commented Dec 2, 2022

@rupesh-sinha,

Glad to know that Dokdo is helpful to you!

As for the warning, you don't have to worry about it. It simply means that numeric data was converted to boolean (e.g. 442 --> True, and 0 --> False). This conversion was needed because you are using the Jaccard distance as distance metrics, which computes distance between samples based on the presence/absence of microbiome instead of using their abundance. Hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants