Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to make heatmap with text file #22

Closed
khemlalnirmalkar opened this issue Jul 15, 2021 · 16 comments
Closed

how to make heatmap with text file #22

khemlalnirmalkar opened this issue Jul 15, 2021 · 16 comments
Labels
enhancement New feature or request

Comments

@khemlalnirmalkar
Copy link

Hi @sbslee ,
Is it possible to make a heatmap using a panda dataframe (not qza)?
i am trying to replicate the same style like bar plot, splitted categories with and without dendrogram.
Please can you help?

like this image
image

Thanks,
Khem

@sbslee sbslee added the enhancement New feature or request label Jul 18, 2021
@sbslee
Copy link
Owner

sbslee commented Jul 18, 2021

@khemlalnirmalkar, thanks for the question.

First of all, I just updated the dokdo.heatmap method in the 1.11.0-dev branch to accept pandas.DataFrame (click here to see the documentation):

import dokdo
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set()
csv_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/table.csv'
metadata_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/sample-metadata.tsv'
df = pd.read_csv(csv_file, index_col=0)
dokdo.heatmap(df,
              metadata=metadata_file,
              normalize='clr',
              hue1='body-site')
plt.savefig('heatmap-1.png')

Example CSV file for creating input pandas.DataFrame: table.csv

heatmap-1

Next, if you want to group the samples but do not want to cluster them, you can set row_cluster=False:

dokdo.heatmap(df,
              metadata=metadata_file,
              normalize='clr',
              hue1='body-site',
              row_cluster=False)
plt.savefig('heatmap-2.png')

heatmap-2

I understand these are not exactly what you wanted, but I hope this is a good starting point. What do you think?

@khemlalnirmalkar
Copy link
Author

@sbslee ,
Yes, this is great. Thanks for updating the code.
I hope we can rotate the plot too, like keeping the taxa on the right top and samples on the bottom.
Heatmap, ComplexHeatmap and some other R & py packages allow annotation and splitting of the heatmap. I like dokdo and It would be great if Dokdo can allow similar changes,
Thanks again,
Khem

@sbslee
Copy link
Owner

sbslee commented Jul 18, 2021

@khemlalnirmalkar,

If it's something that's not too difficult to implement, I'd be more than happy to make updates do Dokdo. So please keep letting me know :) That being said, I updated the dokdo.heatmap method to allow flipping of the x and y axes:

import dokdo
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set()
csv_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/table.csv'
metadata_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/sample-metadata.tsv'
df = pd.read_csv(csv_file, index_col=0)
dokdo.heatmap(df,
              metadata=metadata_file,
              flip=True,
              normalize='clr')
plt.savefig('heatmap-3.png')

heatmap-3

Next up is splitting a heatmap by the sample group. This one is trickier because each sample group will have a different clustering pattern because their microbiome composition will be different. In the example you provided, there are only 30 taxa labeled, which indicates to me that the clustering was performed earlier and you are just using the same order of those 30 taxa (I'm not sure how you got the exact order) for all four sample groups. Can you give me a link to the method's documentation so I can read about it? Or explain to me how you generated the example?

@khemlalnirmalkar
Copy link
Author

Hi @sbslee
My apologies totally distracted by some other work. The abundance file I sent you, was generated through Kraken2 https://github.com/DerrickWood/kraken2/wiki/Manual
As I know, there was no clustering method was applied. I just wanted to make a heatmap with my own order.
Thanks

@sbslee
Copy link
Owner

sbslee commented Jul 26, 2021

@khemlalnirmalkar,

Good news! Once I realized that some users like yourself don't need (or want) clustering of their samples and/or taxa, I decided to rename the heatmap method to clustermap (documentation) and devise a new method called heatmap (documentation). This is implemented in the 1.11.0-dev branch.

Below is a simple example:

import dokdo
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set()
qza_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/table-l3.qza'
dokdo.heatmap(qza_file,
              normalize='log10',
              flip=True,
              figsize=(10, 7))
plt.tight_layout()

heatmap-1

We can display a heatmap for each sample group:

fig, [ax1, ax2, ax3, ax4, ax5] = plt.subplots(1, 5,
                                              figsize=(20, 8),
                                              gridspec_kw={'width_ratios': [1, 1, 1, 1, 0.1]})

qza_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/table-l3.qza'

gut_samples = ['L1S8', 'L1S57', 'L1S76', 'L1S105', 'L1S140', 'L1S208', 'L1S257', 'L1S281'] 
left_palm_samples = ['L2S155', 'L2S175', 'L2S204', 'L2S222', 'L2S240', 'L2S309', 'L2S357', 'L2S382']
right_palm_samples = ['L3S242', 'L3S294', 'L3S313', 'L3S341', 'L3S360', 'L3S378', 'L4S63', 'L4S112', 'L4S137']
tongue_sampels = ['L5S104', 'L5S155', 'L5S174', 'L5S203', 'L5S222', 'L5S240', 'L6S20', 'L6S68', 'L6S93']

kwargs = dict(normalize='log10',
              flip=True,
              linewidths=0.5,
              xticklabels=True)

dokdo.heatmap(qza_file, ax=ax1, samples=gut_samples, cbar=False, **kwargs)
dokdo.heatmap(qza_file, ax=ax2, samples=left_palm_samples, yticklabels=False, cbar=False, **kwargs)
dokdo.heatmap(qza_file, ax=ax3, samples=right_palm_samples, yticklabels=False, cbar=False, **kwargs)
dokdo.heatmap(qza_file, ax=ax4, samples=tongue_sampels, yticklabels=False, cbar_ax=ax5, **kwargs)

ax1.set_title('Gut')
ax2.set_title('Left palm')
ax3.set_title('Right palm')
ax4.set_title('Toungue')

plt.tight_layout()

heatmap-2

Let me know what you think. I will probably make some additional updates to this method before releasing the official 1.11.0 version.

@khemlalnirmalkar
Copy link
Author

@sbslee,
This is great, thank you so much.
Quick question, can we group the samples based on category (like bar plot has an option "group order") instead of using each sample ID? can we normalize the abundance by z-score too?
Once again, I appreciate your constant help,

Khem

@sbslee
Copy link
Owner

sbslee commented Jul 26, 2021

@khemlalnirmalkar,

Those are great questions. I updated the heatmap method to 1) support the Z score normalization and 2) accept the metadata and where arguments:

import dokdo
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set()

qza_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/table-l3.qza'
metadata_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/sample-metadata.tsv'

fig, [ax1, ax2, ax3, ax4, ax5] = plt.subplots(1, 5,
                                              figsize=(20, 8),
                                              gridspec_kw={'width_ratios': [1, 1, 1, 1, 0.1]})

kwargs = dict(normalize='zscore',
              flip=True,
              linewidths=0.5,
              metadata=metadata_file,
              xticklabels=True)

qza_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/table-l3.qza'

dokdo.heatmap(qza_file, ax=ax1, where="[body-site] IN ('gut')", cbar=False, yticklabels=True, **kwargs)
dokdo.heatmap(qza_file, ax=ax2, where="[body-site] IN ('left palm')", yticklabels=False, cbar=False, **kwargs)
dokdo.heatmap(qza_file, ax=ax3, where="[body-site] IN ('right palm')", yticklabels=False, cbar=False, **kwargs)
dokdo.heatmap(qza_file, ax=ax4, where="[body-site] IN ('tongue')", yticklabels=False, cbar_ax=ax5, **kwargs)

ax1.set_title('Gut')
ax2.set_title('Left palm')
ax3.set_title('Right palm')
ax4.set_title('Toungue')

plt.tight_layout()
plt.savefig('heatmap-updated.png')

heatmap-updated

P.S. AFAIK, microbiome relative abundance data will give best results when normalized with log-based transformation such as log10 and CLR. This is because those data often contain lots of zeros and also extremely large values; therefore, the Z score transformation, which is linear and not log, may not be suitable for this kind of data. Please use this option carefully.

@khemlalnirmalkar
Copy link
Author

@sbslee Thanks for the quick action.
Agree, log10 or CLR would be better for microbial abundance. I was thinking to use Z score for metabolites data :).
Thank you so much for everything,
Khem

@khemlalnirmalkar
Copy link
Author

@sbslee,
I have a question on heatmap code
When i am splitting the heatmap with categories, i am losing the sample order, see the attached image. first, start with MS117 but when split, it changes. for paired samples, its important to keep the order. how can i fix it?
image

image

Dokdo needs "s__" for taxa, but is it possible to avoid it? i am thinking to use for metabolites too.

Thanks,

@sbslee
Copy link
Owner

sbslee commented Jul 27, 2021

@khemlalnirmalkar,

  1. As for the s__ prefix, you don't have to worry about that for the heatmap and clustermap methods. That concern is only for the taxa_abundance_bar_plot method. For example, if you look at the very first plot I posted in this thread, you will see that I provided a CSV file with taxa names that do not have any prefix.

  2. If it's truly important to keep the exact order of samples within each group level, you should definitely use the samples argument instead of the where argument. The latter uses QIIME 2 API which returns filtered sample IDs in a set object which erases the original order of samples. Assuming your input data is pandas.DataFrame and your samples (rows) are already perfectly ordered, you can simply do something like:

dokdo.heatmap(df, ax=ax1, ..., samples=df.index[0:5])
dokdo.heatmap(df, ax=ax2 ..., samples=df.index[0:2])
dokdo.heatmap(df, ax=ax3 ..., samples=df.index[0:18])
...

Hope this helps. Let me know if you need more help.

@sbslee
Copy link
Owner

sbslee commented Jul 27, 2021

@khemlalnirmalkar,

I thought about this more, and I think the following might help in your case. Basically, I added the sort_samples argument in the heatmap method:

import dokdo
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set()
qza_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/table-l3.qza'
dokdo.heatmap(qza_file,
              normalize='log10',
              flip=True,
              figsize=(10, 7))
plt.tight_layout()
plt.savefig('heatmap-1.png')

heatmap-1

metadata_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/sample-metadata.tsv'

fig, [ax1, ax2, ax3, ax4, ax5] = plt.subplots(1, 5,
                                              figsize=(20, 8),
                                              gridspec_kw={'width_ratios': [1, 1, 1, 1, 0.1]})

kwargs = dict(normalize='log10',
              flip=True,
              linewidths=0.5,
              sort_samples=True,
              metadata=metadata_file,
              xticklabels=True)

qza_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/table-l3.qza'

dokdo.heatmap(qza_file, ax=ax1, where="[body-site] IN ('gut')", cbar=False, yticklabels=True, **kwargs)
dokdo.heatmap(qza_file, ax=ax2, where="[body-site] IN ('left palm')", yticklabels=False, cbar=False, **kwargs)
dokdo.heatmap(qza_file, ax=ax3, where="[body-site] IN ('right palm')", yticklabels=False, cbar=False, **kwargs)
dokdo.heatmap(qza_file, ax=ax4, where="[body-site] IN ('tongue')", yticklabels=False, cbar_ax=ax5, **kwargs)

ax1.set_title('Gut')
ax2.set_title('Left palm')
ax3.set_title('Right palm')
ax4.set_title('Toungue')

plt.tight_layout()
plt.savefig('heatmap-2.png')

heatmap-2

Let me know what you think.

@khemlalnirmalkar
Copy link
Author

@sbslee Awesome,
Thank you so much for the update :)
I am sorry for keep asking, now i need one more suggestion,
If you see this image, for Prevotella, the abundance level is very low. if i want to avoid making a separate heatmap for Prevotella, Is it possible to stretch more the scale from 0-0.1? or increase the color range for scale 0-0.1.

image

Thanks again,

@sbslee
Copy link
Owner

sbslee commented Jul 28, 2021

@khemlalnirmalkar,

No worries, you are more than welcome to ask as many questions as you want!

Generally speaking, the dokdo.heatmap method uses the seaborn.heatmap method, and you can provide vmin and vmax arguments to both methods to play with scaling of the colormap. Those arguments weren't explicitly exposed to the dokdo.heatmap method before (you were still able to provide them as **kwargs), but I just updated so that they are now.

import dokdo
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set()
qza_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/table-l3.qza'

fig, [ax1, ax2] = plt.subplots(1, 2, figsize=(15, 7))

dokdo.heatmap(qza_file,
              ax=ax1,
              normalize='log10',
              flip=True,
              pretty_taxa=True,
              figsize=(10, 7))

dokdo.heatmap(qza_file,
              ax=ax2,
              normalize='log10',
              flip=True,
              pretty_taxa=True,
              vmin=-2,
              figsize=(10, 7))

plt.tight_layout()
plt.savefig('heatmap-1.png')

heatmap-1

That being said, I'm getting the impression that you want to adjust the coloring scale just for selected species (e.g. Prevotella species), and if that's the case, then it's not possible with the dokdo.heatmap method. In the heatmap, once you apply the normalization (e.g. log10 or CLR) then everything becomes relative to one another.

You could selectively perform some sort of scaling in your input pandas.DataFrame and then use the normalize=None setting for the heatmap, but I don't recommend that either because I think it could be misleading to the readers.

I think the best bet is drawing another heatmap for Prevotella species, as you mentioned, but I personally find your plot above perfectly acceptable, so I wouldn't even bother drawing another plot. But that's just my take! Feel free to ask any follow-up questions.

@khemlalnirmalkar
Copy link
Author

@sbslee earlier, I did try with vmin and vmix but was not helpful,
I will give it a try with a separate plot and will see, how it looks. Otherwise just one plot.
Thanks for your help,

@sbslee
Copy link
Owner

sbslee commented Jul 28, 2021

@khemlalnirmalkar,

Sounds good. One thing to note is that if you are going to plot a separate figure just for Prevotella species, you may have to create a pandas.DataFrame containing only those species. Otherwise, your normalization will still be affected by the presence of other extremely high or low species. Hope it works.

@khemlalnirmalkar
Copy link
Author

@sbslee Agreed..yes, the plan is use panda.dataframe only for Prevotella sps. Thanks for the suggestion :)

@sbslee sbslee closed this as completed Jul 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants