Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Mosaic plot and DataArray #779

Closed
s-celles opened this issue Mar 1, 2016 · 4 comments
Closed

ENH: Mosaic plot and DataArray #779

s-celles opened this issue Mar 1, 2016 · 4 comments

Comments

@s-celles
Copy link
Contributor

s-celles commented Mar 1, 2016

Hello,

I'd like to draw using Python a similar mosaic plot (also named Marimekko chart) of R dataset HairEyeColor.

https://github.com/wch/r-source/blob/trunk/src/library/datasets/data/HairEyeColor.R

Here is R code

> library(datasets)
> HairEyeColor
, , Sex = Male

       Eye
Hair    Brown Blue Hazel Green
  Black    32   11    10     3
  Brown    53   50    25    15
  Red      10   10     7     7
  Blond     3   30     5     8

, , Sex = Female

       Eye
Hair    Brown Blue Hazel Green
  Black    36    9     5     2
  Brown    66   34    29    14
  Red      16    7     7     7
  Blond     4   64     5     8

I can display a mosaic plot using R with the following example available thanks to ?HairEyeColor

require(graphics)
## Full mosaic
mosaicplot(HairEyeColor)
## Aggregate over sex (as in Snee's original data)
x <- apply(HairEyeColor, c(1, 2), sum)
x
mosaicplot(x, main = "Relation between hair and eye color")

I get:

r_mosaic_plot_haireyecolor

I try to produce a similar plot using Python.

I created xarray.DataArray using:

import numpy as np
import xarray


data = np.array([32, 53, 10, 3, 11, 50, 10, 30, 10, 25, 7, 5, 3, 15, 7, 8,
                 36, 66, 16, 4,  9, 34,  7, 64,  5, 29, 7, 5, 2, 14, 7, 8])

_dim = (4, 4, 2)
data = data.reshape(_dim[::-1])

_dims = ['Hair', 'Eye', 'Sex']
_coords = [['Black', 'Brown', 'Red', 'Blond'], 
           ['Brown', 'Blue', 'Hazel', 'Green'],
           ['Male', 'Female']]

data = xarray.DataArray(
    data, dims=_dims[::-1],
    coords=_coords[::-1], name='Number'
)

assert int(data.loc['Female', 'Green', 'Black']) == 2

and try to plot mosaic using:

import matplotlib.pyplot as plt
from statsmodels.graphics.mosaicplot import mosaic
mosaic(data)
plt.show()

I get:

python_mosaic_plot

I think there is some room for improvements.

Providing a da.plot.mosaic method may be a first feature.
Using labels will be an other interesting feature.

Kind regards

PS: a similar mosaic plot can be plot using Titanic dataset

https://github.com/wch/r-source/blob/trunk/src/library/datasets/data/Titanic.R

with R
r_mosaic_plot_titanic

with Python

import numpy as np
import xarray

data = np.array([  0,   0,  35,   0,
                   0,   0,  17,   0,
                 118, 154, 387, 670,
                   4,  13,  89,   3,
                   5,  11,  13,   0,
                   1,  13,  14,   0,
                  57,  14,  75, 192,
                 140,  80,  76,  20])

_dim = (4, 2, 2, 2)
data = data.reshape(_dim[::-1])

_dims = ['Class', 'Sex', 'Age', 'Survived']
_coords = [['1st', '2nd', '3rd', 'Crew'],
           ['Male', 'Female'],
           ['Child', 'Adult'],
           ['No', 'Yes']]

data = xarray.DataArray(
    data, dims=_dims[::-1],
    coords=_coords[::-1], name='Number'
)

assert int(data.loc['Yes', 'Adult', 'Male', '3rd']) == 75

python_mosaic_plot_titanic

PS2: Python/xarray datasets are available at https://github.com/Rdatasets/python

PS3: mosaic code can be found here:
https://github.com/statsmodels/statsmodels/blob/master/statsmodels/graphics/mosaicplot.py
https://github.com/statsmodels/statsmodels/blob/master/statsmodels/graphics/tests/test_mosaicplot.py

@s-celles
Copy link
Contributor Author

s-celles commented Mar 1, 2016

Converting to Pandas Series with hierachical index seems to help:

mosaic(hair_eye_color.data.to_series())

python_mosaic_haireyecolor_with_names

mosaic(titanic.data.to_series())

python_mosaic_titanic_with_names

It's not a clear as R mosaic plot... but that's not so bad.

What is your opinion about adding a mosaic plot method ?

@shoyer
Copy link
Member

shoyer commented Mar 2, 2016

Interesting -- I haven't encountered mosaic plots before. If it's as simple as writing mosaic(data.to_series()) or data.to_series().pipe(mosaic), then I would lean against including this in xarray's API, if only because the wrapper wouldn't do very much, and mosaic plotting is (I suspect) a relatively rare operation for most analyses.

@s-celles
Copy link
Contributor Author

s-celles commented Mar 2, 2016

@fmaussion
Copy link
Member

I tend to agree with @shoyer that mosaic plots are probably a quite unusual way to represent DataArrays (but I can only speak for me).

Since the actual mosaic plot is done by statsmodels you should probably propose your suggestions for melioration there?

@s-celles s-celles closed this as completed Mar 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants