Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can plotnine make a matrix of scatterplots? #16

Closed
araichev opened this issue Jun 16, 2017 · 9 comments
Closed

Can plotnine make a matrix of scatterplots? #16

araichev opened this issue Jun 16, 2017 · 9 comments
Labels

Comments

@araichev
Copy link

Something similar to R's pairs or plotmatrix functions; see e.g. here?

@has2k1
Copy link
Owner

has2k1 commented Jun 17, 2017

You cannot do that with plotnine, though it is a goal of the project that a third party library should be able to implement such compound plots using plotnine. However, you can do it with seaborn.

@marcio-pg
Copy link

marcio-pg commented May 26, 2021

Sorry for reviving this post Is this answer still valid in 2021?

@TyberiusPrime
Copy link
Contributor

TyberiusPrime commented Jun 1, 2021

Still nothing 'out of the box' I believe.

Here's a start though:

import plotnine as p9, plotnine.data
import itertools


def plot_matrix(df, columns):
    pdf = []
    for a1, b1 in itertools.combinations(columns, 2):
        for (a,b) in ((a1, b1), (b1, a1)):
            sub = df[[a, b]].rename(columns={a: "x", b: "y"}).assign(a=a, b=b)
            pdf.append(sub)

    g = p9.ggplot(pd.concat(pdf))
    g += p9.geom_point(p9.aes('x','y'))
    g += p9.facet_grid('b~a', scales='free')
    return g#pd.concat(pdf)


plot_matrix(p9.data.economics, p9.data.economics.columns[1:])

image

@PaulHiemstra
Copy link

PaulHiemstra commented Feb 21, 2023

I was annoyed that the seaborn implementation of a pairplot did not allow me to change the shape of the point, only the color. This implementation in plotnine can do both shape and color.

Note that I use facet_wrap here and not facet_grid. This is to allow the use of scales='free'. It would be great if facet_grid allowed the scales to be free.

def get_data_for_column_combo(col1, col2, source, color_column, shape_column):
    '''
    Get the required data for a combination of two columns. Each call to this functions generates
    the data for one of the subplots in the pairplot. Color and shape data are appended as needed. 
    '''
    col_data = (source[[col1, col2]]
            .rename(columns={col1: 'values1', col2: 'values2'})
            .assign(col1=col1, col2=col2))
    if not color_column is None:
        col_data['color_column'] = source[color_column]
    if not shape_column is None:
        col_data['shape_column'] = source[shape_column]
    return col_data

def get_point_args(color_column, shape_column):
    '''
    Generate the appropriate input arguments to our geom_point. The names of the
    columns are fixed as these are generated in a standard way by `get_data_for_column_combo`. 
    But which should be included varies based on wheter or not color and shape are passed. 
    '''
    point_args = dict(x='values1', y='values2')

    if color_column is not None:
        point_args['color'] = 'color_column'
    if shape_column is not None:
        point_args['shape'] = 'factor(shape_column)'

    return point_args

def pairplot(source, columns, color_column=None, shape_column=None, use_facet_grid=False):
    '''
    This function creates a pairplot from the data in `source` based on the columns listed in `columns. 
    Optional arguments included passing a color and shape variabele, those will then determine the color and
    shape in the resulting pairplot. 

    By default we use `facet_wrap` as this allows us to use `scales='free'`. This is not how a pairplot
    usually works, so there is an option to force the use of `facet_grid` to get a more traditional plot.
    '''
    plot_data = pd.concat([
        get_data_for_column_combo(col1, col2, iris, color_column, shape_column) for col1, col2 in itertools.permutations(columns, 2)
    ])
    gg = ggplot(plot_data) + geom_point(aes(**get_point_args(color_column, shape_column)), alpha=0.4, size=2) 
    if use_facet_grid:
        return gg + facet_grid('col1 ~ col2')
    else:
        return gg + facet_wrap('~ col1 + col2', scales='free')

from plotnine import *
import pandas as pd
import itertools

pairplot(iris,['Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Petal.Width'], color_column='Species', shape_column='cluster')

pairplot

@PaulHiemstra
Copy link

@has2k1 Is there a third-party plotnine package where this function be sent to?

@b-lac
Copy link

b-lac commented Feb 21, 2023

@PaulHiemstra

I was annoyed that the seaborn implementation of a pairplot did not allow me to change the shape of the point, only the color. This implementation in plotnine can do both shape and color.

It looks like with seaborn you can change the markers in pairplot:

sns.pairplot(penguins, hue="species", markers=["o", "s", "D"])

img

@has2k1
Copy link
Owner

has2k1 commented Feb 21, 2023

@PaulHiemstra, I don't think there is. Please open a separate issue to track this.

@PaulHiemstra
Copy link

PaulHiemstra commented Feb 22, 2023

@PaulHiemstra

I was annoyed that the seaborn implementation of a pairplot did not allow me to change the shape of the point, only the color. This implementation in plotnine can do both shape and color.

It looks like with seaborn you can change the markers in pairplot:

sns.pairplot(penguins, hue="species", markers=["o", "s", "D"])

The issue here is that the colors and the markers are based on the same variable, this is not what I was looking for. I want to base the color on one variable, and shape on the other. the case of my visualisation this enables me to inspect the difference between the fitted clusters of a Kmeans and the underlying Species of Iris.

@PaulHiemstra
Copy link

@PaulHiemstra, I don't think there is. Please open a separate issue to track this.

I'll do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants