Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Let DataFrame.plot(kind="scatter") scatter plot each column against index if x/y not given. #51972

Open
2 of 3 tasks
randolf-scholz opened this issue Mar 14, 2023 · 3 comments
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@randolf-scholz
Copy link
Contributor

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

To my surprise, df.plot(kind="scatter") simply raises a ValueError complaining about missing "x" and "y".
In many cases, particularly time series data, the sensible thing to do is to scatter plot each channel against the index.

Feature Description

When x/y is not given scatter plot columns against the index.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

x = np.arange(0, 10, 0.1)
df = pd.DataFrame({"sin": np.sin(x), "cos": np.cos(x)}, index=x)

try:  # without subplots
    df.plot(kind="scatter")
except ValueError as e:
    print(e)
    fig, ax = plt.subplots()
    for col in df:
        ax.scatter(df.index, df[col], label=col)
    ax.legend()
    plt.show()

try:  # with subplots
    df.plot(subplots=True, kind="scatter")
except ValueError as e:
    print(e)
    fig, axes = plt.subplots(len(df.columns), 1)
    for ax, col in zip(axes, df):
        ax.scatter(df.index, df[col], label=col)
        ax.legend()
    plt.show()

Alternative Solutions

Of course, one could just do it manually.

Additional Context

No response

@randolf-scholz randolf-scholz added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 14, 2023
@luke396
Copy link
Contributor

luke396 commented Mar 15, 2023

The bug does exist in the current main branch.

Should we add default values (e.g., index for x) for x and y when the user does not input them? Currently, the code does not attempt to provide default values, so None is returned, which raises this error.

if backend_name == "pandas.plotting._matplotlib":
kwargs = dict(arg_def, **pos_args, **kwargs)
else:
kwargs = dict(pos_args, **kwargs)
x = kwargs.pop("x", None)
y = kwargs.pop("y", None)
kind = kwargs.pop("kind", "line")
return x, y, kind, kwargs

Also, maybe unrelatedly, it seems that kind='scatter' works just fine when inputting a Dataframe. Should we expand this to Series by setting x as s.index?

_dataframe_kinds = ("scatter", "hexbin")

if kind in self._dataframe_kinds:
if isinstance(data, ABCDataFrame):
return plot_backend.plot(data, x=x, y=y, kind=kind, **kwargs)
else:
raise ValueError(f"plot kind {kind} can only be used for data frames")

@luke396
Copy link
Contributor

luke396 commented Mar 16, 2023

Sorry for my rushed reply. After a deeper view, adding something for the ScatterPlot class may be better for this issue.

class ScatterPlot(PlanePlot):

@joshdunnlime
Copy link

joshdunnlime commented Jan 22, 2024

Allowing scatter to follow the df.plot() notation and functionality seems sensible to me. That is to say, using the index as the default x-axis and plotting all columns as y values, with the first column taking the first colour in the colour palette and the second column taking the colour of the second colour in the colour palette.

This would intuitively allow users to switch from: df.plot() to df.plot.scatter(), without changing any kwargs, when realising there data is not order or not sequential.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

3 participants