-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
When the x-axis is all dates, and a user tries to plot a bar plot, pandas treats the dates as categorical values.
Take a simple pandas dataframe with date time values and plot it using bar plots:
import pandas as pd
df = pd.DataFrame(
dict(
date=pd.date_range(start="2020-01-01", end="2020-12-31", freq="MS"),
data=[1,2,3,4,5,6,7,8,9,10,11,12]
),
)
import matplotlib as mpl
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
fig = mpl.figure.Figure(constrained_layout=True)
axs = fig.subplot_mosaic("a")
ax = axs["a"]
df.plot.bar(x="date", y="data", ax=ax, legend=False) # incorrect year -> 1970 instead of 2020
formatter = mdates.DateFormatter("%Y - %b")
ax.xaxis.set_major_formatter(formatter)
fig
You'll get this:
There's unfortunately no way to bypass this.
Using x_compat
doesn't do anything:
with pd.plotting.plot_params.use("x_compat", True):
df.plot.bar()
And throws an error if you try to use it directly:
df.plot.bar(x_compat=True)
If I change the df
to this (i.e. more data points):
import pandas as pd
import numpy as np
date = pd.date_range(start="2020-01-01", end="2050-12-31", freq="MS")
df = pd.DataFrame(
dict(
date=date,
data=[i for i, x in enumerate(date)]
),
)
I get this:
imho, this is a bad user experience.
-
It takes significantly longer to plot because pandas is generating text labels for every data point
-
the plot labels are not useful to a user
-
users have no way to modify this plot to "fix" it because the x axis's data interval is categorical, i.e. 0 - N where N represents an integer corresponding to the last time period
-
users cannot annotate labels on this plot easily because the x position is now a categorical axis instead of datetime values.
fwiw, matplotlib does the right thing when the x axis are all dates:

Feature Description
Add a new option to df.plot.bar(...)
that skips treating datetime values as categorical data. df.plot(...)
already has use_index=False
and x_compat=True
. The former option is not useful imo but adding the latter option for bar plots would be great.
Alternative Solutions
Alternatively, consider passing datetime values to matplotlib always without considering them as categorical data.
This may be slightly breaking though?
Additional Context
This is currently the source of quite a bit of confusion when plotting bar plots with timeseries and line plots on the same ax
.
e.g.: https://stackoverflow.com/q/39560099
Suggestions include
- using
ax.twinx()
and setting the bar plot'sax
to invisible:
This is currently the best solution to this problem but imo is a little bit of a hack.
- using
use_index=False
for the line plot:
This makes the line plot difficult to further annotate (x axis values are still 0 - N, and a user cannot use the datetime to place annotation text) and the user still will run into issues with large number of categorical datetime labels.
For context, this enhancement proposal was because I didn't understand that bar plots always use categorical values in pandas and posted this question on stackoverflow: https://stackoverflow.com/q/78882352/5451769