Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: plot contains too many labels in the axis #47519

Closed
3 tasks done
K20shores opened this issue Jun 27, 2022 · 5 comments
Closed
3 tasks done

BUG: plot contains too many labels in the axis #47519

K20shores opened this issue Jun 27, 2022 · 5 comments
Labels
Bug Needs Info Clarification about behavior needed to assess issue Visualization plotting

Comments

@K20shores
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

times=pd.date_range(start='2021-01-01', end='2021-12-31', freq='1h')
adf = pd.DataFrame({
    'time': times,
    'A': np.random.rand(times.shape[0]),
    'B': np.random.rand(times.shape[0]) * 100
})

# uncomment to produce second image
#for col in adf.columns:
#     adf.loc[adf.sample(frac=0.7).index, col] = np.nan
adf.set_index('time', inplace=True)

fig, ax = plt.subplots(dpi=300)

ax2 = ax.twinx()

a = adf.plot(y='A', ax=ax, color='blue', alpha=0.3, legend=False)
b = adf.plot(y='B', ax=ax2, color='red', alpha=0.3, legend=False)

ax.legend([
    a.get_lines()[0],
    b.get_lines()[0]
], ['A', 'B'])

ax.xaxis.set_major_formatter(mdates.DateFormatter('%m-%d-%y'))
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=1))
ax.tick_params(axis='x', rotation=30)

Issue Description

When there are too many hourly data points with a date-time axis, formatting the date-time tick labels to show only one for the beginning in the month results in far too many tick labels and tick labels showing the wrong year. This is a bug within pandas. I made an issue for matplotlib and they redirected me to here.

image

Expected Behavior

An image like below should be possible.

image

Installed Versions

INSTALLED VERSIONS

commit : 4bfe3d0
python : 3.10.4.final.0
python-bits : 64
OS : Darwin
OS-release : 21.4.0
Version : Darwin Kernel Version 21.4.0: Fri Mar 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.4.2
numpy : 1.21.6
pytz : 2022.1
dateutil : 2.8.2
pip : 22.0.4
setuptools : 62.1.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.8.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.1
IPython : 8.2.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli :
fastparquet : None
fsspec : 2022.3.0
gcsfs : None
markupsafe : 2.1.1
matplotlib : 3.5.1
numba : 0.55.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.8.0
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : 2022.3.0
xlrd : None
xlwt : None
zstandard : None

@K20shores K20shores added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 27, 2022
@datapythonista
Copy link
Member

You are printing one year of one hour samples (365 * 24 = 8760) in the x axis of the plot, the first image seems to be what you're telling pandas to plot, aren't you? If you want one data point per month, you should use the resample method and change the granularity of your data before plotting. We don't have "intelligence" in pandas to plot what's reasonable, we plot the data as is. Am I understanding your problem correctly, or am I missing something?

@datapythonista datapythonista changed the title BUG: BUG: plot contains too many data points Jun 28, 2022
@K20shores
Copy link
Author

I wanted to plot of the data, but I only wanted ticks at the beginning of every month. Does that answer your question?

@datapythonista
Copy link
Member

I see. I'm using GitHub dark theme, and the axis labels aren't shown, since the background is transparent.

I think I'm still missing something. If the problem is only with the ticks, why both plots are different? Also, is this not happening when you're plotting one Series and not two?

Can you please try to simplify your example and problem as much as possible, and find the minimal example that shows the bug. I think it's too complex to understand right now.

@datapythonista datapythonista changed the title BUG: plot contains too many data points BUG: plot contains too many labels in the axis Jun 28, 2022
@K20shores
Copy link
Author

The two plots are different because the first contains all of the data. When I removed a large portion of the data, I'm properly able to set the proper number of ticks. So the only difference is the number of data points.

Still too many ticks

The same thing happens when there's only one column plotted

import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

times=pd.date_range(start='2021-01-01', end='2021-12-31', freq='1h')
adf = pd.DataFrame({
    'time': times,
    'A': np.random.rand(times.shape[0]),
})

adf.set_index('time', inplace=True)

fig, ax = plt.subplots(dpi=300)

adf.plot(y='A', ax=ax, color='blue', alpha=0.3, legend=False)

ax.xaxis.set_major_formatter(mdates.DateFormatter('%m-%d-%y'))
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=1))
ax.tick_params(axis='x', rotation=30)

This produces this image:

image

The proper number of ticks

Now, removing a large portion of the data produces a plot with ticks and labels only at the beginning of each month.

import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

times=pd.date_range(start='2021-01-01', end='2021-12-31', freq='1h')
adf = pd.DataFrame({
    'time': times,
    'A': np.random.rand(times.shape[0]),
})

#
# removing the data
#
for col in adf.columns:
     adf.loc[adf.sample(frac=0.7).index, col] = np.nan

adf.set_index('time', inplace=True)

fig, ax = plt.subplots(dpi=300)

adf.plot(y='A', ax=ax, color='blue', alpha=0.3, legend=False)

ax.xaxis.set_major_formatter(mdates.DateFormatter('%m-%d-%y'))
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=1))
ax.tick_params(axis='x', rotation=30)

image


I don't know what the issue is exactly, but in the matplotlib bug that I created, they indicated that this happens when pandas install it's own locator and formatter when a date-time axis is detected.

@mroeschke mroeschke added Visualization plotting Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 11, 2022
@WillAyd
Copy link
Member

WillAyd commented Apr 12, 2024

FWIW I cannot reproduce the OP. Maybe this can be closed?

import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

times=pd.date_range(start='2021-01-01', end='2021-12-31', freq='1h')
adf = pd.DataFrame({
    'time': times,
    'A': np.random.rand(times.shape[0]),
})

adf.set_index('time', inplace=True)

fig, ax = plt.subplots(dpi=300)

adf.plot(y='A', ax=ax, color='blue', alpha=0.3, legend=False)

This generates the following for me:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Info Clarification about behavior needed to assess issue Visualization plotting
Projects
None yet
Development

No branches or pull requests

4 participants