Skip to content

autocorr() and autocorrelation_plot() output mismatch #24608

@guocheng

Description

@guocheng
import numpy as np
import pandas as pd
from pandas.plotting import autocorrelation_plot
import matplotlib.pyplot as plt

dr = pd.date_range(start='1984-01-01', end='1984-12-31')

df = pd.DataFrame(np.arange(len(dr)), index=dr, columns=["Values"])
autocorrelation_plot(df)
plt.show()

image

The autocorr() produces a different result:

for i in range(0,366):
    print(df['Values'].autocorr(lag=i)) # output is 1 for all lags

Problem description

Pandas's autocorr() method uses the mean and std of each Series (modified original series and lagged series) for calculation. The autocorrelation_plot method used the mean and std of the unmodified original series for calculation.

Expected Output

autocorr() and autocorrelation_plot() should output the same result.

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Darwin
OS-release: 18.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: None
pip: 18.1
setuptools: 35.0.2
Cython: 0.28.5
numpy: 1.15.1
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: None
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.3
openpyxl: 2.5.7
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.7.0

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions