get_baseline_data does not partition data (using daily data set). #363

sc0ttyg · 2019-08-12T23:05:01Z

Report installed package versions

eemeter==2.7.2
pandas==0.23.4
scipy==1.3.0
numpy==1.16.4

Describe the bug
The get_baseline_data function with option max_days = 365 returns the input dataframe, not a version subsetted to 365 days.

Include a short, self-contained Python snippet reproducing the problem. You can
format the code nicely by using GitHub Flavored Markdown:

>>> In [1]: import eemeter

>>> In [2]: import pandas as pd

>>> In [3]: meter_data, temperature_data, metadata = \
...:     eemeter.load_sample('il-electricity-cdd-hdd-daily')
>>> In [5]: data = eemeter.create_caltrack_daily_design_matrix(meter_data, temperature_data)
...:
>>> In [6]: baseline_data, warnings = eemeter.get_baseline_data(data, max_days=365)
>>> In [7]: baseline_data.equals(data)
>>>  Out[7]: True
>>> In [8]: eemeter.get_version()
>>>  Out[8]: '2.7.2'
>>> In [9]: pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.4
pytest: None
pip: 19.1.1
setuptools: 41.0.1
Cython: None
numpy: 1.16.4
scipy: 1.3.0
pyarrow: None
xarray: None
IPython: 7.6.1
sphinx: None
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.1.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: 1.3.6
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

   >>> In [10]: import scipy
   >>> In [12]: scipy.__version__
   >>> Out[12]: '1.3.0'
   >>>  In [13]: import numpy
   >>> In [15]: numpy.__version__
   >>> Out[15]: '1.16.4'
   >>> In [16]: len(baseline_data)
   >>> Out[16]: 810
   >>> In [17]: len(data)
   >>> Out[17]: 810

Expected behavior

Expect a dataframe of length 365 days over only the first 365 days of data.

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

philngo · 2019-08-13T04:34:22Z

@sc0ttyg The get_baseline_data function requires an end date to be set (this function assumes you want to go some number of days back before a project or intervention). Correct usage is the following:

eemeter.get_baseline_data(meter_data, max_days=365, end=datetime(...))

There's a bit more detail on this behavior in the docs for get_baseline_data: http://eemeter.openee.io/api.html#eemeter.get_baseline_data, though it's easy to miss. This behavior might warrant a warning - if so, I'd be happy to take a look at a pull request.

max_days (int) – The maximum length of the period. Ignored if end is not set. The stricter of this or start is used to determine the earliest allowable baseline period date.

Reading between the lines here, if you want to get data selecting forward from a date (e.g., the start of your data), then you can use the get_reporting_data method, which allows that operation. Although that also is admittedly a bit unintuitive if you're selecting down to baseline data.

import eemeter
import pandas as pd
meter_data = pd.DataFrame({'value': 1}, index=pd.date_range(start='2013-01-01', end='2019-01-01', freq='D', tz='utc'))
eemeter.get_reporting_data(meter_data, start=meter_data.index[0], max_days=365)

If neither of these methods quite matches your use case, the ultimate flexibility is also available by selecting on the pandas DatetimeIndex as well.

sc0ttyg · 2019-08-13T22:51:41Z

@philngo Great, thanks for the clarification. I'll consider a pull request after I get to know the code a little better.

philngo · 2019-08-13T23:07:01Z

@sc0ttyg Thanks for reaching out! I'm going to go ahead and close this issue. Please consider helping our developer community by filling out our first-time issue/PR contributor survey.

philngo closed this as completed Aug 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get_baseline_data does not partition data (using daily data set). #363

get_baseline_data does not partition data (using daily data set). #363

sc0ttyg commented Aug 12, 2019

philngo commented Aug 13, 2019

sc0ttyg commented Aug 13, 2019

philngo commented Aug 13, 2019

get_baseline_data does not partition data (using daily data set). #363

get_baseline_data does not partition data (using daily data set). #363

Comments

sc0ttyg commented Aug 12, 2019

INSTALLED VERSIONS

philngo commented Aug 13, 2019

sc0ttyg commented Aug 13, 2019

philngo commented Aug 13, 2019