pandas .resample() "how" deprecation as of its 0.19 version. Fix our daily(), monthly(), quarterly() #6

rsvp · 2016-11-05T18:56:27Z

Description of specific issue

When resampling a time-series the following warning(s) will appear:

FutureWarning: how in .resample() is deprecated
the new syntax is .resample(...).median() fill_method=None)

FutureWarning: .resample() is now a deferred operation
use .resample(...).mean() instead of .resample(...)

It is somewhat cryptic until one realizes how='median'
was being used as an argument to the .resample function.
So how becomes the problem for yi_fred module,
specifically for our functions
daily(), monthly(), and quarterly() in fecon235.

(Sidenote: how='median' since it is more robust than 'mean'.)

The second cryptic warning can be traced to our use of
fill_method=None when upsampling. The new API
urges us to instead use methods:

.backfill() : use NEXT valid observation to fill
.ffill() : propagate last valid observation forward to next valid
.fillna() : fill using nulls
.asfreq() : convert TimeSeries to specified frequency

Bug as of pandas 0.19
Enhancement

Expected behavior

No such warning, possibly fatal termination.

Observed behavior

Warnings started as of pandas 0.18

Why would the improvement be useful to most users?

Because daily(), weekly(), and monthly() in fecon235
should just work without the casual user needing to learn
obscure flags and methods (subject to future API changes).

Additional helpful details for bugs

Problem started recently, but not in older versions
Problem happens with all files, not only some files
Problem can be reliably reproduced
Problem happens randomly
fecon235 version: v4.16.1030
pandas version: 0.18
Python version: both 2.7 and 3
Operating system: cross-platform

The text was updated successfully, but these errors were encountered:

rsvp · 2016-11-05T19:34:59Z

An immediate remedy is to downgrade to pandas 0.18.0 or 0.18.1
if you fatally encounter this issue.

The problem summarized: for pandas API > 0.18, you can either
downsample OR upsample, but not both.

The prior API implementations would allow you to pass an aggregator function
(e.g. mean) even though you were upsampling, providing a bit of confusion.

Thus fecon235 resampling functions which have been working under
both upsampling and downsampling situations will break
e.g. see yi_fred code.

So is there a pandas way to detect which type of sampling is being requested
given the data argument? Otherwise, the fix may have to involve an additional
mandatory flag, and tedious edits across many fecon235 notebooks.

Esp. for index_delta_secs() and resample_main() to fix #6 daily(), monthly(), and quarterly().

rsvp · 2016-11-07T23:13:21Z

Key points in resolving this issue

Reliably infer the frequency of a DataFrame's index
Write a function to compare index frequencies and handle resampling
Let the machine decide whether downsampling or upsampling is appropriate
Hide the messy details from the casual user

pandas breaks previous API for resampling

The code fix will now require pandas 0.18 or higher
Accordingly, we increment our project from v4 to v5

Code which solves current issue

See https://git.io/fecon235-fred for the latest revision
For reference, we include the relevant portion from v5.16.1107 below:

def index_delta_secs( dataframe ):
    '''Find minimum in seconds between index values.'''
    nanosecs_timedelta64 = np.diff(dataframe.index.values).min()
    #  Picked min() over median() to conserve memory;      ^^^^^!
    #  also avoids missing values issue, 
    #  e.g. weekend or holidays gaps for daily data.
    secs_timedelta64 = tools.div( nanosecs_timedelta64, 1e9 )
    #  To avoid numerical error, we divide before converting type: 
    secs = secs_timedelta64.astype( np.float32 )
    if secs == 0.0:
        system.warn('Index contains duplicate, min delta was 0.')
        return secs
    else:
        return secs

    #  There are OTHER METHODS to get the FREQUENCY of a dataframe:
    #       e.g.  df.index.freq  OR  df.index.freqstr , 
    #  however, these work only if the frequency was attributed:
    #       e.g.  '1 Hour'       OR  'H'  respectively. 
    #  The fecon235 derived dataframes will usually return None.
    #  
    #  Two timedelta64 units, 'Y' years and 'M' months, are 
    #  specially treated because the time they represent depends upon
    #  their context. While a timedelta64 day unit is equivalent to 
    #  24 hours, there is difficulty converting a month unit into days 
    #  because months have varying number of days. 
    #       Other numpy timedelta64 units can be found here: 
    #  http://docs.scipy.org/doc/numpy/reference/arrays.datetime.html
    #  
    #  For pandas we could do:  pd.infer_freq( df.index )
    #  which, for example, might output 'B' for business daily series.
    #  
    #  But the STRING representation of index frequency is IMPRACTICAL
    #  since we may want to compare two unevenly timed indexes. 
    #  That comparison is BEST DONE NUMERICALLY in some common unit 
    #  (we use seconds since that is the Unix epoch convention).
    #
    #  Such comparison will be crucial for the machine 
    #  to chose whether downsampling or upsampling is appropriate.
    #  The casual user should not be expected to know the functions
    #  within index_delta_secs() to smoothly work with a notebook.


#  For details on frequency conversion, see McKinney 2013, 
#       Chp. 10 RESAMPLING, esp. Table 10-5 on downsampling.
#       pandas defaults are:  how='mean', closed='right', label='right'
#
#  2014-08-10  closed and label to the 'left' conform to FRED practices.
#              how='median' since it is more robust than 'mean'. 
#  2014-08-14  If upsampling, interpolate() does linear evenly, 
#              disregarding uneven time intervals.
#  2016-11-06  McKinney 2013 on resampling is outdated as of pandas 0.18


def resample_main( dataframe, rule, secs ):
    '''Generalized resample routine for downsampling or upsampling.'''
    #  rule is the offset string or object representing target conversion,
    #       e.g. 'B', 'MS', or 'QS-OCT' to be compatible with FRED.
    #  secs should be the maximum seconds expected for rule frequency.
    if index_delta_secs(dataframe) < secs:
        df = dataframe.resample(rule, closed='left', label='left').median()
        #    how='median' for DOWNSAMPLING deprecated as of pandas 0.18
        return df
    else:
        df = dataframe.resample(rule, closed='left', label='left').fillna(None)
        #    fill_method=None for UPSAMPLING deprecated as of pandas 0.18
        #    note that None almost acts like np.nan which fails as argument.
        #    interpolate() applies to those filled nulls when upsampling:
        #    'linear' ignores index values treating it as equally spaced.
        return df.interpolate(method='linear')


def daily( dataframe ):
    '''Resample data to daily using only business days.'''
    #                         'D' is used calendar daily
    #                         'B' for business daily
    secs1day2hours = 93600.0
    return resample_main( dataframe, 'B', secs1day2hours )


def monthly( dataframe ):
    '''Resample data to FRED's month start frequency.'''
    #  FRED uses the start of the month to index its monthly data.
    #                         'M'  is used for end of month.
    #                         'MS' for start of month.
    secs31days = 2678400.0
    return resample_main( dataframe, 'MS', secs31days )


def quarterly( dataframe ):
    '''Resample data to FRED's quarterly start frequency.'''
    #  FRED uses the start of the month to index its monthly data.
    #  Then for quarterly data: 1-01, 4-01, 7-01, 10-01.
    #                            Q1    Q2    Q3     Q4
    #  ________ Start at first of months,
    #  ________ for year ending in indicated month.
    #  'QS-OCT'
    secs93days = 8035200.0
    return resample_main( dataframe, 'QS-OCT', secs93days )

rsvp · 2018-06-25T07:34:21Z

2018 Addendum

The fecon235 source code was refactored in https://git.io/fecon236

Here's the specific module which fixes the issue:
https://github.com/MathSci/fecon236/blob/master/fecon236/host/fred.py

rsvp added bug warning labels Nov 5, 2016

rsvp added this to the pandas API break milestone Nov 5, 2016

rsvp changed the title ~~pandas .resample() "how" deprecation as of its 0.19 version. Need to fix our daily(), weekly(), monthly()~~ pandas .resample() "how" deprecation as of its 0.19 version. Fix our daily(), monthly(), quarterly() Nov 6, 2016

rsvp closed this as completed in 86fb993 Nov 7, 2016

rsvp added a commit that referenced this issue Nov 7, 2016

Add tests/test_fred.py to test lib/yi_fred.py module

3a6e8d0

Esp. for index_delta_secs() and resample_main() to fix #6 daily(), monthly(), and quarterly().

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pandas .resample() "how" deprecation as of its 0.19 version. Fix our daily(), monthly(), quarterly() #6

pandas .resample() "how" deprecation as of its 0.19 version. Fix our daily(), monthly(), quarterly() #6

rsvp commented Nov 5, 2016 •

edited

rsvp commented Nov 5, 2016

rsvp commented Nov 7, 2016

rsvp commented Jun 25, 2018

pandas .resample() "how" deprecation as of its 0.19 version. Fix our daily(), monthly(), quarterly() #6

pandas .resample() "how" deprecation as of its 0.19 version. Fix our daily(), monthly(), quarterly() #6

Comments

rsvp commented Nov 5, 2016 • edited

Description of specific issue

Expected behavior

Observed behavior

Why would the improvement be useful to most users?

Additional helpful details for bugs

rsvp commented Nov 5, 2016

rsvp commented Nov 7, 2016

Key points in resolving this issue

pandas breaks previous API for resampling

Code which solves current issue

rsvp commented Jun 25, 2018

2018 Addendum

rsvp commented Nov 5, 2016 •

edited