Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a generic version of convert_to_annual function in scikits.timeseries #736

Closed
wesm opened this issue Feb 2, 2012 · 44 comments
Closed

Comments

@wesm
Copy link
Member

wesm commented Feb 2, 2012

The name is a misnomer. It's really a pivot operation

@timmie
Copy link
Contributor

timmie commented Feb 2, 2012

Link to scikits.timeseries meta issue:
https://github.com/wesm/pandas/issues/630

wesm added a commit that referenced this issue Apr 14, 2012
* timeseries: (200 commits)
  TST: don't use deprecated DateRange
  BUG: fix buglets surfacing from merge
  RLS: set released to false, bump dev version to 0.8.0
  BUG: fix major performance issue in DatetimeIndex.union affecting join performance on irregular indexes, remedying #1046
  ENH: add to_datetime method to Index, close #208
  ENH: legacy time rule support and refactoring, better alias handling. misc tests, #1041
  ENH: to_datetime will convert array of strings and NAs to datetime64 with NaT, close #999
  ENH: more datetime64 integration in core data algorithms per #996, close #1035
  ENH: handle datetime64 in block formation from dict of arrays in DataFrame constructor, close #1037
  BUG: fix broken time_rule usage in legacy DateRange, close #1036
  BUG: name inline method something different
  ENH: initial version of convert_to_annual for pandas, #736
  BUG: convert datetime64 -> datetime.datetime for matplotlib, close #1003
  ENH: integrate cython ohlc in groupby and test, close #152
  ENH: implement Cython OHLC function for groupby #152
  ENH: use cython bin groupers, fix bug in DatetimeIndex.__getitem causing slowness, some timeseries vbenches
  ENH: enable to_datetime to be vectorized, handle NAs, close #858
  TST: interactions between array of datetime objects and DatetimeIndex, bug fixes
  TST: remove errant foo and test_datetime64.py
  TST: moved test_datetime64.py tests to test_timeseries
  ...
@wesm
Copy link
Member Author

wesm commented May 18, 2012

This kind of stuff would be nice to be able to do too: http://stackoverflow.com/questions/10458493/pandas-how-to-plot-yearly-data-on-top-of-each-other

@timmie
Copy link
Contributor

timmie commented May 29, 2012

Is it planned to integrate this into 0.8 final?

@changhiskhan
Copy link
Contributor

No, we pushed this off to 0.9.0 for now

@wesm
Copy link
Member Author

wesm commented May 29, 2012

@timmie: there is a version of it already in pandas/tseries/util.py, but I need people like you to give me some more guidance on what features of it are actually required (we're talking about a 15 line function here). I suspect that pandas's groupby capability makes convert_to_annual less useful than it was in scikits.timeseries

@timmie
Copy link
Contributor

timmie commented May 29, 2012

OK, very nice.
I will give it a try until the end of the week.

Is it also possible to do such operations on dataframes?
Case:
I have a data frame with datetime index with temperature, precipitation and wind speed for 10 years. Now I would like to get the annual average for all 3 parameters at once.

@wesm
Copy link
Member Author

wesm commented May 29, 2012

Wouldn't df.resample('A') be what you want?

@wesm
Copy link
Member Author

wesm commented May 29, 2012

Or otherwise df.groupby(lambda x: x.year).mean()

@timmie
Copy link
Contributor

timmie commented May 30, 2012

Please help, I am getting the following errors:


>>> pivot_annual(ser, freq='A')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/pymodules/python2.7/pandas/tseries/util.py", line 58, in pivot_annual
    raise NotImplementedError(freq)
NotImplementedError: A

>>> pivot_annual(ser)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/pymodules/python2.7/pandas/tseries/util.py", line 58, in pivot_annual
    raise NotImplementedError(freq)
NotImplementedError: None
>>> 

@timmie
Copy link
Contributor

timmie commented May 30, 2012

Please addd util to namespace
I had to do

from pandas.tseries.util import pivot_annual

@timmie
Copy link
Contributor

timmie commented May 30, 2012

I suspect that

df.groupby(lambda x: x.year).mean()

Does not achieve a equal alignement over years for all parameters as commented in #736 (comment)

@wesm
Copy link
Member Author

wesm commented May 30, 2012

Why wouldn't it?

@timmie
Copy link
Contributor

timmie commented May 30, 2012

Hmm. I feel I need to set up a Ipython notebook somewhere to have a test playground.

@timmie
Copy link
Contributor

timmie commented May 30, 2012

And what would the NotImplementedError(freq) mean?

@wesm
Copy link
Member Author

wesm commented May 30, 2012

If you look at the source for pivot_annual, only D and M/BM frequencies are implemented.

@wesm
Copy link
Member Author

wesm commented May 30, 2012

You should also try df.resample('A', how='mean', kind='period'). It seems unlikely that convert_to_annual / pivot_annual is strictly necessary for what you're doing if the goal is to aggregate data.

@timmie
Copy link
Contributor

timmie commented May 30, 2012

OK, I see. Then there must be a misunderstanding here.

did you try ts.convert_to_annual with a timeseries of hourly frequency covering at > 1 year?

@wesm
Copy link
Member Author

wesm commented May 30, 2012

You're computing means over years or over hours? If I understand correctly now you want the mean value for each hour of the year across the data set? I agree that you can do that with convert_to_annual but not with groupby/resample. It would be very helpful to be looking at some real (or fake) data =P

@timmie
Copy link
Contributor

timmie commented May 30, 2012

If I understand correctly now you want the mean value for each hour of the year across the data set?

Yes. Could alos apply for daily, etc.

I agree that you can do that with convert_to_annual but not with groupby/resample. It would be very helpful to be looking at some real (or fake) data

Agree. I will come up with an example data soon.

@timmie
Copy link
Contributor

timmie commented May 31, 2012

It would be very helpful to be looking at some real (or fake) data =P

Here's the code:


import numpy as np
import scikits.timeseries as ts

# generate a time series with 10 years of random temperature data

data = np.random.uniform(low=-10.0, high=35.0, size=87671)
start_date = ts.Date(freq='H',year=2000,month=1,day=1, hour=1)

pytseries = ts.time_series(data, dtype=np.int, freq='H', 
                           start_date=start_date)



# alignment of years in respect of the leap days

pytseries_annual = ts.extras.convert_to_annual(pytseries)

# 10 years average of hourly temperatures
lt_ave = pytseries_annual.mean(0).size

I also have this as htmlnotebook file and print but I donno how to attach here.

@wesm
Copy link
Member Author

wesm commented May 31, 2012

Here is the pandas equivalent for daily data (one of the implemented frequencies) using pivot_annual:

In [21]: rng = date_range('1/1/2000', '1/1/2010', freq='D')

In [22]: ts = Series(np.random.randn(len(rng)), rng)

In [23]: util.pivot_annual(ts).mean(0)
Out[23]: 
1    -0.424473
2    -0.221603
3     0.388556
4     0.383484
5    -0.196986
6     0.152771
7    -0.430865
8    -0.278414
9     0.272135
10   -0.216587
11   -0.468601
12    0.010926
13    0.101746
14    0.112212
15   -0.462306
...
352   -0.417838
353    0.459750
354   -0.041899
355   -0.011174
356    0.220537
357    0.242754
358    0.313742
359    0.300249
360    0.040765
361    0.046944
362    0.040023
363   -0.217559
364    0.367547
365    0.119563
366    0.149244
Length: 366

I'll see if I can find the time in the next 10 days to finish the other frequencies--- the hardest part is writing test cases, honestly.

@timmie
Copy link
Contributor

timmie commented May 31, 2012

The htmlnotebook is now at:
http://www.sendspace.com/file/5ukamo

I'll see if I can find the time in the next 10 days to finish the other frequencies--- the hardest part is writing test cases, honestly.

Happy to read that we talked about teh same idea and purpose of the fuction.

For 0.9 I could see soem more improvements: although not being associated with a certain year, ths data is still datetime (months, days, hours, etc.). This is linked with my comment on the plots:
http://permalink.gmane.org/gmane.comp.python.pystatsmodels/8439

But one after the other.

@timmie
Copy link
Contributor

timmie commented Sep 23, 2012

Could you give some guidance for adding higher frequencies?
I woulf appreciate to have at least hourly & minutely.

@timmie
Copy link
Contributor

timmie commented Oct 14, 2012

While I am trying to get moving on the hourly freq., I still would like to get an opinion from @wesm

Why did we not copy the method at:
https://github.com/pierregm/scikits.timeseries/blob/master/scikits/timeseries/extras.py#L135

Thanks in advance & sorry if I am too pushy on this.

@wesm
Copy link
Member Author

wesm commented Oct 14, 2012

That method needs quite a bit of work to be adapted to work with pandas-- I already started doing it but ran out of time. It's just a matter of time and resources--note that you are the only person to ever bring this up with us.

@timmie
Copy link
Contributor

timmie commented Oct 14, 2012

It's just a matter of time and resources--

Noted that. And started to expand it by myself.

note that you are the only person to ever bring this up with us.

Allow one more question:

So would you suppose: Others do not need this functionality or they just have their own function based n what already exists within pandas?
In the case of the latter, I will ask for help on the ML or Stack-X-change.

@timmie
Copy link
Contributor

timmie commented Oct 22, 2012

@wesm:

I finally got a working solution for the hourly frequency. But it diverts a but from the existing function.

I found that using years as rows and hours (of the year) as columns is difficult in e.g. excel.

So I transposed the thing.

How would be best proceed.

Shall I share a working example via email?

If you are fine, I would then try to make a test and prepare a qualified PR.

BTW: let's rename this issue to "Create a generic version of convert_to_annual function in scikits.timeseriesfinalise in pivot_annual". This way it would get found found better under the right topic.

@timmie
Copy link
Contributor

timmie commented Nov 1, 2012

I have a code up for the hourly frequency.

please look at #2153

@ispmarin
Copy link

Sorry about digging something old as this, but is there a way to do year over year aggregation directly?

@jreback
Copy link
Contributor

jreback commented May 23, 2015

maybe show an example of what you are looking

@sinhrks
Copy link
Member

sinhrks commented May 23, 2015

@ispmarin Answered in so. Can you check?

@ispmarin
Copy link

Sure. Gonna test and will report back.

@ispmarin
Copy link

Works. Thanks!

This should be in the manual somewhere. It seems to be a very common question on stack overflow and very useful on financial circles.

@jreback
Copy link
Contributor

jreback commented May 24, 2015

@sinhrks let's add this recipe to ththe cookbook and close this issue then

@ispmarin
Copy link

I can help writing the docs, if needed.

@jreback
Copy link
Contributor

jreback commented May 28, 2015

@ispmarin that would be gr8!.

@jorisvandenbossche
Copy link
Member

BTW, we do have a pivot_annual (https://github.com/pydata/pandas/blob/master/pandas/tseries/util.py), although I don't think we should put that in the picture ..

@sinhrks
Copy link
Member

sinhrks commented May 28, 2015

Thanks for help, @ispmarin

pivot_annual looks to be replaced by dt property/accessor and TimeGrouper. Enhance time grouping docs and deprecate it?

@ispmarin
Copy link

So where do this issue stands? Is the answer on stack overflow the right way to do this, or there is another way?

@jorisvandenbossche
Copy link
Member

yes, @sinhrks 's answer at stackoverflow is the way to go

@ispmarin
Copy link

Ok, got it. How is the best way to create the doc? Create a ipython notebook, annotated? Is there any guidelines for it?

@wesm
Copy link
Member Author

wesm commented Jan 14, 2016

Can this issue be closed?

@jreback
Copy link
Contributor

jreback commented Jan 14, 2016

idea was to add a recipe in the cookbook
but not sure it's that big of a deal

@sinhrks sinhrks mentioned this issue Jul 19, 2016
4 tasks
@jreback jreback modified the milestones: 0.19.0, Someday Jul 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants