New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
monthly data with seasonality issue #1110
Comments
These approaches will be almost the same thing (if months were evenly spaced out and there were no leap years, they'd be totally identical). Seasonalities are specified in a unit of "days", but internally in the model there is nothing special about "days". Actually, internally time is converted to be a float between 0 and 1, and all of the model fitting and predicting happens in this transformed unit. This means that changing the data frequency by any constant will not in any way change the forecast. Here it does change the forecast slightly because the frequency was not changed by a constant since some months have different numbers of days, whereas after your transformation they are all evenly spaced. Is there a month missing from the data? I'm trying to figure out why there would be such a large spike, even when you've converted it to daily seasonality. Perhaps a better option for learning yearly seasonality with monthly data would be to just add a binary extra regressor for each month. That is, disable all seasonality and add a binary regressor "is_jan" that is 1 if the month is Jan, 0 otherwise. And so on for each month. No need to scale data dates. What does that look like? |
Thanks for the thorough explanations @bletham . Let me elaborate on some points you mentioned and provide more experiment results.
A question that i have is that because from a daily data perspective, a yearly seasonality is to capture patterns from a yearly frequency. So 365.25 observations per cycle makes sense.
No missing data. The data used for training is at a monthly level. From 2016-06 to 2019-05. Not sure if this matters but the 'ds', i have used 01 for the days . So the data goes by: [2016-06-01,2016-07-01,.... 2019-05-01] ( i understand there is a leap year effect. 2016-06-01 ! = 2017-06-01) To add on, on the contrary, we are expecting a large spike around dec. It is because the default yearly seasonality is not return a consistent fall in units at dec, especially 2019 which prompt this investigation and made me realized that the yearly seasonality is fluctuating on the dec part. The intuition we want is that dec should always have a sharp decline in units but 2019 seems to result in a decline that is not as much as expected. Also, might be a typo on your end but the daily method i actually used a custom seasonality of 12 so it shouldnt be a 'daily seasonality'
So here is the interesting finding. By disabling the yearly seasonality and instead add dummy regressors for the months results in a similar forecast as the 'daily method'. Both methods are able to give a consistent seasonal pattern (i.e the dip is the same for dec for example) as opposed to default prophet with yearly seasonal (i.e dip in dec is different on a per year basis but same every 4 years) For illustration, you can see the forecast given by:
for comparison, default prophet with yearly seasonality (period = 365.25) |
365.25 isn't the number of observations per cycle, it's just the length in real-time of a cycle. So, yearly seasonality fits a periodic function with a period of 365.25 days. That actual function is modeled using a Fourier series (https://en.wikipedia.org/wiki/Fourier_series) and so is continuous-time. That continuous, 365.25-period function can be fit with any number of observations, whether daily or monthly (though of course more data will allow for more reliable fitting, and monthly observations will leave many parts of the function not-pinned-down, as described in https://facebook.github.io/prophet/docs/non-daily_data.html#monthly-data ). I think what is happening here is that it is clearly a very sharp spike, and since the location of Dec 1 in the 365.25-day cycle varies from year to year (it falls back by 0.25 days every year until a leap year), that 0.25-day difference from year to year is actually producing noticeable changes. I suspect this is due to a combination of a very sharp seasonal effect, combined with the monthly data that make it hard for the model to identify the continuous-time yearly seasonality. This is the first time I've seen a noticeable effect from leap years in the yearly seasonality. I would definitely recommend the extra regressor approach year, and perhaps we should add to the documentation since it may be a better approach typically for monthly data. |
I'd think that keeping |
interesting. Not really. I think the above 2 solution is much more intuitive. Either model directly as a daily data or just add seasonal dummy. In fact, theoretically, seasonal dummy achieves the same as Fourier regressors and more beneficially when you are working with higher frequency data. |
I agree that the methods you suggested are more intuitive to you, because you thought of them in the first place :) |
i apologize since i am not able to understand the purpose of 1461 period. My guess is by setting the Fourier period to 1461, we are trying to add additional seasonality effect every 4 years? is the purpose to compensate for the difference? |
Here are results anyway: So i did what was suggested: adding yearly seasonality + additonal fourier with period 1461. In this case, the leap year effect is not eliminated. (which is something i wanted to do in the first place, since i want a consistent seasonality effect throughout my forecast instead of a leap year seasonality) |
hmm I see, thank you a bunch for sharing the results. I was wondering if adding a seasonality component for 4 years would handle the difference but it clearly doesn't. I went over #825 you referenced in your initial comment and dug into the code and now I get why it doesn't.
This means the 2016 - 2017 difference is what lets the yearly effect catch up in the |
Referencing #823
I am currently facing the same issue. Referencing the above, it makes sense why for monthly forecasting with yearly seasonal:
In order to try to combat this, an approach that i went with is to:
Then continue to forecast as usual and change the ds column back to monthly in forecast dataframe.
So this approach results in the custom column having a cycle of 12. i.e it repeats every 12 months. But comparing using walkforward cross validation, this approach seems to be slight slight worst off as compared to the default forecast (with the yearly seasonality of period 365.25)
So it is apparent that both methods are slightly different ( though we are only considering trend and trying to capture the yearly seasonal component). The first approach (default) results in a 4 year seasonal pattern while the second one (monthly as daily) is capturing a recurring yearly pattern. The issue i had with the first default (due to leap year) results in this:
This is the seasonality component from yearly column. notice that at the end of this year, the fall in units is not as much as the previous year. (which is due to the leap year issue)
and the second method (as described above) results in this: (seasonality with period =12 , 10 fourier order on my 'daily' (aka monthly data) - the xaxis are daily but its the same montly data
but cross validation wise, the first default seems to be slightly better. But i would prefer the second method. (because we are expecting dec 2019 to have a fall in units, but from the yearly fourier, the fall is 'damped')
Is there any advise you can give regarding this issue? is the daily method correct? and will i introduce any issues?
edit: Tested this method on various data we have. For some, daily method is better. But in terms of validation error, they are not too far apart.
edit2: Can i ask if the yearly seaonality is working as intended for monthly data (since this 365.25 parameter is to capture annual seasonality from a daily data perspective)
The text was updated successfully, but these errors were encountered: