Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TopDown returns NaN #255

Closed
jmoralez opened this issue Nov 29, 2023 · 10 comments
Closed

TopDown returns NaN #255

jmoralez opened this issue Nov 29, 2023 · 10 comments

Comments

@jmoralez
Copy link
Member

@jmoralez thanks for the response, in my case, my code was correct I provided the in-sample predictions in Y_df but still the TopDown results are "NaN". it works for the BottomUp and other methods that I tested like "OptimalCombination" and "MinTrace". So it is strange that is returning NaN for the TopDown method.

any recommendation, please ? below is a snippet of my code:

image

Originally posted by @mjsandoval04 in #253 (comment)

@jmoralez
Copy link
Member Author

Hey @mjsandoval04. Do you have zeros in your insample predictions?

@mjsandoval04
Copy link

as a matter of fact yes I do have some "0" values in some of my in-sample predictions, corresponding to "0" sales values for that period, here is a snippet of my data.
I tried to put a small value like "1" and a large value of "1000" but still got "NaN" for the TopDown method.
what should I do?

image

@jmoralez
Copy link
Member Author

What about for CES? I think there's a division by zero going on. Can you try adding some small values to both columns (y and CES)?

@mjsandoval04
Copy link

I have checked my data several times and forecast results (the Y_hat_df and Y_df) When it comes to the forecast "CES" there are no "zeros" (as it should be) and in the in-sample df only "y" has the "0" meaning there is forecast greater than 0 for that period although the actual sales were "0".

I have tried adding values to the zeros in column "y" and still I'm getting NaN. For example for the zeros make "y" equals to "CES".
I also changed the forecasting method for example "SES" and I'm getting the same results.

to my understanding if CES (which is the forecast is greater than 0) then the TopDown should return a value. am I missing something?
image

@jmoralez
Copy link
Member Author

Can you provide a reproducible example? The following works fine:

import numpy as np
import pandas as pd
from hierarchicalforecast.core import HierarchicalReconciliation
from hierarchicalforecast.methods import TopDown
from hierarchicalforecast.utils import aggregate

df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/tourism.csv')
df = df.rename({'Trips': 'y', 'Quarter': 'ds'}, axis=1)
df.insert(0, 'Country', 'Australia')
spec = [
    ['Country'],
    ['Country', 'State'], 
    ['Country', 'State', 'Region'], 
]
y_df, s_df, tags = aggregate(df, spec)
y_df = y_df.reset_index()
y_df.loc[y_df['unique_id'] == 'Australia', 'y'] = 0.1
y_df['model'] = np.random.rand(y_df.shape[0])
valid = y_df.groupby('unique_id').tail(12)
train = y_df.drop(valid.index)
hrec = HierarchicalReconciliation(reconcilers=[TopDown(method='average_proportions')])
hrec.reconcile(Y_hat_df=valid, Y_df=train, S=s_df, tags=tags)

@mjsandoval04
Copy link

yes, the example described in the lib documentation worked for me as well.
here is Jupiter notebook and I've uploaded the data for your reference (here is the link for the excel files)

data: https://drive.google.com/drive/folders/1Ix_noPRb70KUaMtMy9LYu-4xxcdHwq5O?usp=sharing
Jupiter NB:
TopDown returns NaN_test.zip

PS, apologies I'm a newbie when it comes to GitHub I don't know how to paste the code as you did so I just uploaded the files

@mjsandoval04
Copy link

Hello @jmoralez were u able to reproduce my example?

@jmoralez
Copy link
Member Author

Yes, you have a serie that is shorter than the others which produces null values in data2. You can use the following to add the missing dates and fill them with zero:

# %pip install utilsforecast if necessary
from utilsforecast.preprocessing import fill_gaps

data2['ds'] = pd.to_datetime(data2['ds'])
data2_filled = fill_gaps(data2.reset_index(), start='global', end='global', freq='M')
data2_filled = data2_filled.fillna(0)
p_rec = rec_model.reconcile(Y_hat_df=data1, Y_df=data2_filled, S=S_train, tags=tags)

@mjsandoval04
Copy link

@jmoralez thank you for the feedback brother! it works flawlessly :)

Copy link
Contributor

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants