Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DelayedSaturatedMMM.load() error matching model ID #348

Closed
lewi0332 opened this issue Aug 6, 2023 · 5 comments · Fixed by #351
Closed

DelayedSaturatedMMM.load() error matching model ID #348

lewi0332 opened this issue Aug 6, 2023 · 5 comments · Fixed by #351

Comments

@lewi0332
Copy link

lewi0332 commented Aug 6, 2023

Hello everyone,

Perhaps I am doing something wrong here, but I am can't seem to save then load an MMM model. I am testing the MMM functionality in the latest 0.2.0 released last week. Super excited for all the new pieces. Thanks everyone.

Tried:

mmm = DelayedSaturatedMMM(
    model_config=model_config,
    date_column=DATE_COL,
    channel_columns=decay_channels,
    control_columns=control_variables,
    adstock_max_lag=12,
    yearly_seasonality=2,
)

mmm.fit(
    X=data.drop(target, axis=1),
    y=data[target],
    draws=SAMPLING_STEPS,
    tune=TUNNING_STEPS,
    target_accept=0.95,
    chains=4,
    random_seed=rng
    )

mmm.save(f'{path}/model.nc')

new_mmm = DelayedSaturatedMMM.load(f'{path}/model.nc')

Getting error

ValueError: The file 'training_results/2023-07-21_15:07_test_data_meta/model.nc' does not contain an inference data of the same model or configuration as 'DelayedSaturatedMMM'

I took a look at the mis-matched model.id != idata.attrs["id"] which is causing the error. These ID's are indeed different, and I haven't made any changes to the model setup. When I create a new DelayedSaturatedMMM the ID seems to be consistent at e33f4593f13b302a. When I try to load a saved DelayedSaturatedMMM it seems this intermediate step in the load function is creating a different ID of dc3bfba5e3342691:

model = cls(
            date_column=json.loads(idata.attrs["date_column"]),
            control_columns=json.loads(idata.attrs["control_columns"]),
            channel_columns=json.loads(idata.attrs["channel_columns"]),
            adstock_max_lag=json.loads(idata.attrs["adstock_max_lag"]),
            validate_data=json.loads(idata.attrs["validate_data"]),
            yearly_seasonality=json.loads(idata.attrs["yearly_seasonality"]),
            model_config=model_config,
            sampler_config=json.loads(idata.attrs["sampler_config"]),
        )

print(model.id)  # dc3bfba5e3342691

Thanks again everyone. Let me know if I am missing a step.

@ricardoV94
Copy link
Contributor

If you have any parameters defined as lists you should wrap them in numpy arrays.

Otherwise we may need to reproducible example to figure it out :)

@lewi0332
Copy link
Author

lewi0332 commented Aug 7, 2023

Thanks @ricardoV94.

The parameters you mention in the model_config? Good catch. If run the model without any changes to model_config I can save then load without issue

However, I just tested using np.array() in the model_config and got an error while calling the fit() method that np.arrays are not json serializable.

dummy_model = DelayedSaturatedMMM(date_column = '', channel_columns= '', adstock_max_lag = 12)
model_config = dummy_model.default_model_config

#Model config default from .default_model_config
"""
model_config = {'intercept': {'mu': 0, 'sigma': 2},
 'beta_channel': {'sigma': 2, 'dims': ('channel',)},
 'alpha': {'alpha': 1, 'beta': 3, 'dims': ('channel',)},
 'lam': {'alpha': 3, 'beta': 1, 'dims': ('channel',)},
 'sigma': {'sigma': 2},
 'gamma_control': {'mu': 0, 'sigma': 2, 'dims': ('control',)},
 'mu': {'dims': ('date',)},
 'likelihood': {'dims': ('date',)},
 'gamma_fourier': {'mu': 0, 'b': 1, 'dims': 'fourier_mode'}}
"""

# Set Priors from params dataframe (bad params i know... just testing something)
model_config['beta_channel']['sigma'] = hyperparams[hyperparams['primary_variable'].isin(decay_channels)]['beta'].values
model_config['alpha']['alpha'] = np.array([3 for i in decay_channels])
model_config['alpha']['beta'] = (((1 / hyperparams[hyperparams['primary_variable'].isin(decay_channels)]['ads_alpha'].values) * 3) - 3)
model_config['lam']['alpha'] = np.array([3 for i in decay_channels])
model_config['lam']['beta'] = (((1 / hyperparams[hyperparams['primary_variable'].isin(decay_channels)]['sat_gamma'].values) * 3) - 3)

# model_config after set priors:
"""
{'intercept': {'mu': 0, 'sigma': 2},
 'beta_channel': {'sigma': array([0.4533017 , 0.25488063, 0.14992924, 0.14492646, 0.07828438]),
  'dims': ('channel',)},
 'alpha': {'alpha': array([3, 3, 3, 3, 3]),
  'beta': array([3.55001301, 2.87092431, 2.83535104, 2.76894977, 2.91873807]),
  'dims': ('channel',)},
 'lam': {'alpha': array([3, 3, 3, 3, 3]),
  'beta': array([4.12231653, 5.02896872, 5.42851249, 5.54761511, 5.7041018 ]),
  'dims': ('channel',)},
 'sigma': {'sigma': 2},
 'gamma_control': {'mu': 0, 'sigma': 2, 'dims': ('control',)},
 'mu': {'dims': ('date',)},
 'likelihood': {'dims': ('date',)},
 'gamma_fourier': {'mu': 0, 'b': 1, 'dims': 'fourier_mode'}}
"""

mmm = DelayedSaturatedMMM(
    model_config=model_config,
    date_column='date',
    channel_columns=decay_channels,
    control_columns=control_variables,
    adstock_max_lag=12,
    yearly_seasonality=2,
)

# ---------------------------------------------------------------------
# Fit Model
# ---------------------------------------------------------------------

mmm.fit(
    X=data.drop(target, axis=1),
    y=data[target],
    draws=SAMPLING_STEPS,
    tune=TUNNING_STEPS,
    target_accept=0.95,
    chains=4,
    random_seed=rng
    )

Here's the error I see when I have np.arrays in the model_config dict:

File [~/miniconda3/envs/pymc/lib/python3.11/site-packages/pymc_experimental/model_builder.py:341](https://file+.vscode-resource.vscode-cdn.net/home/derricklewis/Documents/Data%20Science/MMM_development/~/miniconda3/envs/pymc/lib/python3.11/site-packages/pymc_experimental/model_builder.py:341), in ModelBuilder.set_idata_attrs(self, idata)
    [339](file:///home/derricklewis/miniconda3/envs/pymc/lib/python3.11/site-packages/pymc_experimental/model_builder.py?line=338) idata.attrs["version"] = self.version
    [340](file:///home/derricklewis/miniconda3/envs/pymc/lib/python3.11/site-packages/pymc_experimental/model_builder.py?line=339) idata.attrs["sampler_config"] = json.dumps(self.sampler_config)
--> [341](file:///home/derricklewis/miniconda3/envs/pymc/lib/python3.11/site-packages/pymc_experimental/model_builder.py?line=340) idata.attrs["model_config"] = json.dumps(self._serializable_model_config)
    [342](file:///home/derricklewis/miniconda3/envs/pymc/lib/python3.11/site-packages/pymc_experimental/model_builder.py?line=341) # Only classes with non-dataset parameters will implement save_input_params
    [343](file:///home/derricklewis/miniconda3/envs/pymc/lib/python3.11/site-packages/pymc_experimental/model_builder.py?line=342) if hasattr(self, "_save_input_params"):
...
--> [180](file:///home/derricklewis/miniconda3/envs/pymc/lib/python3.11/json/encoder.py?line=179)     raise TypeError(f'Object of type {o.__class__.__name__} '
    [181](file:///home/derricklewis/miniconda3/envs/pymc/lib/python3.11/json/encoder.py?line=180)                     f'is not JSON serializable')

TypeError: Object of type ndarray is not JSON serializable

Strangely. I can use an np.array in the 'beta_channel' : {'sigma': array([])} value and the model will mmm.fit() without an error, but if I use an array instead of a list in any of the other model_config values, I get the json error.

Even more strange... after I use the fit() method, the model_config dict has had the beta_channel['sigma'] converted to a list.

THanks for the response. I'll keep trying a few things.

@ricardoV94
Copy link
Contributor

Yes I think you can only use array for parameters? @michaelraczycki will perhaps spot the issue more quickly

@lewi0332
Copy link
Author

lewi0332 commented Aug 7, 2023

Hmm.

Sorry I didn't notice this before, but there is this in the delayed_saturated_mmm.py file on line 363:

    def _serializable_model_config(self) -> Dict[str, Any]:
        serializable_config = self.model_config.copy()
        if type(serializable_config["beta_channel"]["sigma"]) == np.ndarray:
            serializable_config["beta_channel"]["sigma"] = serializable_config[
                "beta_channel"
            ]["sigma"].tolist()
        return serializable_config

Looks like there is a process to convert the arrays that is only working on the beta_channel: sigma at the moment.

I added the others just now and it is working for me now. I can save and load. The model ID's match.

I'm a newb, but this is what I did that is working.

@property
    def _serializable_model_config(self) -> Dict[str, Any]:
        serializable_config = self.model_config.copy()
        for key in serializable_config:
            if isinstance(serializable_config[key], dict):
                for sub_key in serializable_config[key]:
                    if isinstance(serializable_config[key][sub_key], np.ndarray):
                        # Check if "dims" key to pass
                        if sub_key == "dims":
                            pass
                        # Convert all other  numpy arrays to lists
                        else:
                            serializable_config[key][sub_key] = serializable_config[key][sub_key].tolist()
        return serializable_config

@michaelraczycki
Copy link
Contributor

This is something that needs to be patched up, I didn't know if in current model definition priors with lists of arguments would be applicable for all variables, so I did only conversions for those that were already implementing list variables. It needs to be addressed in the next minor patch, probably I can get the PR in this week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants