Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Normal distribution got invalid loc parameter. #124

Closed
steven-struglia opened this issue Dec 19, 2022 · 24 comments
Closed

ValueError: Normal distribution got invalid loc parameter. #124

steven-struglia opened this issue Dec 19, 2022 · 24 comments
Assignees

Comments

@steven-struglia
Copy link

steven-struglia commented Dec 19, 2022

Hi! I'm attempting to recreate the sample presented in PyData 2022 seen here with some of my own MMM data: https://github.com/takechanman1228/mmm_pydata_global_2022/blob/main/simple_end_to_end_demo_pydataglobal.ipynb

data = data.tail(150)
data_size = len(data)

n_media_channels = len(mdsp_cols)
n_extra_features = len(control_vars)
media_data = data[mdsp_cols].to_numpy()
extra_features = data[control_vars].to_numpy()
target = data['y'].to_numpy()
costs = data[mdsp_cols].sum().to_numpy()
media_scaler = preprocessing.CustomScaler(divide_operation=jnp.mean)
extra_features_scaler = preprocessing.CustomScaler(divide_operation=jnp.mean)
target_scaler = preprocessing.CustomScaler(divide_operation=jnp.mean)
cost_scaler = preprocessing.CustomScaler(divide_operation=jnp.mean, multiply_by=0.15)

media_data_train = media_scaler.fit_transform(media_data_train)
extra_features_train = extra_features_scaler.fit_transform(extra_features_train)
target_train = target_scaler.fit_transform(target_train)
costs = cost_scaler.fit_transform(costs)
mmm = lightweight_mmm.LightweightMMM(model_name="hill_adstock")

number_warmup=1000
number_samples=1000

mmm.fit(
    media=media_data_train,
    media_prior=costs,
    target=target_train,
    extra_features=extra_features_train,
    number_warmup=number_warmup,
    number_samples=number_samples,
    media_names = mdsp_cols,
    seed=105)

The below error is displayed:

ValueError                                Traceback (most recent call last)
/tmp/ipykernel_3869/3074029020.py in <module>
     12     number_samples=number_samples,
     13     media_names = mdsp_cols,
---> 14     seed=105)

/opt/conda/lib/python3.7/site-packages/lightweight_mmm/lightweight_mmm.py in fit(self, media, media_prior, target, extra_features, degrees_seasonality, seasonality_frequency, weekday_seasonality, media_names, number_warmup, number_samples, number_chains, target_accept_prob, init_strategy, custom_priors, seed)
    370         transform_function=self._model_transform_function,
    371         weekday_seasonality=weekday_seasonality,
--> 372         custom_priors=custom_priors)
    373 
    374     self.custom_priors = custom_priors

/opt/conda/lib/python3.7/site-packages/numpyro/infer/mcmc.py in run(self, rng_key, extra_fields, init_params, *args, **kwargs)
    595         else:
    596             if self.chain_method == "sequential":
--> 597                 states, last_state = _laxmap(partial_map_fn, map_args)
    598             elif self.chain_method == "parallel":
    599                 states, last_state = pmap(partial_map_fn)(map_args)

/opt/conda/lib/python3.7/site-packages/numpyro/infer/mcmc.py in _laxmap(f, xs)
    158     for i in range(n):
    159         x = jit(_get_value_from_index)(xs, i)
--> 160         ys.append(f(x))
    161 
    162     return tree_map(lambda *args: jnp.stack(args), *ys)

/opt/conda/lib/python3.7/site-packages/numpyro/infer/mcmc.py in _single_chain_mcmc(self, init, args, kwargs, collect_fields)
    384                 init_params,
    385                 model_args=args,
--> 386                 model_kwargs=kwargs,
    387             )
    388         sample_fn, postprocess_fn = self._get_cached_fns()

/opt/conda/lib/python3.7/site-packages/numpyro/infer/hmc.py in init(self, rng_key, num_warmup, init_params, model_args, model_kwargs)
    705             )
    706         init_params = self._init_state(
--> 707             rng_key_init_model, model_args, model_kwargs, init_params
    708         )
    709         if self._potential_fn and init_params is None:

/opt/conda/lib/python3.7/site-packages/numpyro/infer/hmc.py in _init_state(self, rng_key, model_args, model_kwargs, init_params)
    657                 model_args=model_args,
    658                 model_kwargs=model_kwargs,
--> 659                 forward_mode_differentiation=self._forward_mode_differentiation,
    660             )
    661             if self._init_fn is None:

/opt/conda/lib/python3.7/site-packages/numpyro/infer/util.py in initialize_model(rng_key, model, init_strategy, dynamic_args, model_args, model_kwargs, forward_mode_differentiation, validate_grad)
    674             with numpyro.validation_enabled(), trace() as tr:
    675                 # validate parameters
--> 676                 substituted_model(*model_args, **model_kwargs)
    677                 # validate values
    678                 for site in tr.values():

/opt/conda/lib/python3.7/site-packages/numpyro/primitives.py in __call__(self, *args, **kwargs)
    103             return self
    104         with self:
--> 105             return self.fn(*args, **kwargs)
    106 
    107 

/opt/conda/lib/python3.7/site-packages/numpyro/primitives.py in __call__(self, *args, **kwargs)
    103             return self
    104         with self:
--> 105             return self.fn(*args, **kwargs)
    106 
    107 

/opt/conda/lib/python3.7/site-packages/lightweight_mmm/models.py in media_mix_model(media_data, target_data, media_prior, degrees_seasonality, frequency, transform_function, custom_priors, transform_kwargs, weekday_seasonality, extra_features)
    433 
    434   numpyro.sample(
--> 435       name="target", fn=dist.Normal(loc=mu, scale=sigma), obs=target_data)

/opt/conda/lib/python3.7/site-packages/numpyro/distributions/distribution.py in __call__(cls, *args, **kwargs)
     97             if result is not None:
     98                 return result
---> 99         return super().__call__(*args, **kwargs)
    100 
    101 

/opt/conda/lib/python3.7/site-packages/numpyro/distributions/continuous.py in __init__(self, loc, scale, validate_args)
   1700         batch_shape = lax.broadcast_shapes(jnp.shape(loc), jnp.shape(scale))
   1701         super(Normal, self).__init__(
-> 1702             batch_shape=batch_shape, validate_args=validate_args
   1703         )
   1704 

/opt/conda/lib/python3.7/site-packages/numpyro/distributions/distribution.py in __init__(self, batch_shape, event_shape, validate_args)
    177                         raise ValueError(
    178                             "{} distribution got invalid {} parameter.".format(
--> 179                                 self.__class__.__name__, param
    180                             )
    181                         )

ValueError: Normal distribution got invalid loc parameter.

I've checked for null values in my observation data, and none are present. In addition, I removed zero-cost channels from the data model after checking that a few had zero-cost after scaling and referred to the answer in #115 as such. I also tried scaling down the number of rows and the number of columns that are fed into the model, but none of those have helped get past this error. Please let me know what I can do to diagnose this model. Thanks in advance.

@michevan
Copy link
Collaborator

thanks for the info here! can you share the values you're passing to the media_prior argument when you run mmm.fit()?

@steven-struglia
Copy link
Author

steven-struglia commented Dec 19, 2022

DeviceArray([1.3611459e+00, 5.7213748e-01, 0.0000000e+00, 2.6670545e-02,
             0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
             1.6257154e-02, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
             0.0000000e+00, 0.0000000e+00, 1.4914073e-04, 0.0000000e+00,
             2.1849802e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
             4.6400268e-02, 0.0000000e+00, 8.5936986e-02, 2.8591064e-01,
             0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
             1.4479518e-01, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
             0.0000000e+00, 9.9668846e-02, 1.0145502e+00, 0.0000000e+00,
             4.1909981e-01, 0.0000000e+00, 6.0074413e-03, 0.0000000e+00,
             6.2639105e-05, 0.0000000e+00, 1.9774857e-01, 3.1633526e-01,
             1.0129293e-01, 0.0000000e+00, 3.0524179e-01, 0.0000000e+00,
             0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 4.7154009e-02,
             0.0000000e+00, 2.7839604e-03, 1.4890094e+00, 1.1559477e-01,
             0.0000000e+00, 1.2118348e-02, 2.1755135e-01, 0.0000000e+00,
             1.0961844e-05, 0.0000000e+00, 3.8138619e-01], dtype=float32)

This is the costs data being put into media_prior on fit() @michevan

@steven-struglia
Copy link
Author

Update: I thought costs wouldn't have any zero values, turns out it does even when I run a function to remove those features. Let me get back to you on this.

@michevan
Copy link
Collaborator

perfect, thank you! I see there are still a bunch of zeros here and that's probably what is causing the issue. The default prior for each channel is a half-normal distribution with mean zero and standard deviation equal to the values you're passing here, so when that value is zero the prior gets difficult to define. Changing to a small non-zero value should fix the issue. More generally though, media channels in an MMM should usually have non-zero costs, especially if you want to compute ROIs later in the process and perform channel optimization.

@steven-struglia
Copy link
Author

DeviceArray([5.8334827e-01, 2.4520180e-01, 1.1430235e-02, 6.9673518e-03,
             6.3917461e-05, 9.3642014e-01, 1.9885831e-02, 3.6830138e-02,
             1.2253314e-01, 6.2055077e-02, 4.2715222e-02, 4.3480727e-01,
             1.7961422e-01, 2.5746180e-03, 2.6845333e-05, 8.4749393e-02,
             1.3557225e-01, 4.3411259e-02, 1.3081792e-01, 2.0208862e-02,
             1.1931260e-03, 6.3814694e-01, 4.9540617e-02, 5.1935781e-03,
             9.3236297e-02, 4.6979335e-06, 1.6345124e-01], dtype=float32)

With this data as the media_prior this still runs into the invalid loc parameter error on fit()

@michevan
Copy link
Collaborator

thanks for the update! This one is trickier; I think those values should be okay for the solver. Is there any chance that your reduced set of costs has a different number of channels now than your media_data_train, target_train, and extra_features_train?

@steven-struglia
Copy link
Author

media_data_train.shape: (126, 27)
target_train.shape: (126, )
costs.shape: (27, )
extra_features_train.shape: (126, 67)

@steven-struglia
Copy link
Author

steven-struglia commented Dec 19, 2022

These look good to me. As far as I know, the extra_features shouldn't have a cost assigned to them (if I'm not mistaken?)

@michevan
Copy link
Collaborator

Thank you! And yes, those dimensions look logically sound to me, so that doesn't seem to be the issue!

One thing that I notice (not exactly what's causing your issue, but it may help) is that you have too many features in your model. Very roughly, you have 3x27 parameters for your media channels plus at least 67 more parameters for your extra features, and this is already 148 features (there are a few more internally like the seasonality components), for which you only have 126 target data points. You probably should reduce your number of features by a factor of like 5 or 10 in order to get good model convergence, and this might also (hopefully) help surface whatever issue is causing the invalid loc parameter here too.

@steven-struglia
Copy link
Author

I was subsampling the rows down to that number -- I have around 1100 rows of data but figured that was too much to throw at this. If I include all ~1100 rows of data, should I still try to scale down the number of columns as well? We have a lot of media channels that we optimize our ads spend for, so ideally I'd be able to include all of those channels in this model.

@michevan
Copy link
Collaborator

it's worth trying with the full dataset but you're probably right that it's too large. I'd try different combinations and see what works; it's usually best to start with just a few channels, get a working model, and then add more iteratively.

@steven-struglia
Copy link
Author

I tried it with 126 rows of data, 4 media channel columns (ones we spend on often so no zero-cost), and 14 extra features (dummy-encoded [0,1] holiday features) and still got the same error. I'm not certain it's a data-sizing issue at this point :/

@michevan
Copy link
Collaborator

Yeah it sounds like something else is going wrong! I just pushed an update to the example Colabs that adds some data quality checks, can you run your dataset through those and see if anything comes up?

@steven-struglia
Copy link
Author

MissingDataError: exog contains inf or nans was the response from the data quality check run

@steven-struglia
Copy link
Author

np.isnan(media_scaler.transform(media_data)).any() returns True
np.isnan(media_data).any() returns False

Something is off with the scaler that I'm using....

media_scaler = preprocessing.CustomScaler(divide_operation=jnp.mean)
extra_features_scaler = preprocessing.CustomScaler(divide_operation=jnp.mean)
target_scaler = preprocessing.CustomScaler(divide_operation=jnp.mean)
cost_scaler = preprocessing.CustomScaler(divide_operation=jnp.mean, multiply_by=0.15)

media_data_train = media_scaler.fit_transform(media_data_train)
extra_features_train = extra_features_scaler.fit_transform(extra_features_train)
target_train = target_scaler.fit_transform(target_train)
costs = cost_scaler.fit_transform(costs)

Does this workflow look correct to you?

@steven-struglia
Copy link
Author

If I remove the scaler from everything (media_data, target, costs, and extra_features) I'm able to run through the rest of the Colab notebook successfully. It's only when incorporating the scaler that I run into issues.

@steven-struglia
Copy link
Author

Interestingly when I increase the number of rows or features fed into the model, it ends up hitting another error when I don't scale the data prior to fit():

RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_18316/1058065513.py in <module>
     10     extra_features=extra_features_train,
     11     media_names = mdsp_cols,
---> 12     seed=105)

/opt/conda/lib/python3.7/site-packages/lightweight_mmm/lightweight_mmm.py in fit(self, media, media_prior, target, extra_features, degrees_seasonality, seasonality_frequency, weekday_seasonality, media_names, number_warmup, number_samples, number_chains, target_accept_prob, init_strategy, custom_priors, seed)
    370         transform_function=self._model_transform_function,
    371         weekday_seasonality=weekday_seasonality,
--> 372         custom_priors=custom_priors)
    373 
    374     self.custom_priors = custom_priors

/opt/conda/lib/python3.7/site-packages/numpyro/infer/mcmc.py in run(self, rng_key, extra_fields, init_params, *args, **kwargs)
    595         else:
    596             if self.chain_method == "sequential":
--> 597                 states, last_state = _laxmap(partial_map_fn, map_args)
    598             elif self.chain_method == "parallel":
    599                 states, last_state = pmap(partial_map_fn)(map_args)

/opt/conda/lib/python3.7/site-packages/numpyro/infer/mcmc.py in _laxmap(f, xs)
    158     for i in range(n):
    159         x = jit(_get_value_from_index)(xs, i)
--> 160         ys.append(f(x))
    161 
    162     return tree_map(lambda *args: jnp.stack(args), *ys)

/opt/conda/lib/python3.7/site-packages/numpyro/infer/mcmc.py in _single_chain_mcmc(self, init, args, kwargs, collect_fields)
    384                 init_params,
    385                 model_args=args,
--> 386                 model_kwargs=kwargs,
    387             )
    388         sample_fn, postprocess_fn = self._get_cached_fns()

/opt/conda/lib/python3.7/site-packages/numpyro/infer/hmc.py in init(self, rng_key, num_warmup, init_params, model_args, model_kwargs)
    705             )
    706         init_params = self._init_state(
--> 707             rng_key_init_model, model_args, model_kwargs, init_params
    708         )
    709         if self._potential_fn and init_params is None:

/opt/conda/lib/python3.7/site-packages/numpyro/infer/hmc.py in _init_state(self, rng_key, model_args, model_kwargs, init_params)
    657                 model_args=model_args,
    658                 model_kwargs=model_kwargs,
--> 659                 forward_mode_differentiation=self._forward_mode_differentiation,
    660             )
    661             if self._init_fn is None:

/opt/conda/lib/python3.7/site-packages/numpyro/infer/util.py in initialize_model(rng_key, model, init_strategy, dynamic_args, model_args, model_kwargs, forward_mode_differentiation, validate_grad)
    697                                 )
    698             raise RuntimeError(
--> 699                 "Cannot find valid initial parameters. Please check your model again."
    700             )
    701     return ModelInfo(

RuntimeError: Cannot find valid initial parameters. Please check your model again.

@michevan
Copy link
Collaborator

For the scalers, since they're dividing by the mean, it sounds like you may have some channels with zero impressions? That would produce the NaNs when applying the scaler to the media data.

@michevan michevan self-assigned this Dec 22, 2022
@steven-struglia
Copy link
Author

steven-struglia commented Dec 23, 2022

Ok that makes sense mathematically. I will triple check the data I'm feeding into the model to make sure that there's no zero channels. Also quick sidenote, I'm using data that represents spend across channels instead of impressions across channel, and my costs vector is simply the total sum of spend in each channel with shape 1xdim(channels). I assume this is still ok to use with LightweightMMM? If not, let me know. I will get back to you regarding the data quality, I'm hoping that's the answer to this one. Thanks!

@steven-struglia
Copy link
Author

steven-struglia commented Dec 23, 2022

Ok I now have this working on data representing spend across channel.

  1. Use sum(channel_spend) as cost for channel in costs
  2. Ensure that there are no zero-sum features in any of the data (media_data AND extra_features)
  3. Don't use an enormous amount of data (shrink rows and columns to a reasonable amount)
    a. If you use an enormous dataset, chances are you are going to run into a RuntimeError: Cannot find valid initial parameters. Please check your model again.

@michevan Somewhat related to this bug, I noticed switching from a dummy-encoded seasonality feature-space (one column for each week number and holiday with 0-1 indicator values) to the prophet-generated seasonality features (trend, holiday, season) helped to get the model to run. Do you have any recommendations on what format the extra features (notably for seasonality and holiday capture) should be in?

@michevan
Copy link
Collaborator

I think those three points sound correct, yes! And for the extra features, I think it depends on the details of your data, but it's okay to pass either binary or continuous values to the extra features, so if you're finding more success with continuous features that sounds fine to me!

@michevan michevan closed this as completed Jan 3, 2023
@YohanMedalsy
Copy link

Hi I am getting the same error and I've determined that the problem is the scaler specifically on the media_data. I have no extra_Features for now and have succeeded in training a model when scaling the costs and the target... But not on the media_data and Ive made sure that none of my media_data columns sum up to 0. (Although every column does have several 0s..) Do you know what else could be the problem?

@YohanMedalsy
Copy link

I figured it all out. Basically it came down to zero division for the scalers but my problem was not columns of zeros... i had 3d arrays (with sub geo data) and the division was occurring over another axis in which all the values were 0 (a certain country and media source combination was always returning 0). so this led to 0/0. simple solution is to fill the nans with 0 before feeding the scaled data to the training.

@jsatani-tonal
Copy link

What if you want to keep the zeroes? Some channels we launched later than others, example Facebook and TV we have spend for in most weeks, but a few weeks where we went dark fully, I do not want to remove these from the data source. What other ways are there around this? @google-admin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants