Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve copula parameters sampling #58

Closed
ManuelAlvarezC opened this issue Sep 24, 2018 · 1 comment
Closed

Improve copula parameters sampling #58

ManuelAlvarezC opened this issue Sep 24, 2018 · 1 comment
Assignees
Labels
internal The issue doesn't change the API or functionality question General question about the software
Milestone

Comments

@ManuelAlvarezC
Copy link
Contributor

ManuelAlvarezC commented Sep 24, 2018

During the modeling of the database in sdv.Modeler, extensions are created for each row of the parent tables containing the parameters to model the children tables.

On sampling time, this extensions are sampled too and later the parameters extracted and used to create the models to sample the children rows.

When creating new models from the sampled parameters, sometimes the models are created with inconsistent values. So far the following have been found:

  1. The sampled covariance matrix may not be positive-semidefinite, which is a requirement for copulas.multivaritate.GaussianMultivariate copula, which raises this warning:

    sdv_mit/lib/python3.6/site-packages/copulas/multivariate/gaussian.py:199: RuntimeWarning: covariance is not positive-semidefinite.
       samples = np.random.multivariate_normal(means, clean_cov, size=size)
    
  2. If by any chance the sampled value for the std of the copulas.univariate.GaussianUnivariate distribution is negative or zero the value of the generated sampled will be np.nan

@ManuelAlvarezC ManuelAlvarezC added internal The issue doesn't change the API or functionality question General question about the software labels Sep 24, 2018
@ManuelAlvarezC ManuelAlvarezC added this to the 0.1.2 milestone Nov 19, 2018
@ManuelAlvarezC
Copy link
Contributor Author

To solve this issue I propose the following:

  • On point 1, instead of modelling and sampling the whole covariance matrix, do it with just the lower/upper half over the diagonal, and when creating the model from sampled parameters, completing the other half using simetry over diagonal, that is:

    matrix[i][j] = matrix[j][i]
  • For the second point, I will transform the standard deviation using the positive transformer mentioned here before modelling and reverse transform it when recreating the model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal The issue doesn't change the API or functionality question General question about the software
Projects
None yet
Development

No branches or pull requests

1 participant