Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add matrix factorization example back to pymc3 #3709

Merged
merged 2 commits into from
Dec 6, 2019

Conversation

zaxtax
Copy link
Contributor

@zaxtax zaxtax commented Dec 5, 2019

Reintroducing the probabilistic matrix factorization example originally from @macks22 with the Movielens dataset instead of Jester. This could still use some polish, but the notebook works albeit slowly.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB.

@fonnesbeck
Copy link
Member

This is great, and is already pretty polished, actually. Runs in about 30 min on my MBP.

@codecov
Copy link

codecov bot commented Dec 5, 2019

Codecov Report

Merging #3709 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #3709   +/-   ##
=======================================
  Coverage   89.94%   89.94%           
=======================================
  Files         134      134           
  Lines       20430    20430           
=======================================
  Hits        18375    18375           
  Misses       2055     2055

@@ -0,0 +1,1489 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you run this again, can you bump to PyMC3 v3.8?


Reply via ReviewNB

@@ -0,0 +1,1489 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be just a single bar plot for this, right? the density plot is a little weird


Reply via ReviewNB

@@ -0,0 +1,1489 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use

print(f"Users: {num_users}\nMovies:", {num_items}\nSparsity: {sparsity}")


Reply via ReviewNB

@@ -0,0 +1,1489 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would replace this with a function:

def make_pmf_model(train, dim, alpha, std):

    bounds = (1, 5)

    data = train.copy()

    n, m = data.shape

    # Perform mean value imputation

    nan_mask = np.isnan(data)

    data[nan_mask] = data[~nan_mask].mean()

    # Low precision reflects uncertainty; prevents overfitting.

    # Set to the mean variance across users and items.

    alpha_u = 1 / data.var(axis=1).mean()

    alpha_v = 1 / data.var(axis=0).mean()

    # Specify the model.

    logging.info('building the PMF model')

    with pm.Model() as pmf:

      U = pm.MvNormal(

        'U', mu=0, tau=alpha_u * np.eye(dim),

        shape=(n, dim), testval=np.random.randn(n, dim) * std)

      V = pm.MvNormal(

        'V', mu=0, tau=alpha_v * np.eye(dim),

        shape=(m, dim), testval=np.random.randn(m, dim) * std)

      R = pm.Normal(

        'R', mu=(U @ V.T)[~nan_mask], tau=alpha, observed=data[~nan_mask])

return pmf


Reply via ReviewNB

@@ -0,0 +1,1489 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd replace the self in these with a pmf instance.


Reply via ReviewNB

@@ -0,0 +1,1489 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd update this to use just pm.sample(draws=500, tune=500) (or similar!)

This should automatically use 1 chain per physical core and run diagnostics.


Reply via ReviewNB

@@ -0,0 +1,1489 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this self.std is defined as 1 / alpha in the constructor, and is maybe only used here


Reply via ReviewNB

@@ -0,0 +1,1489 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pmf = make_pmf_model(train, dim=10, alpha=2, std=0.05)


Reply via ReviewNB

@@ -0,0 +1,1489 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with pmf:

map_estimate = pm.find_MAP()


Reply via ReviewNB

@zaxtax zaxtax changed the title WIP: Add initial draft of matrix factorization example Add matrix factorization example back to pymc3 Dec 6, 2019
@ColCarroll ColCarroll merged commit 9e5177c into pymc-devs:master Dec 6, 2019
@ColCarroll
Copy link
Member

This is great @zaxtax -- thanks for getting this example back up and running! I can regen the docs so that it is up on the examples site, too.

@zaxtax zaxtax deleted the pmf_notebook branch December 6, 2019 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants