-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add matrix factorization example back to pymc3 #3709
Conversation
Check out this pull request on You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB. |
This is great, and is already pretty polished, actually. Runs in about 30 min on my MBP. |
Codecov Report
@@ Coverage Diff @@
## master #3709 +/- ##
=======================================
Coverage 89.94% 89.94%
=======================================
Files 134 134
Lines 20430 20430
=======================================
Hits 18375 18375
Misses 2055 2055 |
@@ -0,0 +1,1489 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,1489 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be just a single bar plot for this, right? the density plot is a little weird
Reply via ReviewNB
@@ -0,0 +1,1489 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use
print(f"Users: {num_users}\nMovies:", {num_items}\nSparsity: {sparsity}")
Reply via ReviewNB
@@ -0,0 +1,1489 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would replace this with a function:
def make_pmf_model(train, dim, alpha, std):
bounds = (1, 5)
data = train.copy()
n, m = data.shape
# Perform mean value imputation
nan_mask = np.isnan(data)
data[nan_mask] = data[~nan_mask].mean()
# Low precision reflects uncertainty; prevents overfitting.
# Set to the mean variance across users and items.
alpha_u = 1 / data.var(axis=1).mean()
alpha_v = 1 / data.var(axis=0).mean()
# Specify the model.
logging.info('building the PMF model')
with pm.Model() as pmf:
U = pm.MvNormal(
'U', mu=0, tau=alpha_u * np.eye(dim),
shape=(n, dim), testval=np.random.randn(n, dim) * std)
V = pm.MvNormal(
'V', mu=0, tau=alpha_v * np.eye(dim),
shape=(m, dim), testval=np.random.randn(m, dim) * std)
R = pm.Normal(
'R', mu=(U @ V.T)[~nan_mask], tau=alpha, observed=data[~nan_mask])
return pmf
Reply via ReviewNB
@@ -0,0 +1,1489 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,1489 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd update this to use just pm.sample(draws=500, tune=500)
(or similar!)
This should automatically use 1 chain per physical core and run diagnostics.
Reply via ReviewNB
@@ -0,0 +1,1489 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this self.std
is defined as 1 / alpha
in the constructor, and is maybe only used here
Reply via ReviewNB
@@ -0,0 +1,1489 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,1489 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great @zaxtax -- thanks for getting this example back up and running! I can regen the docs so that it is up on the examples site, too. |
Reintroducing the probabilistic matrix factorization example originally from @macks22 with the Movielens dataset instead of Jester. This could still use some polish, but the notebook works albeit slowly.