-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SAR: added pySpark notebook #90
Conversation
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### 3. In order to use SAR, we need to hash users and items and make sure there are no cold users" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks more like a comment than a title
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
} | ||
], | ||
"source": [ | ||
"print(\"Obtaining all users and items \")\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of printing the text, I would add the text as markdown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
} | ||
], | ||
"source": [ | ||
"print(\"Model:\\t\" + model.model_str,\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small detail, the names of the metrics in the other notebook are slightly different:
Model: sar_ref
Top K:
MAP@k:
NDCG@k:
Precision@k:
Recall@k:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I opened an issue about this already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maxkazmsft I was trying to find the issue, but I couldn't. Could you please point me to it?
This is great, added more reviewers |
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### 0. Set up Spark context" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor point about the style of markdown text:
- Use two #s for level-1 headings.
- No "dot" after number.
- For one level down, add "dot" to connect numbers in headings. E.g., "### 1.2 Model training".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree - I copied Nikhil's notebook and going to subsections makes the notebook more readable. We should really have a way to keep all notebooks in sync - please see my earlier comment.
} | ||
], | ||
"source": [ | ||
"schema = StructType((StructField(\"UserId\", StringType()),\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor points - coding style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
} | ||
], | ||
"source": [ | ||
"train, test = spark_random_split(data)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to make the split ratio explicit?
], | ||
"source": [ | ||
"model.fit(train_indexed)\n", | ||
"top_k = model.recommend_k_items(test_indexed)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to have k
explicit here?
The spark tests are failing due to this issue #75 |
The problem is that the notebook metadata, where the kernel spec is given, defines kernel name to be To resolve the issue we can run
|
SAR: added pySpark notebook
No description provided.