SAR: added pySpark notebook #90

maxkazmsft · 2018-10-18T19:08:39Z

No description provided.

notebooks/00_quick_start/sar_pyspark_movielens.ipynb

miguelgfierro · 2018-10-18T19:49:01Z

notebooks/00_quick_start/sar_pyspark_movielens.ipynb

+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 3. In order to use SAR, we need to hash users and items and make sure there are no cold users"


This looks more like a comment than a title

miguelgfierro · 2018-10-18T19:51:13Z

notebooks/00_quick_start/sar_pyspark_movielens.ipynb

+    }
+   ],
+   "source": [
+    "print(\"Obtaining all users and items \")\n",


Instead of printing the text, I would add the text as markdown

miguelgfierro · 2018-10-18T19:52:26Z

notebooks/00_quick_start/sar_pyspark_movielens.ipynb

+    }
+   ],
+   "source": [
+    "print(\"Model:\\t\" + model.model_str,\n",


Small detail, the names of the metrics in the other notebook are slightly different:

Model: sar_ref Top K: MAP@k: NDCG@k: Precision@k: Recall@k:

Yeah, I opened an issue about this already.

@maxkazmsft I was trying to find the issue, but I couldn't. Could you please point me to it?

notebooks/00_quick_start/sar_pyspark_movielens.ipynb

miguelgfierro · 2018-10-18T19:55:07Z

This is great, added more reviewers

notebooks/00_quick_start/sar_pyspark_movielens.ipynb

yueguoguo · 2018-10-18T20:08:18Z

notebooks/00_quick_start/sar_pyspark_movielens.ipynb

+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 0. Set up Spark context"


Minor point about the style of markdown text:

Use two #s for level-1 headings.

No "dot" after number.

For one level down, add "dot" to connect numbers in headings. E.g., "### 1.2 Model training".

I agree - I copied Nikhil's notebook and going to subsections makes the notebook more readable. We should really have a way to keep all notebooks in sync - please see my earlier comment.

yueguoguo · 2018-10-18T20:09:08Z

notebooks/00_quick_start/sar_pyspark_movielens.ipynb

+    }
+   ],
+   "source": [
+    "schema = StructType((StructField(\"UserId\", StringType()),\n",


Minor points - coding style.

notebooks/00_quick_start/sar_pyspark_movielens.ipynb

yueguoguo · 2018-10-18T20:12:03Z

notebooks/00_quick_start/sar_pyspark_movielens.ipynb

+    }
+   ],
+   "source": [
+    "train, test = spark_random_split(data)\n",


Would it be better to make the split ratio explicit?

yueguoguo · 2018-10-18T20:12:51Z

notebooks/00_quick_start/sar_pyspark_movielens.ipynb

+   ],
+   "source": [
+    "model.fit(train_indexed)\n",
+    "top_k = model.recommend_k_items(test_indexed)"


Better to have k explicit here?

miguelgfierro · 2018-10-22T20:38:32Z

The spark tests are failing due to this issue #75

yueguoguo · 2018-10-23T01:19:06Z

The problem is that the notebook metadata, where the kernel spec is given, defines kernel name to be reco instead of the one used in our CICD machine. See lines 557 and 559 in the sar_pyspark_movielens.ipynb file, you will find those metadata that specifies the kernel info. These metadata are created during the creation of notebook, and it is the same as that of the conda env kernel on the user's machine - in our SETUP instruction, the last step is the to install the created conda env kernel to ipython.

To resolve the issue we can run nbconvert to execute the notebook with the kernel name specified in the execution traitlets. The kernel name in the traitlet should be in line with the one in the CICD machine. This overwrites the one specified in the meta data in the notebook. E.g., if I run a notebook with a specified kernel I do

jupyter nbconvert --ExecutePreprocessor.kernel_name="recommender" --execute sar_python_cpu_movielens.ipynb

SAR: added pySpark notebook

SAR: added pySpark notebook

69c0fb4

maxkazmsft requested review from miguelgfierro and yueguoguo as code owners October 18, 2018 19:08

miguelgfierro approved these changes Oct 18, 2018

View reviewed changes

miguelgfierro requested review from nikhilrj, eisber and anargyri October 18, 2018 19:54

yueguoguo reviewed Oct 18, 2018

View reviewed changes

SAR: addressed PR comment for pySpark SAR notebook.

503398d

miguelgfierro added this to the Initial MVP milestone Oct 22, 2018

SAR: resolved merge conflict with reco_utils renaming.

696f750

yueguoguo mentioned this pull request Oct 23, 2018

Review problem with Jupyter kernel in the unit tests #75

Closed

miguelgfierro mentioned this pull request Oct 23, 2018

Fixes papermill kernel spec issue #75 #113

Merged

SAR: changed pySpark SAR kernel.

6b8d510

maxkazmsft merged commit 862684f into staging Oct 23, 2018

maxkazmsft deleted the maxkaz/pyspark_notebook branch October 23, 2018 14:35

miguelgfierro mentioned this pull request Oct 23, 2018

SAR PySpark Notebook #56

Closed

WessZumino pushed a commit that referenced this pull request Nov 28, 2018

Merge pull request #90 from Microsoft/maxkaz/pyspark_notebook

7dc435f

SAR: added pySpark notebook

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SAR: added pySpark notebook #90

SAR: added pySpark notebook #90

maxkazmsft commented Oct 18, 2018

miguelgfierro Oct 18, 2018

maxkazmsft Oct 19, 2018

miguelgfierro Oct 18, 2018

maxkazmsft Oct 19, 2018

miguelgfierro Oct 18, 2018

maxkazmsft Oct 19, 2018

miguelgfierro Oct 22, 2018

miguelgfierro commented Oct 18, 2018

yueguoguo Oct 18, 2018

maxkazmsft Oct 19, 2018

yueguoguo Oct 18, 2018

maxkazmsft Oct 19, 2018

yueguoguo Oct 18, 2018

yueguoguo Oct 18, 2018

miguelgfierro commented Oct 22, 2018

yueguoguo commented Oct 23, 2018 •

edited

SAR: added pySpark notebook #90

SAR: added pySpark notebook #90

Conversation

maxkazmsft commented Oct 18, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

miguelgfierro commented Oct 18, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

miguelgfierro commented Oct 22, 2018

yueguoguo commented Oct 23, 2018 • edited

yueguoguo commented Oct 23, 2018 •

edited