Kaggle blog post #180

pdmack · 2018-08-26T22:08:23Z

/assign @jlewi
/assign @sarahmaddox

/cc @abhi-g
/cc @ewilderj
/cc @aronchick

Related to: #79
Please suggest specific and literal changes via in-line review comments. :-)

This change is

sarahmaddox

Nice! Exciting too. I have a few suggestions to make the content flow more easily.

sarahmaddox · 2018-08-26T22:39:43Z

content/blog/kaggle_on_kubeflow.md

+++
+
+## Kaggle
+[Kaggle](http://kaggle.com/) is home to the world's largest community of data scientists and AI/ML researchers. It's a diverse community ranging from newcomers to accredited research scientists, where participants collaborate and compete on-line to refine algorithms and techniques that are judged to produce the "best" model. The competitions can be organized by anyone but many companies and institutions award significant cash prizes to the winners. Beyond the academic benefit of the competitions, it also provides a means to identify top candidates for data science careers with these corporations as they increase their investments in AI and ML.


Recommendation: Replace "on-line" with "online" here and in 2 other places below. Current best practice is to avoid hyphens where possible, and to use one word for "online".

sarahmaddox · 2018-08-26T22:45:55Z

content/blog/kaggle_on_kubeflow.md

+++
+
+## Kaggle
+[Kaggle](http://kaggle.com/) is home to the world's largest community of data scientists and AI/ML researchers. It's a diverse community ranging from newcomers to accredited research scientists, where participants collaborate and compete on-line to refine algorithms and techniques that are judged to produce the "best" model. The competitions can be organized by anyone but many companies and institutions award significant cash prizes to the winners. Beyond the academic benefit of the competitions, it also provides a means to identify top candidates for data science careers with these corporations as they increase their investments in AI and ML.


The second sentence is quite long, and the meaning of the last bit is ambiguous. Recommendation: Replace this:
"where participants collaborate and compete on-line to refine algorithms and techniques that are judged to produce the “best” model."

with this:
"where participants collaborate online to refine algorithms and techniques. In organized competitions, judges decide on the entries that produce the best model."

sarahmaddox · 2018-08-26T22:53:06Z

content/blog/kaggle_on_kubeflow.md

+
+For new data scientists, there are competitions that are interesting thought experiments. The most notable of these is predicting which persons would survive the Titanic disaster, which in fact we will use for our example below.  
+
+The Kaggle development platform itself is organized into competitions, scripts called "kernels", and datasets which are used to derive the models submitted to the competitions. There are also short form on-line classes for introductions to Python and machine learning.


Where it says 'scripts called "kernels"', I'd suggest a little more info. Rather than being just a script, a kernel is a combination of environment, input, code, and output.
http://blog.kaggle.com/2016/07/08/kaggle-kernel-a-new-name-for-scripts/

sarahmaddox · 2018-08-26T22:59:30Z

content/blog/kaggle_on_kubeflow.md

+
+![Run Notebook](../nb-run.svg)
+
+This [particular public notebook](https://www.kaggle.com/arthurtok/introduction-to-ensembling-stacking-in-python) was chosen based on the high number of votes it has as a kernel for the Titanic competition, the richness of some of the visualizations, and also because it uses XGBoost, a library that Kubeflow does not currently include in its supported TensorFlow notebooks.


Recommendation: Change this:

"This particular public notebook was chosen based on..."

to this:

"I chose this particular public notebook based on..."

Reason: when I first read it, I assumed the "was chosen" meant that the notebook had been awarded a prize in some Kaggle competition, and I expected the next sentence to tell me more about that. Changing the sentence to indicate that you chose it makes things much clearer.

sarahmaddox · 2018-08-26T23:03:33Z

content/blog/kaggle_on_kubeflow.md

+- the Kaggle image includes TensorFlow 1.9 or greater built with [AVX2 support](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Advanced_Vector_Extensions_2), so the image may not run on some older CPU
+- unlike the Kubeflow curated notebooks, the default notebook user (`jovyan`) does not have the permissions to install new packages to global locations (but more on that below)
+
+### What's this all about?


I'd move this section ("What's this all about?") up above the "Images" section, and change it to a level 2 heading. It's super important that people know the goal of this blog post, and this section tells it nicely. When I first read the "Images" section, I wondered why you were telling me all that. Then I read the "What's this all about?" section, and things became much clearer.

pdmack · 2018-08-27T19:06:58Z

@sarahmaddox thanks for the feedback. PTAL latest

jlewi · 2018-08-27T21:19:22Z

Woo Hoo!

abhi-g · 2018-08-27T21:33:02Z

looking great!

sarahmaddox · 2018-08-27T21:39:08Z

Nice!
/lgtm

sarahmaddox · 2018-08-27T23:12:21Z

/lgtm

jlewi · 2018-08-27T23:22:49Z

This looks awesome but let me circulate it with the Kaggle folks.
/hold

This enables use cases such as this one: kubeflow/website#180

pdmack · 2018-08-28T20:50:11Z

Kaggle/kaggle-api#84 was just closed. Backend issue apparently.

Thus, I need to rewrite a portion of this.

jlewi · 2018-08-28T22:03:52Z

Ack.

pdmack · 2018-08-29T13:07:25Z

Ready for (final?) review.

abhi-g · 2018-08-29T20:25:32Z

Any outstanding reasons for the hold on this PR?

sarahmaddox · 2018-08-29T21:35:23Z

/lgtm

jlewi · 2018-08-29T23:29:59Z

/lgtm
/approve

The hold was to give the Kaggle folks a chance to review it.

k8s-ci-robot · 2018-08-29T23:30:01Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jlewi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [jlewi]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jlewi · 2018-08-30T22:29:55Z

Got the signoff from Kaggle.

/hold cancel

jlewi

Reviewable status: 0 of 6 files reviewed, 5 unresolved discussions (waiting on @sarahmaddox, @pdmack, @abhi-g, @aronchick, and @ewilderj)

content/blog/kaggle_on_kubeflow.md, line 39 at r2 (raw file):

- it's a very large notebook, over 21 GB in size. Docker pulls and notebook launches can take a lengthy period of time
- the versions of TensorFlow, PyTorch, XGBoost, and the other libraries included may change at any time

nit: change over time

jlewi · 2018-08-30T22:38:11Z

Merging manually because reviewable is blocking merge; looks like reviewable wasn't configured to not add reviewable status to PR.

pdmack · 2018-08-30T22:43:55Z

I'll follow up on the nit soon

Kaggle blog post

Merge pull request #180 from pdmack/kaggle-blog

k8s-ci-robot assigned jlewi and sarahmaddox Aug 26, 2018

k8s-ci-robot requested review from abhi-g, aronchick and ewilderj August 26, 2018 22:08

k8s-ci-robot added the size/XXL label Aug 26, 2018

sarahmaddox suggested changes Aug 26, 2018

View reviewed changes

pdmack force-pushed the kaggle-blog branch from dc2a843 to 47ba810 Compare August 27, 2018 19:06

k8s-ci-robot added the lgtm label Aug 27, 2018

pdmack force-pushed the kaggle-blog branch from 47ba810 to 9c3c0c4 Compare August 27, 2018 22:16

k8s-ci-robot removed the lgtm label Aug 27, 2018

k8s-ci-robot added the lgtm label Aug 27, 2018

k8s-ci-robot added the do-not-merge/hold label Aug 27, 2018

rosbo added a commit to Kaggle/docker-python that referenced this pull request Aug 27, 2018

Add Kaggle CLI

4b852cb

This enables use cases such as this one: kubeflow/website#180

rosbo mentioned this pull request Aug 27, 2018

Add Kaggle CLI Kaggle/docker-python#283

Closed

pdmack force-pushed the kaggle-blog branch from 9c3c0c4 to 1ac8d80 Compare August 28, 2018 00:52

k8s-ci-robot removed the lgtm label Aug 28, 2018

Kaggle blog post

c3b9c1b

pdmack force-pushed the kaggle-blog branch from 1ac8d80 to c3b9c1b Compare August 29, 2018 12:40

k8s-ci-robot added the lgtm label Aug 29, 2018

k8s-ci-robot added the approved label Aug 29, 2018

k8s-ci-robot removed the do-not-merge/hold label Aug 30, 2018

jlewi approved these changes Aug 30, 2018

View reviewed changes

jlewi merged commit bdb24ed into kubeflow:master Aug 30, 2018

abhi-g pushed a commit to abhi-g/website that referenced this pull request Aug 30, 2018

Merge pull request kubeflow#180 from pdmack/kaggle-blog

2d4119e

Kaggle blog post

abhi-g added a commit to abhi-g/website that referenced this pull request Aug 30, 2018

Merge pull request kubeflow#180 from pdmack/kaggle-blog

321bc74

Kaggle blog post

abhi-g added a commit that referenced this pull request Aug 30, 2018

Merge pull request #197 from abhi-g/kaggle_blog

5740653

Merge pull request #180 from pdmack/kaggle-blog

pdmack deleted the kaggle-blog branch April 4, 2019 16:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kaggle blog post #180

Kaggle blog post #180

pdmack commented Aug 26, 2018 •

edited by jlewi

Loading

sarahmaddox left a comment

sarahmaddox Aug 26, 2018

sarahmaddox Aug 26, 2018

sarahmaddox Aug 26, 2018

sarahmaddox Aug 26, 2018

sarahmaddox Aug 26, 2018

pdmack commented Aug 27, 2018

jlewi commented Aug 27, 2018

abhi-g commented Aug 27, 2018

sarahmaddox commented Aug 27, 2018

sarahmaddox commented Aug 27, 2018

jlewi commented Aug 27, 2018

pdmack commented Aug 28, 2018

jlewi commented Aug 28, 2018

pdmack commented Aug 29, 2018

abhi-g commented Aug 29, 2018

sarahmaddox commented Aug 29, 2018

jlewi commented Aug 29, 2018

k8s-ci-robot commented Aug 29, 2018

jlewi commented Aug 30, 2018

jlewi left a comment

jlewi commented Aug 30, 2018

pdmack commented Aug 30, 2018


		For new data scientists, there are competitions that are interesting thought experiments. The most notable of these is predicting which persons would survive the Titanic disaster, which in fact we will use for our example below.

		The Kaggle development platform itself is organized into competitions, scripts called "kernels", and datasets which are used to derive the models submitted to the competitions. There are also short form on-line classes for introductions to Python and machine learning.


		![Run Notebook](../nb-run.svg)

		This [particular public notebook](https://www.kaggle.com/arthurtok/introduction-to-ensembling-stacking-in-python) was chosen based on the high number of votes it has as a kernel for the Titanic competition, the richness of some of the visualizations, and also because it uses XGBoost, a library that Kubeflow does not currently include in its supported TensorFlow notebooks.

Kaggle blog post #180

Kaggle blog post #180

Conversation

pdmack commented Aug 26, 2018 • edited by jlewi Loading

sarahmaddox left a comment

Choose a reason for hiding this comment

sarahmaddox Aug 26, 2018

Choose a reason for hiding this comment

sarahmaddox Aug 26, 2018

Choose a reason for hiding this comment

sarahmaddox Aug 26, 2018

Choose a reason for hiding this comment

sarahmaddox Aug 26, 2018

Choose a reason for hiding this comment

sarahmaddox Aug 26, 2018

Choose a reason for hiding this comment

pdmack commented Aug 27, 2018

jlewi commented Aug 27, 2018

abhi-g commented Aug 27, 2018

sarahmaddox commented Aug 27, 2018

sarahmaddox commented Aug 27, 2018

jlewi commented Aug 27, 2018

pdmack commented Aug 28, 2018

jlewi commented Aug 28, 2018

pdmack commented Aug 29, 2018

abhi-g commented Aug 29, 2018

sarahmaddox commented Aug 29, 2018

jlewi commented Aug 29, 2018

k8s-ci-robot commented Aug 29, 2018

jlewi commented Aug 30, 2018

jlewi left a comment

Choose a reason for hiding this comment

jlewi commented Aug 30, 2018

pdmack commented Aug 30, 2018

pdmack commented Aug 26, 2018 •

edited by jlewi

Loading