-
Notifications
You must be signed in to change notification settings - Fork 762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kaggle blog post #180
Kaggle blog post #180
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Exciting too. I have a few suggestions to make the content flow more easily.
content/blog/kaggle_on_kubeflow.md
Outdated
+++ | ||
|
||
## Kaggle | ||
[Kaggle](http://kaggle.com/) is home to the world's largest community of data scientists and AI/ML researchers. It's a diverse community ranging from newcomers to accredited research scientists, where participants collaborate and compete on-line to refine algorithms and techniques that are judged to produce the "best" model. The competitions can be organized by anyone but many companies and institutions award significant cash prizes to the winners. Beyond the academic benefit of the competitions, it also provides a means to identify top candidates for data science careers with these corporations as they increase their investments in AI and ML. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommendation: Replace "on-line" with "online" here and in 2 other places below. Current best practice is to avoid hyphens where possible, and to use one word for "online".
content/blog/kaggle_on_kubeflow.md
Outdated
+++ | ||
|
||
## Kaggle | ||
[Kaggle](http://kaggle.com/) is home to the world's largest community of data scientists and AI/ML researchers. It's a diverse community ranging from newcomers to accredited research scientists, where participants collaborate and compete on-line to refine algorithms and techniques that are judged to produce the "best" model. The competitions can be organized by anyone but many companies and institutions award significant cash prizes to the winners. Beyond the academic benefit of the competitions, it also provides a means to identify top candidates for data science careers with these corporations as they increase their investments in AI and ML. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second sentence is quite long, and the meaning of the last bit is ambiguous. Recommendation: Replace this:
"where participants collaborate and compete on-line to refine algorithms and techniques that are judged to produce the “best” model."
with this:
"where participants collaborate online to refine algorithms and techniques. In organized competitions, judges decide on the entries that produce the best model."
content/blog/kaggle_on_kubeflow.md
Outdated
|
||
For new data scientists, there are competitions that are interesting thought experiments. The most notable of these is predicting which persons would survive the Titanic disaster, which in fact we will use for our example below. | ||
|
||
The Kaggle development platform itself is organized into competitions, scripts called "kernels", and datasets which are used to derive the models submitted to the competitions. There are also short form on-line classes for introductions to Python and machine learning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where it says 'scripts called "kernels"', I'd suggest a little more info. Rather than being just a script, a kernel is a combination of environment, input, code, and output.
http://blog.kaggle.com/2016/07/08/kaggle-kernel-a-new-name-for-scripts/
content/blog/kaggle_on_kubeflow.md
Outdated
|
||
![Run Notebook](../nb-run.svg) | ||
|
||
This [particular public notebook](https://www.kaggle.com/arthurtok/introduction-to-ensembling-stacking-in-python) was chosen based on the high number of votes it has as a kernel for the Titanic competition, the richness of some of the visualizations, and also because it uses XGBoost, a library that Kubeflow does not currently include in its supported TensorFlow notebooks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommendation: Change this:
"This particular public notebook was chosen based on..."
to this:
"I chose this particular public notebook based on..."
Reason: when I first read it, I assumed the "was chosen" meant that the notebook had been awarded a prize in some Kaggle competition, and I expected the next sentence to tell me more about that. Changing the sentence to indicate that you chose it makes things much clearer.
content/blog/kaggle_on_kubeflow.md
Outdated
- the Kaggle image includes TensorFlow 1.9 or greater built with [AVX2 support](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Advanced_Vector_Extensions_2), so the image may not run on some older CPU | ||
- unlike the Kubeflow curated notebooks, the default notebook user (`jovyan`) does not have the permissions to install new packages to global locations (but more on that below) | ||
|
||
### What's this all about? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd move this section ("What's this all about?") up above the "Images" section, and change it to a level 2 heading. It's super important that people know the goal of this blog post, and this section tells it nicely. When I first read the "Images" section, I wondered why you were telling me all that. Then I read the "What's this all about?" section, and things became much clearer.
@sarahmaddox thanks for the feedback. PTAL latest |
Woo Hoo! |
looking great! |
Nice! |
/lgtm |
This looks awesome but let me circulate it with the Kaggle folks. |
This enables use cases such as this one: kubeflow/website#180
Kaggle/kaggle-api#84 was just closed. Backend issue apparently. Thus, I need to rewrite a portion of this. |
Ack. |
Ready for (final?) review. |
Any outstanding reasons for the hold on this PR? |
/lgtm |
/lgtm The hold was to give the Kaggle folks a chance to review it. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jlewi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Got the signoff from Kaggle. /hold cancel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 6 files reviewed, 5 unresolved discussions (waiting on @sarahmaddox, @pdmack, @abhi-g, @aronchick, and @ewilderj)
content/blog/kaggle_on_kubeflow.md, line 39 at r2 (raw file):
- it's a very large notebook, over 21 GB in size. Docker pulls and notebook launches can take a lengthy period of time - the versions of TensorFlow, PyTorch, XGBoost, and the other libraries included may change at any time
nit: change over time
Merging manually because reviewable is blocking merge; looks like reviewable wasn't configured to not add reviewable status to PR. |
I'll follow up on the nit soon |
Kaggle blog post
Kaggle blog post
Merge pull request #180 from pdmack/kaggle-blog
/assign @jlewi
/assign @sarahmaddox
/cc @abhi-g
/cc @ewilderj
/cc @aronchick
Related to: #79
Please suggest specific and literal changes via in-line review comments. :-)
This change is![Reviewable](https://camo.githubusercontent.com/23b05f5fb48215c989e92cc44cf6512512d083132bd3daf689867c8d9d386888/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)