KDD Hands-On Tutorial (2018)
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
images
.gitignore
CNAME
Feature Extraction and Summarization with Sequence to Sequence Learning.ipynb
README.md
_config.yml
requirements.txt

README.md

A Hands On Tutorial, With Applications of Sequence to Sequence Learning Using Keras

A hands-on tutorial for KDD 2018.

Abstract

Chatbots, machine translation and agents that summarize text coherently may seem like science fiction or marketing-hype to even experienced machine learning practitioners. In this hands-on-tutorial, you will be guided through a real industry example of how sequence to sequence models are used to create data products. Machine learning engineers from Github will guide you through the process of collecting and preparing the data, building the model architecture and analyzing results. In this hands on session, we will go into greater depth regarding how the architecture was built, how it is used in practice and practical considerations with regards to using this model in a production environment. This will be an end to end example including a dataset that will be publicly hosted with all code made publicly available.

A preview of this tutorial can be viewed by reading these blog posts:

Requirements

The target audience of this tutorial are moderately skilled users who have some familiarity with neural networks and are comfortable writing code. Attendees must bring a laptop and will be using free gpu-enabled notebooks (with 12 hour runtime limit) from Google - google chrome browser is required. Open source tools will be used, so no software licenses will be required.

Tutorial Outline

See this notebook. We will go in depth into (1) how these models are being used for product research at Github, (2) practical considerations with regards to using this model in a customer facing product and how to evaluate results both qualitatively and quantitatively, (3) related research we have conducted, (4) other tools and techniques that can be used to accomplish the same goals and (5) a live Q&A session.

Other information

TThe tutorial's duration is 3 hours. Latest version of Google Chrome and access to the internet is required. We will cover setup during the tutorial which will only require the installation of 1-2 python packages.

Tutors

Ho-Hsiang Wu

Ho-Hsiang Wu is a Data Scientist at GitHub building data products using machine learning models including recommendation systems and graph analysis. Currently, he is focusing on efforts in understanding code by building various representations adopting natural language processing techniques and deep learning models. Prior to GitHub, he worked at various music streaming companies developing features that help users discover music. He graduated from University of California, Los Angeles, where he received his Master of Science in Electrical Engineering. Ho-Hsiang can also be reached on Twitter and LinkedIn.

Hamel Husain

Hamel Husain is a machine learning engineer at GitHub, where he focuses on applications of natural language processing and representation learning, particularly with deep learning. Prior to GitHub, Hamel worked at Airbnb where he built machine learning systems to optimize growth marketing. Prior to Airbnb, Hamel was a machine learning engineer at an AutoML startup, DataRobot. Prior to this, Hamel worked as a consultant for 8 years. Hamel has a masters in Computer Science from Georgia Tech. Hamel's current research interests are representation learning of code and meta-learning. Hamel can also be reached on Twitter and LinkedIn.