GitHub - groupejopa/JHU-Data-Science-Capstone-

JHU Data Science Capstone Project

Describing the Project

The goal of this project is to take a dataset provided and create an NLP (natural language processing) model that is able to predict subsequent words. Blogs, Twitter and News were datasets used to train the model.

SwiftKey is the company that works in cooperation with Professors of the Johns Hopkins University to prepare this Project, with objective to construct a predictive model that makes easier for people to type on their mobile devices.

Besides cleaning and sub-setting the data, the tokenization technique of N-Grams were used to combinations the words to be used at the predictive algotithm.

The final project was concluded with one Shiny application and a Pitch using R-Presentation.

The Project

This project involves Natural Language Processing. The critical task is to take a user's input phrase (group of words) and to output a predicted next word.

Project deliverables:

Next Word Prediction Model, as basis for an app
Next Word Prediction App hosted at shinyapps.io
This presentation hosted at R pubs

Next Word Prediction Model

The next word prediction model uses the principles of "tidy data" applied to text mining in R. Key model steps:

Input: raw text files for model training
Clean training data; separate into 2 word, 3 word, and 4 word n grams, save as tibbles
Sort n grams tibbles by frequency, save as repos
N grams function: uses a "back-off" type prediction model

user supplies an input phrase
model uses last 3, 2, or 1 words to predict the best 4th, 3rd, or 2nd match in the repos

Output: next word prediction

Benefits: easy to read code; uses "pipes"; fast processing of training data; able to sample up to 25% of original corpus; relatively small output repos

Next Word Prediction App

The next word prediction app provides a simple user interface to the next word prediction model.

Key Features:

Text box for user input
Predicted next word outputs dynamically below user input
Tabs with plots of most frequent n grams in the data-set
Side panel with user instructions

Key Benefits:

Fast response
Method allows for large training sets leading to better next word predictions

Shiny App Link

Documentation and Source Code

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
ngram_match		ngram_match
03_Milestone_Report.html		03_Milestone_Report.html
03_Milestone_Report.md		03_Milestone_Report.md
DataProduct_Task.Rmd		DataProduct_Task.Rmd
DataProduct_Task.md		DataProduct_Task.md
Final_Project_Instructions.Rmd		Final_Project_Instructions.Rmd
Final_Project_Instructions.md		Final_Project_Instructions.md
JHU-Data-Science-Capstone.Rproj		JHU-Data-Science-Capstone.Rproj
Milestone_Instructions.Rmd		Milestone_Instructions.Rmd
Milestone_Instructions.md		Milestone_Instructions.md
Milestone_Report.Rmd		Milestone_Report.Rmd
Milestone_Report.html		Milestone_Report.html
Milestone_Report.md		Milestone_Report.md
README.md		README.md
SlideDeck_Task.Rpres		SlideDeck_Task.Rpres
SlideDeck_Task.md		SlideDeck_Task.md
_config.yml		_config.yml
bigrams.png		bigrams.png
creative_Task.Rmd		creative_Task.Rmd
creative_Task.md		creative_Task.md
creative_Task_Script.R		creative_Task_Script.R
exploratory_Task.Rmd		exploratory_Task.Rmd
exploratory_Task.html		exploratory_Task.html
exploratory_Task.md		exploratory_Task.md
exploratory_Task_Script.R		exploratory_Task_Script.R
fast_fourgrams.R		fast_fourgrams.R
getting_Task.Rmd		getting_Task.Rmd
getting_Task.md		getting_Task.md
getting_Task_Script.R		getting_Task_Script.R
predictiveA_Task_Script.R		predictiveA_Task_Script.R
predictive_Task.Rmd		predictive_Task.Rmd
predictive_Task.md		predictive_Task.md
predictive_Task_Script.R		predictive_Task_Script.R
project_overview.Rmd		project_overview.Rmd
project_overview.md		project_overview.md
slideDeck_Task.Rmd		slideDeck_Task.Rmd
slideDeck_Task.md		slideDeck_Task.md
syllabus.Rmd		syllabus.Rmd
syllabus.md		syllabus.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

JHU Data Science Capstone Project

Describing the Project

The Project

Next Word Prediction Model

Next Word Prediction App

Documentation and Source Code

Shiny App

Shiny App Source Code repository on Github

Data Specialization Capstone repository on Github

References

About

Uh oh!

Releases

Packages

Languages

groupejopa/JHU-Data-Science-Capstone-

Folders and files

Latest commit

History

Repository files navigation

JHU Data Science Capstone Project

Describing the Project

The Project

Next Word Prediction Model

Next Word Prediction App

Documentation and Source Code

Shiny App

Shiny App Source Code repository on Github

Data Specialization Capstone repository on Github

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages