Natural-Language-Processing End-to-End Implementation Examples

In recent years, natural language processing (NLP) has seen quick growth in quality and usability, and this has helped to drive business adoption of artificial intelligence (AI) solutions. In the last few years, researchers have been applying newer deep learning methods to NLP. Data scientists started moving from traditional methods to state-of-the-art (SOTA) deep neural network (DNN) algorithms which use language models pretrained on large text corpora.

This repository contains the full implementation example of several Natural Language Processing methods in Python, which can be used in any dataset of indutry to readily usage. I tried to output it all as jupyter notebook so that it's easy to read and follow through.

The goal of this repo is to provide a comprehensive set of tools and examples that leverage recent advances in NLP algorithms.

When properly consumed in order of the notebooks, it will guide you through the basics of NLP concepts and skills through several different libraries (Keras, Tenforfow), and eventually will help build production-level systems like Chatbot / Recommendation system based on the language data

Implementation

01_News_Category_Classification:

We try to classify the category of the News using the pre-trained embedding model. Use Keras to build model from scrath and start training.

02_Sentiment_Analysis_in_Keras:

We perform a sentiment Analysis using Google BERT model on the movie data with TF Keras API. We use Keras API this time to do the analysis, as Pytorch version examples are already a lot. Also, We will use Korean Movie Review dataset, as analysis done in English Movie Review (IMDB) are easy-to-be-find online. You should be able to have a firm grasp of how Google's language model 'BERT' works, and fine-tune it to apply to any of custom business problems.

03_Sentiment_Analysis_in_Tensorflow:

Same sentiment Analysis using Google BERT model on the movie data with Tensorflow 2.0. Here we utilize transformers 'BertTokenizer' and 'BertModel' to easily load functions necessary in BERT, and use it in training.

04_SQuAD_in_Keras:

Build SQuAD model using Keras and BERT. Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

05_SQuAD_in_Tensorflow:

This time, we are going to solve the same SQuAD problem with Tensorflow by fine-tuning pretrained BERT-Large model

06_Named_Entity_Recognition_in_Tensorflow:

Solve the another application of Natural Language Processing - Names Entity Recognition (NER). Named entity recognition (NER) ‒ also called entity identification or entity extraction ‒ is an information extraction technique that automatically identifies named entities in a text and classifies them into predefined categories.

07_End-to-End_Speech_Recognition_in_Tensorflow:

Build a simpel End-to-End Speech-to-Text model using librosa library. The model takes recordings of 10 different classes or words (data from Kaggle Speech Recognition Challenge), trains algorithm that is in Convolutional 1D, and predicts the sound in text.

08_Generate_fluent_English_text_using_GPT2:

A simple example to look at different decoder method provided by Transformer library. We use GPT2 specifically to to see which decoder gerenates the most human-like languages when given texts.

ExtraLearning

Novel generator using KoGPT2 and pytorch Link
Text GEneration / Lyric GEneration / SQuAD fine-tuning Link
Make a simple Chat-bot in Korean language using Korean language data and pre-trained KoGPT2 model Link

Refernece

https://github.com/kimwoonggon/publicservant_AI
https://github.com/microsoft/nlp-recipes
https://github.com/NirantK/nlp-python-deep-learning
https://github.com/monologg/KoBERT-NER \

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
img		img
01_News_Category_Classification.ipynb		01_News_Category_Classification.ipynb
02_Sentiment_Analysis_in_Keras.ipynb		02_Sentiment_Analysis_in_Keras.ipynb
03_Sentiment_Analysis_in_Tensorflow.ipynb		03_Sentiment_Analysis_in_Tensorflow.ipynb
04_SQuAD_in_Keras.ipynb		04_SQuAD_in_Keras.ipynb
05_SQuAD_in_Tensorflow.ipynb		05_SQuAD_in_Tensorflow.ipynb
06_Named_Entity_Recognition_(NER)_in_Tensorflow.ipynb		06_Named_Entity_Recognition_(NER)_in_Tensorflow.ipynb
07_Multi-Class_Text_Classification_with_PySpark_and_Doc2Vec.ipynb		07_Multi-Class_Text_Classification_with_PySpark_and_Doc2Vec.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Natural-Language-Processing End-to-End Implementation Examples

Table of contents

Requirements

Usage

Data

Implementation

01_News_Category_Classification:

02_Sentiment_Analysis_in_Keras:

03_Sentiment_Analysis_in_Tensorflow:

04_SQuAD_in_Keras:

05_SQuAD_in_Tensorflow:

06_Named_Entity_Recognition_in_Tensorflow:

07_End-to-End_Speech_Recognition_in_Tensorflow:

08_Generate_fluent_English_text_using_GPT2:

ExtraLearning

Refernece

About

Releases

Packages

Languages

hyunjoonbok/natural-language-processing

Folders and files

Latest commit

History

Repository files navigation

Natural-Language-Processing End-to-End Implementation Examples

Table of contents

Requirements

Usage

Data

Implementation

01_News_Category_Classification:

02_Sentiment_Analysis_in_Keras:

03_Sentiment_Analysis_in_Tensorflow:

04_SQuAD_in_Keras:

05_SQuAD_in_Tensorflow:

06_Named_Entity_Recognition_in_Tensorflow:

07_End-to-End_Speech_Recognition_in_Tensorflow:

08_Generate_fluent_English_text_using_GPT2:

ExtraLearning

Refernece

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages