generalization-classification

Deep learning models for text classification using AllenNLP

This project tackles a binary text classification problem where we have 3,456 instances/sentences annotated for whether they encode a generalization or not. This repository contains the annotated data & code for various deep learning models with the option of using pre-trained GloVe embeddings or ELMo embeddings (see /src/compare-models/).

The best performing model is a CNN using ELMo embeddings which is then trained on the entire training set of 3,456 instances (see .py files in /src/). It is used to predict on a test set of 16,816 sentences. The test set is obtained by extracting sentences from the beginning and end of a set of 230 txt files (see /src/Predict-TestSet.ipynb).

If the notebooks don't load on Github, you can always use Jupyter's nbviewer. All implementations are in Python 3.7 and utilise AllenNLP (0.8.4) and PyTorch (1.1.0) which are powerful deep learning frameworks. Thanks to Keita for this lovely tutorial on AllenNLP.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
predictions		predictions
results		results
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

predictions

predictions

results

results

src

src

.gitignore

.gitignore

README.md

README.md

Repository files navigation

generalization-classification

Deep learning models for text classification using AllenNLP

About

Releases

Packages

Languages

sunyam/generalization-classification

Folders and files

Latest commit

History

Repository files navigation

generalization-classification

Deep learning models for text classification using AllenNLP

About

Topics

Resources

Stars

Watchers

Forks

Languages