Switch branches/tags
Nothing to show
Find file History
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
model_dumps Include demo Nov 15, 2018
web unify sharing link in demo Nov 27, 2018
HMTL_demo.png Update README and co Nov 15, 2018
README.md Clarification on the demo and weights --> English. Dec 10, 2018
hmtlPredictor.py Format/clean code Dec 6, 2018
predictionFormatter.py Format/clean code Dec 6, 2018
requirements.txt Include demo Nov 15, 2018
server.py Format/clean code Dec 6, 2018

README.md

🎮 Demo: HMTL (Hierarchical Multi-Task Learning model)

Introduction

This is a demonstration of our NLP system: HMTL is a neural model for resolving four fundamental tasks in NLP, namely Named Entity Recognition, Entity Mention Detection, Relation Extraction and Coreference Resolution using multi-task learning.

For a brief introduction to multi-task learning, you can refer to our blog post. Each of the four tasks considered is detailed in the following section.

The web interface for the demo can be found here for you to try and play with it. HMTL comes with the web visualization client if you prefer to run on your local machine. The demo (and the released weights) are for English.

HMTL Demo

Setup

The web demo is based on Python 3.6 and AllenNLP.

The easiest way to setup a clean and working environment with the necessary dependencies is to refer to the setup section in the parent folder. A few supplementary dependecies are listed in requirements.txt and are required to run the demo.

We also release three pre-trained HMTL models on English corporas. The three models essentially differ by the size of the ELMo embeddings used and thus the size of the model. The bigger the model, the higher the performance:

Model Name NER (F1) EMD (F1) RE (F1) CR(F1) Description
conll_small_elmo 85.73 83.51 58.40 62.85 Small version of ELMo
conll_medium_elmo 86.41 84.02 58.78 61.62 Medium version of ELMo
conll_full_elmo (default model) 86.40 85.59 61.37 62.26 Original version of ELMo

To download the pre-trained models, please install git lfs and do a git lfs pull. The weights of the model will be saved in the model_dumps folder.

Description of the tasks

Named Entity Recognition (NER)

Named Entity Recognition aims at identifying and clasifying named entities (real-world object, such as persons, locations, etc. that can be denoted with a proper name).

[Homer Simpson]PERS lives in [Springfield]LOC with his wife and kids.

HMTL is trained on OntoNotes 5.0 and can recognized various types (18) of named entities: PERSON, NORP, FAC, ORG, GPE, LOC, etc.

Entity Mention Detection (EMD)

Entity Mention Detection aims at identifying and clasifying entity mentions (real-world object, such as persons, locations, etc. that are not necessarily denoted with a proper name).

[The men]PERS held on [the sinking vessel]VEH until [the ship]VEH was able to reach them from [Corsica]LOC.

HMTL can recognized different types of mentions: PER, GPE, ORG, FAC, LOC, WEA and VEH.

Relation Extraction (RE)

Relation extraction aims at extracting the semantic relations between the mentions.

The different types of relation detectec by HMTL are the following:

Shortname Full Name Description Example
ART Artifact User-Owner-Inventor-Manufacturer {Leonard de Vinci painted the Joconde., ARG1 = Leonard de Vinci, ARG2 = Joconde}
GEN-AFF Gen-Affiliation Citizen-Resident-Religion-Ethnicity, Org-Location {The people of Iraq., ARG1 = The people, ARG2 = Iraq}
ORG-AFF Org-Affiliation Employment, Founder, Ownership, Student-Alum, Sports-Affiliation, Investor-Shareholder, Membership {Martin Geisler, ITV News, Safwan southern Iraq., ARG1 = Martin Geisler, ARG2 = ITV News}
PART-WHOLE Part-whole Artifact, Geographical, Subsidiary {They could safeguard the fields in Iraq., ARG1 = the fields, ARG2 = Iraq}
PER-SOC Person-social Business, Family, Lasting-Personal {Sean Flyn, son the famous actor Errol Flynn, ARG1 = son, ARG2 = Errol Flynn}
PHYS Physical Located, Near {The two journalists worked from the hotel., ARG1 = the two journalists, ARG2 = the hotel}

For more details, please refer to the dataset release notes.

Coreference Resolution (CR)

In a text, two or more expressions can link to the same person or thing in the worl. Coreference Resolution aims at finding the coreferent spans and cluster them.

[My mom]1 tasted [the cake]2. [She]1 liked [it]2.

Using HMTL as a server

HTML can be used as a REST API. A simple example of server script is provided as an example in server.py. To launch a specific model (please make sure to be in a environment with all the dependencies before: source .env/bin/activate):

gunicorn -b:8000 'server:build_app(model_name="<model_name>")'

or simply launching the default (full) model:

gunicorn -b:8000 'server:build_app()'

You can then call then the model with the following command: curl http://localhost:8000/jmd/?text=Barack%20Obama%20is%20the%20former%20president..