Skip to content

🤗 Dockerized BERT-Multi-Label-Classifier Inferer 🤗

Notifications You must be signed in to change notification settings

JulesBelveze/BERT-sequence-classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤗 BERT-Multi-Label-Classifier / Dockerized Inferer 🤗

Repository to fine-tune a BERT-base multi-label/multi-class classifier, based on HuggingFace library. The repository includes a Flask API wrapper for inference.

Table of contents

Installation

To install the repository please run the following command:

git clone https://github.com/JulesBelveze/BERT-multi-label-classifier.git

The repository uses Poetry as a package manager (see full documentation here). To install the required packages please run the following commands:

python3 -m venv .venv/bert-mlc
source .venv/bert-mlc/bin/activate
poetry install

This repo uses neptune.ai to manage experiments. We invite you to look at their documentation if needed.

Organisation of files

  • models/: folder containing custom models
  • utils/: folder containing function utilities
  • main.py: main file to run
  • train.py: file containing the training procedure
  • eval.py: file containing the evaluation procedure
  • app.py: file containing the Flask app
  • inferer.py: file containing the model inferer
  • poetry.lock: Poetry file
  • pyproject.toml: Poetry file
  • requirements_inference.txt: required packages for inference
  • Dockerfile: file to run the API as a docker image

Datasets

Models

We provide customisation of four different models: BERT, Roberta, XLMRoberta and Distilbert.

1. Multi-label-classifier

The model is an adaptation of the BertForSequenceClassification model of HuggingFace to handle multi-label. The key modification here is the modification of loss function.

2. Multi-class-classifier

The model used is basically a MLP on top of a BERT model. Once again, the custom model provided extends the BertForSequenceClassification model of HuggingFace to integrate the class weights in the loss function.

Inference

The inferrer only supports single input inference. It handles all the processing steps required to feed the text into the classification model. It can be used in the following way:

model_infer = ModelInferer(config=config, checkpoint_path=checkpoint_path, quantize=True)
model_infer.predict("I hate you from more than you can imagine")

We also provide a Flask API that encapsulates the inferrer as well as a way Dockerized the app for production usage.