Octopus

Standardized framework to train and evaluate NLP models fast and easily. No need to have advanced knowledge of implementing or training NLP models.

Prerequisites

In order to run this framework, you have to install some dependencies as follows:

$ pip install -r requirements.txt

Some dependencies include:

transformers>=4.24.0
datasets>=2.7.0
torch>=1.12.1

If you want to use a GPU/CUDA, you must install PyTorch with the matching CUDA Version. Follow PyTorch - Get Started for further details how to install PyTorch.

Instructions

There are two ways to use this framework:

via command line (CLI)
via import python module

Getting started with python module

This example shows you how to train/evaluate/inference a Text Classification model.

from classification.sentence import SentenceClassifier

model = SentenceClassifier('sentence-classification',
                           <model_name>)
model.train(output_dir="./",
            train_file=<path_to_train_file>,
            validation_file=<path_to_validation_file>,
            num_train_epochs=3,
            max_seq_length=128,
            learning_rate=3e-5,
            weight_decay=0.0,
            warmup_steps=0,
            gradient_accumulation_steps=1,
            per_device_train_batch_size=16,
            do_eval=True,
            fp16=True,
            evaluation_strategy="epoch")

# Evaluate a file
print("Result:", model.evaluate(eval_filename))

# Predict an input sentence
print(model.predict("Good night 😊"))

# Output
# {"positive": 0.8466}

Getting started with CLI

🔮 The guide below shows you how to train a Text classifier via CLI.

python3 run_cli.py train --task_name=sentence-classification \
                          --model_name=<model_name> \
                          --train_file=<path_to_train_file> \
                          --validation_file=<path_to_validation_file> \
                          --output_dir=./ \
                          --lr=3e-5 \
                          --epochs=3 \
                          --max_length=128 \
                          --warmup_steps=0 \
                          --weight_decay=0.0 \
                          --batch_size=16 \
                          --gradient_accumulation_steps=1 \
                          --fp16

For training, if you don't know what pretrained model to use, just remove the model_name argument then we will show you a list of model suggestions.

🔥 To evaluate the model,

python3 run_cli.py evaluate --task_name=sentence-classification \
                             --model_name=<model_name> \
                             --eval_file=<path_to_train_file> \
                             --max_length=128 \
                             --batch_size=32 \
                             --fp16

Sample Data Format

For sentence classification, we define input,label or input1,input2,label (sentence pair) as default column names. Instead of using our default column names, you can use arbitrary column names for one or two sentence columns but have to make sure that your label column name is label.

CSV format

input,label
sentence1,label1
sentence2,label2
...

JSON format

{"id": 1, "input": "<sentence_1>", "label": "<label_1>"}
{"id": 2, "input": "<sentence_2>", "label": "<label_2>"}
...

Application Examples

Currently, we support applications below:

Please refer to specific application for more details.

Contact

If you want to get help or have any questions, don't hesitate to send an email to heraclex12@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
classification		classification
examples		examples
ranking		ranking
utils		utils
.gitignore		.gitignore
README.md		README.md
auto_model.py		auto_model.py
base.py		base.py
requirements.txt		requirements.txt
run_cli.py		run_cli.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

classification

classification

examples

examples

ranking

ranking

utils

utils

.gitignore

.gitignore

README.md

README.md

auto_model.py

auto_model.py

base.py

base.py

requirements.txt

requirements.txt

run_cli.py

run_cli.py

Repository files navigation

Octopus

Prerequisites

Instructions

Getting started with python module

Getting started with CLI

Sample Data Format

Application Examples

Contact

About

Releases

Packages

Contributors 2

Languages

heraclex12/octopus

Folders and files

Latest commit

History

Repository files navigation

Octopus

Prerequisites

Instructions

Getting started with python module

Getting started with CLI

Sample Data Format

Application Examples

Contact

About

Resources

Stars

Watchers

Forks

Languages