Multilingual Joint Fine-tuning of Transformer models for identifying Trolling, Aggression and Cyberbullying at TRAC 2020

Video presentation and slides
Citation
Usage
Visualize models in exbert

Models and predictions for submission to TRAC - 2020 Second Workshop on Trolling, Aggression and Cyberbullying.

Our trained models as well as evaluation metrics during traing are available at: https://databank.illinois.edu/datasets/IDB-8882752# We also make a few of our models available in HuggingFace's models repository at https://huggingface.co/socialmediaie/, these models can be further fine-tuned on your dataset of choice.

Our approach is described in our paper titled:

Mishra, Sudhanshu, Shivangi Prasad, and Shubhanshu Mishra. 2020. "Multilingual Joint Fine-Tuning of Transformer Models for Identifying Trolling, Aggression and Cyberbullying at TRAC 2020." In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying (TRAC-2020).

The source code for training this model and more details can be found on our code repository: https://github.com/socialmediaie/TRAC2020

NOTE: These models are retrained for uploading here after our submission so the evaluation measures may be slightly different from the ones reported in the paper.

Video presentation and slides

Slides PDF PPTX

Citation

If you plan to use the dataset please cite the following resources:

Mishra, Sudhanshu, Shivangi Prasad, and Shubhanshu Mishra. 2020. "Multilingual Joint Fine-Tuning of Transformer Models for Identifying Trolling, Aggression and Cyberbullying at TRAC 2020." In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying (TRAC-2020).
Mishra, Shubhanshu, Shivangi Prasad, and Shubhanshu Mishra. 2020. “Trained Models for Multilingual Joint Fine-Tuning of Transformer Models for Identifying Trolling, Aggression and Cyberbullying at TRAC 2020.” University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-8882752_V1.

@inproceedings{Mishra2020TRAC,
author = {Mishra, Sudhanshu and Prasad, Shivangi and Mishra, Shubhanshu},
booktitle = {Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying (TRAC-2020)},
title = {{Multilingual Joint Fine-tuning of Transformer models for identifying Trolling, Aggression and Cyberbullying at TRAC 2020}},
year = {2020}
}

@data{illinoisdatabankIDB-8882752,
author = {Mishra, Shubhanshu and Prasad, Shivangi and Mishra, Shubhanshu},
doi = {10.13012/B2IDB-8882752_V1},
publisher = {University of Illinois at Urbana-Champaign},
title = {{Trained models for Multilingual Joint Fine-tuning of Transformer models for identifying Trolling, Aggression and Cyberbullying at TRAC 2020}},
url = {https://doi.org/10.13012/B2IDB-8882752{\_}V1},
year = {2020}
}

Usage

The models can be used via the code below.

A notebook with this code can be found at ipynb

from transformers import AutoModel, AutoTokenizer, AutoModelForSequenceClassification
import torch
from pathlib import Path
from scipy.special import softmax
import numpy as np
import pandas as pd

TASK_LABEL_IDS = {
    "Sub-task A": ["OAG", "NAG", "CAG"],
    "Sub-task B": ["GEN", "NGEN"],
    "Sub-task C": ["OAG-GEN", "OAG-NGEN", "NAG-GEN", "NAG-NGEN", "CAG-GEN", "CAG-NGEN"]
}

model_version="databank" # other option is hugging face library
if model_version == "databank":
    # Make sure you have downloaded the required model file from https://databank.illinois.edu/datasets/IDB-8882752
    # Unzip the file at some model_path (we are using: "databank_model")
    model_path = next(Path("databank_model").glob("./*/output/*/model"))
    # Assuming you get the following type of structure inside "databank_model"
    # 'databank_model/ALL/Sub-task C/output/bert-base-multilingual-uncased/model'
    lang, task, _, base_model, _ = model_path.parts
    tokenizer = AutoTokenizer.from_pretrained(base_model)
    model = AutoModelForSequenceClassification.from_pretrained(model_path)
else:
    lang, task, base_model = "ALL", "Sub-task C", "bert-base-multilingual-uncased"
    base_model = f"socialmediaie/TRAC2020_{lang}_{task.split()[-1]}_{base_model}"
    tokenizer = AutoTokenizer.from_pretrained(base_model)
    model = AutoModelForSequenceClassification.from_pretrained(base_model)

# For doing inference set model in eval mode
model.eval()
# If you want to further fine-tune the model you can reset the model to model.train()

task_labels = TASK_LABEL_IDS[task]

sentence = "This is a good cat and this is a bad dog."
processed_sentence = f"{tokenizer.cls_token} {sentence}"
tokens = tokenizer.tokenize(sentence)
indexed_tokens = tokenizer.convert_tokens_to_ids(tokens)
tokens_tensor = torch.tensor([indexed_tokens])

with torch.no_grad():
  logits, = model(tokens_tensor, labels=None)
logits


preds = logits.detach().cpu().numpy()
preds_probs = softmax(preds, axis=1)
preds = np.argmax(preds_probs, axis=1)
preds_labels = np.array(task_labels)[preds]
print(dict(zip(task_labels, preds_probs[0])), preds_labels)
"""You should get an output as follows:

({'CAG-GEN': 0.06762535,
  'CAG-NGEN': 0.03244293,
  'NAG-GEN': 0.6897794,
  'NAG-NGEN': 0.15498641,
  'OAG-GEN': 0.034373745,
  'OAG-NGEN': 0.020792078},
 array(['NAG-GEN'], dtype='<U8'))

"""

Visualize models in exbert

You can find the exbert link on each model's URL. E.g. if you want to visualize socialmediaie/TRAC2020_ALL_C_bert-base-multilingual-uncased, you can use the following link.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
ALL		ALL
ENG		ENG
HIN		HIN
IBEN		IBEN
data/raw		data/raw
notebooks		notebooks
run_submissions		run_submissions
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TRAC2020_YT.jpg		TRAC2020_YT.jpg
TRAC2020_presentation.pdf		TRAC2020_presentation.pdf
TRAC2020_presentation.pptx		TRAC2020_presentation.pptx
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multilingual Joint Fine-tuning of Transformer models for identifying Trolling, Aggression and Cyberbullying at TRAC 2020

Video presentation and slides

Citation

Usage

Visualize models in exbert

About

Releases

Packages

Contributors 3

Languages

License

socialmediaie/TRAC2020

Folders and files

Latest commit

History

Repository files navigation

Multilingual Joint Fine-tuning of Transformer models for identifying Trolling, Aggression and Cyberbullying at TRAC 2020

Video presentation and slides

Citation

Usage

Visualize models in exbert

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages