Skip to content

kargaranamir/Hengam

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hengam: An Adversarially Trained Transformer for Persian Temporal Tagging

HuggingFace

Code (Software)

HengamTagger

The HengamTagger is distributed as Parstdex package (will be continuously updated) and is available via pip.

HengamTransformer

You can use this model directly downloading the utils and requirements files and installing requirements:

! wget https://huggingface.co/spaces/kargaranamir/Hengam/raw/main/utils.py
! wget https://huggingface.co/spaces/kargaranamir/Hengam/raw/main/requirements.txt
! pip install -r requirements.txt

and downloading the models HengamTransA.pth or HengamTransW.pth and building ner pipline:

import torch
from huggingface_hub import hf_hub_download
from utils import *

# HengamTransW = hf_hub_download(repo_id="kargaranamir/Hengam", filename="HengamTransW.pth")
HengamTransA = hf_hub_download(repo_id="kargaranamir/Hengam", filename="HengamTransA.pth")
# ner = NER(model_path=HengamTransW, tags=['B-TIM', 'I-TIM', 'B-DAT', 'I-DAT', 'O'])
ner = NER(model_path=HengamTransA, tags=['B-TIM', 'I-TIM', 'B-DAT', 'I-DAT', 'O'])
ner('.سلام من و دوستم ساعت ۸ صبح روز سه شنبه رفتیم دوشنبه بازار ')
>>
[{'Text': 'ساعت', 'Tag': 'B-TIM', 'Start': 17, 'End': 21},
 {'Text': '۸', 'Tag': 'I-TIM', 'Start': 22, 'End': 23},
 {'Text': 'صبح', 'Tag': 'I-TIM', 'Start': 24, 'End': 27},
 {'Text': 'روز', 'Tag': 'I-TIM', 'Start': 28, 'End': 31},
 {'Text': 'سه', 'Tag': 'B-DAT', 'Start': 32, 'End': 34},
 {'Text': 'شنبه', 'Tag': 'I-DAT', 'Start': 35, 'End': 39}]

Alos, in this github 4 different notebooks are provided to train and load the model. Click on the hyperlinks to open each in Google Colab.

Data

Train Data

HengamCorpus data along with strong labeled data is uploaded in HuggingFace. Click on hyperlinks to download.

Evaluation Data

HengamGold, challenge set and prediction result of different models on HengamGold is available in evaluation folder.

Citation

If you use any part of this repository in your research, please cite it using the following BibTex entry.

@inproceedings{mirzababaei-etal-2022-hengam,
	title        = {Hengam: An Adversarially Trained Transformer for {P}ersian Temporal Tagging},
	author       = {Mirzababaei, Sajad  and Kargaran, Amir Hossein  and Sch{\"u}tze, Hinrich  and Asgari, Ehsaneddin},
	year         = 2022,
	booktitle    = {Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing},
	publisher    = {Association for Computational Linguistics},
	address      = {Online only},
	pages        = {1013--1024},
	url          = {https://aclanthology.org/2022.aacl-main.74}
}