RIGA at SemEval 2023 Task 2: MultiCoNER II (Multilingual Complex Named Entity Recognition)

The repository describes RIGA team submission to MultiCoNER II.

Getting started

Create a new environment
```
python -m venv venv
```
Install dependencies
```
pip install -r requirements.txt
```
Now your environment is ready. Next step is get the data from MultiCoNER download page. Put the data in data directory.

Convert the data using parse_conll.py script

python parse_conll.py --source_path {specify a path to dataset in CoNNL format}

Start gathering context using get_context.py script. You'll need to specify your own API key and specifying the dataset split to use. You'll find a TODO comments in the file for a help
On step 5. each context is collected separately for easier navigation and not querying the same sentences multiple times in case of error.
On this step you'll need to merge all of them into a single file. Use merge_context.py script for this purpose. You'll also need to change the dataset split in order to merge contexts for all train/dev/test datasets.
The last step is NER model fine-tuning. You could run python train.py --help command to get all argument list. During the competition we used mainly either distilbert-base-uncased (66M parameters) or xlm-roberta-large models (558M parameters).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

get_context.py

get_context.py

merge_context.py

merge_context.py

parse_conll.py

parse_conll.py

requirements.txt

requirements.txt

train.py

train.py

Repository files navigation

RIGA at SemEval 2023 Task 2: MultiCoNER II (Multilingual Complex Named Entity Recognition)

Getting started

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
get_context.py		get_context.py
merge_context.py		merge_context.py
parse_conll.py		parse_conll.py
requirements.txt		requirements.txt
train.py		train.py

License

emukans/multiconer2-riga

Folders and files

Latest commit

History

Repository files navigation

RIGA at SemEval 2023 Task 2: MultiCoNER II (Multilingual Complex Named Entity Recognition)

Getting started

About

Resources

License

Stars

Watchers

Forks

Languages