efficient_training

code for the paper efficient training for historical texts

After cloning the repository change into the subdirectory scripts. There you can

run the NER script from Huggingface on the commandline:

python run_ner.py config-bert.json

The Bert configuration should run out of the box (and the distilbert vanilla) because the pretrained model is downloaded. The others have to be installed by hand.

run the crossvalidation script, you have to modify crossval.json to your wishes. The most important aspect is the model, you want to train. Then just run:

python crossvalidation.py crossval.json

models

electra-small

Trained from scratch

Data: 8GB raw text, 19.200.200 training examples (DOC-SENTENCE style, see Roberta Paper)
Batch size: 64
Device: RTX 2080ti
Training steps: 1.600.000
Training time: 14d

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
models		models
scripts		scripts
.gitignore		.gitignore
Ensemble.ipynb		Ensemble.ipynb
README.md		README.md
dodo.py		dodo.py
test.csv		test.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

efficient_training

models

electra-small

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

efficient_training

models

electra-small

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages