Skip to content
πŸ”€ πŸ‘€ Seeing Language Through Character Level Taggers
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
datasets/ud-2.3
images
models
.gitignore
AsymBiLSTM.py
LICENSE
PDI.ipynb
README.md
analysis.py
analysis_utils.py
evaluate_morphotags.py
infer.py
model.py
test.py
utils.py

README.md

Character Eyes

Code for our project analyzing character level taggers. This repository is a work in progress but contains some of our code and analysis. More will be added soon!

example activations

Contents

  • model.py - A fully character level tagger model, implemented in DyNet. It has support for asymmetric bi-directional RNNs, which we found had performance effects depending on linguistic properties of the language.
  • Pretrained models for 5 of our 24 languages
  • Ready-to-train datasets (from Univseral Dependencies 2.3) for all 24 languages
  • This notebook reproduces some of the figures and charts in our paper.

Coming Soon

  • Interactive Notebooks - play with character level representations on the fly!
  • better dependencies/requirements.txt
  • Storage size permitting, more pretrained models including asymmetric configurations

Much of the code is modified from Mimick, a character level system that can replace OOVs or UNKs with learned representations approximating a closed vocabulary set of word embeddings.

Citation format

When using our work, please use the following .bib entry:

@article{charactereyes,
  title={Character Eyes: Seeing Language through Character-Level Taggers},
  author={Pinter, Yuval and Marone, Marc and Eisenstein, Jacob},
  journal={arXiv preprint arXiv:1903.05041},
  year={2019}
}
You can’t perform that action at this time.