Visualizer for genres based on neural network language models
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data
templates
LICENSE
README.md
dsm_genres.cfg
dsm_genres.py
favicon.ico
images_cache.csv
models.csv
plots.py
run_explorer.py
sparql.py
tagger.py
tau.py
word2vec_server_genres.py

README.md

Register Explorer

Language registers explorer powered by word embeddings.

This is the framework behind the web service http://ltr.uio.no/embeddings/registers/ accompanying the following paper:

A. Kutuzov, A.Marakasova, and E. Kuzmenko. Exploration of register-dependent lexical semantics using word embeddings. In Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH), pp. 26-34. COLING 2016, Osaka, Japan.

https://www.clarin-d.net/images/lt4dh/pdf/LT4DH05.pdf

This source code can be easily adapted to any set of distributional models.

Installation

  1. Clone the repository
  2. Tune the config file dsm_genres.cfg according to your setup (especially root and url parameters)
  3. Put your models into the models subdirectory (models can be either in gzipped binary word2vec format or in Gensim format)
  4. Describe your models in the models.csv file; the model with all identifier will serve as the reference one
  5. NB: we presuppose that your models use words augmented with PoS tags ('boot_VERB'); if they don't, you'll have to tune the code a bit
  6. Download Stanford Core NLP suite (https://stanfordnlp.github.io/CoreNLP/); we use it for linguistic processing of the queries
  7. Run Core NLP in the background (something like java -mx2g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer --port 9999)
  8. Run the script word2vec_server_genres.py in the background; it loads the models, stays in memory and answers queries from the web interface
  9. Install Gunicorn (http://gunicorn.org/)
  10. Run the service with something like gunicorn run_explorer:app_explorer -b 127.0.0.1:10000
  11. If your url parameter was set to "/mymodels/", your service should now be available at http://127.0.0.1:10000/mymodels/

In case you have any questions about the code, feel free to ask us.

Our contacts can be found at http://ltr.uio.no/embeddings/registers/about