Skip to content

Lexonomy development plan (November 2017)

egon w. stemle edited this page Jul 8, 2019 · 3 revisions

Michal Boleslav Měchura, valselob@gmail.com
November 2017

0. Introduction

Lexonomy is a light-weight, web-based system for writing and publishing dictionaries. Following its successful (re-)launch at the eLex conference in September 2017, this document outlines in broad terms how Lexonomy will develop in the next year (2018) and beyond.

This document is subdivided into two halves: the first half lists top-priority developments requested most frequently by current and prospective users, the second half lists other developments which are in the pipeline. Developments in the first half will likely be implemented soon (certainly by the end of 2018 but many sooner than that). Items in the second half are under consideration but are not in the immediate focus of attention.

Lexonomy's development in the next couple of years will continue to be sponsored by Lexical Computing (the company that makes Sketch Engine, a popular corpus query system) and by funding from the European Union-funded ELEXIS projects.

During this time, Lexonomy will continue to be open-source software, with source code available from a GitHub repository and licensed under the MIT License which allows unrestricted re-use, even for commercial purposes. This means that anybody in the world can download and set up their own local installation of Lexonomy and customize it to their requirements.

Lexonomy will continue to have a special relationship with Sketch Engine, a corpus query system developed by Lexical Computing. Both Lexonomy and Sketch Engine will have features for "talking" to each other. Together, Sketch Engine and Lexonomy will support Lexicographers along the entire pipeline of producing a dictionary, from corpus to screen. In particular, Sketch Engine and Lexonomy will move together to establish a new kind of workflow in lexicography: the paradigm of "One-Click Dictionary" where dictionaries are pre-generated automatically from a corpus (using Sketch Engine) and then post-edited in a dictionary writing system (using Lexonomy).

1. Top-priority developments

Hosting

Lexonomy's "home" installation at www.lexonomy.eu will continue to be free: anyone in the world can create an account and start creating and publishing dictionaries. The only restrictions are:

  1. Dictionaries published on www.lexonomy.eu must be under one of the following open-source licenses: CC0, CC-BY, CC-BY-SA or ODbL.

  2. The service is provided on an as-is basis. While every effort will be made to keep this free service available at all times and for ever, we offer are no guarantees of server uptime or long-term availability.

The website www.lexonomy.eu will be hosted by Lexical Computing (the company that makes Sketch Engine). People and organizations who do not want to be subject to the restrictions above will be asked to purchase a commercial dictionary hosting service from Lexical Computing. Lexical Computing will provide a dictionary hosting service (using Lexonomy) as a companion to their corpus hosting service (using Sketch Engine). The details of this, including the prices, will be worked out soon in 2018.

HTTPS

The "home" installation www.lexonomy.eu will be accessible with the HTTPS protocol (with automatic redirection from HTTP).

Automated new user registration

New users will no longer have to ask politely by e-mail to set up an account.

Help and manual

Lexonomy will come with a proper manual. Various relevant sections of the manual will be accessible through a "help" button everywhere in the user interface.

Push and pull

The features for "pushing" and "pulling" lexicographic data between a Sketch Engine corpus and a Lexonomy dictionary will be developed further and will eventually cover:

  • example sentences (optionally sorted by GDEX)
  • collocations (with or without example sentences)
  • definitions and/or descriptions
  • automatically clustered word senses
  • translation candidates
  • thesaurus items
  • images

Post-editing

To support the new paradigm of "One-Click Dictionary", Lexonomy will develop some dedicated features for post-editing an automatically pre-generated dictionary: features for quickly splitting and lumping senses, for distributing example sentences into senses, and so on.

Media files

Users will be able to upload images and other media files (sound files, videos) into a dictionary and link to them from XML elements and attributes in entries. When formatting entries for display Lexonomy will make sure the media files are presented appropriately (images are displayed, sound and videos are playable).

Working with XML fragments

In Lexonomy's XML editor, you will be able to duplicate sections of the XML and copy/paste them around the XML document.

In addition, there will be a "lay-by area" beside the entry where you can temporarily store entry fragments: example sentences that you have not moved into a sense yet etc.

External links

Users will be able to include external links (= internet URLs with an optional caption) in entries. Lexonomy will make sure they are clickable when the entry is formatted for display.

Cross-references

The current version of Lexonomy already has a primitive mechanism for handling cross-reference: some XML elements an be formatted as "clickable". More sophistication is needed in this area, however.

In the future, users will be able to include cross-references from one entry to another entry or to a location in another entry (such as specific sense inside a dictionary entry). Lexonomy will make sure the cross-references are clickable when the entry is formatted for display. Also, Lexonomy will keep track of what cross-references what, will make sure that cross-references are never broken, and (if the dictionary is so configured) will make sure that cross-references are reciprocated: if X cross-references Y, then Y must also cross-reference X.

Customized access rights

It will be possible for a dictionary administrator to limit a user's access to only a subset of entries, or to only some locations in an entry. For example, some users may only be allowed to add translations to entries but cannot change anything else.

Workflow features

It will be possible to entries to users, to mark the status of entries (new, in progress, finished etc), to group entries into batches etc.

High-level view of a dictionary

There will be a dashboard with statistics on the dictionary, both static (how many entries, their relative sizes, their alphabetical distribution, their workflow status etc) and dynamic (what has recently happened to my dictionary, recent changes etc).

Entry history

Internally, Lexonomy keeps track of who saved what and when, and has a complete history for each entry. Based on this, Lexonomy will offer features for viewing an entry's history and for restoring previous versions.

XML upload

Lexonomy's current XML upload feature has some problems. It fails on large files and seems to have a weird problem with files not produced by Lexonomy itself. We will rethink and fix this.

"Hackability"

Even though part of Lexonomy's mission is not to expect its users to have any knowledge of coding, Lexonomy will be enriched with a few "some coding required" features which advanced "coding-aware" users will be able to use to customize Lexonomy and to integrate Lexonomy into their own internal toolsets. In particular, the following will be developed:

  • A feature to upload your own XML schema and/or to induce a schema from an example XML file.
  • A feature to upload your own stylesheet in XSL/CSS.
  • An API for integrating Lexonomy into a workflow, e.g. for reading entries that have been changed recently, doing something with them and then saving them back in.
  • Customized entry editors: dictionary administrators will be able to hand-code and upload a customized entry editor to replace Lexonomy's default XML editor.

2. Developments in the pipeline

Undo

While editing an entry, Lexonomy will have an undo button, like you'd find in a typical word processor.

Search and filtering

Lexonomy needs richer features for finding entries based on different criteria, including:

  • Finding entries that have or do not have a specific XML element or attribute, a specific number of specific elements or attributes, a specific value inside a specific element or attribute and so on.
  • Find entries that have validation errors (as per the XML schema)

Entry templates

It will be possible to set up a number of entry templates in a dictionary. When the user starts a new entry, they will have the option of basing it on a template, as well as starting from a completely blank entry as they do now.

Motivation: Templates are often used in dictionary writing systems to encourage structural consistency across multiple entries that belong together, such as colour terms or country names.

Project templates

When creating a new dictionary, Lexonomy gives you a choice of several templates (as well as a completely blank one). All these templates are very simple, which makes them suitable for learning Lexonomy but not for real work. I will develop a few more realistic templates suitable for industrial-strength applications, replicating (subsets of) schemas from well-known standards such as TEI and LMF.

Entry locking

Lexonomy will not use any form of entry locking (= blocking other users from opening/saving an entry if someone else currently has it opened). Instead, Lexonomy will go for a light-weight approach: it will inform the user if it looks like the entry he or she is editing is currently opened by another user or if the entry has been saved by another user in the mean time.

Subentry sharing

A dictionary administrator will be able to configure a dictionary such that some parts of an entry will be 'shareable' between several entries. This will ensure that, for example, phraseological subentries are able to appear under more than one headword. Note: This will implement a suggestion from my Raslan paper http://www.lexiconista.com/raslan2016.pdf

Dictionary pairing

A dictionary administrator will be able to set up a mapping between two dictionaries. Based on this mapping, Lexonomy will guide users to make sure the dictionaries are synchronized. Note: This will implement a suggestion from my Raslan paper http://www.lexiconista.com/raslan2016.pdf

Autosave in some configs

Every screen in the configuration section (as well as the One-Click Dictionary API key screen) requires you to click the Save button but save your changes. In some cases this is unintuitive and people keep forgetting it.

Optionally publish only a subset of entries

In the current version of Lexonomy, when you decide to publish your dictionary, you have no option but to publish all entries in the dictionary. In the next version you will optionally be able to select only a subset of entries for publishing, for example only entries that have a specific status.

Rich-text formatting in blurb

When publishing a dictionary in Lexonomy, the current version allows you to supply a 'blurb': a short description which appears on the dictionary's homepage. The blurb is in plain text. Lexonomy will allow rich-text formatting in the blurb: bold, italics, paragraph breaks, clickable links etc (in other words: Markdown).

Optional 'about' pages

In addition to a blurb, the next version of Lexonomy will allow you to add one or more publicly visible 'about' pages to your published dictionary (in other words: a wiki with Markdown).

Upload your own logo

You will be able to upload your own logo and/or a background graphic when publishing a dictionary.

Public interface to be viewable if logged in

Currently, a dictionary's public interface is only viewable if the dictionary has been declared "public". The applies to both anonymous web users and logged-in dictionary editors.

It might be a good idea to make it possible for logged-in dictionary editors to view the public interface even before the dictionary has been declared "public".

User interface localization

It will be possible to translate Lexonomy's user interface into other languages beside English.

Delete many dictionaries at the same time

For users who have very many dictionaries (which can happen when you experiment a lot with the push feature from Sketch Engine), a feature is needed to select and quickly destroy many of them at the same time.

QA rules

Make it possible for dictionary administrators to specify QA rules that go beyond the entry schema, for example: If an entry has more than one sense then each sense must have at least one example sentence.

Defining vocabularies

Features to support the use of defining vocabularies in definitions etc: A bit like spellcheck but with the vocabulary coming from a custom dataset and/or the dictionary itself.

Make the push/pull APIs public

So other people can build other things to push/pull from, e.g. pull example sentences from your own database of citations (not a corpus).