Object structure example
verbly has a rather complicated object model. Understanding the object model is key to being able to query for data effectively. To aid in understanding why the object model is set up the way it is, this article steps through an example of some objects in the model, how they're related, and why those relationships are useful.
The point of verbly is to be able to manipulate concepts that can be expressed using words. For instance, "a course that is traveled". We know that this can be expressed with the word "route". So, we create the "word" object to represent this instantiation of a concept.
One issue that arises is that we know there is another word for this concept, i.e. "path". This relationship is called synonymy. We could have a many-to-many relationship between words in order to represent synonymy, but it is easier to create a new object called "notion", which represents something that can be expressed with words. So, both "route" and "path" are words belonging to the notion "a course that is traveled".
The next issue is that words need to be able to be inflected. The noun "route" also has a plural form, "routes". We could put fields on the word object for each possible type of inflection, and just stuff the textual representation of those inflections into there, but there are a couple of problems with that. First, that makes querying for words harder. One of the major functions verbly provides is the ability to query words, so it is important that this is easy to do. If you want to find a word that is spelled "routes", you would have to look for words that have a base form of "routes" OR a plural form of "routes" OR etc... for each type of inflection.
We could solve this by creating an "inflection" object that belongs to the "word" object and contains fields for the text of the inflection, and the type of inflection. This makes querying for words by text easier, but doesn't completely solve the problem. Consider the notion of "a regular itinerary". This contains a word with the singular form "route" and the plural form "routes", as in a bus route. This word belongs to a different notion than the first word we described, but it is inflected in exactly the same way. This is a form of homography, and would require having duplicate "inflection" objects that share the same text.
The way verbly approaches this is by forgetting the "inflection" object, and creating two different objects: "lemma" and "form". Let's start with "form". A "form" is a literal collection of characters, such as "route" or "routes". There is exactly one "form" for every collection of characters that is part of a word. Thus, it doesn't matter that "route" can mean both "a course that is traveled" and "a regular itinerary". They both use the same "form".
How do they use the "form"? Via the "lemma" object. A "lemma" describes how to inflect a word. Every "word" has a "lemma", and multiple "word"s can have the same "lemma", as in the two "route"s we described earlier. A "lemma" also forms a many-to-many relationship with "form" with an inflection type attached. For instance, a "lemma" that has a base form relationship with the "route" form, and a plural form relationship with the "routes" form.
This object model provides many advantages in addition to those described already:
- There is a second form of homography where different words share a form but are inflected differently. Consider the notion "to plan a course that is traveled". There is a word for that notion with the base form "route". However, the rest of the lemma is different, because this is a verb and does not have a plural inflection, but instead has a simple present, a present participle, a simple past, and a past participle. This is a different "lemma" from the first one we described, but it is related to the same "route" form from earlier.
- It is possible for two different inflections of a lemma to have the same form. Consider the verb lemma in the previous paragraph. The simple past inflection and the past participle inflection of that lemma are both spelled "routed". They are joined to the same form, but are distinct inflections.
- It is possible for there to be two different ways to inflect a lemma. There isn't a "route"-related example for this, so consider the adjective word with the base form "small" meaning "inferior in size". The comparative inflection of this word can be spelled "less" or "lesser". These are two different forms, but they are joined to the lemma with the same inflection.
The next issue concerns pronunciations. If there were only one way to pronounce a form, it would be simple to put the pronunciation information into the "form" object. However, speaker variation prevents this from being so. For example, the form "route" can be pronounced as both "root" and "r-ow-t". To handle this, we create a "pronunciation" object that has a many-to-many relationship with "form". The reason that it is many-to-many as opposed to "form" having many "pronunciation"s is that homophony exist. Consider the form "rout", which is the base form of a word for the notion "to defeat completely". It is pronounced "r-ow-t", which is a pronunciation for the form "route".
This is not an exhaustive list of the relationships that verbly objects can have. For more detailed information, check out the object structure document.
Here is a diagram showing the relationships between the objects discussed in this example: