Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Welcome to Documentation for MLL:
Meta Language Learning
Welcome to Documentation for MLL:
Meta Language Learning
Meta Language learning (MLL) is a data-driven, reading and instant messaging experience which uses memory-assistance and analytics to that track what words you know and don't know as you read through stories that you provide. You create the content and we provide sophisticated, interactive assistance to help you navigate through your stories as you learn a new language.
Currently it supports, English/Spanish, English/Chinese, Chinese/English, Spanish/English, but the basic infrastructure is already in place for more languages in the future.
Motivation: (All languages)
Think back to when you were in grade school learning your native language. The teacher would assign you stories or novels to read to teach you the grammar and structure of the language. When you reached a new word that you didn't understand, did you whip out your smartphone? No, probably not. You made a "guess" about the word in context by looking at the words to the left and to the right. If that didn't work, you talked to a human (probably your parents, sibling or teacher). Only then if that didn't work would you go find a dictionary.
That's precisely how MLL is designed to work: You learn by reading, but instead the software tracks your vocabulary FOR you. Occasionally you use that same corpus of vocabulary to go and chat with real people in our same user interface, but the real magic of the system is to emulate the same experience you did when you were younger through reading while allowing YOU to import the content at your reading level that you care about and not depend on someone else to choose that content for you.
The basic observation behind creating this learning system is very simple: The most well-spoken foreigners in the we know all have a particular trait in common that really stands out:
They are all voracious readers - sometimes they even have better test scores than native speakers. It's spooky how well avid reading correlates to fluent speaking.
We first start with Character-based languages (like Chinese) in this introduction, and then later show how the software also applies equally to Romanized languages like English as well. All languages benefit from of our approach to learning by reading.
Meet the Enemy:
These are the 'characters' that represent Chinese Characters.
The phonetic problem:
汉字 (Hàn zì) != Phonetic Sounds
Now, to be fair, billions of people use character systems. In fact, if you ask a native speaker what their frustration was when they first learned English, they will tell you the complete opposite of the phonetic problem: Which is that romanized words don't have pictures associated with them!
But they are taking for granted that the romanization bootstraps their language learning by giving them (nearly) immediate access to the ability to pronounce what they see. =)
Ok, so, why is this a problem at all? Speakers of character-based languages don't seem to think it's a problem, right? Well, it's a problem of frequency:
What would a foreigner in China, for example, do on a daily basis?
- Reading philosophy texts? No.
- Studying for your law degree? Ummmm. No.
- Driving around the city reading street signs? (Maybe).
- Having casual dinners with VIPs and business men? Yeah, right.
- Arguing with the wife about what's for dinner? Yes! Yes!
So, in my daily language learning classes, I told the teacher, "I don't want to battle with the phonetic problem", and immediately her reaction was, "What? You want to be illiterate?"
Well, of course not. Actually, I want to be VERY literate, because as I foreshadowed above, the best foreign speakers are the ones the read the most literature (and the ones that watch the most TV shows, too =).
So, we need a tool - something to bridge the gap between the ability to vocalize language and the ability to read the vocalizations. Something that would immediately give us the same mental stimulus for Character-based languages that a well-spoken foreign English speaker would experience.
First, begin by copy and pasting a story using the 'Upload New Story' button in the "Account" menu at the top:
You have two choices:
- Upload a text file in native Characters (like a short story or even an e-book) in UTF-8 encoded format.
- Copy and paste a news article or email in native characters.
As you can see in the above example, I have copy and pasted the following characters, and selected "Chinese to English". If you are learning English instead, you could similarly choose "English to Spanish":
Here is a similar example if you are learning English:
Since we assume the user is starting from scratch, MLL has never seen these characters before and will create a new story in your database containing your new story.
Click on the Top Left corner of the page to expand the navigation section:
There are 4 sections in the left-hand pane:
- "Untranslated": These are stories that you just uploaded but which have not yet been prepared yet.
- "Not reviewed": These are prepared stories which have not been reviewed first by a native speaker to correct for invalid conjugations or polyphomes, in the case of character-based languages, which we will explain later.
- "Reading": These are post-reviewed stories which have passed up through the lower two sections and ready for regular language learning and review.
- "Finished": Stories go here when you're finished with them.
You can begin by first clicking 'Translate' and waiting for your first translation.
Once the translation has completed, the story will appear in the "Not reviewed" section. Some large stories, particularly ones that are very dense or have many pages, can take a very long time to translate as MLL is doing a lot of work to properly group together characters into words and prepare the story for you. Periodically MLL has to go online to get a translation, or even reverse-engineer Pinyin-to-character pronunciations, so be patient. (There's a lot going on behind the scenes).
After you refresh the page, the story will move into the 'Not Reviewed' section like, this and will have a number of icons next to the story to indicate the next steps you have to take in the story:
First, you should go to 'Account' and then click on 'Review' to switch the current story to Review mode.
Reviewing before Learning (Review Mode)
Now, we get to the good stuff first: Review mode is the first of the three modes and designed to give the language learner and opportunity to read a "Sanitized" translated story.
This mode is used to build "clean" material for the learner to read. We want to be able to process large amounts of meanings (and pinyin if this is a character-based language) for the learner from MLL as fast as possible as if the learner were reading a book of literature with the same experience as an English novel. To do that, we first need to review the text first for any errors in meaning or tone.
With Character-based languages, not all words have a single meaning, and in particular with Chinese, not all characters have the same tone, and even words that have the same tone don't have the same meaning!
In "Review" mode, a new learner will want to get a "buddy" who is a native speaker to "review" their newly uploaded story in this mode:
The way review mode works is that non-blue words may or may not have more than one definition, sound, or tone. In a Character-based language, these kinds of words are also are called "Polyphomes". These are words that have been identified as not having a single meaning and/or sound per word and are fuzzy or have the same sound but multiple definitions (for Chinese only).
As you can see from the "Legend" on the side, MLL classifies and analyzes words in different ways:
only have a single sound and single meaning, and by clicking on them you will see their translations of the word after the story was processed. For the reviewer, it is not necessary to spend to much time on these words as they are NOT Polyphomes and their translations already accurate. (If they are not correct, then the database used by MLL can sometimes be wrong as well - it is not impossible). Grey words
only have a single sound but have different meanings. For Character-based languages, technically, they are not polyphomes, but they are treated as fuzzy, because they have different meanings. The language learner will not inadvertently learn the wrong sound (safe), but may inadvertently learn the wrong meaning. However, even if the wrong meaning is memorized without a proper review by a native speaker, the incorrect meaning will immediately become obvious in context as the meaning will likely not make much sense. Green words
may several different sounds as well sounds. These words need special attention by the reviewer so that the language learner does memorize the wrong meaning or tone. Eventually, more advanced learners can perform these reviews by oneself because they will be able to select the correct choice by reading in context. Red words
are words that have a "history" of being wrong because they were previously reviewed and corrected in the past across the entire story database. MLL performs sophisticated analytics to automatically recognize these words upon new translations and assigns percentages to figure out which polyphomes have the highest chance of being correct. The new story does not have any black words yet because we have not performed any corrections - to come later. Black words
In the below example, we have clicked on the the words and pinyin for 'one': "yī" (一). This first results in exposing a small blue 'play button. By clicking this button, the learner is challenged to first remember what the meaning is before revealing it. This results in expanding all occurrences of the character and its translation, resulting in showing the translation for that word.
Red, green, and black words, however need to be reviewed for accuracy, by clicking on the word, the different possibilities for that character will appear. Let's first start with a "green" word:
In the above example, we have clicked on the sound 'lǐ' (里), which usually means 'inside' of something. As you can see in the pop-up, the 'Default' selected polyphome is incorrect - it translates to 500 meters (or half a kilometer. Half of a unit of measure is very common in Chinese). Also noticed that because it's green, all the sounds are exactly the same, but only differ in their English meaning.
Checking this is very important distinction in MLL, because later in 'Read' mode, MLL will keep track of exactly which words the learner has memorized and which words the learner has not memorized. So, if you don't correct the polyphomes, MLL will think you have memorized the wrong word, when in fact you have not.
So, by choosing the 'Select' button, we can correct MLL so that the correct word is 'inside':
Also pay attention to the "Percentages" on the left-hand side of the correct choice: This indicates that MLL has recorded how 'popular' this selection is and will re-use this information the next time a story is translated, so that the reviewer does not need to select it again in the future.
Upon refreshing the page, this character is now listed as 'black', indicating that it now has a history in MLL:
It also includes a "(1)" next to it as well as being listed on the right-hand side of the page indicating that this particular word has undergone that number of changes in the history of the database of stories for this user.
The next time a story that is translated containing this word, MLL will modify the default choice to be the most popular one so that the reviewer can focus their time on harder-to-identify words.
Finally, let's take a look at "red" words, as in the word 'zhī' (只):
This time, notice that MLL is telling us that there are multiple completely different sounds for the same word. In this case, there are both a 1st tone and 3rd tone listings for the same character. This time, MLL chose the correct choice by default, so there is no need to spend anymore time on this word.
Unfortunately, if you haven't learned at least a little of your new language before starting the software, you'll have to ask your best friend, your significant other, or your co-worker who is a native speaker to help you out on your first few stories until you get fluent enough at context-matching to know when a choice is wrong or not by yourself.
When learning a romanized language, like English, the most difficult thing for MLL to get right the first time around is verb conjugation and pronouns. While we do our best to make initial guesses, we rely on you and your initial review of the content before using the story for reading-based learning. In the future, we'll use statistics and additional analytics to bear the load, but until then, here is a similar example in Review mode of an English story from before:
Let's first upload a new story to learn English as a native Spanish Speaker:
We then initialize:
And then click 'Review':
And we arrive at this result:
If you look off to the right again, you'll see how MLL classifies individual words:
only have a single conjugation, depending on the correctness of the database MLL has at hand. Grey words
have different definitions or conjugations. Green words
Let's focus on the word 'you'. It's clear that MLL has guessed wrong, as our Spanish database is essentially empty:
Above, we have the option to select the "correct" conjugation. The results in an indicated '(1)' next to the word after we've chosen the correct one. Also noticed the '100%' next to our choice. MLL will automatically analyze new stories that you import and use the selection as the default selection in the future.
Here's another example, while learning English instead as a native Chinese speaker:
The way review mode works in this case is that you would click on the word "Red", revealing both a noun and an adjective:
Similarly, for English, the word "Red" is not a noun in this case, but instead is an adjective. So by choosing the right meaning, we have the result:
The result is that we would have:
This choice for Red will automatically be chosen (based on percentages) in future imported content.
Here's where the real fun begins and the full benefit of MLL comes in to play:
Once you (or your friend) have completed an initial review of the story, you can go back "promote" the story to the Reading section.
Let's see an example in a character based language, such as Chinese:
By clicking "Read" on that story, you will now see a complete view of the reviewed story:
In this mode, you will see a slightly different structure: All the original words and corrected meanings are black, but their translations are not automatically expanded by default. This is to challenge the learner to try to remember words in context before indicating to MLL which ones you have memorized. MLL explicitly does not translate entire sentences. That would defeat the purpose of getting you to learn in context, even if the individual word translations are not completely accurate.
By clicking the 'reveal' button icon at the top:
We can automatically reveal all the meanings. (Or you can just click the individual blue icons to reveal individual means, which is more common while using the software over time.)
What does this mean? If a word is automatically expanded upon first loading the page, it means that you have not yet memorized the word.
Let's start by interacting with MLL: Let's click on some of the translations to indicate that we have memorized a couple of the most commonly early-learned words:
We have clicked on the translations of the words "yī" (一), and 'zhī' (只):
MLL will process the request and then hide the translations from you, indicating that you have memorized those words. In both cases, MLL will memorize and record memorizations for all occurrences of those words.
The resulting page will look like this with the memorized words hidden:
These words will stay hidden by default the next time you come back to read this story in the future.
Furthermore, if you click the 'refresh' icon after reading for a while, you will notice that the right-hand side of the page has more useful tracking information:
This is kind of self-explanatory, but gives you some rewarding information about exactly what percentage of the words you have memorized.
And, let's say that a week later you have forgotten a particular word, you can simply find the word in the list, click the 'X' and the MLL will record that the word has been forgotten (or un-memorized).
Even if you forget the word, but don't want to remove it from the memorization list, you can just click on the 'black' pinyin readings of the words to expose the original translation without telling MLL to track anything additionally.
Furthermore: Just as in the 'Review' and 'Edit' modes, newly translated stories will automatically track which words you have memorized and which ones you have not memorized without you having to re-select them in the future. As the database grows with more stories, there's no need for you to make additional selections unless your memory changes.
This where the real power comes from: You can now spend much finer granularities of time using the 'Reading' mode of the software as well as having conversations (via Chat mode) with native speakers to help you navigate the stories you are reading while you read more and more complicated texts. The more you read, the more you are forced to speak in your mind, and the better you become!
Let's look at an example in English as a native Spanish speaker would see it, using the same example "Roses are Red" above that we visited in Review mode:
So far, we haven't memorized anything yet. Let's start by expanding the word 'is':
Clicking the translation 'es' in Spanish makes it disappear, indicating that we've memorized it.
Let's expand and memorize a few more words a the end:
Which results in the following statistics in the side panel:
Let's again look at a similar example in English, but instead for a Chinese-native speaker:
Let's expand all of the words at the same time:
Let's click on the word "Violet":
Now, it is memorized:
Edit Mode: (Character-based only)
This mode is equally as import as 'Review' mode: Sometimes, MLL improperly groups the wrong words together, for example in a different story:
This mode is not supported when learning a language like English, because because words are single units of meaning. But in a Character-based language, a single "word" can be composed of multiple picture-based characters. This mode will be disabled if your story is originally written in English.
In the above example, the characters (穿着) are incorrectly grouped together by MLL, and since they are grouped together they have a single, fixed meaning: 'Chuān zhuó', which translates as "attire" a [beautiful flower dress]. This is not correct, and instead the characters should be separated, as in: (穿 着), which instead means: "Chuān zhe", which translates as "wear a" [beautiful flower dress].
Without a native speaker reviewing the story, the language learner might clue in to the fact that 'attire' doesn't make much sense in the story. They could confirm this fact by first clicking on the characters themselves:
After clicking the characters, then you would click 'Instant' at the top of the page and wait for the online translation to complete:
MLL will then do an instant translation online by translating the individual characters in pieces and return an instant result:
As you can see, the individual characters for (穿着) actually translate roughly to: "wear a" [beautiful flower dress] as opposed to "attire" [beautiful flower dress].
Solution: To solve this problem, you would click on the "Split" button at the top of the page selecting only the individual characters for "attire" as their are currently listed.
A dialog will appear confirming that you indeed want to split this word apart into its constituent characters:
Which then results in reloading the story with the correct elements:
A similar problem happens in MLL when an already-separated group of characters were not correctly merged into a single word, as in the following example:
In the above examples, the two characters at the end of the sentence "shí hòu" (时候) should in fact be merged into a single word which instantly translates as "when" (an extension of the word 'time'):
This can easily be fixed in edit mode by, again selecting on the individual words and then selecting the "Merge" icon buttom:
After confirming that you really want to merge them together:
The characters in the story will then read as:
(NOTE: A merge will not occur if MLL cannot find a definition for a merged group of words - the result is that the characters will continue to remain separated).
Finally, MLL is also smart enough to know the history of various merged and split words. By clicking the 'forget' button on the left hand side of the story and re-translating the story (or by translating another story that shares some common characters with the above example), we can see how MLL is learning our editing behavior as we use the software:
As you can see in the story and the legend, we now have the story presented as it was originally translated without the proper edits, but instead, MLL is showing us that the two previous words and characters are "suggested" to be merged and split.
All of these tools together are designed to make the reading experience as accurate as possible and as the language learner gets better and better, they will need to rely less and less on others to correct the story for them and more on themselves.
Notice there is a "Try Recommendations" in both Review Mode and Edit modes. This option will "Bulk" edit or "Bulk" review any words recommended by MLL that were found to need review or editing based on your previous usage history.
This is very useful in large books, like PDFs, where lots of content has a similar, highly-repetitive review/edit history. MLLs analytics can automatically recognized when these kinds of changes need to be repeated.
Chat mode is one of the most exciting features to come out of the MLL system:
Chat in our environment is especially powerful, because now that we are maintaining essentially a model of your brain's vocabulary, we can take that same model and interactively present it back to you in real time simultaneously while you are chatting with peers. This is nothing less than a secret weapon aimed directly at the mundane nature of traditional ways of learning a language. Not only are you able to import content, but that content then forms the basis of assisting your chat experience.
Further more, a all traditional chat histories become stories within themselves, which allow you to revisit them like prose or literature and continue learning without any end.
Let's look at an example of how this works for a native English speaker trying to learn Chinese:
We've developed a custom IME that looks up your message on-the-fly which not only show you the assisted input selections but also your own language learning meta-data too. It's perhaps one of the most amazing realizations that we had when developing the system:
After making the selection and sending the message, we can continue to interact with it just like we were using reading mode of a story:
Similarly, we've just received a message from a buddy showing as a word that we've never learned before:
And we can expand and learn from that word interactively without ever leaving the comfort of the chat window.
Let's look at a chat message session for an Spanish-native learning english:
The IME (input management engine) is not just for character based languages, but assists us in realtime with meta-data from our own database without ever having to leave the comfort of the chat window.
In this message, we don't actually know the Spanish translation of the word 'me', so MLL shows it to us on the fly:
After sending the message, we can interact with individual words in the chat window just as we would with any story in the database.
The future of the MLL system is really limitless. This software could allow for users on the main website to be able to share content with one another, share edit/review history and publish stories they have learned back to the public.
We would eventually like to have a "marketplace", possibly with agreements with publishers where new learners need not necessarily perform any edits or reviews whatsoever and can benefit from a social community that grows an ever-increasing pool of learning material provided by the users themselves, all driven by statistics and analytics where the days or rote-memory education are all but dead.
One user might spend hours, weeks, or months getting very good at a language. We could then convert the analytics data tracked by MLL and apply it to a new learner.
Simiarly, a book publishing company might be able to provide MLL-compatible versions of an E-Book for an educational institution, giving both them or the public another tool to achieve self-paced language study.
The power of this interactive reading format as a way to learn a language can only grow...