Skip to content

Latest commit

 

History

History
71 lines (40 loc) · 5.69 KB

README.md

File metadata and controls

71 lines (40 loc) · 5.69 KB

JapaneseGraph

JapaneseGraph is a tool for language learners. It represents the Japanese language as a graph, in which individual characters are the nodes, and the edges are words. As a concrete example, and would each represent a node, and 反応 would be the edge that connects them.

This branch is for Japanese. The main branch, and the repo name (HanziGraph), come from the Chinese version. The two codebases are being consolidated, as the functionality is nearly identical, with the data being the main differentiator.

Demo

japanese-graph-demo.mov

Features

The nodes and edges have data associated with them. Specifically, each node and edge has:

  • Usage frequency data, which takes the form of color coding (red: very frequent; blue: less frequent). Word frequency can also be substituted with color coding by JLPT level.
  • Definitions, from JMDict.
  • Human-generated example sentences, sorted by average word frequency, from tatoeba.
  • For words not present in tatoeba's corpus, AI generated examples may be used in the future (note that AI examples are already present on the Chinese version).

Word Relationships

In addition to character relationships expressed through a graph structure, the tool uses collocation data to show how words relate to one another. It expresses those relationships with sankey diagrams. These diagrams can also be thought of as a graph, where each node is a word and each edge is a collocation, with the edge weight representing frequency of use. One example would be 舞踏 commonly being succeded by . In this case, 舞踏 and are nodes, and the weight of their connecting edge represents the frequency of the collocation 舞踏会.

Recommending Characters

Based on which nodes and edges a user visits, the tool is able to recommend related characters. It bases these recommendations on the user having seen connected characters, with more-common characters prioritized. For example, if a user has viewed , , and , might be recommended, since it is connected to what the learner has already seen (via 大変, 変化, and 大学) and is very common.

Flashcards

Flashcards can be created from the definitions and example sentences, and either studied in the tool or exported to Anki. The flashcards test both recognition (translating from Japanese to English) and recall (translating from English to Chinese); cloze cards (fill in the blank) are also made. When a new word is being studied, it should often be studied in several contexts, so up to 10 cards are made for a single word or character.

As seen on...

You can also see the reddit discussion or the discussion on tofugu (it also made it onto their curated list for summer 2022). It was also recommended by The Japan Foundation, Sydney.

The Chinese version was recommended on the You Can Learn Chinese podcast and on HackingChinese.

Running the code

Running the main branch code is intended to be extremely simple. There is no backend; the entire app runs in-browser. Setup is therefore as simple as:

git clone https://github.com/mreichhoff/HanziGraph.git
cd HanziGraph
# Assuming python is installed, though any basic web server would do; it's just viewing files.
python3 -m http.server

after which you can use the app in your browser, e.g. at localhost:8000.

Note that some of the larger data files are partitioned to avoid excessive memory use or network bandwidth (while also avoiding huge numbers of files).

To run the code in this branch, you can modify the firebase-init.js file with your own firebase project config, run rollup -c rollup.config.ts, and then start the firebase emulators.

Project Status

The webapp is still a prototype, but it is functional and can be installed as a PWA or used on the web.

Upcoming work includes paying down the (rather substantial) technical debt in the code, moving to lit, and further consolidating the supported datasets (Japanese, Mandarin (simplified and traditional), and Cantonese).

Acknowledgements

he examples came from Tatoeba, which releases data under CC-BY 2.0 FR, and from OpenSubtitles, pulled from opus.nlpl.eu.

Definitions were pulled from JMDict; links to their license terms are available on that page.

See the main branch README for more details.