Skip to content

garfieldnate/syuwa-nihongo-daijiten-extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data extractor for 手話・日本語大辞典 (JSL Dictionary)

This repo contains scripts relate to extracting the treasure trove of data from this Japanese Sign Language dictionary. Data I would like to extract are: Japanese glosses, hand shapes, locations and movements, a prose description of the sign and the illustration.

What does the data look like?

Obviously I can't upload this information or the PDF's I'm working with, but you'll find sample pages under sample_data. The dictionary has three types of entries:

  • one-handed signs
  • two-handed signs with the same shape for both hands
  • two-handed signs with different shapes for each hand.

What do you plan to do with the data?

I want to play around with newer NLP technologies and multi-modal learning. Some random ideas I have:

  • (simplest) generate illustrations from movement descriptions or vise-versa
  • collect hand/face/body position data and train a model to convert the dictionary data into the control info for an avatar
  • figure out a writing system and automatically generate spellings

About

extract JSL dataset from 手話・日本語大辞典

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages