This repo contains scripts relate to extracting the treasure trove of data from this Japanese Sign Language dictionary. Data I would like to extract are: Japanese glosses, hand shapes, locations and movements, a prose description of the sign and the illustration.
Obviously I can't upload this information or the PDF's I'm working with, but you'll find sample pages under sample_data
. The dictionary has three types of entries:
- one-handed signs
- two-handed signs with the same shape for both hands
- two-handed signs with different shapes for each hand.
I want to play around with newer NLP technologies and multi-modal learning. Some random ideas I have:
- (simplest) generate illustrations from movement descriptions or vise-versa
- collect hand/face/body position data and train a model to convert the dictionary data into the control info for an avatar
- figure out a writing system and automatically generate spellings