This repository contains a Unicode version of the current state of the Tibetan manuscript of Dunhuang. It comes from two main sources:
- Old Tibetan Documents Online
- work by Brandon Dotson for his work on Tibetan Kingship
The tags in the text have been removed and the safest option have been taken, but it still contains mistakes and unknown blocks.
The aim is to be able to treat Dunhuang texts as a digital corpus that can be analyzed. The project has been initiated (and material provided) by Nathan Hill for this purpose.
This work is under CC0 license, roughly equivalent to Public Domain.
The text still contains some archaic forms that make it difficult to analyze automatically, but are straightforward to "update":
- འི འོ འང འམ are often (not always) separated from the syllable by a tsheg (་)
- some reversed gigus appear instead of normal gigus
- some anusvara appear instad of མ suffix