Skip to content

A multilingual parallel corpus created from translations of the Bible.

License

Notifications You must be signed in to change notification settings

panyang/bible-corpus

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 

Repository files navigation

bible-corpus

A multilingual parallel corpus created from translations of the Bible.

Here you can find a multilingual parallel corpus created from translations of the Bible. This an effort to create a parallel corpus containing as many languages as possible that could be used for a number of NLP tasks. Using the Book, Chapter and Verse indices the corpus is aligned (almost) at a sentence level. (There are cases where two verses in one language are translated as one in another).

Following a similar effort by Philip Resnik and Mari Broman Olsen at the University of Maryland, I have encoded the text of each language in XML files using the Corpus Encoding Standard. Refer to the following paper for more details about the creation of the corpus:

Armin Hoenen from the Text Technology Lab at the Goethe Universität, has created tokenised versions of four languages (Chinese, Japanese, Thai, Vietnamese). They are included in this collection but they can also be found here.

Follow this link for a collection of tools for reading/processing the corpus.

About

A multilingual parallel corpus created from translations of the Bible.

Resources

License

Stars

Watchers

Forks

Packages

No packages published