The SARIT library

Welcome to SARIT’s library of Indic e-texts. SARIT is short for “Search and Retrieval of Indic Texts” (cf. http://sarit.indology.info/).

This collection of files is special in that any changes you make to your files can, if you choose so, be shared with other people. So if you correct, annotate, or otherwise improve any file in this directory or one of the directories below it, you might consider sharing your work with others (and hope that they will share their work with you).

We, people administering SARIT and working with it, hope that this will lead to incremental but continuous improvement of these files, and of any files you wish to add. We plan to periodically update SARIT’s search interface with the texts changed or added in this repository. At the moment, this library is an experiment in online collaboration on this sort of material.

What to do next?

Installing git

In order to make full use of the SARIT library, you should install git. You can find all you need to get started here: http://git-scm.com/downloads/. Documentation is available from http://git-scm.com/doc/, and tutorials are listed at the top of http://git-scm.com/doc/ext/.

Getting a Graphical User Interface (GUI)

If you do not like working on a command line, you can use a Graphical User Interface (GUI) for git. The standard GUI that is installed automatically by the steps above is “gitk” on Linux, and “git gui” on mac and windows.

These are stable but rather basic programs, and there are quite a few other options you might want to explore if you’re not happy with them. In the following, we will give examples for Smart Git, a Java program that is free of charge for non-commercial use, and that will run any platform that Java runs on (i.e., Linux, Windows, Mac).

Other options are listed here: http://git-scm.com/downloads/guis/.

Get a copy of the collection you want

Getting the first copy is called “cloning.” Depending on your platform, you can either

Run git clone https://github.com/sarit/SARIT-corpus.git if you are using standard git, or
Consult the documentation of your git interface on how to point it at https://github.com/sarit/SARIT-corpus.git

For Smart Git, do this:

Project –> Clone
Under “Remote Git or SVN repository”, enter the name of the collection, e.g., “https://github.com/sarit/SARIT-corpus.git”, click Next.
Choose a directory where to put your files. This should be somewhere where you usually store your files, i.e., not in “C:\Programs” or so. Click Next.
Accept or choose the project name (SARIT should be the default).

Learn about TEI P5

The full guidelines can be found here: http://www.tei-c.org/Guidelines/P5/, and a simplified version, TEI Lite: http://www.tei-c.org/Guidelines/Customization/Lite/.

The specific guidelines that are used for the SARIT corpus are here: https://sarit.indology.info/sarit-pm/docs/encoding-guidelines-simple.html.

The most recent version is file:./schemas/odd/sarit-guidelines.xml (online at https://github.com/sarit/SARIT-corpus/blob/devel/schemas/odd/sarit-guidelines.xml). Please use that document as the basis for suggesting improvements to the guidelines.

Learn about git

We have found the following introductions very helpful:

Comparing git to a computer game: http://www-cs-students.stanford.edu/~blynn/gitmagic/ (note also the various translations)
Basic git commands, short and to the point: https://www.kernel.org/pub/software/scm/git/docs/giteveryday.html.
More links can be found here: http://git-scm.com/doc/

Where can I get help?

That depends on the area you need help with:

Editing files

You probably do need an XML aware editor. There are many, and with many different features, just as there are many tastes. Some popular choices include:

Jedit, with appropriate plug-ins (runs on all computers where Java is installed)
http://www.oxygenxml.com/ (on all platforms, not free)
Emacs + nxml mode (all platforms, free)

For a list to choose from, cf. http://en.wikipedia.org/wiki/List_of_XML_editors or http://www.xml.com/pub/rg/XML_Editors.

Sharing/organizing files

This is done via the program git: see Learn about git above for links to documentation, and see How does sharing work? below for a general description.

How does sharing work?

General idea

Three steps are involved in sharing these files:

Getting <<what other people changed>>.
Letting other people <<get what you changed>>.
<<Merging the changes>> together.

To do this in an organised fashion, we are using a program called git. It keeps track of changes to the files in this directory, and can `pull’ (point 1 above) and `push’ (point 2 above) from or to another instance of these files likewise controlled by git. What it pushes are the changes you have made to these files, and what it pulls are the changes another person (or a group of other persons) has made to these files.

When it does this, two things can happen:

You changed different parts of a file

When, say, Jane corrects paragraph 1, and Jack corrects paragraph 2 of the same file, git will be able to `merge’ (point 3 above) . So if Jack `pulls’ Jane’s changes, paragraph 1 of his file will automatically be changed to paragraph 1 of Jane’s file. Likewise, if Jane `pulls’ Jack’s changes, her file will automatically be changed in paragraph 2 according to Jack’s changes. So they each edited only one paragraph, but both have the same version of the file now, with both paragraphs corrected.

You changed the same part of a file

In case both Jane and Jack change the same part of a file, git will refuse to `merge’ the files (since it doesn’t know which change is the correct one). In this situation, either Jack or Jane will have to review the other person’s changes, and decide which version to keep (or make a third version that contains the changes of both). After making these changes, git will understand that either Jack or Jane have resolved the conflict, and they can continue to work in the normal fashion.

github specific information

On github, there are two ways in which you can get your changes back to SARIT:

by being a collaborator, or
by creating your own copy of the SARIT library (forking) and telling us about your changes (pull request)

In both cases, you need to sign up on http://github.com.

Collaboration

Please send an email to pma@rdorte.org with your github account name and mentioning that you’d like to collaborate.

Forking and Sending Pull Requests

This is the preferred way to go if you want to be a little more independent from the main SARIT library, e.g., for working on your own set of files, or for general experimentation. Basically, you copy the whole project at a particular moment in its history, and then work independently on that copy. If you are happy with your changes, you can send us a pull request, and we will try to merge your changes back into the main repository.

These two pieces of information will get you started:

forking http://help.github.com/fork-a-repo/
Sending pull requests: http://help.github.com/pull-requests/

What are these XML files?

The files in this directory try to adhere to the Text Encoding Initiatives standards in version P5 (TEI P5). These standards define a vocabulary for describing things about a text: who is its author, which other texts is it referring to, which page of a printed edition is this paragraph on, who is “asya” referring to, etc.

saritcorpus.xml

This file is an exception in that it aggregates all the individual text files into a corpus. Correspondingly, it has teiCorpus as its root element, instead of TEI like all the others.

You can use this to easily operate on the whole corpus, e.g.:

xmllint --encode UTF-8  --xinclude saritcorpus.xml  > /tmp/saritcorpus.xml \
    && \
    jing -c schemas/sarit.rnc /tmp/saritcorpus.xml

Git and XML

Git treats XML files as text files; it does not know anything specific about the `logic’ of the XML files. For example, git would think that the two strings <p xml:id="firstpar" n="1"/> and <p n="1" xml:id="firstpar" /> are pretty different from each other, whereas they probably should be considered the same.

Due to this, there might be some problems when trying to see what (important) changes there are between files. The proper solution for this problem is to compare the normalized versions of the two files that have differences, and base your merge decisions on this.

In order to see what git thinks changed, it’s useful to change git-diff’s understanding of what constitutes a word by changing the word-diff to work on xml tags (and spaces). Taking file:./pramanavarttikavrtti.xml as an example, we could look at various things that changed in the last month in the following ways:

view all changes to xml tags

git diff --word-diff-regex="<[^>]+>|[^[:space:]]" \
--word-diff=porcelain "master@{1 month ago}" HEAD pramanavarttikavrtti.xml | \
egrep "^[-+]<"

The --word-diff=porcelain option makes it really easy to grep through the results. If you leave it out, it’s easier to see the changes in context.

Running this through further grep expressions or sort and uniq filters, you can get a quick overview of what happened.

inspect content changes

To see only what content changed, you could also convert your files into a line based xml representation, like the PyX format.

Assume you’ve encountered a merge conflict for the file:./tattvasangrahapanjika.xml. You could convert the files to pyx formats like this (with the help of http://xmlstar.sourceforge.net/):

git show :2:tattvasangrahapanjika.xml | \
    xmlstarlet c14n | \
    xmlstarlet pyx | \
    grep "^-" | \
    sed 's/^-//g' > /tmp/tsp_ours.pyx

git show :3:tattvasangrahapanjika.xml | \
    xmlstarlet c14n | \
    xmlstarlet pyx | \
    grep "^-" | \
    sed 's/^-//g' > /tmp/tsp_theirs.pyx

You can then compare /tmp/tsp_theirs.pyx and /tmp/tsp_theirs.pyx to find out about the differences in the text portions of the file.

Alternatively, line-based XML representations can also be created with tools found here: http://www.ofb.net/~egnor/xml2/ref. It might look like this:

git show :2:tattvasangrahapanjika.xml | \
    # standardize, sort attributes, and format 
    xmllint --exc-c14n --format | \
    xml2 | \
    # remove uninteresting stuff 
    egrep "^/[^@\\!]+=" | \
    cut -d= -f 2- > /tmp/tsp_ours.xml2

git show :3:tattvasangrahapanjika.xml | \
    xmllint --exc-c14n --format | \
    xml2 | \
    egrep "^/[^@\\!]+=" | \
    cut -d= -f 2- > /tmp/tsp_theirs.xml2

Name		Name	Last commit message	Last commit date
Latest commit History 1,169 Commits
.github/workflows		.github/workflows
bib		bib
out		out
schemas		schemas
tools		tools
transliterated		transliterated
.Dockerfile		.Dockerfile
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.travis.yml		.travis.yml
00-sarit-tei-header-template.xml		00-sarit-tei-header-template.xml
README.html		README.html
README.org		README.org
arunadatta-sarvangasundara.xml		arunadatta-sarvangasundara.xml
astangahrdayasamhita.xml		astangahrdayasamhita.xml
astangasangraha.xml		astangasangraha.xml
astavakragita.xml		astavakragita.xml
asvaghosa-buddhacarita.xml		asvaghosa-buddhacarita.xml
avayavinirakarana.xml		avayavinirakarana.xml
ayurvedasutram.xml		ayurvedasutram.xml
bana-kadambari.xml		bana-kadambari.xml
bandimocana.xml		bandimocana.xml
bhartrhari-vakyapadiya.xml		bhartrhari-vakyapadiya.xml
bhattojidiksita-siddhantakaumudi.xml		bhattojidiksita-siddhantakaumudi.xml
bhelasamhita.xml		bhelasamhita.xml
bhoja-rajamartanda.xml		bhoja-rajamartanda.xml
brahmapurana.xml		brahmapurana.xml
cakradatta.xml		cakradatta.xml
cakrapanidatta-ayurvedadipika.xml		cakrapanidatta-ayurvedadipika.xml
carakasamhita.xml		carakasamhita.xml
caryamelapakapradipa.xml		caryamelapakapradipa.xml
dasarupaka.xml		dasarupaka.xml
dharmottarapradipa.xml		dharmottarapradipa.xml
dharmottaratippanaka.xml		dharmottaratippanaka.xml
diksita-jivanacaritra.xml		diksita-jivanacaritra.xml
dudhadasa-ramasvamedhabhasa.xml		dudhadasa-ramasvamedhabhasa.xml
dvadasara_nayacakra.xml		dvadasara_nayacakra.xml
giridharadasa-bharatibhusana.xml		giridharadasa-bharatibhusana.xml
govindabhagavatpada-rasahrdayatantra.xml		govindabhagavatpada-rasahrdayatantra.xml
hajarilala-srisivavivahakavitavali.xml		hajarilala-srisivavivahakavitavali.xml
jitari-isvaravadimatapariksa.xml		jitari-isvaravadimatapariksa.xml
jitari-jatinirakrti.xml		jitari-jatinirakrti.xml
jitari-nairatmyasiddhi.xml		jitari-nairatmyasiddhi.xml
jitari-sarvajnasiddhi.xml		jitari-sarvajnasiddhi.xml
jitari-vedapramanyasiddhi.xml		jitari-vedapramanyasiddhi.xml
jnanadasa-aparoksanubhava-tika.xml		jnanadasa-aparoksanubhava-tika.xml
jnanasrimitra-nibandhavali.xml		jnanasrimitra-nibandhavali.xml
kautalyarthasastra.xml		kautalyarthasastra.xml
kumarila-tantravarttika.xml		kumarila-tantravarttika.xml
madhusudanasarasvati-atmabodhaprakarana.xml		madhusudanasarasvati-atmabodhaprakarana.xml
mahabharata-devanagari.xml		mahabharata-devanagari.xml
manusmrti.xml		manusmrti.xml
naradasmrti.xml		naradasmrti.xml
nityanatha-rasaratnakara-rasayanakhanda.xml		nityanatha-rasaratnakara-rasayanakhanda.xml
nivajakavi-sakuntala-upakhyana.xml		nivajakavi-sakuntala-upakhyana.xml
nyayakumudacandra.xml		nyayakumudacandra.xml
nyayamanjari.xml		nyayamanjari.xml
nyayasudha.xml		nyayasudha.xml
nyayavarttikatatparyatika.xml		nyayavarttikatatparyatika.xml
patanjalayogasastra.xml		patanjalayogasastra.xml
pramanamimamsa-and-vrtti.xml		pramanamimamsa-and-vrtti.xml
pramanantarbhava.xml		pramanantarbhava.xml
pramanavarttikalankarabhasya.xml		pramanavarttikalankarabhasya.xml
pramanavarttikaparisista-1.xml		pramanavarttikaparisista-1.xml
pramanavarttikasvavrttitika.xml		pramanavarttikasvavrttitika.xml
pramanavarttikavrtti.xml		pramanavarttikavrtti.xml
prasastapada-padarthadharmasangraha.xml		prasastapada-padarthadharmasangraha.xml
pravarasena-setubandha.xml		pravarasena-setubandha.xml
pujyapada-sarvarthasiddhi.xml		pujyapada-sarvarthasiddhi.xml
pyarelala-ragabinoda.xml		pyarelala-ragabinoda.xml
rasarnava.xml		rasarnava.xml
ratnakirti-nibandhavali.xml		ratnakirti-nibandhavali.xml
ratnasritika-dn.xml		ratnasritika-dn.xml
samanyadusana.xml		samanyadusana.xml
sarasvatikanthabharana-dn.xml		sarasvatikanthabharana-dn.xml
saritcorpus.xml		saritcorpus.xml
sarvadarsanasangraha.xml		sarvadarsanasangraha.xml
siddhayoga.xml		siddhayoga.xml
siddhiviniscayatika.xml		siddhiviniscayatika.xml
skandapurana.xml		skandapurana.xml
sridharadasa-saduktikarnamrta.xml		sridharadasa-saduktikarnamrta.xml
susrutasamhita.xml		susrutasamhita.xml
tarkabhasa.xml		tarkabhasa.xml
tarkarahasya.xml		tarkarahasya.xml
tattvasangrahapanjika.xml		tattvasangrahapanjika.xml
trisuli-piyusalahari.xml		trisuli-piyusalahari.xml
trivikrama-nalacampu.xml		trivikrama-nalacampu.xml
tulasidasa-ramayanasatika.xml		tulasidasa-ramayanasatika.xml
ugraditya-kalyanakaraka.xml		ugraditya-kalyanakaraka.xml
vacaspati-tattvavaisaradi.xml		vacaspati-tattvavaisaradi.xml
vadanyayatika.xml		vadanyayatika.xml
vadasthana.xml		vadasthana.xml
vagbhata-rasaratnasamuccaya-comms.xml		vagbhata-rasaratnasamuccaya-comms.xml
vatsyayana-nyayabhasya.xml		vatsyayana-nyayabhasya.xml
vidhiviveka-and-nyayakanika.xml		vidhiviveka-and-nyayakanika.xml
vijnanesvara-mitaksara.xml		vijnanesvara-mitaksara.xml
yogapradipa.xml		yogapradipa.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The SARIT library

What to do next?

Installing git

Getting a Graphical User Interface (GUI)

Get a copy of the collection you want

Learn about TEI P5

Learn about git

Where can I get help?

Editing files

Sharing/organizing files

How does sharing work?

General idea

You changed different parts of a file

You changed the same part of a file

github specific information

Collaboration

Forking and Sending Pull Requests

What are these XML files?

saritcorpus.xml

Git and XML

view all changes to xml tags

inspect content changes

About

Releases

Packages

Contributors 9

Languages

sarit/SARIT-corpus

Folders and files

Latest commit

History

Repository files navigation

The SARIT library

What to do next?

Installing git

Getting a Graphical User Interface (GUI)

Get a copy of the collection you want

Learn about TEI P5

Learn about git

Where can I get help?

Editing files

Sharing/organizing files

How does sharing work?

General idea

You changed different parts of a file

You changed the same part of a file

github specific information

Collaboration

Forking and Sending Pull Requests

What are these XML files?

saritcorpus.xml

Git and XML

view all changes to xml tags

inspect content changes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 9

Languages

Packages