Skip to content
Tool(s) to help read Sanskrit (and other) metrical verse
Python HTML Shell
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data Dump everything Jul 13, 2018
history This is going into the writeup Nov 1, 2016
identify diffs Oct 6, 2017
layout
meta me rambling Jul 17, 2015
read just committing for safekeeping; amend and don't push this Jul 11, 2018
static just committing for safekeeping; amend and don't push this Jul 11, 2018
syllabize Adding another planning file from Mar 1 2016 Nov 2, 2016
templates Merge branch 'master' of github.com:shreevatsa/sanskrit Nov 12, 2019
tests This whole code is in flux, just saving the current state for now Oct 10, 2017
texts Dump everything Jul 13, 2018
transliteration Python3 compatibility Oct 11, 2017
utils __future__ statements on one line Jul 19, 2015
version2 Another version2 planning from June 3 2016 Nov 2, 2016
views Dump everything Jul 13, 2018
.gitignore Python3 compatibility Oct 11, 2017
LICENSE Initial commit Oct 15, 2013
README.org Less dissuading Jul 14, 2018
app.yaml just committing for safekeeping; amend and don't push this Jul 11, 2018
deps.sh inkscape seems to be wonky on my system May 7, 2015
design.org Saving notes to myself Jul 8, 2015
display.py
display_test.py Another two simple test cases to illustrate what _Align is supposed t… Nov 1, 2016
favicon.ico
identifier_pipeline.py Include the pattern in debug output Jan 7, 2016
identifier_test.py __future__ statements on one line Jul 19, 2015
index.yaml another place to update the path May 7, 2015
print_utils.py Another thing for Python 3 compatibility Oct 6, 2017
request_handler.py just committing for safekeeping; amend and don't push this Jul 11, 2018
scan.py Python3 compatibility Oct 11, 2017
slp1.py __future__ statements on one line Jul 19, 2015
sscan.py __future__ statements on one line Jul 19, 2015

README.org

NOTE (July 2018): The description below is out-of-date at the moment. It can only be used to get an approximate idea.

Table of contents

What

Code to identify the metre of a Sanskrit verse.

Web version currently serving at http://sanskritmetres.appspot.com/

Can also be used as a Python library.

Examples

In the web version, try the following inputs.

kāṣṭhād agnir jāyate mathya-mānād-
bhūmis toyaṃ khanya-mānā dadāti|
sotsāhānāṃ nāsty asādhyaṃ narāṇāṃ
mārgārabdhāḥ sarva-yatnāḥ phalanti||

or (note that this one intentionally has many typos):

काष्ठाद् अग्नि जायते
मथ्यमानाद्भूमिस्तोय खन्यमाना ददाति।
सोत्साहानां नास्त्यसाध्यं
नराणां मार्गारब्धाः सवयत्नाः फलन्ति॥

If using as a library (TODO: document this better):

import identifier_pipeline

verse = r'''kāṣṭhād agnir jāyate mathya-mānād-
bhūmis toyaṃ khanya-mānā dadāti|
sotsāhānāṃ nāsty asādhyaṃ narāṇāṃ
mārgārabdhāḥ sarva-yatnāḥ phalanti||'''

identifier = identifier_pipeline.IdentifierPipeline()
match_results = identifier.IdentifyFromText(verse)

How

The design of the program is as follows.

Transform the input (Read, Scan)

The input passes through the following representations.

The raw input

This is whatever is typed into the textarea (for the web interface) or given as input to `IdentifierPipeline`. Consider the examples above.

The input in slp1

Whatever the input script (transliteration scheme) used, the input is cleaned up and “read” into a limited Sanskrit alphabet (slp1). For instance, the examples above are read as the following:

kAzWAdagnirjAyatemaTyamAnAd
BUmistoyaMKanyamAnAdadAti
sotsAhAnAMnAstyasADyaMnarARAM
mArgArabDAHsarvayatnAHPalanti

and

kAzWAdagnijAyate
maTyamAnAdBUmistoyaKanyamAnAdadAti
sotsAhAnAMnAstyasADyaM
narARAMmArgArabDAHsavayatnAHPalanti

respectively.

The metrical signature of the input

We next scan the input, to reduce it to a pattern of laghu (denoted L) and guru (denoted G) syllables.

Our two examples above are scanned into the lists:

['GGGGGLGGLGG',
 'GGGGGLGGLGL',
 'GGGGGLGGLGG',
 'GGGGGLGGLGL']

and

['GGGLGLG',
 'GLGGGGGLGLGGLGL',
 'GGGGGLGG',
 'LGGGGGGLLGGLGL']

respectively.

Identify

Finally, we compare this metrical signature (or “pattern lines”) against a database of known patterns.

For example, in our database we have the information that Śālinī is a sama-vṛtta metre consisting of 4 lines (pāda-s / quarters) each having the pattern

GGGG—GLGGLGG

Thus Śālinī is recognized as the (probable, best-guess) metre of the input verse.

Note that in the second example, even though no line matches a line of Śālinī, the program is still clever enough to detect a match.

Look at the README inside the identify directory for more details on the matching heuristics used.

Thus the code can detect partial matches: if there are metrical errors in the verse, but some parts of it are in some metre, then that metre still has a chance of being recognized.

We might also have multiple results when we have multiple metres guessed, such as when different lines are in different metres.

Display

The detected metre is displayed, along with how the verse fits the metre, and information about the metre.

TODO: Describe this.


(Everything below this line needs even more rewriting.)

Code organization

See deps.png for the dependency graph.

Read

Covered by the files in read and their dependencies.

Detecting the transliteration format of the input, removing junk characters that are not part of the verse, and transliterating the input to SLP1 (the encoding we use internally).

Scan

Determining the pattern of gurus and laghus.

The functions in scan.py take this cleaned-up verse, and convert it to a pattern of laghus and gurus. A “pattern” means a sequence over the alphabet {‘L’, ‘G’}.

Identify

Identification algorithm: Given a verse,

  1. Look for the full verse’s pattern in known_metre_patterns.
  2. Loop through known_metre_regexes and see if any match the full verses’s pattern.
  3. Look in known_partial_patterns (then known_partial_regexes) for: – whole verse, – each line, – each half, – each quarter.
  4. [TODO/Maybe] Look for substrings, find closest match, etc.? Might have to restrict to the popular metres for efficiency.

Metrical data

  • A “pattern” means a sequence over the alphabet {‘L’, ‘G’}.
  • A “regex” (for us) is a regular expression that matches some patterns.

(TODO: This is obsolete.) We use the following data structures:

  • known_metre_patterns, a dict mapping a pattern to a MatchResult.
  • known_metre_regexes, a list of pairs (regex, MatchResult).
  • known_partial_patterns, a dict mapping a pattern to MatchResult-s.
  • known_partial_regexes, a list of pairs (regex, MatchResult).

    A MatchResult is usually arrived at by looking at a pattern (or list of patterns), and can be seen as a tuple (metre_name, match_type):

    metre_name - name of the metre, match_type - used to distinguish between matching one pāda (quarter) or one ardha (half) of a metre. Or, in ardha-sama metres, it can distinguish between odd and even pādas.

Display

Display the list of metres found as possible guesses. For vrtta metres, we also try to “align” the input verse to the metre, so that it’s more clear where to break it, etc. (And when the input verse has metrical errors, it’s clear what they are.)

You can’t perform that action at this time.