Click the button to launch this notebook in Binder: [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/kmadathil/sanskrit_parser/HEAD?filepath=examples%252Fbasic_example.ipynb)


# Sanskrit Parser Examples

The `sanskrit_parser` module supports 3 different usages, in order of increasing complexity:
1. tags - Morphological analysis of a word
2. sandhi - Sandhi split of a phrase
3. vakya - Morpho-syntactic analysis of a sentence (after Sandhi split)

In this notebook, we will see how to use the API to perform the latter two tasks - sandhi splitting and vakya analysis in python code.

Command line usage of the scripts is very similar and is documented [here](https://kmadathil.github.io/sanskrit_parser/build/html/)

## Installation

Sanskrit Parser can be easily installed using `pip`. 

If we are running on Binder, we can skip this step. If not, please uncomment and run one of the cells below.

To directly install from the github repo to get the latest version of the package:


In [1]:
# !pip install git+https://github.com/kmadathil/sanskrit_parser

Alternately, to install from the latest version on pypi, uncomment and run the cell below

In [2]:
# !pip install sanskrit_parser

## Sandhi Splitting

Splitting sandhis in a long phrase/sentence to obtain the constituent words can be done in just a few lines of code. 

First, let's import the `Parser` class that is used for most of the tasks.



In [3]:
from sanskrit_parser import Parser

The `Parser` object supports various options for controlling the parsing, as well as the input and output formats. Here, let us specify that we want output in Devanagari (default is SLP1). The other options available can be seen [here](https://kmadathil.github.io/sanskrit_parser/build/html/sanskrit_parser_api.html#sanskrit_parser.api.Parser)

In [4]:
parser = Parser(output_encoding='Devanagari')

As an example, let us try a long phrase from the चम्पूरामायणम् of भोजः । We will ask the parser to find at most 10 splits.

In [5]:
text = 'तस्मात्समस्तक्षत्रवर्गगर्वपाटनवरिष्ठधारापरश्वधभरणभीषणवेषभार्गवभङ्गादपरिच्छिन्नतरशौर्यशालिनि'
splits = parser.split(text, limit=10)
for split in splits:
    print(f'{split}')

['तस्मात्', 'समस्त', 'क्षत्र', 'वर्ग', 'गर्व', 'पाटन', 'वरिष्ठ', 'धारा', 'परशु', 'अध', 'भरण', 'भीषण', 'वेष', 'भार्गव', 'भङ्गात्', 'अपरि', 'छिन्न', 'तर', 'शौर्य', 'शालि', 'नि']
['तस्मात्', 'समस्त', 'क्षत्र', 'वर्ग', 'गर्व', 'पाटन', 'वरिष्ठ', 'ध', 'आरा', 'परश्वध', 'भरण', 'भीषण', 'वेष', 'भार्गव', 'भङ्गात्', 'अपरि', 'छिन्न', 'तर', 'शौर्य', 'शालि', 'नि']
['तस्मात्', 'समस्त', 'क्षत्र', 'वर्ग', 'गर्व', 'पाटन', 'वरिष्ठ', 'धारा', 'परशु', 'अध', 'भरण', 'भीषण', 'वेष', 'भार्गव', 'भङ्गात्', 'अपरि', 'छिन्न', 'तर', 'शौर्य', 'शालिनि']
['तस्मात्', 'समस्त', 'क्षत्र', 'वर्ग', 'गर्व', 'पाटन', 'वरिष्ठ', 'ध', 'आरा', 'परश्वध', 'भरण', 'भीषण', 'वेष', 'भार्गव', 'भङ्गात्', 'अपरि', 'छिन्न', 'तर', 'शौर्य', 'शालिनि']
['तस्मात्', 'समस्त', 'क्षत्र', 'वर्ग', 'गर्व', 'पाटन', 'वरिष्ठ', 'धारा', 'परश्वध', 'भरण', 'भीषण', 'वेष', 'भार्गव', 'भङ्गात्', 'अपरि', 'छिन्न', 'तर', 'शौर्य', 'शालि', 'नि']
['तस्मात्', 'समस्त', 'क्षत्र', 'वर्ग', 'गर्व', 'पाट', 'न', 'वरिष्ठ', 'धारा', 'परशु', 'अध', 'भरण', 'भीषण', 'वेष', 'भार्गव', 'भङ्गात्'

As we can see, the parser did a decent job of splitting this long phrase, though it does some over splitting. Hopefully, this should point a student in the correct direction.

## Vakya Analysis

Next, let us use the parser for analyzing a sentence and understanding the relationships among the words. We will use a simple sentence to illustrate the parser's capabilities.

In [6]:
sentence = 'देवदत्तः ग्रामं गच्छति'

We can now split the sentence to convert it to the parser's internal representation. Since we know that there is no sandhi in this sentence, we can pass `pre_segmented=True` to indicate this to the parser, and retain just the first split.

In [7]:
split = parser.split(sentence, pre_segmented=True)[0]
print(f'{split}')

['देवदत्तः', 'ग्रामम्', 'गच्छति']


In [8]:
parses = list(split.parse(limit=2))
for i, parse in enumerate(parses):
    print(f'Parse {i}')
    print(f'{parse}')

Parse 0
देवदत्तः => (देवदत्त, ['पुंल्लिङ्गम्', 'एकवचनम्', 'प्रथमाविभक्तिः']) : कर्ता of गच्छति
ग्रामम् => (ग्राम, ['पुंल्लिङ्गम्', 'एकवचनम्', 'द्वितीयाविभक्तिः']) : कर्म of गच्छति
गच्छति => (गम्, ['लट्', 'कर्तरि', 'परस्मैपदम्', 'प्रथमपुरुषः', 'प्राथमिकः', 'एकवचनम्'])
Parse 1
देवदत्तः => (देवदत्त, ['पुंल्लिङ्गम्', 'एकवचनम्', 'प्रथमाविभक्तिः']) : कर्ता of गच्छति
ग्रामम् => (ग्राम, ['नपुंसकलिङ्गम्', 'एकवचनम्', 'द्वितीयाविभक्तिः']) : कर्म of गच्छति
गच्छति => (गम्, ['लट्', 'कर्तरि', 'परस्मैपदम्', 'प्रथमपुरुषः', 'प्राथमिकः', 'एकवचनम्'])


For visualization, the parses can be converted to the GraphViz DOT format.

In [9]:
print(parses[0].to_dot())

digraph  {
"grAmam [grAma, {puMlliNgam, ekavacanam, dvitIyAviBaktiH}] 1";
"gacCati [gam, {law, kartari, parasmEpadam, praTamapuruzaH, prATamikaH, ekavacanam}] 2";
"devadattas [devadatta, {puMlliNgam, ekavacanam, praTamAviBaktiH}] 0";
"gacCati [gam, {law, kartari, parasmEpadam, praTamapuruzaH, prATamikaH, ekavacanam}] 2" -> "devadattas [devadatta, {puMlliNgam, ekavacanam, praTamAviBaktiH}] 0"  [key=0, label=kartA];
"gacCati [gam, {law, kartari, parasmEpadam, praTamapuruzaH, prATamikaH, ekavacanam}] 2" -> "grAmam [grAma, {puMlliNgam, ekavacanam, dvitIyAviBaktiH}] 1"  [key=0, label=karma];
}



We can convert this representation to a picture using any tool that supports the DOT format. Let's use [image-charts.com] which exposes a REST API for generating charts.

In [10]:
from urllib.parse import urlencode
from IPython.display import Image

q = urlencode({'cht': 'gv:dot', 'chl': parses[0].to_dot()})
url = f"https://image-charts.com/chart?{q}"
Image(url=url)

This shows the basic capabilities of the `sanskrit_parser`. For advanced usages, please consult the documentation.