# Sanskrit Parser Examples

The `sanskrit_parser` module supports 3 different usages, in order of increasing complexity:
1. tags - Morphological analysis of a word
2. sandhi - Sandhi split of a phrase
3. vakya - Morpho-syntactic analysis of a sentence (after Sandhi split)

In this notebook, we will see how to use the API to perform the latter two tasks - sandhi splitting and vakya analysis in python code.

Command line usage of the scripts is very similar and is documented [here](https://kmadathil.github.io/sanskrit_parser/build/html/)

## Installation

Sanskrit Parser can be easily installed using pip. In this notebook, we will directly install from the github repo to get the latest version of the package. 

If you have already installed the sanskrit_parser package, you can skip this step.


In [None]:
!pip install git+https://github.com/kmadathil/sanskrit_parser

Alternately, to install from the latest version on pypi, uncomment and run the cell below

In [None]:
# !pip install sanskrit_parser

## Sandhi Splitting

Splitting sandhis in a long phrase/sentence to obtain the constituent words can be done in just a few lines of code. 

First, let's import the `Parser` class that is used for most of the tasks.



In [1]:
from sanskrit_parser import Parser

The `Parser` object supports various options for controlling the parsing, as well as the input and output formats. Here, let us specify that we want output in Devanagari (default is SLP1). The other options available can be seen [here](https://kmadathil.github.io/sanskrit_parser/build/html/sanskrit_parser_api.html#sanskrit_parser.api.Parser)

In [2]:
parser = Parser(output_encoding='Devanagari')

As an example, let us try a long phrase from the चम्पूरामायणम् of भोजः । We will ask the parser to find at most 10 splits.

In [3]:
text = 'तस्मात्समस्तक्षत्रवर्गगर्वपाटनवरिष्ठधारापरश्वधभरणभीषणवेषभार्गवभङ्गादपरिच्छिन्नतरशौर्यशालिनि'
splits = parser.split(text, limit=10)
for split in splits:
    print(f'{split}')

['तस्मात्', 'समस्त', 'क्षत्र', 'वर्ग', 'गर्व', 'पाटन', 'वरिष्ठ', 'धारा', 'परशु', 'अध', 'भरण', 'भीषण', 'वेष', 'भार्गव', 'भङ्गात्', 'अपरि', 'छिन्न', 'तर', 'शौर्य', 'शालि', 'नि']
['तस्मात्', 'समस्त', 'क्षत्र', 'वर्ग', 'गर्व', 'पाटन', 'वरिष्ठ', 'ध', 'आरा', 'परश्वध', 'भरण', 'भीषण', 'वेष', 'भार्गव', 'भङ्गात्', 'अपरि', 'छिन्न', 'तर', 'शौर्य', 'शालि', 'नि']
['तस्मात्', 'समस्त', 'क्षत्र', 'वर्ग', 'गर्व', 'पाटन', 'वरिष्ठ', 'धारा', 'परशु', 'अध', 'भरण', 'भीषण', 'वेष', 'भार्गव', 'भङ्गात्', 'अपरि', 'छिन्न', 'तर', 'शौर्य', 'शालिनि']
['तस्मात्', 'समस्त', 'क्षत्र', 'वर्ग', 'गर्व', 'पाटन', 'वरिष्ठ', 'ध', 'आरा', 'परश्वध', 'भरण', 'भीषण', 'वेष', 'भार्गव', 'भङ्गात्', 'अपरि', 'छिन्न', 'तर', 'शौर्य', 'शालिनि']
['तस्मात्', 'समस्त', 'क्षत्र', 'वर्ग', 'गर्व', 'पाटन', 'वरिष्ठ', 'धारा', 'परश्वध', 'भरण', 'भीषण', 'वेष', 'भार्गव', 'भङ्गात्', 'अपरि', 'छिन्न', 'तर', 'शौर्य', 'शालि', 'नि']
['तस्मात्', 'समस्त', 'क्षत्र', 'वर्ग', 'गर्व', 'पाट', 'न', 'वरिष्ठ', 'धारा', 'परशु', 'अध', 'भरण', 'भीषण', 'वेष', 'भार्गव', 'भङ्गात्'

As we can see, the parser did a decent job of splitting this long phrase, though it does some over splitting. Hopefully, this should point a student in the correct direction.

## Vakya Analysis

Next, let us use the parser for analyzing a sentence and understanding the relationships among the words. We will use a simple sentence to illustrate the parser's capabilities.

In [4]:
sentence = 'देवदत्तः ग्रामं गच्छति'

We can now split the sentence to convert it to the parser's internal representation. Since we know that there is no sandhi in this sentence, we can pass `pre_segmented=True` to indicate this to the parser, and retain just the first split.

In [5]:
split = parser.split(sentence, pre_segmented=True)[0]
print(f'{split}')

['देवदत्तः', 'ग्रामम्', 'गच्छति']


In [6]:
parses = list(split.parse(limit=2))
for i, parse in enumerate(parses):
    print(f'Parse {i}')
    print(f'{parse}')

Parse 0
देवदत्तः => (देवदत्त, ['एकवचनम्', 'पुंल्लिङ्गम्', 'प्रथमाविभक्तिः']) : कर्ता of गच्छति
ग्रामम् => (ग्राम, ['एकवचनम्', 'पुंल्लिङ्गम्', 'द्वितीयाविभक्तिः']) : कर्म of गच्छति
गच्छति => (गम्, ['प्राथमिकः', 'एकवचनम्', 'कर्तरि', 'लट्', 'प्रथमपुरुषः', 'परस्मैपदम्'])
Parse 1
देवदत्तः => (देवदत्त, ['एकवचनम्', 'पुंल्लिङ्गम्', 'प्रथमाविभक्तिः']) : कर्ता of गच्छति
ग्रामम् => (ग्राम, ['नपुंसकलिङ्गम्', 'एकवचनम्', 'द्वितीयाविभक्तिः']) : कर्म of गच्छति
गच्छति => (गम्, ['प्राथमिकः', 'एकवचनम्', 'कर्तरि', 'लट्', 'प्रथमपुरुषः', 'परस्मैपदम्'])


For visualization, the parses can be converted to the GraphViz dot format.

In [7]:
print(parses[0].to_dot())

digraph  {
"devadattas [devadatta, {ekavacanam, puMlliNgam, praTamAviBaktiH}] 0";
"gacCati [gam, {prATamikaH, ekavacanam, kartari, law, praTamapuruzaH, parasmEpadam}] 2";
"grAmam [grAma, {ekavacanam, puMlliNgam, dvitIyAviBaktiH}] 1";
"gacCati [gam, {prATamikaH, ekavacanam, kartari, law, praTamapuruzaH, parasmEpadam}] 2" -> "devadattas [devadatta, {ekavacanam, puMlliNgam, praTamAviBaktiH}] 0"  [key=0, label=kartA];
"gacCati [gam, {prATamikaH, ekavacanam, kartari, law, praTamapuruzaH, parasmEpadam}] 2" -> "grAmam [grAma, {ekavacanam, puMlliNgam, dvitIyAviBaktiH}] 1"  [key=0, label=karma];
}



We can convert the DOT representation to a graph. Below is an online DOT visualizer. Copy/paste the DOT to the left-hand window to see the graph in the righ-hand window.

In [8]:
%%html
<iframe src="https://dreampuf.github.io/GraphvizOnline/" width="1200" height="500"></iframe>

This shows the basic capabilities of the `sanskrit_parser`. For advanced usages, please consult the documentation.