## Semantic Error Analysis of Bangla Simple Sentences

The project is very basic so only two libraries are required. **Pandas** to import the csv data and **Numpy** for the random choice function.

In [155]:
import pandas as pd
import numpy as np

First a Python dictionary is used to store the list of different category of nouns indexed by their types. Same approach is used to store different categories of verbs.

In [156]:
nouns = {
     'মানুষ': [ 'মানুষ', 'রিতা','মিতা', 'সজিব', 'সোহান', 'জামাল', 'কামাল', 'সে', 'তুমি', 'আমি', 'আমরা', 'তোমরা'],
     'পক্ষি' :['পাখি', 'বক', 'ময়না', 'টিয়া', 'ঘুঘু', 'দোয়েল', 'কোকিল']
}

In [157]:
verbs = {
    'পড়' : ['পড়েছিল', 'পড়া', 'পড়বে', 'পড়ল', 'পড়ি', 'পড়ে'],
    'কর' : ['করেছিল', 'করা', 'করবে', 'করল', 'করি', 'করে'],
    'খায়' : ['খাবো', 'খেল', 'খাবে', 'খেয়েছিল', 'খাই', 'খায়']
}

Noun x Verb matrix is created using all the noun types as columns and verb types as the rows. Every correspondin cell entry holds an id to the objects table. Given the noun and verb type, this id can be used to lookup the object table to determine all the suitable objects. 

**Note:** If there is a noun and verb type which are not used together the corresponding entry in the matrix is ' '. (*a single white space*)

The matrix is stored in n_v.csv file wich is read using the read_csv method of pandas. index_col = 0 specifies that the first column is the index for table (so that pandas do not create another auto incremental index column).

In [160]:
n_v_mat = pd.read_csv('n_v.csv', encoding = 'utf8', index_col = 0)
n_v_mat

Unnamed: 0,মানুষ,পক্ষি
পড়,Y21,
কর,Y51,Y53
খায়,Y11,Y13


Similarly objects.csv file holds the information of all valid objects to a noun type, verb type pair. This file has the column names with ids and each column contains all the valid objects for that id. **Here also, all the empty entries are ' ' (one whitespace)** 


*Here not specifying the index_col has resulted in pandas creating a default index column, which is not a problem here.*

In [161]:
objects= pd.read_csv("object.csv", encoding = 'utf8')
objects

Unnamed: 0,Y11,Y13,Y21,Y51,Y53
0,মাংস,পোকামাকড়,বই,পড়াশোনা,কিচির-মিচির
1,সবজি,কেঁচো,কোরআন,খেলা,ডাকা-ডাকি
2,ফল,ফল,বাইবেল,কাজ,শব্দ
3,ভাত,,পত্রিকা,চুরি,
4,মুরগী,,,মারামারি,
5,মাছ,,,ডাকাতি,
6,পিঠা,,,ঝগড়া,


### Semantic Error Function
input: a Bangla Simple Sentence
output: [the subject, the verb and a list of all valid objects for the subject and verb]

        error, a boolean variable {if True: sentence has errors, else: sentence is correct
        
First we go through all the words of the sentence to find the noun(*the_noun variable*) and type of the noun(*subject variable*). Again we repeat the process to find the verb (*this_verb*) and type of the verb (*verb*). We lookup the n_v_matrix to find the id to the objects table. 

 *If the id is ' ' (none) then we print the message that the noun and verb are incompatible and stop further process.*
 
 *Else we use the id to get the list of valid objects from the objects matrix. Now we again traverse all the words to see if the object in the sentence is present in valid objects list.*
 
     -If the objecct is valid then there is no error. 
     -If not there is an error

In [165]:
def semantic_error(sentence):
    subject = ''
    verb = ''
    this_noun = ''
    this_verb = ''
    for word in sentence.split():
        for i, c in nouns.items():
            if word in c:
                subject = i
                this_noun = word
        for i, c in verbs.items():
            if word in c:
                verb = i
                this_verb = word
    valid_object_key = n_v_mat[subject].loc[verb]
    if valid_object_key == ' ':
        print("Error: '" + this_noun + "' and '" + this_verb + "' can not be used together")
        return None, None
    valid_objects = objects[valid_object_key].to_list()
    err = True
    for word in sentence.split():
        if word in valid_objects:
            err = False
    output = [this_noun, this_verb, valid_objects]
    return output, err

In [183]:
sentence = 'জামাল ডাকাতি করে'

If there is an error, a sentence is suggested using the list returned by the semantic_error function. The noun and verb are fixed and an object is randomly picked from the valid objects list suing the numpys random.choice method, which returns an item randomly from an array.

Else the message is printed saying there is no error.

In [184]:
output, error = semantic_error(sentence)
if error != None :
    if error:
        suggested_object = ' '
        while (suggested_object == ' '):
            suggested_object = np.random.choice(output[2])
        suggestion = output[0] + ' ' + suggested_object + ' ' + output[1]
        print('The sentence is not correct. Here is a suggestion: ' + suggestion)
    else:
        print('The sentence is correct')

The sentence is correct
