# IPL CHATBOT

This is a Q&A ChatBot based on NLP statistics which responds to statistical based Questions on Indian Premier League Season 1 (2008). This model is based on supervised learning algorithms with limited approach to user questions. This Bot implements:
- NLP (Cleaning, lemmatizing, Chunking, Basic POS-tagging)
- Brill Tagger for Custom POS tagging based on learnings
- Feature extraction based on training/learning
- Naive Bayes Classifer for classifying the content(subject) of the question
- Statistical Methods to generate stats from dataset
- Answer formulation for reply-formatting

---

## Imports

In [2]:
# Importing Built-in libs
import sys
import os
import math
import nltk
import re
import random
import json

# Importing Standard Packages
import numpy as np
import pandas as pd

# NLTK
from nltk.tag import SequentialBackoffTagger
from nltk.tokenize import word_tokenize, sent_tokenize, PunktSentenceTokenizer, RegexpTokenizer
from nltk.stem import PorterStemmer, WordNetLemmatizer
from nltk.corpus import stopwords, state_union
from nltk.wsd import lesk
from nltk.tag import UnigramTagger, BigramTagger, BrillTagger, brill, BrillTaggerTrainer
from nltk.chunk import ne_chunk
from nltk.data import load

----

## Data sets

- Data set = matches.csv, deliveries.csv from UCI

In [3]:
matches = pd.read_csv('Data/matches.csv')
deliveries = pd.read_csv('Data/deliveries.csv')

In [4]:
matches.head(1)

Unnamed: 0,id,season,city,date,team1,team2,toss_winner,toss_decision,result,dl_applied,winner,win_by_runs,win_by_wickets,player_of_match,venue,umpire1,umpire2
0,1,2008,Bangalore,2008-04-18,Kolkata Knight Riders,Royal Challengers Bangalore,Royal Challengers Bangalore,field,normal,0,Kolkata Knight Riders,140,0,BB McCullum,M Chinnaswamy Stadium,Asad Rauf,RE Koertzen


In [5]:
deliveries.head(1)

Unnamed: 0,match_id,inning,batting_team,bowling_team,over,ball,batsman,non_striker,bowler,is_super_over,...,bye_runs,legbye_runs,noball_runs,penalty_runs,batsman_runs,extra_runs,total_runs,player_dismissed,dismissal_kind,fielder
0,1,1,Kolkata Knight Riders,Royal Challengers Bangalore,1,1,SC Ganguly,BB McCullum,P Kumar,0,...,0,1,0,0,0,1,1,,,


----

# Processes 

    - The overall process contains of 6 steps. 
    - The main method is at the bottom.

## 1. Brill-Tagging (Custom trained POS-Tagging)

This is used to form rules of POS tagging after training the Brill Tagger on a data-set consisting of probable questions.

- Data set = List of Questions (with POS-Tagged words)
- Trains on a data-set containing a list of questions related to IPL ChatBot.
- These Questions are tokenized into POS-tagged words.
- These POS-tagged words are converted into tuples and fetched in brill-tagger

In [15]:
def Brill_Tagging():
    
    # Loading POS-Tagged training words (Contains a list of POS-tagged words: "Hello/VBD world/NN how/VB ...")
    with open('Data/tagged_training_sentences.txt') as tagged_sentence:
        Tagged_training_list = tagged_sentence.read()
        
    #############################################################################
    # Methods for Brill Tagging
    #############################################################################
    
    # Custom Pos Tagging
    class Custom_POS_Tagger(SequentialBackoffTagger):
        def __init__(self, *args, **kwargs):
            SequentialBackoffTagger.__init__(self, *args, **kwargs)
        def choose_tag(self, tokens, index, history):
            word = tokens[index]
            return nltk.pos_tag([word])[0][1] if word != "" else None
    custom_pos_tagger = Custom_POS_Tagger()
    
    # Converts the list(POS-tagged training words) into tuples
    def transform_str2tuple(tagged_sentence):
        tagged_sentence_tuple = []
        sentences = tagged_sentence.split("\n")
        for sentence in sentences:
            tagged_question = []
            for word in word_tokenize(sentence):
                tagged_question.append(nltk.str2tuple(word))
            tagged_sentence_tuple.append(tagged_question)
        return tagged_sentence_tuple
    
    # Brill-Taggs, takes POS-tagged training tuples as Inputs, Returns rules for tagging
    def get_brill_tagger(tagged_sentences):
    
        templates = [brill.Template(brill.Pos([1,1])), brill.Template(brill.Pos([2,2])), brill.Template(brill.Pos([1,2])),
                     brill.Template(brill.Pos([1,3])), brill.Template(brill.Pos([1,1])), brill.Template(brill.Pos([2,2])),
                     brill.Template(brill.Pos([1,2])), brill.Template(brill.Pos([1,3])), brill.Template(brill.Word([-1, -1])),
                     brill.Template(brill.Word([-1, -1]))]

        trainer_initial_pos = BrillTaggerTrainer(initial_tagger=custom_pos_tagger, templates=templates, trace=3, deterministic=True)
        brill_tagger = trainer_initial_pos.train(tagged_sentences, max_rules=10)

        return brill_tagger
    
    # Calling Transform list to tuples
    Tagged_training_tuples = transform_str2tuple(Tagged_training_list)
    
    # Calling Brill tagger
    brill_tagg_rules = get_brill_tagger(Tagged_training_tuples)
    
    return brill_tagg_rules

#### Brill-Tagger Rules

In [16]:
brill_tagger = Brill_Tagging()

TBL train (fast) (seqs: 116; tokens: 1177; tpls: 10; min score: 2; min acc: None)
Finding initial useful rules...
    Found 624 useful rules.

           B      |
   S   F   r   O  |        Score = Fixed - Broken
   c   i   o   t  |  R     Fixed = num tags changed incorrect -> correct
   o   x   k   h  |  u     Broken = num tags changed correct -> incorrect
   r   e   e   e  |  l     Other = num tags changed incorrect -> incorrect
   e   d   n   r  |  e
------------------+-------------------------------------------------------
  92  92   0   0  | .->None if Pos:None@[1]
  92  92   0   0  | None-> if Pos:.@[1]
  92  92   0   0  | .->None if Word:@[-1]
  30  38   8   9  | NN->NNP if Pos:NN@[1]
  14  14   0   0  | NN->NNP if Pos:POS@[1,2]
  14  14   0   0  | VBN->VBD if Word:who@[-1]
   8  11   3   0  | NN->NNP if Word:by@[-1]
   5   6   1   0  | RBR->JJR if Pos:NN@[1]
   5   6   1   0  | VBD->VBZ if Word:much@[-1]
   4   4   0   2  | NN->NNP if Pos:VB@[1]


---

## 2. Name Autofill/Extension

This is used to convert a user "word" into a known word (autofill/complete a word) that can be used to run the Bot.

- Uses a stored list of complete words to extend the user "word"
- To identify the user "word" by the Bot for correct classification and processing.

In [17]:
# Returns full/complete Player, Batsman, Bowler, Team name -

def function_return_fullName(chunked_words):
    batsman_list = list(deliveries.batsman.unique())
    bowler_list = list(deliveries.bowler.unique())
    fielder_list = list(deliveries.fielder.unique())
    temp = {'player':[],'team':[]}
    teams_abbr = [ ('Kolkata Knight Riders', 'kolkata knight riders', 'kolkata', 'kolkata riders', 'kolkata rider', 'kolkata knights', 'kolkata knight', 'knight riders', 'knight rider', 'riders', 'k k riders', 'k knight riders', 'kkr'), 
               ('Chennai Super Kings', 'chennai super kings', 'chennai', 'chennai kings', 'chennai super', 'super kings', 'chennai kings', 'csk'),  
               ('Rajasthan Royals', 'rajasthan royals', 'rajasthan', 'rajasthan royal', 'rr'),
               ('Mumbai Indians', 'mumbai indians', 'mumbai', 'mumbai indian', 'indians', 'indian', 'mi'), 
               ('Deccan Chargers', 'deccan chargers', 'deccan', 'deccan charger', 'chargers', 'charger', 'dc'), 
               ('Kings XI Punjab', 'kings xi punjab', 'kings', 'punjab', 'kings XI', 'kings punjab',  'kxip','kp', 'kxp'), 
               ('Royal Challengers Bangalore', 'royal challengers bangalore', 'bangalore', 'royal challengers', 'royal challenger', 'royal bangalore', 'challengers bangalore', 'challenger bangalore', 'rcb', 'rb'),
               ('Delhi Daredevils', 'delhi daredevils', 'delhi', 'daredevils', 'delhi daredevil', 'dd') ]
               
    for data in chunked_words:
        for w in (batsman_list or bowler_list or fielder_list):
            if w.lower() == data.lower():
                temp['player'].append(w)
            elif str(w.split()[len(w.split())-1]).lower() == data.lower():
                temp['player'].append(w)
        for w in teams_abbr:
            if data.lower() in w:
                temp['team'].append(w[0])         

    return temp

---

## 3. Text Classifier (Naive Bayes)

This is used to classify the "subject" of the Question asked into desired classes.

- Data set = List of Questions
- Classifies the content of the subject of the Question.
- Classifier used is Naive Bayes Classifer based on Bayes Theorem.
- Classifier Classes : 'runs', 'max_runs', 'min_runs', 'total_runs', 'fours,sixes', 'bowler_wickets', 'bowler_balls', 'bowler_runs'

#### Loading "Questions" Dataset for training the classifier

In [23]:
# Contains a Dataframe with Questions

train_sentences = pd.read_csv("Data/training_sentences_classifier.csv")
train_sentences.head()

Unnamed: 0.1,Unnamed: 0,sentence,label
0,0,Total runs scored by SC Ganguly in match 5?,runs
1,1,SC Ganguly score in match 1?,runs
2,2,how many runs did Ganguly score in match 2?,runs
3,3,Sachin's score in 4th match?,runs
4,4,how much did McCullum scored in match 3?,runs


#### Feature Extraction

In [24]:
#  features.json  - This file contains a list of keywords (like 'runs', 'matches', 'teams') which can be probable features. 
#                   It contains these keywords along with their intial count set to '0'. This count will be incremented 
#                   based on frequency of these keywords found in the training set. 
#
#  Feature Extraction Process - If the keyword 'runs' is found 50 times in the training set, 
#                               the feature_set will look like {'runs':50, 'matches': 0, 'teams':0}. Thus this method will 
#                               return this features list along with their found frequency of occurance in the training set.
#

def feature_extractor(words):
    
    # Loading 'features.json' file
    with open('Data/features.json') as features:
        features = json.load(features)
    
    # Frequency Distribution of words in Training Set
    word_counts = nltk.Counter(words)
    
    # Updating 'features' with found counts
    for word in word_counts:
        if word in features:
            features[word] = word_counts[word]
            
    return features

#### Classifier

In [25]:
#   Naive Bayes - This theorem is based on Naive Bayes Theorem. It observes the frequency distribution of words 
#                 occuring in the Training set and classifies on the basis of it into major classes.

naive_bayes_classifier = nltk.classify.naivebayes.NaiveBayesClassifier.train([(feature_extractor(nltk.word_tokenize(sentence)), label) for index, sentence, label in train_sentences.values])

---

## 4. Methods for Replying to  User Question/Query

This class contains all the methods for processing the questions.

- Data set = Main Datasets (matches.csv , deliveries.csv)
- Once the User Question is Processed, Cleaned, Tagged and Classified, it is passed to one of the methods in this class.
- Returns the output from that method and displays to the user

In [26]:
#############################################################################
#
#  Class defining various functions
#  Function Definiton ::
#           Attributes - Input(Batsman, Bowler, Match, Team, Postion)
#           Returns    - Output(Batsman, Bowler, Runs, Match, Team, Postion)
#           Key        - B = Batsman;  M = Match;  R = Runs;  Team = T;  Wickets = W; Position=i   
#############################################################################

class Executors:
    
    # Reading Data-sets
    def __init__(self):
        self.matches = pd.read_csv("Data/matches.csv")
        self.deliveries = pd.read_csv("Data/deliveries.csv")
        
    #############################################################################
    # -- BATSMAN FUNCTIONS --
    #############################################################################

    # Total Runs scored by 'B' Batsman in 'M' Match
    def runs_batsman_match(self, batsman_name, match_id):
        runs = self.deliveries.groupby(['match_id', 'batsman'])['batsman_runs'].sum()[match_id][batsman_name]
        return {'batsman':batsman_name, 'runs':runs, 'match':match_id}

    # Total Runs scored in 'M' Match by 'T' Team
    def total_runs_team_match(self, team, match_id):
        runs = self.deliveries.groupby(['match_id','batting_team'])['total_runs'].sum()[match_id][team]  
        return {'team':team, 'runs':runs, 'match':match_id}

    # Max scorer in a 'M' Match
    def max_score_batsman_match(self, match_id):
        x = self.deliveries.groupby(['match_id', 'batsman'])['batsman_runs'].sum()
        name, runs = x[match_id].idxmax(), x[match_id].max()
        return {'batsman':name, 'runs':runs, 'match':match_id}

    # Min scorer in a 'M' Match
    def min_score_batsman_match(self, match_id):
        x = self.deliveries.groupby(['match_id', 'batsman'])['batsman_runs'].sum()
        name, runs = x[match_id].idxmin(), x[match_id].min()
        return {'batsman':name, 'runs':runs, 'match':match_id}

    # Max scorer in a 'M' Match by 'T' Team
    def max_score_batsman_match_inTeam(self, match_id, team):
        x = self.deliveries.groupby(['match_id','batting_team', 'batsman'])['batsman_runs'].sum()
        name, runs = x[match_id][team].idxmax(), x[match_id][team].max()    
        return {'batsman':name, 'team':team, 'runs':runs, 'match':match_id}

    # Min scorer in a 'M' Match by 'T' Team
    def min_score_batsman_match_inTeam(self, match_id, team):
        x = self.deliveries.groupby(['match_id','batting_team', 'batsman'])['batsman_runs'].sum()
        name, runs = x[match_id][team].idxmin(), x[match_id][team].min()
        return {'batsman':name, 'team':team, 'runs':runs, 'match':match_id}

    # Max Scorer in all matches (ORANGE CAP)
    def highest_scorer(self):
        x = self.deliveries.groupby('batsman')['batsman_runs'].sum().sort_values(ascending =False).iloc[0:1]
        name, runs = x.index[0], x.values[0]
        return {'batsman':name, 'runs':runs}

    # Total Runs by a 'B' Batsman
    def total_runs_batsman_IPL(self, batsman):
        runs = self.deliveries.groupby(['batsman'])['batsman_runs'].sum()[batsman]
        return {'batsman':batsman, 'runs':runs}

    # Total Runs by a 'T' Team in all matches
    def total_runs_team_IPL(self, team):
        runs = self.deliveries.groupby(['batting_team'])['total_runs'].sum()[team]
        return {'team':team, 'runs':runs}

    # Dot balls faced by a 'B' Batsman in a 'M' Match
    def dot_balls_batsman_match(self, batsman, match_id):
        balls = self.deliveries[(self.deliveries['batsman'] == batsman) & (self.deliveries['match_id'] == match_id) & (self.deliveries['total_runs'] == 0)].shape[0]
        return {'batsman':batsman, 'dot_balls':balls, 'match':match_id}

    # No of 4's by a 'B' Batsman in a 'M' Match
    def b_4_batsman_match(self, batsman, match_id):
        runs_4 = self.deliveries[(self.deliveries['batsman'] == batsman) & (self.deliveries['match_id'] == match_id) & (self.deliveries['total_runs'] == 4)].shape[0]
        return {'batsman':batsman, 'fours':runs_4, 'match':match_id}

    # No of 6's by a 'B' Batsman in a 'M' Match
    def b_6_batsman_match(self, batsman, match_id):
        runs_6 = self.deliveries[(self.deliveries['batsman'] == batsman) & (self.deliveries['match_id'] == match_id) & (self.deliveries['total_runs'] == 6)].shape[0]
        return {'batsman':batsman, 'sixes':runs_6, 'match':match_id}
    
    # No of 4's by a 'T' Team in a 'M' Match
    def team_fours(self, match_id, batting_team):
        team_fours_count = self.deliveries[self.deliveries.batsman_runs == 4].groupby(['match_id','batting_team']).count()['inning']
        return {'fours':team_fours_count, 'team':batting_team, 'match':match_id}

    # No of 6's by a 'T' Team in a 'M' Match
    def team_sixes(self, match_id, batting_team):
        team_sixes_count = self.deliveries[self.deliveries.batsman_runs == 6].groupby(['match_id','batting_team']).count()['inning']
        return {'sixes':team_sixes_count, 'team':batting_team, 'match':match_id}

    # Total 4's by 'B' Batsman 
    def overall_fours_count(self, batsman):
        fours_count = self.deliveries[self.deliveries.batsman_runs == 4].groupby('batsman').count()['inning']
        return {'batsman':batsman, 'fours':fours_count}
    
    # Total 6's by 'B' Batsman
    def overall_sixes_count(self, batsman):
        sixes_count = self.deliveries[self.deliveries.batsman_runs == 6].groupby('batsman').count()['inning']
        return {'batsman':batsman, 'sixes':sixes_count}
   
    # Max 4's by a Batsman
    def most_fours_count(self):
        fours_b = self.deliveries[self.deliveries.batsman_runs == 4].groupby('batsman').count()['inning'].sort_values(ascending = False).iloc[0:1]
        batsman, fours = fours_b.index[0], fours_b.values[0]
        return {'batsman':batsman, 'fours':fours}

    # Max 6's by a Batsman
    def most_sixes_count(self):
        sixes_b = self.deliveries[self.deliveries.batsman_runs == 6].groupby('batsman').count()['inning'].sort_values(ascending = False).iloc[0:1]
        batsman, sixes = sixes_b.index[0], sixes_b.values[0]
        return {'batsman':batsman, 'sixes':sixes}
           
    # Strike Rate of a 'B' Batsman in a 'M' Match
    def strikeRate_batsman_match(self, batsman, match_id):
        runs = runs_batsman_match(batsman, match_id)
        balls = self.deliveries[(self.deliveries['batsman'] == batsman) & (self.deliveries['match_id'] == match_id) & (self.deliveries['wide_runs'] == 0)].shape[0]
        return runs/balls * 100.0

    # Max Strike Rate 'nth' position in all matches
    def total_strike_rate_IPL(self, i):
        return ((self.deliveries.groupby('batsman')['batsman_runs'].sum()/self.deliveries[(self.deliveries.wide_runs == 0)].groupby('batsman')['inning'].count())*100.0).sort_values(ascending = False).iloc[i:i+1]
    
    
    #############################################################################
    # -- BOWLING FUNCTIONS --
    #############################################################################
    
    def overall_economy_rate_by_bowler(self, match_id=0, team=None, bowler=None):
        bowler_eco = []
        if(match_id==0):
            if(team is not None):
                runs_conceded = self.deliveries[self.deliveries['bowling_team'] == team].total_runs.sum()
                balls = (self.deliveries[(self.deliveries['bowling_team'] == team) & (self.deliveries['wide_runs'] == 0) & (self.deliveries['is_super_over'] == 0) & (self.deliveries['noball_runs']==0)]).ball.count()
                dot_balls = (self.deliveries[(self.deliveries['bowling_team'] == team) & (self.deliveries['wide_runs'] == 0) & (self.deliveries['is_super_over'] == 0) & (self.deliveries['noball_runs']==0)& (self.deliveries['total_runs']==0)]).ball.count()
                overs = float(int(balls/6) + float(balls%6)/10)
                frac, whole = math.modf(overs)
                total = whole + frac*10/6
                economy_rate = runs_conceded/total
                bowler_eco.append((team, economy_rate, balls, overs, dot_balls))
            elif (bowler == None): 
                bowlers = self.deliveries.bowler.unique()
                for bowler in bowlers:
                    runs_conceded = self.deliveries[self.deliveries['bowler'] == bowler].total_runs.sum()-deliveries[self.deliveries['bowler'] == bowler].bye_runs.sum()-deliveries[self.deliveries['bowler'] == bowler].legbye_runs.sum() 
                    balls = (self.deliveries[(self.deliveries['bowler']== bowler) & (self.deliveries['wide_runs'] == 0) & (self.deliveries['is_super_over'] == 0) & (self.deliveries['noball_runs']==0)]).ball.count()
                    dot_balls = (self.deliveries[(self.deliveries['bowler']== bowler) & (self.deliveries['wide_runs'] == 0) & (self.deliveries['is_super_over'] == 0) & (self.deliveries['noball_runs']==0)& (self.deliveries['total_runs']==0)]).ball.count()
                    overs = float(int(balls/6) + float(balls%6)/10)
                    frac, whole = math.modf(overs)
                    total = whole + frac*10/6
                    economy_rate = runs_conceded/total
                    bowler_eco.append((bowler, economy_rate, balls, overs, dot_balls))
            else:
                runs_conceded = self.deliveries[self.deliveries['bowler'] == bowler].total_runs.sum()-deliveries[self.deliveries['bowler'] == bowler].bye_runs.sum()-deliveries[self.deliveries['bowler'] == bowler].legbye_runs.sum()
                balls = (self.deliveries[(self.deliveries['bowler']== bowler) & (self.deliveries['wide_runs'] == 0) & (self.deliveries['is_super_over'] == 0) & (self.deliveries['noball_runs']==0)]).ball.count()
                dot_balls = (self.deliveries[(self.deliveries['bowler']== bowler) & (self.deliveries['wide_runs'] == 0) & (self.deliveries['is_super_over'] == 0) & (self.deliveries['noball_runs']==0)& (self.deliveries['total_runs']==0)]).ball.count()
                overs = float(int(balls/6) + float(balls%6)/10)
                frac, whole = math.modf(overs)
                total = whole + frac*10/6
                economy_rate = runs_conceded/total
                bowler_eco.append((bowler, economy_rate, balls, overs, dot_balls))
        else:
            if (team is not None):
                runs_conceded = self.deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowling_team'] == team)].total_runs.sum()
                balls = (self.deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowling_team'] == team) & (self.deliveries['wide_runs'] == 0) & (self.deliveries['is_super_over'] == 0) & (self.deliveries['noball_runs']==0)]).ball.count()
                dot_balls = (self.deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowling_team'] == team) & (self.deliveries['wide_runs'] == 0) & (self.deliveries['is_super_over'] == 0) & (self.deliveries['noball_runs']==0)& (self.deliveries['total_runs']==0)]).ball.count()
                overs = float(int(balls/6) + float(balls%6)/10)
                frac, whole = math.modf(overs)
                total = whole + frac*10/6
                economy_rate = runs_conceded/total
                bowler_eco.append((team, economy_rate, balls, overs, dot_balls))
            elif (bowler == None): 
                bowlers = self.deliveries[self.deliveries['match_id'] == match_id].bowler.unique()
                for bowler in bowlers:
                    runs_conceded = self.deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowler'] == bowler)].total_runs.sum()-deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowler'] == bowler)].bye_runs.sum()-deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowler'] == bowler)].legbye_runs.sum()
                    balls = (self.deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowler']== bowler) & (self.deliveries['wide_runs'] == 0) & (self.deliveries['is_super_over'] == 0) & (self.deliveries['noball_runs']==0)]).ball.count()
                    dot_balls = (self.deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowler']== bowler) & (self.deliveries['wide_runs'] == 0) & (self.deliveries['is_super_over'] == 0) & (self.deliveries['noball_runs']==0)& (self.deliveries['total_runs']==0)]).ball.count()
                    overs = float(int(balls/6) + float(balls%6)/10)
                    frac, whole = math.modf(overs)
                    total = whole + frac*10/6
                    economy_rate = runs_conceded/total
                    bowler_eco.append((bowler, economy_rate, balls, overs, dot_balls))
            else:
                runs_conceded = self.deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowler'] == bowler)].total_runs.sum()-deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowler'] == bowler)].bye_runs.sum()-deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowler'] == bowler)].legbye_runs.sum()
                balls = (self.deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowler']== bowler) & (self.deliveries['wide_runs'] == 0) & (self.deliveries['is_super_over'] == 0) & (self.deliveries['noball_runs']==0)]).ball.count()
                dot_balls = (self.deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowler']== bowler) & (self.deliveries['wide_runs'] == 0) & (self.deliveries['is_super_over'] == 0) & (self.deliveries['noball_runs']==0)& (self.deliveries['total_runs']==0)]).ball.count()
                overs = float(int(balls/6) + float(balls%6)/10)
                frac, whole = math.modf(overs)
                total = whole + frac*10/6
                economy_rate = runs_conceded/total
                bowler_eco.append((bowler, economy_rate, balls, overs, dot_balls))

        return  bowler_eco
    
    def bowler_balls(self, match_id=0, team = None, bowler=None, economy_rate=None, balls=None, overs=None,dot_balls=None, rank=1):
        eco_balls_over = self.overall_economy_rate_by_bowler(match_id=match_id,team=team, bowler=bowler)
        if(economy_rate is not None):
            eco_balls_over = sorted(eco_balls_over, key=lambda x: x[1])
            return eco_balls_over[rank-1][0],eco_balls_over[rank-1][1] 
        elif(balls is not None):
            eco_balls_over = sorted(eco_balls_over, key=lambda x: x[2], reverse=True)
            return eco_balls_over[rank-1][0],eco_balls_over[rank-1][2]
        elif(overs is not None):
            eco_balls_over = sorted(eco_balls_over, key=lambda x: x[3], reverse=True)
            return eco_balls_over[rank-1][0],eco_balls_over[rank-1][3]
        elif(dot_balls is not None):
            eco_balls_over = sorted(eco_balls_over, key=lambda x: x[4], reverse=True)
            return eco_balls_over[rank-1][0],eco_balls_over[rank-1][4]
        else:
            eco_balls_over = sorted(eco_balls_over, key=lambda x: x[1])
            return eco_balls_over[rank-1]
        
    def overall_runs_conceded(self, match_id=0,team=None, bowler=None):
        over_all_runs = []
        if(match_id==0):
            if(team is not None):
                total_runs_conceded = self.deliveries[self.deliveries['bowling_team'] == team].total_runs.sum()
                bye_runs_conceded = self.deliveries[self.deliveries['bowling_team'] == team].bye_runs.sum()
                legbye_runs_conceded = self.deliveries[self.deliveries['bowling_team'] == team].legbye_runs.sum()
                wide = self.deliveries[self.deliveries['bowling_team'] == team].wide_runs.sum()
                noball = self.deliveries[self.deliveries['bowling_team'] == team].noball_runs.sum()
                four_boundary_conceded = self.deliveries[(self.deliveries['bowling_team'] == team)&(self.deliveries['batsman_runs']==4)].batsman_runs.count()
                six_boundary_conceded = self.deliveries[(self.deliveries['bowling_team'] == team)&(self.deliveries['batsman_runs']==6)].batsman_runs.count()
                runs = total_runs_conceded
                total_boundaries = four_boundary_conceded + six_boundary_conceded
                extras = wide + noball + bye_runs_conceded + legbye_runs_conceded
                over_all_runs.append((team, runs ,wide, noball,extras, four_boundary_conceded, six_boundary_conceded, total_boundaries))          
            elif (bowler == None):
                bowlers = self.deliveries.bowler.unique()
                for bowler in bowlers:
                    total_runs_conceded = self.deliveries[self.deliveries['bowler'] == bowler].total_runs.sum()
                    bye_runs_conceded = self.deliveries[self.deliveries['bowler'] == bowler].bye_runs.sum()
                    legbye_runs_conceded = self.deliveries[self.deliveries['bowler'] == bowler].legbye_runs.sum()
                    wide = self.deliveries[self.deliveries['bowler']== bowler].wide_runs.sum()
                    noball = self.deliveries[self.deliveries['bowler']== bowler].noball_runs.sum()
                    four_boundary_conceded = self.deliveries[(self.deliveries['bowler'] == bowler)&(self.deliveries['batsman_runs']==4)].batsman_runs.count()
                    six_boundary_conceded = self.deliveries[(self.deliveries['bowler'] == bowler)&(self.deliveries['batsman_runs']==6)].batsman_runs.count()
                    runs = total_runs_conceded - bye_runs_conceded - legbye_runs_conceded
                    total_boundaries = four_boundary_conceded + six_boundary_conceded
                    extras = wide + noball + bye_runs_conceded + legbye_runs_conceded
                    over_all_runs.append((bowler, runs ,wide, noball,extras, four_boundary_conceded, six_boundary_conceded, total_boundaries))


            else:
                total_runs_conceded = self.deliveries[self.deliveries['bowler'] == bowler].total_runs.sum()
                wide = self.deliveries[self.deliveries['bowler']== bowler].wide_runs.sum()
                noball = self.deliveries[self.deliveries['bowler']== bowler].noball_runs.sum()    
                bye_runs_conceded = self.deliveries[self.deliveries['bowler'] == bowler].bye_runs.sum()
                legbye_runs_conceded = self.deliveries[self.deliveries['bowler'] == bowler].legbye_runs.sum()
                four_boundary_conceded = self.deliveries[(self.deliveries['bowler'] == bowler)&(self.deliveries['batsman_runs']==4)].batsman_runs.count()
                six_boundary_conceded = self.deliveries[(self.deliveries['bowler'] == bowler)&(self.deliveries['batsman_runs']==6)].batsman_runs.count()
                runs = total_runs_conceded - bye_runs_conceded - legbye_runs_conceded
                total_boundaries = four_boundary_conceded + six_boundary_conceded
                extras = wide + noball + bye_runs_conceded + legbye_runs_conceded
                over_all_runs.append((bowler, runs ,wide, noball,extras, four_boundary_conceded, six_boundary_conceded, total_boundaries))

        else:
            if(team is not None):
                total_runs_conceded = self.deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowling_team'] == team)].total_runs.sum()
                wide = self.deliveries[(self.deliveries['match_id']==match_id)& (self.deliveries['bowling_team'] == team)].wide_runs.sum()
                noball = self.deliveries[(self.deliveries['match_id']==match_id)& (self.deliveries['bowling_team'] == team)].noball_runs.sum()    
                bye_runs_conceded = self.deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowling_team'] == team)].bye_runs.sum()
                legbye_runs_conceded = self.deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowling_team'] == team)].legbye_runs.sum()
                four_boundary_conceded = self.deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowling_team'] == team)&(self.deliveries['batsman_runs']==4)].batsman_runs.count()
                six_boundary_conceded = self.deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowling_team'] == team)&(self.deliveries['batsman_runs']==6)].batsman_runs.count()
                runs = total_runs_conceded
                total_boundaries = four_boundary_conceded + six_boundary_conceded
                extras = wide + noball + bye_runs_conceded + legbye_runs_conceded
                over_all_runs.append((team, runs ,wide, noball,extras, four_boundary_conceded, six_boundary_conceded, total_boundaries))

            elif(bowler == None):
                bowlers = self.deliveries[self.deliveries['match_id'] == match_id].bowler.unique()
                for bowler in bowlers:
                    total_runs_conceded = self.deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowler'] == bowler)].total_runs.sum()
                    bye_runs_conceded = self.deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowler'] == bowler)].bye_runs.sum()
                    legbye_runs_conceded = self.deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowler'] == bowler)].legbye_runs.sum()
                    wide = self.deliveries[(self.deliveries['match_id']==match_id)& (self.deliveries['bowler']== bowler)].wide_runs.sum()
                    noball = self.deliveries[(self.deliveries['match_id']==match_id)& (self.deliveries['bowler']== bowler)].noball_runs.sum()
                    four_boundary_conceded = self.deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowler'] == bowler)&(self.deliveries['batsman_runs']==4)].batsman_runs.count()
                    six_boundary_conceded = self.deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowler'] == bowler)&(self.deliveries['batsman_runs']==6)].batsman_runs.count()
                    runs = total_runs_conceded - bye_runs_conceded - legbye_runs_conceded
                    total_boundaries = four_boundary_conceded + six_boundary_conceded
                    extras = wide + noball + bye_runs_conceded + legbye_runs_conceded
                    over_all_runs.append((bowler, runs ,wide, noball,extras, four_boundary_conceded, six_boundary_conceded, total_boundaries))
            else:
                total_runs_conceded = self.deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowler'] == bowler)].total_runs.sum()
                wide = self.deliveries[(self.deliveries['match_id']==match_id)& (self.deliveries['bowler']== bowler)].wide_runs.sum()
                noball = self.deliveries[(self.deliveries['match_id']==match_id)& (self.deliveries['bowler']== bowler)].noball_runs.sum()    
                bye_runs_conceded = self.deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowler'] == bowler)].bye_runs.sum()
                legbye_runs_conceded = self.deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowler'] == bowler)].legbye_runs.sum()
                four_boundary_conceded = self.deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowler'] == bowler)&(self.deliveries['batsman_runs']==4)].batsman_runs.count()
                six_boundary_conceded = self.deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowler'] == bowler)&(self.deliveries['batsman_runs']==6)].batsman_runs.count()
                runs = total_runs_conceded - bye_runs_conceded - legbye_runs_conceded
                total_boundaries = four_boundary_conceded + six_boundary_conceded
                extras = wide + noball + bye_runs_conceded + legbye_runs_conceded
                over_all_runs.append((bowler, runs ,wide, noball,extras, four_boundary_conceded, six_boundary_conceded, total_boundaries))
        return over_all_runs
    
    def bowler_runs(self, match_id=0, team=None, bowler=None, runs=None, wide=None, noball=None,extras=None, fours=None, sixes=None, boundary=None, rank=1):
        bowler_stats_data = self.overall_runs_conceded(match_id=match_id,team=team, bowler=bowler)
        if (runs is not None):
            bowler_stats_data = sorted(bowler_stats_data, key=lambda x: x[1], reverse=True)
            return bowler_stats_data[rank-1][0:2]
        elif (wide is not None):
            bowler_stats_data = sorted(bowler_stats_data, key=lambda x: x[2], reverse=True)
            return bowler_stats_data[rank-1][0],bowler_stats_data[rank-1][2]
        elif (noball is not None):
            bowler_stats_data = sorted(bowler_stats_data, key=lambda x: x[3], reverse=True)
            return bowler_stats_data[rank-1][0],bowler_stats_data[rank-1][3]
        elif (extras is not None):
            bowler_stats_data = sorted(bowler_stats_data, key=lambda x: x[4], reverse=True)
            return bowler_stats_data[rank-1][0],bowler_stats_data[rank-1][4]
        elif (fours is not None):
            bowler_stats_data = sorted(bowler_stats_data, key=lambda x: x[5], reverse=True)
            return bowler_stats_data[rank-1][0],bowler_stats_data[rank-1][5]
        elif (sixes is not None):
            bowler_stats_data = sorted(bowler_stats_data, key=lambda x: x[6], reverse=True)
            return bowler_stats_data[rank-1][0],bowler_stats_data[rank-1][6]
        elif (boundary is not None):
            bowler_stats_data = sorted(bowler_stats_data, key=lambda x: x[7], reverse=True)
            return bowler_stats_data[rank-1][0],bowler_stats_data[rank-1][7]
        else:
            bowler_stats_data = sorted(bowler_stats_data, key=lambda x: x[1], reverse=True)
            return bowler_stats_data[rank-1][0:2]
        
    def bowler_wickets(self, match_id=0, team=None, bowler=None,rank =1):
        bowler_wicket =[]
        if(match_id==0):
            if (team is not None):
                total_wickets = self.deliveries[self.deliveries['bowling_team'] == team].dismissal_kind.count()
                wickets_total = total_wickets
                bowlers = self.deliveries[self.deliveries['bowling_team'] == team].bowler.unique()
                for bowler in bowlers:
                    total_wickets = self.deliveries[self.deliveries['bowler'] == bowler].dismissal_kind.count()
                    run_outs = self.deliveries[(self.deliveries['bowler'] == bowler) & (self.deliveries['dismissal_kind']== 'run out')].dismissal_kind.count()
                    wickets = total_wickets - run_outs
                    bowler_wicket.append((bowler, wickets, team, wickets_total))

            elif (bowler == None): 
                bowlers = self.deliveries.bowler.unique()
                for bowler in bowlers:
                    total_wickets = self.deliveries[self.deliveries['bowler'] == bowler].dismissal_kind.count()
                    run_outs = self.deliveries[(self.deliveries['bowler'] == bowler) & (self.deliveries['dismissal_kind']== 'run out')].dismissal_kind.count()
                    wickets = total_wickets - run_outs
                    bowler_wicket.append((bowler, wickets))
            else:
                total_wickets = self.deliveries[self.deliveries['bowler'] == bowler].dismissal_kind.count()
                run_outs = self.deliveries[(self.deliveries['bowler'] == bowler) & (self.deliveries['dismissal_kind']== 'run out')].dismissal_kind.count()
                wickets = total_wickets - run_outs
                bowler_wicket.append((bowler, wickets))
        else:
            if (team is not None):
                total_wickets = self.deliveries[(self.deliveries['match_id'] == match_id) & (self.deliveries['bowling_team'] == team)].dismissal_kind.count()
                wickets_total = total_wickets 
                bowlers = self.deliveries[(self.deliveries['match_id'] == match_id)& (self.deliveries['bowling_team'] == team)].bowler.unique()
                for bowler in bowlers:
                    total_wickets = self.deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowler'] == bowler)].dismissal_kind.count()
                    run_outs = self.deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowler'] == bowler) & (self.deliveries['dismissal_kind']== 'run out')].dismissal_kind.count()
                    wickets = total_wickets - run_outs
                    bowler_wicket.append((bowler, wickets, team, wickets_total))
            elif (bowler == None):
                bowlers = self.deliveries[(self.deliveries['match_id'] == match_id)].bowler.unique()
                for bowler in bowlers:
                    total_wickets = self.deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowler'] == bowler)].dismissal_kind.count()
                    run_outs = self.deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowler'] == bowler) & (self.deliveries['dismissal_kind']== 'run out')].dismissal_kind.count()
                    wickets = total_wickets - run_outs
                    bowler_wicket.append((bowler, wickets))
            else:
                total_wickets = self.deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowler'] == bowler)].dismissal_kind.count()
                run_outs = self.deliveries[(self.deliveries['match_id'] == match_id)&(self.deliveries['bowler'] == bowler) & (self.deliveries['dismissal_kind']== 'run out')].dismissal_kind.count()
                wickets = total_wickets - run_outs
                bowler_wicket.append((bowler, wickets))
        bowler_wicket = sorted(bowler_wicket, key=lambda x: x[1], reverse=True)
        return  bowler_wicket[rank-1]

----

## 5. Answer Formulation

This is used to form the "reply" format that will be displayed as output by the Bot.

- Uses a stored list of pre-defined formats of how the Bot can answer the questions.
- Multiple answers are added to same question to give the Bot a feel of versatility

In [27]:
answers = { 'total_runs_team_match' : ['{team} scored {runs} runs in match {match}',
                                       'In match {match} {team} scored {runs} runs',
                                       '{runs} runs were scored by {team} in match {match}',
                                       'Match {match} saw {team} scoring {runs} runs',
                                       'A total of {runs} was scored by {team} in match {match}'],
            'runs_batsman_match' : ['{batsman} scored {runs} runs in match {match}',
                                    'In match {match} {batsman} scored {runs} runs',
                                    '{runs} runs were scored by {batsman} in match {match}'],
            'max_score_batsman_match_inTeam' : ['{batsman} of {team} scored {runs} runs in match {match}',
                                                'In match {team} {batsman} of {team} scored {runs} runs',
                                                '{runs} runs were scored by {batsman} of {team} in match {match}'],
            'max_score_batsman_match' : ['{batsman} of scored {runs} runs in match {match}',
                                         'In match {match} {batsman} scored {runs} runs',
                                          '{batsman} in match {match} scored {runs} runs',
                                          '{runs} runs were scored by {batsman} in {match}',
                                          'In match {match} {batsman} scored {runs} runs'],
            'min_score_batsman_match' : ['{batsman} of scored {runs} runs in match {match}',
                                         'In match {match} {batsman} scored {runs} runs',
                                          '{batsman} in match {match} scored {runs} runs',
                                          '{runs} runs were scored by {batsman} in {match}',
                                          'In match {match} {batsman} scored {runs} runs'],
            'max_score_batsman_match_inTeam':['{batsman} of {team} scored {runs} runs in match {match}',
                                               'In match {match} {batsman} of {team} scored {runs} runs',
                                               '{runs} runs were scored by {batsman} of {team} in match {match}'],
            'total_runs_batsman_IPL':['{batsman} scored {runs} runs in ipl',
                                      'In this season {batsman} scored {runs} runs',
                                      '{runs} runs were scored by {batsman} in the whole ipl'],
            'total_runs_team_IPL':['{team} scored {runs} runs in ipl',
                                  'In this season {team} scored {runs} runs',
                                  '{runs} runs were scored by {team} in the whole ipl'],
            'dot_balls_batsman_match':['{batsman} faced {dot_balls} dot balls in match {match}',
                                       'In match {match} {batsman} faced {dot_balls} dot balls',
                                       '{dot_balls} dot balls were faced by {batsman} in match {match}'],
            'b_4_batsman_match':['{batsman} hit {fours} fours in match {match}',
                                 'In match {match} {batsman} hit {fours} fours',
                                 '{fours} fours were hit by {batsman} in match {match}'],
            'b_6_batsman_match':['{batsman} hit {sixes} sixes in match {match}',
                                 'In match {match} {batsman} scored {sixes} runs',
                                 '{sixes} were hit by {batsman} in match {match}'],
            'team_fours':['{team} hit {fours} fours in match {match}',
                          'In match {match} {team} hit {fours} fours',
                          '{fours} fours were hit by {team} in match {match}'],
            'team_sixes':['{team} hit {sixes} sixes in match {match}',
                          'In match {match} {team} scored {sixes} runs',
                          '{sixes} were hit by {batsman} in match {match}'],
            'overall_fours_count':['{batsman} hit {fours} fours in ipl',
                                   'In this season {batsman} hit {fours} fours',
                                   '{fours} fours were hit by {batsman} in ipl'],
            'overall_sixes_count':['{batsman} hit {sixes} sixes in ipl',
                                   'In this season {batsman} hit {sixes} sixes',
                                   '{sixes} sixes were hit by {batsman} in ipl'],
            'highest_scorer':['{batsman} scored maximum runs,he scored {runs} runs in ipl',
                              'In this season {batsman} scored maximum runs,he scored {runs} runs',
                              'maximum runs {runs} runs were scored by {batsman} in the whole ipl'],
            'most_fours_count':['{batsman} hit most fours,he hit {fours} fours in ipl',
                                'In this season {batsman} hit most fours,he {fours} fours',
                                'most fours {fours} fours were hit by {batsman} in ipl'],
            'most_sixes':['{batsman} hit most sixes,he hit {sixes} sixes in ipl',
                          'In this season {batsman} hit most sixes,he hit {sixes} sixes',
                          'most sixes {sixes} sixes were hit by {batsman} in ipl']}

---

## 6. Output (Answering the User Question/Query)

This is the method where the Processed, Cleaned and Classified Question is fitted with a reply by calling the "Executors" class.

- The Classified Question is passed to a fitting method in "Executors" class.
- The reply is displayed to the user.

#### Calling "Executors" class (Step 5)

In [28]:
exe = Executors()

#### Output

In [29]:
def classify(classifier, chunked, chunked_dict):
    boundary_list =['4s','6s','sixes','fours']
    
    
    # CLASSIFIER CLASSES #
    
    #############################################################################
    #   Classifer Class 1 = 'runs'
    #############################################################################
    if classifier == 'runs':
        try:
            if chunked_dict['team'] != []:
                team_name = chunked_dict['team'][0]
                for i in range(len(chunked)):
                    if type(chunked[i]) is tuple:
                        if chunked[i][1] == 'CD':
                            match_id = chunked[i][0]           
                result = exe.total_runs_team_match(team_name, int(match_id))
                reply = random.choice(answers['total_runs_team_match'])
                print(reply.format(**result))
            else :
                for i in range(len(chunked)):
                    person_name = chunked_dict['player'][0]
                    if type(chunked[i]) is tuple:
                        if chunked[i][1] == 'CD':
                            match_id = chunked[i][0]
                result = exe.runs_batsman_match(person_name, int(match_id))
                reply = random.choice(answers['runs_batsman_match'])
                print(reply.format(**result))
        except:
            print("Sorry, I can not understand.")

    #############################################################################
    #   Classifer Class 2 = 'max_runs'
    #############################################################################
    elif classifier =='max_runs':
        try:
            if chunked_dict['team'] != []:
                team_name = chunked_dict['team'][0]
                for i in range(len(chunked)): 
                    if type(chunked[i]) is tuple:
                        if chunked[i][1] == 'CD':
                            match_id = chunked[i][0]
                result = exe.max_score_batsman_match_inTeam(int(match_id), team_name)
                reply = random.choice(answers['max_score_batsman_match_inTeam'])
                print(reply.format(**result))
            else:
                for i in range(len(chunked)):
                    if type(chunked[i]) is tuple:
                        if chunked[i][1] == 'CD':
                            match_id = chunked[i][0]
                print(max_score_batsman_match(int(match_id)))
        except:
            print("Sorry, I can not understand.")

    #############################################################################
    #   Classifer Class 3 = 'min_runs'
    #############################################################################
    elif classifier =='min_runs':
        try:
            if chunked_dict['team'] != []:
                team_name = chunked_dict['team'][0]
                for i in range(len(chunked)):
                    if type(chunked[i]) is tuple:
                        if chunked[i][1] == 'CD':
                            match_id = chunked[i][0]
                print(min_score_batsman_match_inTeam(int(match_id), team_name))
            else:
                for i in range(len(chunked)): 
                    if type(chunked[i]) is tuple:
                        if chunked[i][1] == 'CD':
                            match_id = chunked[i][0]
                print(min_score_batsman_match(int(match_id)))
        except:
            print("min_runs")

    #############################################################################
    #   Classifer Class 4 = 'total_runs'
    #############################################################################
    elif classifier == 'total_runs':
        try:
            if chunked_dict['team'] != []:
                team_name = chunked_dict['team'][0]
                print(total_runs_team_IPL(team_name))
            else :
                for i in range(len(chunked)): 
                    person_name = chunked_dict['player'][0]
                print(total_runs_batsman_IPL(person_name))
        except:
            print("total_runs")

    #############################################################################
    #   Classifer Class 5 = 'fours'
    #############################################################################
    elif classifier == 'fours':
        try:
            if chunked_dict['team'] != []:
                team_name = chunked_dict['team'][0]
                for i in range(len(chunked)): 
                    if type(chunked[i]) is tuple:
                        if chunked[i][1] == 'CD':
                            flag = 0
                            for p in boundary_list:
                                if chunked[i][1] == p:
                                    flag = 1
                            if flag == 0:
                                match_id = chunked[i][0]
                print(team_fours(int(match_id), team_name))
            else:
                person_name = chunked_dict['player'][0]
                for i in range(len(chunked)): 
                    if type(chunked[i]) is tuple:
                        if chunked[i][1] == 'CD':
                            flag = 0
                            for p in boundary_list:
                                if chunked[i][1] == p:
                                    flag = 1
                            if flag == 0:
                                match_id = chunked[i][0]    
                print(b_4_batsman_match(person_name ,int(match_id)))
        except:
            print("Fours")

    #############################################################################
    #   Classifer Class 6 = 'sixes'
    #############################################################################
    elif classifier == 'sixes':
        try:
            if chunked_dict['team'] != []:
                team_name = chunked_dict['team'][0]
                for i in range(len(chunked)): 
                    if type(chunked[i]) is tuple:
                        if chunked[i][1] == 'CD':
                            flag = 0
                            for p in boundary_list:
                                if chunked[i][1] == p:
                                    flag = 1
                            if flag == 0:
                                match_id = chunked[i][0]
                print(team_sixes(int(match_id), team_name))
            else:
                person_name = chunked_dict['player'][0]
                for i in range(len(chunked)): 
                    if type(chunked[i]) is tuple:
                        if chunked[i][1] == 'CD':
                            flag = 0
                            for p in boundary_list:
                                if chunked[i][1] == p:
                                    flag = 1
                            if flag == 0:
                                match_id = chunked[i][0]    
                print(b_6_batsman_match(person_name ,int(match_id)))
        except:
            print("SIXES")
    
    #############################################################################
    #   Classifer Class 7 = 'bowler_wickets'
    #############################################################################
    elif classifier == 'bowler_wickets':
        team = None
        bowler = None
        match_id = 0
        if chunked_dict['team'] != []:
            team = chunked_dict['team'][0]
        elif chunked_dict['player'] != []:
            bowler = chunked_dict['player'][0]
        for i in range(len(chunked)): 
                    if type(chunked[i]) is tuple:
                        if chunked[i][1] == 'CD':
                            flag = 0
                            for p in boundary_list:
                                if chunked[i][1] == p:
                                    flag = 1
                            if flag == 0:
                                match_id = chunked[i][0]
        print(exe.bowler_wickets(match_id = int(match_id), team = team, bowler=bowler,rank = 1))

    #############################################################################
    #   Classifer Class 8 = 'bowler_balls'
    #############################################################################
    elif (classifier == 'bowler_balls'):
        if chunked_dict['team'] != []:
            team = chunked_dict['team'][0]
        elif chunked_dict['player'] != []:
            bowler = chunked_dict['player'][0]
        for i in range(len(chunked)): 
                    if type(chunked[i]) is tuple:
                        if chunked[i][1] == 'CD':
                            flag = 0
                            for p in boundary_list:
                                if chunked[i][1] == p:
                                    flag = 1
                            if flag == 0:
                                match_id = chunked[i][0]
        features_check = feature_extractor(test_sentences)
        economy_rate_list = ['economy','economy rate','expensive','economical','economy-rate']
        overs_list = ['over','overs']
        dot_ball_list = ['dot','dots','dotball','dotballs']
        flag = 0
        for i in economy_rate_list:
            if(features_check[i]==1):
                flag = 1
                economy_rate = 'economy_rate'
                break
        if(economy_rate is not 'economy_rate'):
            for i in overs_list:
                if(features_check[i] == 1):
                    flag = 1
                    overs = 'overs'
                    break
        if(overs is not 'overs'):
            for i in dot_ball_list:
                if(features_check[i] == 1):
                    dot_balls = 'dot_balls'
                    flag = 1
                    break
        if (flag is not 1):
            balls = 'balls'

        print(exe.bowler_balls(match_id=int(match_id), bowler=bowler,economy_rate=economy_rate, balls=balls, overs=overs,dot_balls=dot_balls, rank=rank))
    
    #############################################################################
    #   Classifer Class 9 = 'bowler_runs'
    #############################################################################
    elif (classifier == 'bowler_runs'):
        if chunked_dict['team'] != []:
            team = chunked_dict['team'][0]
        elif chunked_dict['player'] != []:
            bowler = chunked_dict['player'][0]
        for i in range(len(chunked)): 
                    if type(chunked[i]) is tuple:
                        if chunked[i][1] == 'CD':
                            flag = 0
                            for p in boundary_list:
                                if chunked[i][1] == p:
                                    flag = 1
                            if flag == 0:
                                match_id = chunked[i][0]
        features_check = feature_extractor(query)
        wide_list = ['wide','wides']
        noball_list = ['no','noball','noballs']
        extra_list = ['extra','extras']
        fours_list = ['fours','4s','four']
        sixes_list = ['sixes','6s','six']
        boundaries_list = ['boundary','boundaries']
        flag = 0
        for i in wide_list:
            if(features_check[i]==1):
                flag = 1
                wide = 'wide'
                break
        if(wide is not 'wide'):
            for i in noball_list:
                if(features_check[i]==1):
                    flag = 1
                    noball='noball'
                    break
        if(noball is not 'noball'):
            for i in extra_list:
                if(features_check[i]==1):
                    flag = 1
                    extras='extras'
                    break
        if(extras is not 'extras'):
            for i in fours_list:
                if(features_check[i]==1):
                    flag = 1
                    fours = 'fours'
                    break
        if(fours is not 'fours'):
            for i in sixes_list:
                if(features_check[i]==1):
                    flag = 1
                    sixes = 'sixes'
                    break
        if(sixes is not 'sixes'):
            for i in boundaries_list:
                if(features_check[i]==1):
                    flag = 1
                    boundary='boundary'
                    break
        if(flag == 0):
            runs = 'runs'
        print(exe.bowler_runs(match_id=int(match_id),bowler=bowler, runs = runs, wide=wide, noball=noball,extras=extras, fours=fours, sixes=sixes,boundary=boundary, rank=rank))

---

# Main Method (Compiles all 6 step process)

In [30]:
def User_Input(question):
    
    try:
        # Pre-processing (Cleaning)
        boundary_list =['4s','6s','sixes','fours']
        filtered_words = []
        words = word_tokenize(question)
        lm = WordNetLemmatizer()
        stop_words = stopwords.words('english')
        for word in words:
             if word.lower() not in stop_words:
                    if word in boundary_list:
                        filtered_words.append(word)
                    else:
                        filtered_words.append(lm.lemmatize(word))

        # 1. Brill-Tagging (Custom trained POS-Tagging)
        tagged = brill_tagger.tag(filtered_words)
        chunkGram = r"""Chunk:{<NN.?>*<NNP.?>*}"""
        chunkParser = nltk.RegexpParser(chunkGram)
        chunked = chunkParser.parse(tagged)
        chunked_words = []
        for node in chunked:
            if hasattr(node, "label"):
                words = [word for word, tag in node.leaves()]
                chunked_words += words

        # 2. Extending/Completing user "word" to known word
        chunked_dict = function_return_fullName(chunked_words)

        # 3. Classifier 
        classifier = naive_bayes_classifier.classify(feature_extractor(filtered_words))

        # 4,5,6. Output (Methods for Replying to User Question/Query) + Classifier + Chuker
        output = classify(classifier, chunked, chunked_dict)
        
        return output
    
    except:
        print("Sorry I didn't understand.")

=======================================================================================================================

# User asks a Question

In [31]:
query = "score by mccullum in match 1?"
User_Input(query)

In match 1 BB McCullum scored 158 runs


In [32]:
query = "How much did mccullum score in match 1?"
User_Input(query)

BB McCullum scored 158 runs in match 1


In [33]:
query = "ABCD EFGH.."
User_Input(query)

Sorry I didn't understand.
