#### 25-02-2021

### Task Details and Objective:
* Returns is number one issue for any startup in Pakistan. It can kill the startup in no time. 
* In this notebook I will predict the possibility of return for a given order (for eg. by city name, book name, payment method etc.)

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import math  
import re

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

### Import data

In [None]:
data = pd.read_csv('/kaggle/input/gufhtugu-publications-dataset-challenge/GP Orders - 5.csv')

data.head()

#### For data information

In [None]:
data.info()

### Checking Null values

In [None]:
data.isnull().sum()

In [None]:
data[(data.apply(lambda x : sum(x.isnull().values), axis=1) > 0)]

### Data Dimension

In [None]:
data.shape

In [None]:
data[data['Book Name'].isnull()]

In [None]:
data.drop(data.index[[12350, 16976]],inplace=True)

# Data Cleaning

## Book Names Cleaning

### Renaming and Splitting
* Removing Special Character
* Renaming few books manually

In [None]:
data["Book Name"] = data["Book Name"].str.replace("(" , "").str.replace(")" , "")
data["Book Name"] = data["Book Name"].str.replace("[" , "").str.replace("]" , "")

data["Book Name"] = data["Book Name"].str.replace("python programming- release date: august 14, 2020" , 
                                              "python programming")
data["Book Name"] = data["Book Name"].str.replace("molo masali - مولو مصلی" , "molo masali" )
data["Book Name"] = data["Book Name"].str.replace("r ka taaruf  آر کا تعارف" , "r ka taaruf")
data["Book Name"] = data["Book Name"].str.replace("linux - an introduction release data - october 3, 2020" , 
                                              "linux - an introduction")

#### Many rows contains multiple books in same row, so splitting them to get each book in separate rows.

In [None]:
new_df = pd.DataFrame(data['Book Name'].str.split('/').tolist(), index=data['Order Number']).stack()

new_df = new_df.reset_index([0, 'Order Number'])

new_df.columns = ['Order Number', 'Book Name']

In [None]:
new_df

### Merging

In [None]:
data = pd.merge(data, new_df, on=["Order Number"])

In [None]:
data.head()

#### New dimension

In [None]:
data.shape

### Cleaning extra columns

In [None]:
data['Book Name'] = data['Book Name_y']
data.drop(['Book Name_x'], axis=1, inplace=True)
data.drop(['Book Name_y'], axis=1, inplace=True)

In [None]:
data.dropna(inplace=True)

### Renaming

In [None]:
data['Payment Method'].replace({'Cash on delivery': 'COD', 'Cash on Delivery (COD)': 'COD'}, inplace=True)

In [None]:
data.isnull().sum()

In [None]:
data.head()

In [None]:
data.drop(columns = ['Order Date & Time', 'Order Number'], inplace=True)

## City Names Cleaning

In [None]:
pd.options.display.max_rows = None
data['City'].value_counts().head(10)

In [None]:
data.head()

In [None]:
data['Order Status'] = data['Order Status'].str.lower()
data['City'] = data['City'].str.lower()
data['Payment Method'] = data['Payment Method'].str.lower()
data['Book Name'] = data['Book Name'].str.lower()

In [None]:
data.head()

In [None]:
pd.options.display.max_rows = None
data['City'].value_counts().head(10)

### Converting Urdu City Names into English

In [None]:
urdu_city = data['City'][~(data['City'].str.contains("[a-zA-Z]"))]

In [None]:
urdu_city.unique()

In [None]:
from textblob import TextBlob
from time import sleep

In [None]:
to_en = {}
for x in urdu_city:
    # sleep to not exceed the limit of requests
    sleep(0.5)
    try:
        tr = TextBlob(x).translate().string
        to_en[x] = tr
#         print(x," - ", tr)
    except:
        pass

In [None]:
len(to_en)

In [None]:
data['City'] = data.City.replace(to_en)

In [None]:
data.head(10)

#### Further cleaning of city names, because there has been plenty of spelling mistakes, use of short forms etc. so manually replacing them. Also in city names, complete addresses and nearby areas names are also written so cleaning them is very important, otherwise it would eventually affect our ML predictions.
#### So manually finding the patterns and replacing them with correct names

In [None]:
pattern = r'(karach)'

In [None]:
khi = data['City'][data['City'].str.contains(pattern)].value_counts().index

In [None]:
khi

In [None]:
data['City'] = data['City'].replace(khi, 'karachi')

In [None]:
data['City'].unique().size

#### Using this city names list for replacing

In [None]:
cities = ['islamabad', 'ahmed nager chatha', 'ahmadpur east', 'ali khan abad', 'alipur', 'arifwala', 'attock', 'bhera',
              'bhalwal', 'bahawalnagar','bahawalpur', 'bhakkar', 'burewala', 'chillianwala', 'chakwal', 'chichawatni',
              'chiniot', 'chishtian',
              'daska', 'darya khan', 'dera ghazi khan', 'dhaular', 'dina', 'dinga', 'dipalpur', 'faisalabad', 'ferozewala',
              'fateh jhang','ghakhar mandi', 'gojra', 'gujranwala', 'gujrat', 'gujar khan', 'hafizabad', 'haroonabad', 'hasilpur',
              'haveli lakha', 'jatoi',
              'jalalpur', 'jattan', 'jampur', 'jaranwala', 'jhang', 'jhelum', 'kalabagh', 'karor lal esan', 'kasur', 'kamalia', 'kamoke',
              'khanewal',
              'khanpur', 'kharian', 'khushab', 'kot addu', 'jauharabad', 'lahore', 'lalamusa', 'layyah', 'liaquat pur',
              'lodhran', 'malakwal', 'mamoori', 'mailsi', 'mandi bahauddin', 'mian channu', 'mianwali', 'multan', 'murree', 
              'muridke', 'mianwali bangla', 'muzaffargarh', 'narowal', 'nankana sahib', 'okara', 'renala khurd', 'pakpattan', 
              'pattoki', 'pir mahal', 'qaimpur', 'qila didar singh', 'rabwah', 'raiwind', 'rajanpur', 'rahim yar khan',
              'rawalpindi',
              'sadiqabad', 'safdarabad', 'sahiwal', 'sangla hill', 'sarai alamgir', 'sargodha', 'shakargarh', 'sheikhupura',
              'sialkot',
              'sohawa', 'soianwala', 'siranwali', 'talagang', 'taxila', 'toba tek singh', 'vehari', 'wah cantonment', 
              'wazirabad',
              'badin', 'bhirkan', 'rajo khanani', 'chak', 'dadu', 'digri', 'diplo', 'dokri', 'ghotki', 'haala', 'hyderabad',
              'islamkot', 'jacobabad', 'jamshoro', 'jungshahi', 'kandhkot', 'kandiaro', 'karachi', 'kashmore', 'keti bandar',
              'khairpur', 'kotri', 'larkana', 'matiari', 'mehar', 'mirpur khas', 'mithani', 'mithi', 'mehrabpur', 'moro',
              'nagarparkar', 'naudero', 'naushahro feroze', 'naushara', 'nawabshah', 'nazimabad', 'qambar', 'qasimabad', 
              'ranipur', 'ratodero', 'rohri', 'sakrand', 'sanghar', 'shahbandar', 'shahdadkot', 'shahdadpur',
              'shahpur chakar', 'shikarpaur', 'sukkur', 'tangwani', 'tando adam khan', 'tando allahyar',
              'tando muhammad khan', 'thatta', 'umerkot', 'warah', 'abbottabad', 'adezai', 'alpuri', 'akora khattak',
              'ayubia', 'banda daud shah', 'bannu', 'batkhela', 'battagram', 'birote', 'chakdara', 'charsadda', 'chitral',
              'daggar', 'dargai', 'darya khan', 'dera ismail khan', 'doaba', 'dir', 'drosh', 'hangu', 'haripur', 'karak',
              'kohat', 'kulachi', 'lakki marwat', 'latamber', 'madyan', 'mansehra', 'mardan', 'mastuj', 'mingora', 'nowshera',
              'paharpur', 'pabbi', 'peshawar', 'saidu sharif', 'shorkot', 'shewa adda', 'swabi', 'swat', 'tangi', 'tank',
              'thall', 'timergara', 'tordher', 'awaran', 'barkhan', 'chagai', 'dera bugti', 'gwadar', 'harnai', 'jafarabad',
              'jhal magsi', 'kacchi', 'kalat', 'kech', 'kharan', 'khuzdar', 'killa abdullah', 'killa saifullah', 'kohlu',
              'lasbela', 'lehri', 'loralai', 'mastung', 'musakhel', 'nasirabad', 'nushki', 'panjgur', 'pishin valley', 
              'quetta', 'sherani', 'sibi', 'sohbatpur', 'washuk', 'zhob', 'ziarat']

In [None]:
def clean_city(city):
    for i in cities:
        if i in str(city):
            return i
    return city

In [None]:
data["City"] = data["City"].apply(clean_city)

In [None]:
data['City'].unique().size

In [None]:
data['City'].value_counts().head(10)

In [None]:
data.head(10)

In [None]:
match = {
    'fsd':'faisalabad', 
    'umer kot':'umerkot', 
    'khi':'karachi', 
    "alkhidmat raazi hospital cbr":"islamabad",
    "khirpur" : "khairpur", 
    "lakhi shikarpur":"shikarpur", 
    "sakhi sarwar":"dera ghazi khan", 
    "valencia / lhr":"lahore",
    'unjab lhr':"lahore",
    'ferozpur road lhr': "lahore", 
    'lhr':'lahore',
    'lahor':'lahore', 
    "rwp":"rawalpindi",
    "hyd":"hyderabad", 
    "isb":"islamabad", 
    "kot ghulam muhammad mirpurkhas":"mirpurkhas", 
    "mirpurkhas sattlelite":"mirpurkhas", 
    "rawalpindi/tarnol":"rawalpindi",
    "milad chowk satellite rawalpindi":"rawalpindi", 
    "sadiq abad":"sadiqabad", 
    "sadiqbad":"sadiqabad",
    "mandi sadiq gunj":"bahawalnagar", 
    "ghulam muhammad abad muhallah sadiqabad street#":"sadiqabad",
    "gb mureed wala samundri":"faislabad", 
    "d type sohailabad":"faislabad",
    "tehseel tandlianwala distric":"faislabad",
    "hyderabaderabad":"hyderabad",
    "hyderabaderbad":"hyderabad",
    "rahimyarkhan":"rahim yar khan", 
    "jahangiria ship malir ext near sokarachio goath":"karachi",
    "central district":"karachi", 
    "dg":"dera ghazi khan", 
    "lakarachi shikarpur":"shikarpur",
    "shah faisal maleer halt":"karachi", 
    "gulshan ravi":"lahore", 
    "mangrotha east thesil taunsa district dera ghazi khan khan":"dera ghazi khan",
    "newport":"karachi",
    "milad chowk satellite rawalpindi":"rawalpindi", 
    "hyderabadetabad":"hyderabad",
    "abottabad":"abbottabad", 
    "golden residency housing scheme opposite the  school near bus terminal shikarpur road skkur":"sukkur",
    "rangpur baghoor tehsil noorpur thal":"noorpur thal", 
    "near noory wali railway crossing":"rahim yar khan",
    "nooriabad":"jamshoro", 
    "baluchistan":"", 
    "ouch":"dir", 
    "rangpur":"noorpur thal", 
    "sharorah":"",
    "sharjah":"", 
    "pindi said pur":"jhelum", 
    "muzaffargarhghar":"muzaffargarh", 
    "shar sultan":"muzaffargarh",
    "chachran sharif": "khanpur", 
    "qaboola shareef":"pakpattan", 
    "lowari sharif":"badin", 
    "bhara kahu":"islamabad",
    "kamar mushani":"mianwali", 
    "shakar garh":"shakargarh", 
    "shankargarh":"shakargarh",
    "sukehki":"pindi bhattian", 
    "karoondi":"khairpur", 
    "shehwan shareef":"sehwan", 
    "pindigheb":"attock",
    "pindi gheb":"attock", 
    "attcka":"attock", 
    "jaurah karnana":"gujrat", 
    "haveli bahadur shah":"jhang", 
    "khalid street mohala attari jandiala sher khan p":"sheikhupura", 
    "mc zahir peer tehsil khanpur":"khanpur",
    "mubarik pur district bahwalpur tehesil ahmedpur":"ahmedpur east", 
    "qazi international tavels nimra cantre faqir abad chok pashawer":"peshawar",
    "qazi ahmed":"shaheed benazirabad", 
    "kot qazi":"chakwal"
}

In [None]:
data['City'] = data.City.replace(match)

In [None]:
data['City'].unique().size

In [None]:
df = data

for i in df["City"]:
    if "fsd" in i:
        df["City"] = df["City"].str.replace("fsd" , "faislabad")
    elif "dg" in i:
        df["City"] = df["City"].str.replace("dg" , "dera ghazi khan") 
    elif "kotli" in i:
        df["City"] = df["City"].str.replace(i , "kotli")
    elif "muzaffarabad" in i:
        df["City"] = df["City"].str.replace(i , "muzaffarabad")  
    elif "dera ghazi khan" in  i:
        df["City"] = df["City"].str.replace(i , "dera ghazi khan")
    elif "madni photo state adda retra tehsil taunsa sharif district dgkhan" in i:
        df["City"] = df["City"].str.replace(i , "dera ghazi khan")  
    elif "punjab" in i:
        df["City"] = df["City"].str.replace(i , "")  
    elif "mandi baha" in i:
        df["City"] = df["City"].str.replace(i , "mandi bahauddin")
    elif "mandi bhuddin punjab" in i:
        df["City"] = df["City"].str.replace(i , "mandi bahauddin")
    elif "tehsil phalia" in i:
        df["City"] = df["City"].str.replace(i , "mandi bahauddin")
    elif "mandibahawaldin" in i:
        df["City"] = df["City"].str.replace(i , "mandi bahauddin")
    elif "village bhinder kalan p" in i:
        df["City"] = df["City"].str.replace(i , "mandi bahauddin") 
    elif "bun" in i:
        df["City"] = df["City"].str.replace(i , "buner") 
    elif "gag" in i:
        df["City"] = df["City"].str.replace(i , "gaggoo mandi")  
    elif "mirpur a" in i:
        df["City"] = df["City"].str.replace(i , "mirpur")
    elif "dadyal" in i:
        df["City"] = df["City"].str.replace(i , "mirpur") 
    elif "tando al" in i:
        df["City"] = df["City"].str.replace(i , "tando allahyar")
    elif "tandoallahyar"in i:
        df["City"] = df["City"].str.replace(i , "tando allahyar")
    elif "nasarpur district tandoallayhar" in i:
        df["City"] = df["City"].str.replace(i , "tando allahyar") 
    elif "tandoa" in i:
        df["City"] = df["City"].str.replace(i , "tando adam")
    elif "marghu" in i:
        df["City"] = df["City"].str.replace(i , "swabi")
    elif "tarb" in i:
        df["City"] = df["City"].str.replace(i , "haripur")
    elif "hub" in i:
        df["City"] = df["City"].str.replace(i , "karachi")
    elif "hayat" in i:
        df["City"] = df["City"].str.replace(i , "peshawar")
    elif "univ" in i:
        df["City"] = df["City"].str.replace(i , "peshawar")
    elif "canal bank road university" in i:
        df["City"] = df["City"].str.replace(i , "peshawar")   
    elif "chun" in i:
        df["City"] = df["City"].str.replace(i , "chunian")
    elif "gulshan-e-ravi" in i:
        df["City"] = df["City"].str.replace(i , "lahore")
    elif "nishtar" in i:
        df["City"] = df["City"].str.replace(i , "lahore") 
    elif "orangi" in i:
        df["City"] = df["City"].str.replace(i , "karachi")
    elif "malir" in i:
        df["City"] = df["City"].str.replace(i , "karachi")
    elif "gulshan" and "aziz" in i:
        df["City"] = df["City"].str.replace(i , "karachi")
    elif "north n" in i:
        df["City"] = df["City"].str.replace(i , "karachi")
    elif "karachi district lasbeka" in i:
        df["City"] = df["City"].str.replace(i , "karachi")
    elif "karachi choki" in i:
        df["City"] = df["City"].str.replace(i , "karachi")
    elif "karachir" in i:
        df["City"] = df["City"].str.replace(i , "karachi")
    elif "karachi chow" in i:
        df["City"] = df["City"].str.replace(i , "karachi")
    elif "karachichuki" in i:
        df["City"] = df["City"].str.replace(i , "karachi")
    elif "gulzar" and "gulistan e johar" in i:
        df["City"] = df["City"].str.replace(i , "karachi")
    elif "col" in i:
        df["City"] = df["City"].str.replace(i , "karachi")
    elif "soldier" in i:
        df["City"] = df["City"].str.replace(i , "karachi")
    elif "district c" in i:
        df["City"] = df["City"].str.replace(i , "karachi") 
    elif "islamabd" in i:
        df["City"] = df["City"].str.replace(i , "islamabad")
    elif "islamababd" in i:
        df["City"] = df["City"].str.replace(i , "islamabad")
    elif "islam bad" in i:
        df["City"] = df["City"].str.replace(i , "islamabad")
    elif "islambad" in i:
        df["City"] = df["City"].str.replace(i , "islamabad")
    elif "nows" in i:
        df["City"] = df["City"].str.replace(i , "nowshera") 
    elif "abu" in i:
        df["City"] = df["City"].str.replace(i , "")
    elif "mis" in i:
        df["City"] = df["City"].str.replace(i , "")
    elif "north y" in i:
        df["City"] = df["City"].str.replace(i , "")
    elif "hunza" in i:
        df["City"] = df["City"].str.replace(i , "hunza")
    elif "gilgit" in i:
        df["City"] = df["City"].str.replace(i , "gilgit")
    elif "khaplu" in i:
        df["City"] = df["City"].str.replace(i , "khaplu")
    elif "mustafabad" in i:
        df["City"] = df["City"].str.replace(i , "depalpur")
    elif "usta" in i:
        df["City"] = df["City"].str.replace(i , "usta muhammad")
    elif "dera allah yar" in i:
        df["City"] = df["City"].str.replace(i , "dera allah yar")
    elif "district muza" in i:
        df["City"] = df["City"].str.replace(i , "muzaffargarh")
    elif "shahja" in i:
        df["City"] = df["City"].str.replace(i , "muzaffargarh")
    elif "skardu" in i:
        df["City"] = df["City"].str.replace(i , "skardu")
    elif "district shi" in i:
        df["City"] = df["City"].str.replace(i , "shikarpur")
    elif "gambat" in i:
        df["City"] = df["City"].str.replace(i , "gambat")  
    elif "thana" in i:
        df["City"] = df["City"].str.replace(i , "malakand")
    elif "fatehp" in i:
        df["City"] = df["City"].str.replace(i , "fateh pur")
    elif "fatehj" in i:
        df["City"] = df["City"].str.replace(i , "fateh jang")
    elif "branch fateh jang" in i:
        df["City"] = df["City"].str.replace(i , "fateh jang")
    elif "fateh chowk" in i:
        df["City"] = df["City"].str.replace(i , "fateh jang")
    elif "dera murad" in i:
        df["City"] = df["City"].str.replace(i , "dera murad jamali")
    elif "nasira" in i:
        df["City"] = df["City"].str.replace(i , "dera murad jamali")
    elif "naseerabad temple dera" in i:
        df["City"] = df["City"].str.replace(i , "dera murad jamali")
    elif "daraban kalan" in i:
        df["City"] = df["City"].str.replace(i , "dera ismail khan")
    elif "south waziristan tehsil sarwakai village chagmalai" in i:
        df["City"] = df["City"].str.replace(i , "dera ismail khan")
    elif "mianw" in i:
        df["City"] = df["City"].str.replace(i , "mianwali")
    elif "kalabagh" in i:
        df["City"] = df["City"].str.replace(i , "mianwali")
    elif "kala bagh" in i:
        df["City"] = df["City"].str.replace(i , "mianwali")
    elif "nowhere kalan" in i:
        df["City"] = df["City"].str.replace(i , "nowshera")
    elif "jamber kalan" in i:
        df["City"] = df["City"].str.replace(i , "kasur")
    elif "kalabat" in i:
        df["City"] = df["City"].str.replace(i , "swabi")
    elif "samar bagh" in i:
        df["City"] = df["City"].str.replace(i , "dir")
    elif "momound post of inayat killi district bajaur" in i:
        df["City"] = df["City"].str.replace(i , "malakand")
    elif "charbagh" in i:
        df["City"] = df["City"].str.replace(i , "swat")
    elif "malaka" in i:
        df["City"] = df["City"].str.replace(i , "malakand")
    elif "malakw" in i:
        df["City"] = df["City"].str.replace(i , "malakwal")
    elif "islamab" in i:
        df["City"] = df["City"].str.replace(i , "islamabad")
    elif "abbo" in i:
        df["City"] = df["City"].str.replace(i , "abbottabad")
    elif "sheik" in i:
        df["City"] = df["City"].str.replace(i , "sheikhupura")
    elif "noor " in i:
        df["City"] = df["City"].str.replace(i , "noorpur thal")
    elif "taunsa" in i:
        df["City"] = df["City"].str.replace(i , "taunsa sharif")
    elif "neu" in i:
        df["City"] = df["City"].str.replace(i , "")
    elif "uch " in i:
        df["City"] = df["City"].str.replace(i , "uch sharif")
    elif "sehwa" in i:
        df["City"] = df["City"].str.replace(i , "sehwan")
    elif "sharaq" in i:
        df["City"] = df["City"].str.replace(i , "sheikhupura")
    elif "choa" in i:
        df["City"] = df["City"].str.replace(i , "chakwal")
    elif "phullan" in i:
        df["City"] = df["City"].str.replace(i , "muzaffargarh")
    elif "kahr" in i:
        df["City"] = df["City"].str.replace(i , "lodhran")
    elif "kehr" in i:
        df["City"] = df["City"].str.replace(i , "lodhran")
    elif "karor pakka" in i:
        df["City"] = df["City"].str.replace(i , "lodhran")
    elif "karor" in i:
        df["City"] = df["City"].str.replace(i , "karor lal esan")
    elif "d i" in i:
        df["City"] = df["City"].str.replace(i , "dera ismail khan")
    elif "dikh" in i:
        df["City"] = df["City"].str.replace(i , "dera ismail khan")
    elif "di kh" in i:
        df["City"] = df["City"].str.replace(i , "dera ismail khan")
    elif "haveli l" in i:
        df["City"] = df["City"].str.replace(i , "haveli lakha")
    elif "khan p" in i:
        df["City"] = df["City"].str.replace(i , "khanpur")
    elif "shika" in i:
        df["City"] = df["City"].str.replace(i , "shikarpur")
    elif "east" in i:
        df["City"] = df["City"].str.replace(i , "ahmedpur east")


data = df

## Machine Learning

### Importing Libraries

In [None]:
import sklearn
from sklearn.utils import shuffle
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import GradientBoostingClassifier
from sklearn import linear_model, preprocessing

### Label Encoding
#### Encoding label using sklearn preprocessing

In [None]:
le = preprocessing.LabelEncoder()

city = le.fit_transform(list(data["City"]))
payment = le.fit_transform(list(data["Payment Method"]))
book = le.fit_transform(list(data["Book Name"]))
item = le.fit_transform(list(data["Total items"]))
status = le.fit_transform(list(data["Order Status"]))

#### Adding Encoded values columns of Features in DataFrame

In [None]:
data['City_Code'] , data['PM_Code'], data['Book_Code'], data['Status_Code'] = city, payment, book, status

#### Assigning X and Y Label and Splitting Training and Testing Data Set

In [None]:
X = list(zip(city, payment, book, data['Total items'],))
Y = list(status)
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(X, Y, test_size = 0.2)

### Model Comparison
* Determing model's accuracy by using model score
* Selecting the model for prediction on the basis of model accuracy

In [None]:
model_KNN = KNeighborsClassifier(n_neighbors= int(math.sqrt(len(x_train))))
model_RF = RandomForestClassifier(n_estimators=900, min_samples_split=10,
                                  oob_score=True,random_state=1,n_jobs=-1)
model_LR = LogisticRegression(random_state=0, max_iter=1000)
model_GB = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0,max_depth=2, random_state=1)


model_RF.fit(x_train, y_train)
acc_RF = model_RF.score(x_test, y_test)

model_KNN.fit(x_train, y_train)
acc_KNN = model_KNN.score(x_test, y_test)

model_LR.fit(x_train, y_train)
acc_LR= model_LR.score(x_test, y_test)

model_GB.fit(x_train, y_train)
acc_GB = model_GB.score(x_test, y_test)

ind = ['Random Forest', 'Gradient Boost', 'Logistic Regression', 'K Nearest Neighbot']
pd.DataFrame({'Models': ind, 'Accuracy':[round(acc_RF,2), 
                                         round(acc_GB,2), 
                                         round(acc_LR,2), 
                                         round(acc_KNN,2)]}).set_index('Models')

#### From above comparison, Random Forest is the best model among others so we will be using it for our predictions

### Predicting the test_data, and displaying the Predicted and Actual Result

In [None]:
predicted = model_RF.predict(x_test)
names = ["Cancelled", "Completed", "Returned"]
pred_act = {}
pred_data = {}


for x in range(len(predicted)):
    pred_act[x] = "Predicted: ", names[predicted[x]], 'Actual: ', names[y_test[x]]
    
    pred_data[x] = "City: ",data[data['City_Code'] == x_test[x][0]]['City'].values[0],'Payment Method: ', data[data['PM_Code'] == x_test[x][1]]['Payment Method'].values[0],'Book Name: ', data[data['Book_Code'] == x_test[x][2]]['Book Name'].values[0], 'Total items: ', data[data['Total items'] == x_test[x][3]]['Total items'].values[0]
           

In [None]:
data['City'] = data['City'].str.lower()
data['Payment Method'] = data['Payment Method'].str.lower()
data['Book Name'] = data['Book Name'].str.lower()

## Predicting the Probability

### Predicting the possibility, whether the given order will be Returned, Completed or Cancelled in terms of probability

In [None]:
#Checking for dummy values

cty = 'karachi'.lower()
print(f'City Name: {cty.title()}')
print('\n')
paym = 'jazzcash'.lower()
print(f'Payment Method - (COD, BankTransfer, EasyPaisa, JazzCash): {paym.title()}')
print('\n')
bk = 'kaggle for begginers'.lower()
print(f'Book Name: {bk.title()}')
print('\n')
itm = 1
print(f'Number of items: {itm}')
print('\n')

ans_city = data[data['City'] == cty]['City_Code'].values[0]
ans_pm = data[data['Payment Method'] == paym]['PM_Code'].values[0]
ans_book = data[data['Book Name'] == bk]['Book_Code'].values[0]

fin = [(ans_city, ans_pm, ans_book, itm)]
print('\n')

print(f'Based on given variables:\nCity: {cty.title()} , Payment Method: {paym.title()} , Book Name: {bk.title()} , Number of items: {itm}\n')

print(f'PROBABILITY OF ORDER RETURNED: {round(model_RF.predict_proba(fin)[0][2], 2)*100}%')

print(f'PROBABILITY OF ORDER COMPLETED: {round(model_RF.predict_proba(fin)[0][1], 2)*100}%')

print(f'PROBABILITY OF ORDER CANCELLED: {round(model_RF.predict_proba(fin)[0][0], 2)*100}%')




### Above task has been performed using Random Forest model, with model accuracy of around 96%.
### Probability of returned, completed and cancelled order has also been determined.

### Feel free to let me know what changes can be done to make the model more accurate.