# Introduction
This notebook is the starting point of our Foundation 2 project for ISB AMPBA Term 5. Here we performed dataset cleaning where we removed the columns which were not required as part of the problem statement. 

The original dataset was downloaded from [here](https://www.kaggle.com/himanshupoddar/zomato-bangalore-restaurants) and was saved in local drive. You can perform the same by changing the path below to point to your downloaded version. The dataset is not included in repo for its large size

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.impute import KNNImputer
import numpy as np
import re

In [2]:
zomato_data_path = '../datasets/Zomato Bangalore Restaurants Data/zomato_original.csv'
df = pd.read_csv(zomato_data_path)

In [3]:
df.sample(n=10)

Unnamed: 0,url,address,name,online_order,book_table,rate,votes,phone,location,rest_type,dish_liked,cuisines,approx_cost(for two people),reviews_list,menu_item,listed_in(type),listed_in(city)
23370,https://www.zomato.com/bangalore/vinaya-cafe-j...,"30, 33rd Main, 16th Cross, 6th Phase, JP Nagar...",Vinaya Cafe,Yes,No,3.7/5,55,080 41233313\r\n+91 7022281100,JP Nagar,Quick Bites,"Coffee, Masala Dosa","South Indian, Street Food, Juices",200,"[('Rated 5.0', 'RATED\n Very kind staff and o...","['Plain Dosa', 'Masala Dosa', 'Idli Vada', 'Pa...",Dine-out,JP Nagar
50670,https://www.zomato.com/bangalore/sai-charan-ti...,"32/4, Near D Mart,Varthur Main Road, Whitefiel...",Sai Charan Tiffin House,No,No,,0,+91 9535154001,Whitefield,Quick Bites,,South Indian,150,"[('Rated 5.0', 'RATED\n Hi this restaurant st...",[],Delivery,Whitefield
9243,https://www.zomato.com/bangalore/kolkata-famou...,"8, 24th Main, Opposite Royal School, Phase 5, ...",Kolkata Famous Kati Roll,Yes,No,3.5/5,11,+91 9620076440\r\n+91 7892737394,JP Nagar,Quick Bites,,"Rolls, Chinese",200,"[('Rated 4.0', ""RATED\n They make delicious r...",[],Delivery,BTM
23308,https://www.zomato.com/bangalore/cheesiano-piz...,"16, 17, 18/2 Sarakki Lake, 24th Main, 6th Phas...",Cheesiano Pizza,Yes,Yes,4.0/5,205,+91 8043761455\r\n+91 7678077768,JP Nagar,Quick Bites,"Pasta, Fries, Cheesiano Pizza, Veggie Pizza, C...","Pizza, Italian",700,"[('Rated 5.0', ""RATED\n The pizza's were amaz...","['French Fries', 'Cheese Garlic Bread', 'Chees...",Dine-out,JP Nagar
27333,https://www.zomato.com/bangalore/eatery-have-u...,"767-768, 16th Main Road, MCHS Colony, Stage 2,...",Eatery Have U Been,Yes,No,3.4/5,35,+91 9743524748,BTM,"Cafe, Quick Bites",,"Cafe, Chinese, Fast Food, Burger, North Indian",500,"[('Rated 5.0', 'RATED\n Good ambience. Very g...",[],Delivery,Koramangala 4th Block
16689,https://www.zomato.com/bangalore/wings-mama-hs...,"1086/A, Twin Tulips, Near BDA Complex, 18th Cr...",Wings Mama,Yes,No,,0,+91 8095066833,HSR,"Takeaway, Delivery",,"American, Fast Food",300,[],[],Delivery,HSR
18225,https://www.zomato.com/bangalore/kings-cafe-an...,"18, 17 F Cross, 2nd Stage, Indiranagar, Bangalore",King's Cafe And Restro,No,No,,0,+91 9916533331\r\n+91 9916533332,Indiranagar,Delivery,,"North Indian, Biryani",550,[],[],Delivery,Indiranagar
19802,https://www.zomato.com/bangalore/smoke-de-burg...,"Shop 50, Koramangala 5th Block, Bangalore",Smoke de Burger,Yes,No,3.6/5,36,+91 9591066373,Koramangala 5th Block,Quick Bites,,Fast Food,300,"[('Rated 4.0', ""RATED\n Visited this place on...","['Veg White Sauce Pasta', 'Chicken White Sauce...",Delivery,Jayanagar
28422,https://www.zomato.com/bangalore/the-old-fashi...,"470, 80 Feet Road, Koramangala 6th Block, Bang...",The Old Fashioned Bar,No,Yes,4.5 /5,569,080 49652711,Koramangala 6th Block,Bar,"Cocktails, Long Island Iced Tea, Draught Beer,...","Finger Food, North Indian, Continental",1000,"[('Rated 5.0', 'RATED\n Never been to a bette...",[],Dine-out,Koramangala 4th Block
50182,https://www.zomato.com/bangalore/cakezone-vart...,"710, Thubarahalli, Varthur Main Road, Whitefie...",CakeZone,Yes,No,3.9 /5,33,080 43334321,"Varthur Main Road, Whitefield","Takeaway, Delivery",,"Bakery, Desserts",200,"[('Rated 5.0', ""RATED\n They have very nice v...","['Chocolate Truffle Cake (500 Grams)', 'Black ...",Delivery,Whitefield


# Check distinct values of output variable

In [4]:
df.rate.value_counts().to_dict()

{'NEW': 2208,
 '3.9/5': 2098,
 '3.8/5': 2022,
 '3.7/5': 2011,
 '3.9 /5': 1874,
 '3.8 /5': 1851,
 '3.7 /5': 1810,
 '3.6/5': 1773,
 '4.0/5': 1609,
 '4.0 /5': 1574,
 '3.6 /5': 1543,
 '4.1 /5': 1474,
 '4.1/5': 1474,
 '3.5/5': 1431,
 '3.5 /5': 1353,
 '3.4/5': 1259,
 '3.4 /5': 1217,
 '3.3/5': 1168,
 '4.2 /5': 1165,
 '3.3 /5': 1142,
 '4.2/5': 1019,
 '3.2/5': 1006,
 '4.3 /5': 917,
 '3.2 /5': 867,
 '3.1/5': 862,
 '4.3/5': 776,
 '3.1 /5': 699,
 '4.4 /5': 628,
 '3.0/5': 558,
 '4.4/5': 519,
 '3.0 /5': 465,
 '2.9/5': 427,
 '4.5 /5': 409,
 '2.9 /5': 375,
 '2.8/5': 313,
 '2.8 /5': 287,
 '4.5/5': 247,
 '4.6 /5': 175,
 '2.7/5': 170,
 '2.6/5': 143,
 '2.7 /5': 137,
 '4.6/5': 125,
 '2.6 /5': 117,
 '4.7 /5': 86,
 '4.7/5': 81,
 '-': 69,
 '2.5 /5': 56,
 '2.5/5': 45,
 '4.8 /5': 43,
 '2.4/5': 40,
 '4.9 /5': 30,
 '2.4 /5': 30,
 '2.3/5': 28,
 '4.9/5': 25,
 '2.3 /5': 23,
 '4.8/5': 23,
 '2.2/5': 19,
 '2.1 /5': 13,
 '2.1/5': 11,
 '2.2 /5': 7,
 '2.0 /5': 7,
 '2.0/5': 4,
 '1.8 /5': 3,
 '1.8/5': 2}

**Observation:** There are many ratings with uncleaned texts we have to manage them

# Feature selection by problem statement
As per our problem statement we will make the feature selection as below for modelling purpose:

| Feature | Comments |
| :- | :- |
| url | Not used URL has nothing to do with ratings |
| address | Raw address is not required as we will focus on location information |
| name | Name has nothing to do with rating |
| online_order | **Required**. We need to model with online_order. Online delivery service plays good role |
| book_table | **Required**. Table booking feature can play good role in ratings |
| rate | **Required**.  Target variable |
| votes | Not selected as new restaurants which will be onboarded will have little or no votes. Cold start problem |
| phone | Not required|
| location | May be required. We are also capturing the info in **listed_in(city)** |
| rest_type | **Required**. Restaurant type might have some effect |
| dish_liked | Not required. New onboarded restaurant will not have this info. |
| cuisines | **Required**. |
| approx_cost(for two people) | **Required**. |
| reviews_list | Not required as of now. Cold start problem. |
| menu_item | **Required**. Requires data cleaning |
| listed_in(type) | **Required**. |
| listed_in(city) | **Required**.  |

## Dropping the reviews list as its large and we don't need them anyway for our application

In [5]:
df = df.drop(columns=['reviews_list'])
df.sample(10)

Unnamed: 0,url,address,name,online_order,book_table,rate,votes,phone,location,rest_type,dish_liked,cuisines,approx_cost(for two people),menu_item,listed_in(type),listed_in(city)
14159,https://www.zomato.com/bangalore/amaravati-1-e...,"9, Near Infosys Gate 3, Electronic City, Banga...",Amaravati,No,No,,0,+91 9663059999\r\n080 41217666,Electronic City,Casual Dining,,"Andhra, North Indian, Chinese, Seafood",600,[],Dine-out,Electronic City
43885,https://www.zomato.com/bangalore/the-hub-ibis-...,"Ibis Hotel, Bengaluru City Centre, Plot 30, Ra...",The Hub - Ibis Hotel,No,No,3.5 /5,29,080 42548005\n+91 7022032592,Richmond Road,Bar,,"Finger Food, Fast Food, Continental",2200,[],Drinks & nightlife,MG Road
15031,https://www.zomato.com/bangalore/aqni-rt-nagar...,"1, 15th Cross, 2nd Block, Next to Patel's Inn,...",AQNI,Yes,No,3.0/5,9,+91 8095929928\r\n+91 9900918842,RT Nagar,Quick Bites,,"North Indian, Mughlai",500,"['Chicken Biryani', 'Memon Mutton Biryani', 'K...",Delivery,Frazer Town
16717,https://www.zomato.com/bangalore/masale-daan-h...,"14th Main Road Sector 4, HSR Layout, HSR, Bang...",Masale Daan,Yes,No,,0,+91 8043334333,HSR,"Takeaway, Delivery",,North Indian,350,[],Delivery,HSR
26489,https://www.zomato.com/bangalore/onesta-korama...,"562, 8th Main, Koramangala 4th Block, Bangalore",Onesta,Yes,Yes,4.4/5,9064,080 43723443\r\n080 43705665,Koramangala 4th Block,"Casual Dining, Cafe","Berryblast, Gourmet Pizza, Mocktails, Ravioli,...","Pizza, Cafe, Italian",600,"['Nutella Chocolate Mousse', 'French Fries', '...",Buffet,Koramangala 4th Block
17580,https://www.zomato.com/bangalore/the-ants-cafe...,"2286/B, 1st Cross, 14th A Main, HAL 2nd Stage,...",The Ants Cafe & Store,Yes,Yes,3.9/5,1380,+91 7204701157\r\n080 41715639,Indiranagar,Cafe,"English Breakfast, Coffee, Watermelon Feta Sal...","Cafe, Italian",900,[],Cafes,Indiranagar
9392,https://www.zomato.com/bangalore/dr-shawarma-h...,"Shop 1087/A, 14th Main, 18th Cross, 3rd Sector...",Dr. Shawarma,Yes,No,,0,+91 9632680729,HSR,Quick Bites,,"North Indian, Rolls, Street Food",200,[],Delivery,BTM
6999,https://www.zomato.com/bangalore/basaveshwara-...,"Nanjudeshwari complex, Oppsite Brookefield Mal...",Basaveshwara Khanavali,No,No,,0,+91 9740912864\r\n+91 7353747430,Brookefield,Quick Bites,,South Indian,200,[],Delivery,Brookefield
126,https://www.zomato.com/bangalore/banashankari-...,"17/1, Ramalh Garden, Kadieranahall Circle, Nea...",Banashankari Nati Style,No,No,,0,+91 9035141678\r\n+91 9742174293,Banashankari,Quick Bites,,"Biryani, Chinese, South Indian, North Indian",350,[],Delivery,Banashankari
42540,https://www.zomato.com/bangalore/taipan-1-fraz...,"57, Coles Road, Near ICICI Bank, Frazer Town, ...",Taipan,Yes,No,3.5 /5,129,+91 8041251279\n080 25467784,Frazer Town,Casual Dining,"Noodles, Dragon Chicken, Chop Suey","Chinese, Momos",700,[],Delivery,MG Road


# Column Renames

In [6]:
rename_dict = {'approx_cost(for two people)':'cost'
               ,'listed_in(type)':'listed_in_type'
               ,'listed_in(city)':'listed_in_city'
              }
df = df.rename(columns=rename_dict)
df.head()

Unnamed: 0,url,address,name,online_order,book_table,rate,votes,phone,location,rest_type,dish_liked,cuisines,cost,menu_item,listed_in_type,listed_in_city
0,https://www.zomato.com/bangalore/jalsa-banasha...,"942, 21st Main Road, 2nd Stage, Banashankari, ...",Jalsa,Yes,Yes,4.1/5,775,080 42297555\r\n+91 9743772233,Banashankari,Casual Dining,"Pasta, Lunch Buffet, Masala Papad, Paneer Laja...","North Indian, Mughlai, Chinese",800,[],Buffet,Banashankari
1,https://www.zomato.com/bangalore/spice-elephan...,"2nd Floor, 80 Feet Road, Near Big Bazaar, 6th ...",Spice Elephant,Yes,No,4.1/5,787,080 41714161,Banashankari,Casual Dining,"Momos, Lunch Buffet, Chocolate Nirvana, Thai G...","Chinese, North Indian, Thai",800,[],Buffet,Banashankari
2,https://www.zomato.com/SanchurroBangalore?cont...,"1112, Next to KIMS Medical College, 17th Cross...",San Churro Cafe,Yes,No,3.8/5,918,+91 9663487993,Banashankari,"Cafe, Casual Dining","Churros, Cannelloni, Minestrone Soup, Hot Choc...","Cafe, Mexican, Italian",800,[],Buffet,Banashankari
3,https://www.zomato.com/bangalore/addhuri-udupi...,"1st Floor, Annakuteera, 3rd Stage, Banashankar...",Addhuri Udupi Bhojana,No,No,3.7/5,88,+91 9620009302,Banashankari,Quick Bites,Masala Dosa,"South Indian, North Indian",300,[],Buffet,Banashankari
4,https://www.zomato.com/bangalore/grand-village...,"10, 3rd Floor, Lakshmi Associates, Gandhi Baza...",Grand Village,No,No,3.8/5,166,+91 8026612447\r\n+91 9901210005,Basavanagudi,Casual Dining,"Panipuri, Gol Gappe","North Indian, Rajasthani",600,[],Buffet,Banashankari


**Observation:**
1. cuisines, menu_item, rest_type require count vectorization using ', ' separated values
1. online_order, book_table, listed_in_city, listed_in_type require one hot encoding

# Simple EDA

## Categorical columns

In [7]:
categorical_cols = ['online_order',
                    'book_table',
                    'location',
                    'rest_type',
                    'listed_in_type',
                    'listed_in_city'
                   ]
for col in categorical_cols:
    print('****{}****'.format(col))
    print(df[col].value_counts().to_dict())
    print('-'*5,'NaNs present','-'*5)
    print(df[col].isna().sum())

****online_order****
{'Yes': 30444, 'No': 21273}
----- NaNs present -----
0
****book_table****
{'No': 45268, 'Yes': 6449}
----- NaNs present -----
0
****location****
{'BTM': 5124, 'HSR': 2523, 'Koramangala 5th Block': 2504, 'JP Nagar': 2235, 'Whitefield': 2144, 'Indiranagar': 2083, 'Jayanagar': 1926, 'Marathahalli': 1846, 'Bannerghatta Road': 1630, 'Bellandur': 1286, 'Electronic City': 1258, 'Koramangala 1st Block': 1238, 'Brigade Road': 1218, 'Koramangala 7th Block': 1181, 'Koramangala 6th Block': 1156, 'Sarjapur Road': 1065, 'Ulsoor': 1023, 'Koramangala 4th Block': 1017, 'MG Road': 918, 'Banashankari': 906, 'Kalyan Nagar': 853, 'Richmond Road': 812, 'Frazer Town': 727, 'Malleshwaram': 725, 'Basavanagudi': 684, 'Residency Road': 675, 'Banaswadi': 664, 'Brookefield': 658, 'New BEL Road': 649, 'Kammanahalli': 648, 'Rajajinagar': 591, 'Church Street': 569, 'Lavelle Road': 529, 'Shanti Nagar': 511, 'Shivajinagar': 499, 'Domlur': 496, 'Cunningham Road': 491, 'Old Airport Road': 446, 'Ejipu

**Observation:** 
1. We can substitute some locations which have very less counts as "Others"
1. rest_type is not a one-to-one mapping. It needs to be handled through count Vectorizer.
1. There can be direct correlation of **listed_in(city)** and **location**. As per our understanding **listed_in(city)** seems to be more drill down of **location**. So for overall view, we can go with **listed_in(city)**. Need to confirm with B'lore people on these. However there are presence of NaNs in the **location** field. So we will use **listed_in(city)** for it has complete information.
1. Fields like **rest_type** contains too many categories which are comma-separated. We need to perform encoding by count vectorization

## Preprocess the columns
Since rating column is given in the form of string we need to preprocess it to float

In [8]:
def preprocess_rating(rating):
    try:
        rate = float(rating.split('/')[0].strip())
    except:
        rate = None
    return rate
def preprocess_cost(cost):
    try:
        cost_cleaned = float(re.sub('\,','',cost))
    except:
        cost_cleaned = None
    return cost_cleaned
def preprocess_menu_item(menu_item:list):
    try:
        menu = ', '.join(eval(menu_item))
    except:
        menu = None
    return menu
df['rate'] = df.rate.apply(lambda element: preprocess_rating(element))
df['cost'] = df['cost'].apply(lambda element: preprocess_cost(element))
df['menu_item'] = df['menu_item'].apply(lambda element: preprocess_menu_item(element))

**Observation:** So our custom column cleaner is working fine we can drop these original columns

In [9]:
df.sample(10).dtypes

url                object
address            object
name               object
online_order       object
book_table         object
rate              float64
votes               int64
phone              object
location           object
rest_type          object
dish_liked         object
cuisines           object
cost              float64
menu_item          object
listed_in_type     object
listed_in_city     object
dtype: object

# Segregate the unrated restaurants

In [10]:
unrated_rest = df[df.rate.isna()]
df = df[~df.rate.isna()]

In [11]:
display('Unrated restaurants:::',unrated_rest.sample(10))
display('Rated restaurants:::',df.sample(10))

'Unrated restaurants:::'

Unnamed: 0,url,address,name,online_order,book_table,rate,votes,phone,location,rest_type,dish_liked,cuisines,cost,menu_item,listed_in_type,listed_in_city
11868,https://www.zomato.com/bangalore/momo-time-uls...,"31, Sree Yellamma Temple Street, near BIB, Uls...",Momo Time,Yes,No,,0,+91 9036011861,Ulsoor,"Takeaway, Delivery",,"Momos, Chinese",300.0,,Delivery,Church Street
40856,https://www.zomato.com/bangalore/chawlas-2-onl...,"150, 10th cross, Green Garden Layout, Sai Baba...",Chawla's 2 online.com,No,No,,0,+91 7065123000\n+91 9910009994,Marathahalli,Delivery,,"North Indian, Chinese",900.0,,Delivery,Marathahalli
20697,https://www.zomato.com/bangalore/ss-hot-dhum-b...,"No 24, 4th Main Balaji Nagar, DRCPost, Krishna...",SS Hot Dhum Briyani Point,No,No,,0,+91 7892293634\r\n+91 8553137892,BTM,Quick Bites,,"North Indian, Chinese, South Indian",200.0,,Delivery,Jayanagar
48515,https://www.zomato.com/bangalore/sri-sai-sagar...,"14/15, Manipal Center 47, Dickenson Road, MG R...",Sri Sai Sagar Food,No,No,,0,+91 9845815667,MG Road,Quick Bites,,"Street Food, Fast Food, North Indian",300.0,,Dine-out,Residency Road
4372,https://www.zomato.com/bangalore/ghar-ka-chulh...,"Number 1204, D block, 7th cross, AECS layout, ...",Ghar Ka Chulha,No,No,,0,+91 9716775607,Marathahalli,Delivery,,North Indian,150.0,,Delivery,Bellandur
22296,https://www.zomato.com/bangalore/nourich-jp-na...,"67, Ground Floor, 3rd E Main Road, 15th Cross ...",Nourich,Yes,Yes,,0,+91 9945675038\r\r\n+91 8970112299,JP Nagar,Casual Dining,"Soba Noodles, Pad Thai Noodle, Pickled Beetroo...","Continental, Chinese, North Indian, Asian, Hea...",800.0,"Murg Malai Kebab, Goan Fish Curry, Vegetarian ...",Delivery,JP Nagar
51382,https://www.zomato.com/bangalore/dhadoom-white...,"FB 10, 3rd Floor, VR Bangalore, Mahadevapura, ...",Dhadoom,No,No,,0,+91 7899255941,Whitefield,Food Court,,"Italian, Mexican, Burger, Fast Food",450.0,,Dine-out,Whitefield
31771,https://www.zomato.com/bangalore/momo-time-btm...,"39, 18th Main, 7th Cross, BTM 1st Stage, Oppos...",Momo Time,Yes,No,,0,+91 9036011861,BTM,Quick Bites,,"Momos, American",600.0,,Dine-out,Koramangala 5th Block
38180,https://www.zomato.com/bangalore/bib-breakfast...,"31, Sree Yellamma Temple Street, Ulsoor, Banga...",BIB-Breakfast In The Box,Yes,No,,0,+91 9886217725\n+91 9538948306,Ulsoor,Quick Bites,,"American, Continental",600.0,,Delivery,Lavelle Road
43647,https://www.zomato.com/bangalore/shalimar-chas...,"Bazar Street, Neelasandra",Shalimar Chaska,No,No,,0,+91 9535050784,Richmond Road,Quick Bites,,"Fast Food, Chinese",200.0,,Dine-out,MG Road


'Rated restaurants:::'

Unnamed: 0,url,address,name,online_order,book_table,rate,votes,phone,location,rest_type,dish_liked,cuisines,cost,menu_item,listed_in_type,listed_in_city
34023,https://www.zomato.com/bangalore/eagles-pizza-...,"120, 80 Feet Road, Koramangala 7th Block, Bang...",Eagles Pizza,Yes,No,3.8,19,+91 8296052909,Koramangala 7th Block,Casual Dining,,"Pizza, Italian",400.0,,Dine-out,Koramangala 6th Block
25253,https://www.zomato.com/bangalore/d-zaprino-nag...,"2/1, Next To Maruti Medicals, Jayamma Complex,...",D-Zaprino,Yes,No,3.8,25,+91 9916957618,Nagawara,"Takeaway, Delivery",,"Fast Food, Burger, Pizza",350.0,"Zap Veggie Burger, Zap Chilli Cheese Melt Burg...",Delivery,Kammanahalli
4448,https://www.zomato.com/bangalore/smokin-bites-...,"First Floor, Shree Complex, Opposite Salarpuri...",Smokin Bites,Yes,No,3.4,20,+91 9538909155,Sarjapur Road,Quick Bites,,"North Indian, Chinese",200.0,,Delivery,Bellandur
45281,https://www.zomato.com/bangalore/linoui-the-le...,"The Collonnade, The Leela Palace, Old Airport ...",L'inoui - The Leela Palace,No,No,3.5,9,080 41284128,Old Airport Road,Dessert Parlor,,Desserts,400.0,,Delivery,Old Airport Road
43314,https://www.zomato.com/bangalore/the-rice-bowl...,"40/2, Lavelle Road, Bangalore",The Rice Bowl,Yes,Yes,4.3,509,080 49653104,Lavelle Road,"Casual Dining, Bar","Rice Bowl, Lamb, Chocolava, Rolls, Paneer Manc...","Chinese, Momos",1300.0,,Dine-out,MG Road
19737,https://www.zomato.com/bangalore/the-belgian-w...,"Shop 62, The High Street, 11th Main, 4th Block...",The Belgian Waffle Co.,Yes,No,3.8,151,00 08041010202,Jayanagar,Dessert Parlor,Waffles,"Desserts, Beverages",350.0,"Blueberry Cream Cheese Waffle, Butterscotch Cr...",Delivery,Jayanagar
36825,https://www.zomato.com/bangalore/wtf-koramanga...,"63, 5th Block, Jyoti Nivas College Road, Koram...",WTF,Yes,No,4.1,96,080 43758178,Koramangala 5th Block,Cafe,"Cheesy Pizza, Cheese Pasta, Brownie Milkshake,...","Cafe, Continental",600.0,"Margherita Pizza, BBQ Chicken Pizza, Chicken P...",Dine-out,Koramangala 7th Block
47622,https://www.zomato.com/bangalore/fat-buddha-cu...,"70, Near Accenture Building, Cunningham Road, ...",Fat Buddha,Yes,Yes,4.0,457,080 41235139\n080 22343771,Cunningham Road,Casual Dining,"Crab Soup, Noodles, Wine, Spicy Crabmeat Soup,...","Chinese, Thai",1000.0,,Delivery,Residency Road
5298,https://www.zomato.com/bangalore/ciclo-cafe-in...,"12th Main Road, Indiranagar, Bangalore",Ciclo Cafe,Yes,No,4.3,1283,00 08048664512\r\r\n+91 7829278585,Indiranagar,"Cafe, Casual Dining","Pizza, Cheesy Fries, Tiramisu, Ravioli Pasta, ...","Cafe, Italian, American",1000.0,"Home Made Fettucini Aglio Olio Pasta, Basil Ri...",Delivery,Brigade Road
2497,https://www.zomato.com/bangalore/onesta-basava...,"90/66, 1st Floor, Gandhi Bazar Main Road, Basa...",Onesta,Yes,Yes,4.6,1755,+91 8049525686,Basavanagudi,"Casual Dining, Cafe","Barbeque Chicken Pizza, Mushroom Ravioli, Past...","Pizza, Cafe, Italian",600.0,,Cafes,Basavanagudi


# Save the DF

## Save the training dataframe
Note: If you are saving the DF(s) to GCS use the below command to point to the service account key downloaded in local. Else ignore and save the files to local directory

In [13]:
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = r'C:\Users\sandipto.sanyal\OneDrive - Accenture\Documents\Study materials\self study\app_deployment\gcloud\sandipto-project-1763acaafc53.json'

In [14]:
df.head()

Unnamed: 0,url,address,name,online_order,book_table,rate,votes,phone,location,rest_type,dish_liked,cuisines,cost,menu_item,listed_in_type,listed_in_city
0,https://www.zomato.com/bangalore/jalsa-banasha...,"942, 21st Main Road, 2nd Stage, Banashankari, ...",Jalsa,Yes,Yes,4.1,775,080 42297555\r\n+91 9743772233,Banashankari,Casual Dining,"Pasta, Lunch Buffet, Masala Papad, Paneer Laja...","North Indian, Mughlai, Chinese",800.0,,Buffet,Banashankari
1,https://www.zomato.com/bangalore/spice-elephan...,"2nd Floor, 80 Feet Road, Near Big Bazaar, 6th ...",Spice Elephant,Yes,No,4.1,787,080 41714161,Banashankari,Casual Dining,"Momos, Lunch Buffet, Chocolate Nirvana, Thai G...","Chinese, North Indian, Thai",800.0,,Buffet,Banashankari
2,https://www.zomato.com/SanchurroBangalore?cont...,"1112, Next to KIMS Medical College, 17th Cross...",San Churro Cafe,Yes,No,3.8,918,+91 9663487993,Banashankari,"Cafe, Casual Dining","Churros, Cannelloni, Minestrone Soup, Hot Choc...","Cafe, Mexican, Italian",800.0,,Buffet,Banashankari
3,https://www.zomato.com/bangalore/addhuri-udupi...,"1st Floor, Annakuteera, 3rd Stage, Banashankar...",Addhuri Udupi Bhojana,No,No,3.7,88,+91 9620009302,Banashankari,Quick Bites,Masala Dosa,"South Indian, North Indian",300.0,,Buffet,Banashankari
4,https://www.zomato.com/bangalore/grand-village...,"10, 3rd Floor, Lakshmi Associates, Gandhi Baza...",Grand Village,No,No,3.8,166,+91 8026612447\r\n+91 9901210005,Basavanagudi,Casual Dining,"Panipuri, Gol Gappe","North Indian, Rajasthani",600.0,,Buffet,Banashankari


## Saving the training file.
This file will be used for model training

In [15]:
df.to_csv('gs://foundation_project2/training_folder/zomato.csv', index=False)

## Prediction files
Since prediction needs to be fast we will save chunks of our dataframe as per **listed_in_city**

### Group restaurants in Koramangala
Since Koramangala is divided into several blocks we will group them into one

In [21]:
def get_area(listed_in_city:str):
    if 'Koramangala' in listed_in_city:
        return 'Koramangala'
    else:
        return listed_in_city
unrated_rest['area'] = unrated_rest.listed_in_city.apply(get_area)
unrated_rest.area.value_counts().to_dict()

{'Koramangala': 2085,
 'BTM': 669,
 'JP Nagar': 447,
 'Jayanagar': 443,
 'Whitefield': 406,
 'Electronic City': 401,
 'HSR': 401,
 'Bannerghatta Road': 400,
 'Brookefield': 368,
 'Marathahalli': 345,
 'Indiranagar': 305,
 'Kalyan Nagar': 303,
 'Church Street': 302,
 'Kammanahalli': 295,
 'Lavelle Road': 286,
 'Brigade Road': 285,
 'MG Road': 279,
 'Residency Road': 262,
 'Bellandur': 256,
 'Frazer Town': 228,
 'Sarjapur Road': 222,
 'Old Airport Road': 217,
 'Rajajinagar': 201,
 'Basavanagudi': 194,
 'New BEL Road': 168,
 'Malleshwaram': 149,
 'Banashankari': 135}

## Save the chunks

In [23]:
save_folder = 'gs://foundation_project2/prediction_folder'
for area in unrated_rest.area.unique():
    # filter the df
    chunk = unrated_rest[unrated_rest.area==area].drop(columns=['area'])
    filename = area.lower()+'.csv'
    filepath = '{}/{}'.format(save_folder,filename)
    chunk.to_csv(filepath, index=False)
    print('Saved to {}'.format(filepath))

Saved to gs://foundation_project2/prediction_folder/banashankari.csv
Saved to gs://foundation_project2/prediction_folder/bannerghatta road.csv
Saved to gs://foundation_project2/prediction_folder/basavanagudi.csv
Saved to gs://foundation_project2/prediction_folder/bellandur.csv
Saved to gs://foundation_project2/prediction_folder/brigade road.csv
Saved to gs://foundation_project2/prediction_folder/brookefield.csv
Saved to gs://foundation_project2/prediction_folder/btm.csv
Saved to gs://foundation_project2/prediction_folder/church street.csv
Saved to gs://foundation_project2/prediction_folder/electronic city.csv
Saved to gs://foundation_project2/prediction_folder/frazer town.csv
Saved to gs://foundation_project2/prediction_folder/hsr.csv
Saved to gs://foundation_project2/prediction_folder/indiranagar.csv
Saved to gs://foundation_project2/prediction_folder/jayanagar.csv
Saved to gs://foundation_project2/prediction_folder/jp nagar.csv
Saved to gs://foundation_project2/prediction_folder/kaly