# **Installing necessary packages for Web Scraping and Data Collection**

# **Business Recommender and Neighborhood Analyzer**

#Objective

The objective of this project would be to analyse the neighbourhoods of the city of
Ahmedabad in India, to tell us about the quality of hospitals and schools in any particular
area. Moreover, using data science methodology and machine learning techniques like
clustering, this project aims to provide solutions to answer the business question:

**In the city of Ahmedabad, if a property developer is looking to open a new hospital or
school, where would you recommend that they open it?**

In this project, we implement an algorithm to find the best place to start a business where
there is high demand and no (or very few) supply. We measure the quality of
recommendation in terms of average service rating and customer-business ratio.

At the end we wish that the user will input a corrdinate of his choice in the city of Ahmedabad and we will be able to predict the score of quality of service in the field of medical facility. Moreover we will provide the information regarding which tier(label) does the asked corrdinate belong to.
# **Steps**


*   Data Collection using appropriate Web Scraping through Beautiful Soup and then using the Places API
*   Data Visualisation to ensure if we understand the data correctly and if we can bring out exciting information about the data
*   UnSupervised Learning technique to cluster our data points. K-Means to be precise
*   Haversine Formula to calculate distance between two coordinates

# **Metrics**
After collecting a much of features for medical care which includes:
1.	Number of hospitals & their mean rating
2.	Number of dentists & their mean rating
3.	Number of doctors & their mean rating
4.	Number of pharmacies & their mean rating
5.	Number of physiotherapist & their mean rating

We will use these features and formulate a scoring function which will imply a final medical score for the neighbourhood and then we will use K-means clustering to analyse how our data is clustered and labelled. This will thereby divide the neighbourhoods based on the quality of medical service into different tiers.










In [None]:
!pip install python-google-places
!pip install lxml
!pip install beautifulsoup4
!pip install geocoder

Collecting python-google-places
  Downloading https://files.pythonhosted.org/packages/9e/b0/59646874502b356a163b4a772376409f203eff172898f181d5d07a825ad5/python-google-places-1.4.2.tar.gz
Building wheels for collected packages: python-google-places
  Building wheel for python-google-places (setup.py) ... [?25l[?25hdone
  Created wheel for python-google-places: filename=python_google_places-1.4.2-cp36-none-any.whl size=13602 sha256=00c6f7e9f30a82d863c04939bb00fed2dae8a7b1b5f7603812e45064390b533e
  Stored in directory: /root/.cache/pip/wheels/bd/2b/1f/344a728fff2647c9658d9358dfb297064f4b1ca974a61fd30f
Successfully built python-google-places
Installing collected packages: python-google-places
Successfully installed python-google-places-1.4.2
Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K     |████████████████████████████████| 102kB 2.3MB/s 
C

In [None]:
from googleplaces import GooglePlaces, types, lang 
import requests 
import json 
import pandas as pd
import requests
from bs4 import BeautifulSoup
import pandas as pd
import math
import numpy as np

# Finding neighborhoods in Ahmedabad using Web Scraping

Collecting data from the Wikipedia URl regarding neighbourhoods in the city.`BeautifulSoup` is used to parse HTML and XML files. For further details refer to the [documentation](https://https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
We find all the list tags in the `BeautifulSoup` object.

In [None]:
data = requests.get("https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Ahmedabad").text
# Parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')
# Create a list to store neighbourhood data
neighborhoodList = []
# Append the data into the list
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
  neighborhoodList.append(row.text)
# Create a new DataFrame from the list
kl_df = pd.DataFrame({"Neighborhood": neighborhoodList})
kl_df

Unnamed: 0,Neighborhood
0,Agol
1,Ahmedabad Cantonment
2,Alam Roza
3,Ambawadi
4,Amraiwadi
...,...
76,Usmanpura
77,Vastral
78,Vastrapur
79,Vejalpur


# Extract coordinates
`GeoCoder` is used to extract the latitudes and longitudes of the neighbourhoods found above.

In [None]:
# Defining a function to get coordinates
import geocoder
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Ahmedabad, India'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords
# Call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(neighborhood) for neighborhood in kl_df["Neighborhood"].tolist()]

In [None]:
coords

[[23.027760000000058, 72.60027000000008],
 [23.027760000000058, 72.60027000000008],
 [23.002120000000048, 72.54979000000003],
 [23.018850000000043, 72.55441000000008],
 [23.00735000000003, 72.62268000000006],
 [23.011390000000063, 72.51712000000003],
 [23.04708000000005, 72.60481000000004],
 [23.04225742945364, 72.60456625728018],
 [22.84128000000004, 72.45453000000003],
 [23.027760000000058, 72.60027000000008],
 [23.034760000000063, 72.63024000000007],
 [22.85570000000007, 72.59490000000005],
 [23.00278000000003, 72.57706000000007],
 [22.315900000000056, 72.10697000000005],
 [23.002575410797863, 72.59815911107509],
 [23.159320000000037, 72.01855000000006],
 [23.030320000000074, 72.47247000000004],
 [23.000980000000027, 72.57459000000006],
 [22.806890000000067, 72.42511000000007],
 [23.112140000000068, 72.57989000000003],
 [23.087290000000053, 72.54899000000006],
 [23.027760000000058, 72.60027000000008],
 [23.036070000000052, 72.59213000000005],
 [23.32218000000006, 72.18817000000007],

In [None]:
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
# Merge the coordinates into the original dataframe
kl_df['Latitude'] = df_coords['Latitude']
kl_df['Longitude'] = df_coords['Longitude']
print(kl_df.shape)
kl_df.head(60)

(81, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Agol,23.02776,72.60027
1,Ahmedabad Cantonment,23.02776,72.60027
2,Alam Roza,23.00212,72.54979
3,Ambawadi,23.01885,72.55441
4,Amraiwadi,23.00735,72.62268
5,Anand Nagar (Ahmedabad),23.01139,72.51712
6,Asarwa,23.04708,72.60481
7,Asarwa Chakla,23.042257,72.604566
8,Badarkha,22.84128,72.45453
9,Bahiyal,23.02776,72.60027


In [None]:
kl_df.to_csv('Neighbour_names.csv',index="FALSE")

In [None]:
kl_df.dtypes

Neighborhood     object
Latitude        float64
Longitude       float64
dtype: object

# Places API setup
We will use Google Places API to look around for places related to HeathCare in a radius of 1km from each neighbourhood.

In [None]:
TYPE_HOSPITAL = 'hospital'
TYPE_DOCTOR = 'doctor'
TYPE_DENTIST = 'dentist'
TYPE_PHARMACY = 'pharmacy'
TYPE_PHYSIOTHERAPIST = 'physiotherapist'


# Similarly for food
# TYPE_BAKERY = 'bakery'
# TYPE_CAFE = 'cafe'
# TYPE_CONVENIENCE_STORE = 'convenience_store'
# TYPE_FOOD = 'food'
# TYPE_GROCERY_OR_SUPERMARKET = 'grocery_or_supermarket'
# TYPE_MEAL_DELIVERY = 'meal_delivery'
# TYPE_RESTAURANT = 'restaurant'

#For education
# TYPE_SCHOOL = 'school'
# TYPE_UNIVERSITY = 'university'
# TYPE_BOOKS_TORE = 'book_store'
# TYPE_MUSEUM = 'museum'
# TYPE_LIBRARY = 'library'

Make a list of all the businesses/facilities related to HealthCare

In [None]:
types_list=[TYPE_HOSPITAL,TYPE_DOCTOR,TYPE_DENTIST,TYPE_PHARMACY,TYPE_PHYSIOTHERAPIST]

# types_list=[TYPE_BAKERY,TYPE_CAFE,TYPE_CONVENIENCE_STORE,TYPE_FOOD,TYPE_GROCERY_OR_SUPERMARKET,TYPE_MEAL_DELIVERY,TYPE_RESTAURANT]    (for food)
# types_list=[TYPE_SCHOOL,TYPE_UNIVERSITY,TYPE_BOOK_STORE,TYPE_MUSEUM,TYPE_LIBRARY]    (for education)

In [None]:
APIKEY = "Put your API KEY"

# Data Collection and Data Exploration

In [None]:
def findPlaces(loc,n,types,radius=1000, pagetoken = None): #radius considered is 1km
   lat, lng = loc
   #types = TYPE_HOSPITAL
   url = "https://maps.googleapis.com/maps/api/place/nearbysearch/json?location={lat},{lng}&radius={radius}&type={type}&key={APIKEY}{pagetoken}".format(lat = lat, lng = lng, radius = radius, type = types,APIKEY = APIKEY, pagetoken = "&pagetoken="+pagetoken if pagetoken else "")
   print(url)
   response = requests.get(url)
   res = json.loads(response.text)
   print(res)
  #  print("here results ---->>> ", len(res["results"]))
   i=0
   for result in res["results"]:
      
      places_dict['Latitude'].append(lat)
      places_dict['Longitude'].append(lng)
      places_dict['Neighbourhood'].append(n)
      places_dict['Venue_type'].append(types)
      places_dict['Venue_Name'].append(result["name"])
      if "rating" in result.keys():
        places_dict['Venue_Rating'].append(result["rating"])
      else:
        places_dict['Venue_Rating'].append(math.nan)

   pagetoken = res.get("next_page_token",None)
   #pagetoken=None
   print("here -->> ", pagetoken)
   return pagetoken
   


In the next cell we will explore the nearby surroundings of every neighbourhood in the radius of 1km for different features which constitutes for Medical Facility that is,
[TYPE_HOSPITAL,TYPE_DOCTOR,TYPE_DENTIST,TYPE_PHARMACY,TYPE_PHYSIOTHERAPIST]

In [None]:
for ind in range(30,81):
  pagetoken = None
  lat=kl_df['Latitude'][ind]
  lng=kl_df['Longitude'][ind]
  neighbourhood1=kl_df['Neighborhood'][ind]

  latitude=str(lat)
  longitude=str(lng)
  location=(latitude,longitude)
  for types_name in types_list:
    pagetoken = None
    while True:
        
        pagetoken = findPlaces(loc=location,n=neighbourhood1,types=types_name,pagetoken=pagetoken)
        import time
        time.sleep(5)

        if not pagetoken:
            break

https://maps.googleapis.com/maps/api/place/nearbysearch/json?location=23.01597000000004,72.61082000000005&radius=1000&type=hospital&key=AIzaSyCJ6ZCZm6RMUMZ_E7QE5se9dTQHqQOGR_M
{'html_attributions': [], 'next_page_token': 'CqQCHQEAAJNXeKxtkBIc1H-cg7mqXRV2Km4aw1nLfsk9gogspIHMZMB29pFhs5niy4l_I5kZeu3UdNbPUSjJ1DfCSkI02FvGvcoja2XtrDWKJwHsP9O8RZzJsz5Fab-cUA3T7_uzOkwglHbC-peiNmyn7uYF0FNErsQ-645ZsisrNP3m5Ooj4hhzPtYYtSL0bdedAA6_DPNcBTT2DdCVrMk_vWlna8TdHmqoCwun4r5L_41gCTYk9kwiB1btoLID4fTnc18jRxVCYzdkyVDe0x_YV4TeW5XroH1x6sEVb8lp_kZvss9Iz62oiaH1BAyXtMVLGPDy2_76AZPdPQiFuUpsr--ZxH0Uo3CTtRx-fZEK6e_YT0m7svxZT6r3i31nexI8CHWQ2RIQQ5eosa2yHZevhqxR2F5QTRoUyPQE4EpsN22rjU2atMnxezOxtlY', 'results': [{'business_status': 'OPERATIONAL', 'geometry': {'location': {'lat': 23.0194021, 'lng': 72.6106701}, 'viewport': {'northeast': {'lat': 23.02076548029151, 'lng': 72.6120577302915}, 'southwest': {'lat': 23.0180675197085, 'lng': 72.60935976970848}}}, 'icon': 'https://maps.gstatic.com/mapfiles/place_api/icons/v1/png_71/

In [None]:
df=pd.DataFrame.from_dict(places_dict)
df

Unnamed: 0,Neighbourhood,Latitude,Longitude,Venue_type,Venue_Name,Venue_Rating
0,Agol,23.027760000000058,72.60027000000008,hospital,Victoria Jubilee Hospital,4.0
1,Agol,23.027760000000058,72.60027000000008,hospital,Al Ameen Hospital|,
2,Agol,23.027760000000058,72.60027000000008,hospital,Dr.Tanumati Shah Hospital,
3,Agol,23.027760000000058,72.60027000000008,hospital,Lokhandwala General Hospital,4.2
4,Agol,23.027760000000058,72.60027000000008,hospital,shreeShreeji Pathology Laboratory,
...,...,...,...,...,...,...
11088,Vejalpur,23.007820000000038,72.51818000000003,physiotherapist,Dr Binal Shah Desai/Happy Healing Physio Clinic,5.0
11089,Vejalpur,23.007820000000038,72.51818000000003,physiotherapist,Dr.Nirali`s PhysioRehab,
11090,Virochannagar,23.093770000000063,72.22700000000003,hospital,SC Virochannagar,
11091,Virochannagar,23.093770000000063,72.22700000000003,hospital,phc virochannagar,


In [None]:
df.to_csv('medical_neighbourhood_csv.csv',index=False)

In [None]:
df.dtypes

Neighbourhood     object
Latitude          object
Longitude         object
Venue_type        object
Venue_Name        object
Venue_Rating     float64
dtype: object

In [None]:
feature_dict={'Neighbourhood':[],'Latitude':[],'Longitude':[],'Hospital_Count':[],'Mean_hospital_rating':[],'Doctor_Count':[],
               'Mean_doctor_rating':[],'Dentist_Count':[],'Mean_dentist_rating':[],'Pharmacy_Count':[],'Mean_pharmacy_rating':[],'Physiotherapist_Count':[],'Mean_physiotherapist_rating':[]}

# Data Preprocessing
We groupby the dataframe according to neighbourhood name and venue-type to calculate the number of hospitals, doctors etc and their mean rating. NaN values are filled with their mean rating.

In [None]:
gk=df.groupby(['Neighbourhood','Venue_type'])
types_list=[TYPE_HOSPITAL,TYPE_DOCTOR,TYPE_DENTIST,TYPE_PHARMACY,TYPE_PHYSIOTHERAPIST]

for ind in range(0,81):
  neighbour_name=kl_df['Neighborhood'][ind]
  lat=kl_df['Latitude'][ind]
  lng=kl_df['Longitude'][ind]
  hospital_count=0
  hospital_mean=0.0
  doctors_count=0
  doctors_mean=0.0
  dentist_count=0
  dentist_mean=0.0
  pharmacy_count=0
  pharmacy_mean=0.0
  physio_count=0
  physio_mean=0.0
  for type_name in types_list:
    try:
      grouped_df=gk.get_group((neighbour_name,type_name))
      if len(grouped_df.index)>0:
        grouped_df.fillna(grouped_df.mean()) #Replace NaN with Mean
        mean=grouped_df['Venue_Rating'].mean()
        if type_name==TYPE_HOSPITAL:
          hospital_count=len(grouped_df.index)
        if type_name==TYPE_HOSPITAL:
          hospital_mean=mean

        if type_name==TYPE_DOCTOR:
          doctors_count=len(grouped_df.index)
        if type_name==TYPE_DOCTOR:
          doctors_mean=mean

        if type_name==TYPE_DENTIST:
          dentist_count=len(grouped_df.index)
        if type_name==TYPE_DENTIST:
          dentist_mean=mean

        if type_name==TYPE_PHARMACY:
          pharmacy_count=len(grouped_df.index)
        if type_name==TYPE_PHARMACY:
          pharmacy_mean=mean 
        
        if type_name==TYPE_PHYSIOTHERAPIST:
          physio_count=len(grouped_df.index)
        if type_name==TYPE_PHYSIOTHERAPIST:
          physio_mean=mean 
    except:
        print("Not found")
  feature_dict['Neighbourhood'].append(neighbour_name)      
  feature_dict['Latitude'].append(lat)
  feature_dict['Longitude'].append(lng)

  feature_dict['Hospital_Count'].append(hospital_count)
  feature_dict['Mean_hospital_rating'].append(hospital_mean)

  feature_dict['Doctor_Count'].append(doctors_count)
  feature_dict['Mean_doctor_rating'].append(doctors_mean)

  feature_dict['Dentist_Count'].append(dentist_count)
  feature_dict['Mean_dentist_rating'].append(dentist_mean)

  feature_dict['Pharmacy_Count'].append(pharmacy_count)
  feature_dict['Mean_pharmacy_rating'].append(pharmacy_mean)
 
  feature_dict['Physiotherapist_Count'].append(physio_count)
  feature_dict['Mean_physiotherapist_rating'].append(physio_mean)



Not found
Not found
Not found
Not found
Not found
Not found
Not found
Not found
Not found
Not found
Not found
Not found
Not found
Not found
Not found
Not found
Not found
Not found
Not found
Not found
Not found
Not found
Not found
Not found
Not found
Not found
Not found
Not found


In [None]:
feature_df=pd.DataFrame.from_dict(feature_dict)
feature_df


Unnamed: 0,Neighbourhood,Latitude,Longitude,Hospital_Count,Mean_hospital_rating,Doctor_Count,Mean_doctor_rating,Dentist_Count,Mean_dentist_rating,Pharmacy_Count,Mean_pharmacy_rating,Physiotherapist_Count,Mean_physiotherapist_rating
0,Agol,23.02776,72.60027,33,4.475000,45,4.712500,12,4.812500,39,4.420833,0,0.000000
1,Ahmedabad Cantonment,23.02776,72.60027,33,4.475000,45,4.712500,12,4.812500,39,4.420833,0,0.000000
2,Alam Roza,23.00212,72.54979,60,4.489189,60,4.169444,32,4.942857,60,4.351613,10,4.760000
3,Ambawadi,23.01885,72.55441,60,4.578571,60,4.369767,56,4.493548,60,4.405714,10,4.542857
4,Amraiwadi,23.00735,72.62268,47,4.144444,27,4.607692,16,4.655556,40,3.968182,3,4.950000
...,...,...,...,...,...,...,...,...,...,...,...,...,...
76,Usmanpura,23.04981,72.57120,60,4.407500,60,4.532353,27,4.905000,33,4.218750,2,4.850000
77,Vastral,23.00238,72.65865,60,4.153488,17,4.516667,26,4.633333,35,4.303704,8,4.883333
78,Vastrapur,23.03717,72.53085,60,4.426316,60,4.639394,45,4.744118,54,4.142105,9,4.966667
79,Vejalpur,23.00782,72.51818,60,4.274419,60,4.502703,54,4.723529,55,4.255263,8,4.940000


In [None]:
feature_df.to_csv('Medical_final.csv',index=False)

In [None]:
feature_df=pd.read_csv('Medical_final.csv')

# Data Preprocessing
This includes replacing the NaN values. If we get a Nan value here we know that out of all the number of hospitals or doctors none of them had rating on them because if they had, it would  have been replaced by mean before.
So, we replace the Nan with 0 assuming that the place is really bad if there isn't even a single rating on any of the hospitals available.

In [None]:
feature_df['Mean_hospital_rating']=feature_df['Mean_hospital_rating'].fillna(0)
feature_df

Unnamed: 0,Neighbourhood,Latitude,Longitude,Hospital_Count,Mean_hospital_rating,Doctor_Count,Mean_doctor_rating,Dentist_Count,Mean_dentist_rating,Pharmacy_Count,Mean_pharmacy_rating,Physiotherapist_Count,Mean_physiotherapist_rating
0,Agol,23.02776,72.60027,33,4.475000,45,4.712500,12,4.812500,39,4.420833,0,0.000000
1,Ahmedabad Cantonment,23.02776,72.60027,33,4.475000,45,4.712500,12,4.812500,39,4.420833,0,0.000000
2,Alam Roza,23.00212,72.54979,60,4.489189,60,4.169444,32,4.942857,60,4.351613,10,4.760000
3,Ambawadi,23.01885,72.55441,60,4.578571,60,4.369767,56,4.493548,60,4.405714,10,4.542857
4,Amraiwadi,23.00735,72.62268,47,4.144444,27,4.607692,16,4.655556,40,3.968182,3,4.950000
...,...,...,...,...,...,...,...,...,...,...,...,...,...
76,Usmanpura,23.04981,72.57120,60,4.407500,60,4.532353,27,4.905000,33,4.218750,2,4.850000
77,Vastral,23.00238,72.65865,60,4.153488,17,4.516667,26,4.633333,35,4.303704,8,4.883333
78,Vastrapur,23.03717,72.53085,60,4.426316,60,4.639394,45,4.744118,54,4.142105,9,4.966667
79,Vejalpur,23.00782,72.51818,60,4.274419,60,4.502703,54,4.723529,55,4.255263,8,4.940000


In [None]:
for col in feature_df.columns:
  feature_df[col]=feature_df[col].fillna(0)
feature_df

Unnamed: 0,Neighbourhood,Latitude,Longitude,Hospital_Count,Mean_hospital_rating,Doctor_Count,Mean_doctor_rating,Dentist_Count,Mean_dentist_rating,Pharmacy_Count,Mean_pharmacy_rating,Physiotherapist_Count,Mean_physiotherapist_rating
0,Agol,23.02776,72.60027,33,4.475000,45,4.712500,12,4.812500,39,4.420833,0,0.000000
1,Ahmedabad Cantonment,23.02776,72.60027,33,4.475000,45,4.712500,12,4.812500,39,4.420833,0,0.000000
2,Alam Roza,23.00212,72.54979,60,4.489189,60,4.169444,32,4.942857,60,4.351613,10,4.760000
3,Ambawadi,23.01885,72.55441,60,4.578571,60,4.369767,56,4.493548,60,4.405714,10,4.542857
4,Amraiwadi,23.00735,72.62268,47,4.144444,27,4.607692,16,4.655556,40,3.968182,3,4.950000
...,...,...,...,...,...,...,...,...,...,...,...,...,...
76,Usmanpura,23.04981,72.57120,60,4.407500,60,4.532353,27,4.905000,33,4.218750,2,4.850000
77,Vastral,23.00238,72.65865,60,4.153488,17,4.516667,26,4.633333,35,4.303704,8,4.883333
78,Vastrapur,23.03717,72.53085,60,4.426316,60,4.639394,45,4.744118,54,4.142105,9,4.966667
79,Vejalpur,23.00782,72.51818,60,4.274419,60,4.502703,54,4.723529,55,4.255263,8,4.940000


In [None]:
feature_df.isnull().sum().sum()

0

In [None]:
colnames_numerics_only = feature_df.select_dtypes(include=np.number).columns.tolist()
colnames_numerics_only=colnames_numerics_only[2:]
colnames_numerics_only

['Hospital_Count',
 'Mean_hospital_rating',
 'Doctor_Count',
 'Mean_doctor_rating',
 'Dentist_Count',
 'Mean_dentist_rating',
 'Pharmacy_Count',
 'Mean_pharmacy_rating',
 'Physiotherapist_Count',
 'Mean_physiotherapist_rating']

# Calculating Z-Score for every column
Z-Score calcuation is very important if we want to rank the neighbourhoods later on. We need to rationalise the column so that later it can be used to make a Scoring Function.

In [None]:
for col in colnames_numerics_only:
  col_zscore= col+'_zscore'
  feature_df[col_zscore]=(feature_df[col] - feature_df[col].mean())/feature_df[col].std(ddof=0)
feature_df

Unnamed: 0,Neighbourhood,Latitude,Longitude,Hospital_Count,Mean_hospital_rating,Doctor_Count,Mean_doctor_rating,Dentist_Count,Mean_dentist_rating,Pharmacy_Count,Mean_pharmacy_rating,Physiotherapist_Count,Mean_physiotherapist_rating,Hospital_Count_zscore,Mean_hospital_rating_zscore,Doctor_Count_zscore,Mean_doctor_rating_zscore,Dentist_Count_zscore,Mean_dentist_rating_zscore,Pharmacy_Count_zscore,Mean_pharmacy_rating_zscore,Physiotherapist_Count_zscore,Mean_physiotherapist_rating_zscore
0,Agol,23.02776,72.60027,33,4.475000,45,4.712500,12,4.812500,39,4.420833,0,0.000000,-0.465214,0.465571,0.386518,0.479923,-0.385951,0.514842,0.150329,0.431355,-0.865566,-1.167511
1,Ahmedabad Cantonment,23.02776,72.60027,33,4.475000,45,4.712500,12,4.812500,39,4.420833,0,0.000000,-0.465214,0.465571,0.386518,0.479923,-0.385951,0.514842,0.150329,0.431355,-0.865566,-1.167511
2,Alam Roza,23.00212,72.54979,60,4.489189,60,4.169444,32,4.942857,60,4.351613,10,4.760000,0.793973,0.475391,1.055491,0.113520,0.780543,0.592700,1.169094,0.375053,1.905613,0.857417
3,Ambawadi,23.01885,72.55441,60,4.578571,60,4.369767,56,4.493548,60,4.405714,10,4.542857,0.793973,0.537254,1.055491,0.248679,2.180335,0.324341,1.169094,0.419058,1.905613,0.765043
4,Amraiwadi,23.00735,72.62268,47,4.144444,27,4.607692,16,4.655556,40,3.968182,3,4.950000,0.187698,0.236788,-0.416250,0.409208,-0.152652,0.421103,0.198842,0.063179,-0.034212,0.938244
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
76,Usmanpura,23.04981,72.57120,60,4.407500,60,4.532353,27,4.905000,33,4.218750,2,4.850000,0.793973,0.418853,1.055491,0.358376,0.488919,0.570089,-0.140746,0.266985,-0.311330,0.895704
77,Vastral,23.00238,72.65865,60,4.153488,17,4.516667,26,4.633333,35,4.303704,8,4.883333,0.793973,0.243047,-0.862232,0.347793,0.430595,0.407831,-0.043721,0.336085,1.351377,0.909884
78,Vastrapur,23.03717,72.53085,60,4.426316,60,4.639394,45,4.744118,54,4.142105,9,4.966667,0.793973,0.431875,1.055491,0.430598,1.538764,0.473999,0.878018,0.204644,1.628495,0.945334
79,Vejalpur,23.00782,72.51818,60,4.274419,60,4.502703,54,4.723529,55,4.255263,8,4.940000,0.793973,0.326745,1.055491,0.338371,2.063686,0.461702,0.926531,0.296684,1.351377,0.933990


Declaring weights for different features of the HealthCare domain starting from number of hospitals, mean hospital rating and so on.

In [None]:
 weights_list=[2.5,2.0,1.5,1.5,1.0,1.0,2.0,1.5,0.5,0.75]
 medical_score_list=[]

Now, let's find the overall HealthCare or Medical Score for each neighbourhood


In [None]:
for ind in feature_df.index:
  score=0.0
  for i in range(len(weights_list)):
    weight=weights_list[i]
    col=colnames_numerics_only[i]
    col_zscore= col+'_zscore'
    # print(col_zscore)
    # print(weight)
    col_score=feature_df[col_zscore][ind]*weight
    score=score+col_score
  medical_score_list.append(score)

In [None]:
feature_df['Medical_score']=medical_score_list
feature_df

Unnamed: 0,Neighbourhood,Latitude,Longitude,Hospital_Count,Mean_hospital_rating,Doctor_Count,Mean_doctor_rating,Dentist_Count,Mean_dentist_rating,Pharmacy_Count,Mean_pharmacy_rating,Physiotherapist_Count,Mean_physiotherapist_rating,Hospital_Count_zscore,Mean_hospital_rating_zscore,Doctor_Count_zscore,Mean_doctor_rating_zscore,Dentist_Count_zscore,Mean_dentist_rating_zscore,Pharmacy_Count_zscore,Mean_pharmacy_rating_zscore,Physiotherapist_Count_zscore,Mean_physiotherapist_rating_zscore,Medical_score
0,Agol,23.02776,72.60027,33,4.475000,45,4.712500,12,4.812500,39,4.420833,0,0.000000,-0.465214,0.465571,0.386518,0.479923,-0.385951,0.514842,0.150329,0.431355,-0.865566,-1.167511,0.835933
1,Ahmedabad Cantonment,23.02776,72.60027,33,4.475000,45,4.712500,12,4.812500,39,4.420833,0,0.000000,-0.465214,0.465571,0.386518,0.479923,-0.385951,0.514842,0.150329,0.431355,-0.865566,-1.167511,0.835933
2,Alam Roza,23.00212,72.54979,60,4.489189,60,4.169444,32,4.942857,60,4.351613,10,4.760000,0.793973,0.475391,1.055491,0.113520,0.780543,0.592700,1.169094,0.375053,1.905613,0.857417,10.559109
3,Ambawadi,23.01885,72.55441,60,4.578571,60,4.369767,56,4.493548,60,4.405714,10,4.542857,0.793973,0.537254,1.055491,0.248679,2.180335,0.324341,1.169094,0.419058,1.905613,0.765043,12.013735
4,Amraiwadi,23.00735,72.62268,47,4.144444,27,4.607692,16,4.655556,40,3.968182,3,4.950000,0.187698,0.236788,-0.416250,0.409208,-0.152652,0.421103,0.198842,0.063179,-0.034212,0.938244,2.379737
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
76,Usmanpura,23.04981,72.57120,60,4.407500,60,4.532353,27,4.905000,33,4.218750,2,4.850000,0.793973,0.418853,1.055491,0.358376,0.488919,0.570089,-0.140746,0.266985,-0.311330,0.895704,6.637544
77,Vastral,23.00238,72.65865,60,4.153488,17,4.516667,26,4.633333,35,4.303704,8,4.883333,0.793973,0.243047,-0.862232,0.347793,0.430595,0.407831,-0.043721,0.336085,1.351377,0.909884,4.312579
78,Vastrapur,23.03717,72.53085,60,4.426316,60,4.639394,45,4.744118,54,4.142105,9,4.966667,0.793973,0.431875,1.055491,0.430598,1.538764,0.473999,0.878018,0.204644,1.628495,0.945334,10.676829
79,Vejalpur,23.00782,72.51818,60,4.274419,60,4.502703,54,4.723529,55,4.255263,8,4.940000,0.793973,0.326745,1.055491,0.338371,2.063686,0.461702,0.926531,0.296684,1.351377,0.933990,10.928872


In [None]:
feature_df.to_csv('scored_medical_final.csv')

The below cell is to make a requirements.txt file for better understanding of the viewer.

In [None]:
pip freeze > requirements.txt

This Notebook was all about Data Collection, Data Exploration and Data Pre-processing. The next book will include all the Visualisations.