# Capstone Project
### Applied Data Science Capstone by IBM Coursera

## Table of contents
* [Introduction](#Introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

# Introduction

An Example to fetch postal code information of Canada similar to course assignments.


The purpose of this Project is to help people in exploring better facilities around their neighbourhood.
It will help people making smart and efficient decision on selecting great neighbourhood out of numbers of other neighbourhoods in Quebec.

Lots of people are migrating to various states of Canada and needed lots of research for good housing prices and reputed schools for their children. This project is for those people who are looking for better neighbourhoods. For ease of accessing to Cafe, School, Super market, medical shops, grocery shops, mall, theatre, hospital, like-minded people, etc.

This Project aim to create an analysis of features for a people migrating to Quebec to search a best neighbourhood as a comparative analysis between neighbourhoods. The features include median housing price and better school according to ratings, crime rates of that particular area, road connectivity, weather conditions, good management for emergency, water resources both fresh and waste water and excrement conveyed in sewers and recreational facilities.

It will help people to get awareness of the area and neighbourhood before moving to a new city, state, country or place for their work or to start a new fresh life.

# Data

# Problems to solve

The major purpose of this project, is to suggest a better neighbourhood in a new city for the person who are shifting there. Social presence in society in terms of like-minded people. Connectivity to the airport, bus stand, city centre, markets and other daily needs things nearby.

# Location

Quebec is a popular destination for new immigrants in Canada to reside. As a result, Quebec is the largest province by area, with much of the population live in urban areas along the Saint Lawrence River, between the most populous city, Montreal, and its capital city, Quebec City. Quebec is also the home of Québécois recognized as a nation by both the provincial and federal government. Although immigration has become a hot topic over the past few years with more governments seeking more restrictions on immigrants and refugees, the general trend of immigration into Canada has been one of on the rise.

# Methodology

# Foursquare API

This project would use Four-square API as its prime data gathering source as it has a database of millions of places, especially their places API which provides the ability to perform location search, location sharing and details about a business.

# Work Flow

Using credentials of Foursquare API features of near-by places of the neighbourhoods would be mined. Due to http request limitations the number of places per neighbourhood parameter would reasonably be set to 100 and the radius parameter would be set to 500.

# Libraries Which Are Used to Develop the Project:

Pandas: For creating and manipulating data frames.

Folium: Python visualization library would be used to visualize the neighbourhoods cluster distribution of using interactive leaflet map.

JSON: Library to handle JSON files.

Geocoder: To retrieve Location Data.

Beautiful Soup and Requests: To scrap and library to handle http requests.

Matplotlib: Python Plotting Module.

In [1]:
pip install bs4

Note: you may need to restart the kernel to use updated packages.


In [2]:
import pandas as pd
import numpy as np
import requests

from bs4 import BeautifulSoup


source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_H').text

soup = BeautifulSoup(source, 'html5lib')

postal_codes_dict = {} # initialize an empty dictionary to save the data in
for table_cell in soup.find_all('td'):
    try:
        postal_code = table_cell.p.b.text # get the postal code
        postal_code_investigate = table_cell.span.text
        neighborhoods_data = table_cell.span.text # get the rest of the data in the cell
        borough = neighborhoods_data.split('(')[0] # get the borough in the cell
        
        # if the cell is not assigned then ignore it
        if neighborhoods_data == 'Not assigned':
            neighborhoods = []
        # else process the data and add it to the dictionary
        else:
            postal_codes_dict[postal_code] = {}
            
            try:
                neighborhoods = neighborhoods_data.split('(')[1]
            
                # remove parantheses from neighborhoods string
                neighborhoods = neighborhoods.replace('(', ' ')
                neighborhoods = neighborhoods.replace(')', ' ')

                neighborhoods_names = neighborhoods.split('/')
                neighborhoods_clean = ', '.join([name.strip() for name in neighborhoods_names])
            except:
                borough = borough.strip('\n')
                neighborhoods_clean = borough
 
            # add borough and neighborhood to dictionary
            postal_codes_dict[postal_code]['borough'] = borough
            postal_codes_dict[postal_code]['neighborhoods'] = neighborhoods_clean
    except:
        pass
    
# create an empty dataframe
columns = ['PostalCode', 'Borough', 'Neighborhood']
data = pd.DataFrame(columns=columns)
data

# populate dataframe with data from dictionary
for ind, postal_code in enumerate(postal_codes_dict):
    borough = postal_codes_dict[postal_code]['borough']
    neighborhood = postal_codes_dict[postal_code]['neighborhoods']
    data = data.append({"PostalCode": postal_code, 
                                        "Borough": borough, 
                                        "Neighborhood": neighborhood},
                                        ignore_index=True)

# print number of rows of dataframe
data.shape

(124, 3)

In [3]:
data

Unnamed: 0,PostalCode,Borough,Neighborhood
0,H1A,Pointe-aux-Trembles,Pointe-aux-Trembles
1,H2A,"Saint-Michel,East","Saint-Michel,East"
2,H3A,Downtown Montreal North,McGill University
3,H4A,Notre-Dame-de-GrâceNortheast,Notre-Dame-de-GrâceNortheast
4,H5A,Place Bonaventure,Place Bonaventure
...,...,...,...
119,H1Z,Saint-MichelWest,Saint-MichelWest
120,H2Z,Downtown MontrealNortheast,Downtown MontrealNortheast
121,H3Z,WestmountSouth,WestmountSouth
122,H4Z,Tour de la Bourse,Tour de la Bourse


In [4]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(data['Borough'].unique()),
        data.shape[0]
    )
)

The dataframe has 124 boroughs and 124 neighborhoods.


In [5]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation


!pip install geopy
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize


! pip install folium==0.5.0
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Folium installed
Libraries imported.


In [6]:
address = 'Quebec, QC'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Quebec are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Quebec are 46.8259601, -71.2352226.


In [7]:
pip install geocoder

Note: you may need to restart the kernel to use updated packages.


In [None]:
import geocoder # import geocoder

# initialize your variable to None
lat_lng_coords = None

# loop until you get the coordinates
while(lat_lng_coords is None):
  g = geocoder.google('{}, Quebec'.format(data['PostalCode']))
  lat_lng_coords = g.latlng

latitude = lat_lng_coords[0]
longitude = lat_lng_coords[1]

In [9]:
# create map of quebec using latitude and longitude values
map_qc = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for borough, neighborhood in zip(data['Borough'], data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [1, 0],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_qc)  
    
map_qc

## Results and Discussion <a name="results"></a>

Our analysis shows that although there is a great number of neighbourhoods in Canada.
Quebec is one of the thirteen provinces and territories of Canada, it is located in Central Canada, 
the province shares land borders with Ontario to the southwest, 
Newfoundland and Labrador to the northeast, New Brunswick to the southeast, 
a coastal border with Nunavut; and land borders with the states of Maine, 
New Hampshire, Vermont and New York to the south.
Quebec has one of the world's largest reserves of fresh water, occupying 12% of its surface.
In general, the climate of Quebec is cold and humid.
Quebec has an advanced, market-based, and open economy.
All universities in Quebec exist by virtue of laws adopted by the National Assembly of Quebec in 1967 
during the Quiet Revolution. Their financing mostly comes from public taxes, 
but the laws under which they operate grants them more autonomy than other levels of education.
Teachers are represented by province-wide unions that negotiate province-wide working conditions 
with local school service centres and the government of Quebec.
School work and tests are normally
graded using one of two methods (or both simultaneously): 
a percentage-based 0 to 100% correct system (60% correct is usually the minimum passing grade),
or, a letter grade system going from A (best) down to B, C, D and finally, F (failure).



## Conclusion <a name="conclusion"></a>

The purpose of this project, was to suggest a better neighbourhood in a new city for the person who are shifting there. Social presence in society in terms of like-minded people. Connectivity to the airport, bus stand, city centre, markets and other daily needs things nearby.
Relevant information were gathered and reported.