# Employee Relocation Analysis: Capstone Project

**Company:** WXYZ,Brooklyn, NY, USA <br>
**Presented by:** <a href="https://www.linkedin.com/in/samuelnnitiwetheophilus/">Samuel Nnitiwe Theophilus</a>


## Background 

A Company's best resource and its largest costs are the people who bring creativity, productivity and ultimately profitability to a company- It's Employees. A good talent management program can improve an employer's competitiveness, but it does not ensure that the talent is located where it is most needed. 

In a situation where an Organization wishes to expand its branch to a new location, there will be a need to deploy staff. While it is possible for the Company to hire new talents, using current capable employees who are already familiar with the company structure and operations is the best decision for the organization in terms of Cost & overhead time required to staff to adapt and  handle its operations.
The Company will  need to find a way to relocate some keys employees for continued career development or to bring their knowledge to different subsidiaries or locations. These moves can be a daunting task for the company and a high-stress situation for the employee. If a relocation is not handled successfully, it threatens the employer's ability to retain the employee—and it risks losing someone the employer has devoted time and money to develop and move.


## Problem Definition

The **WXYZ Company** has been operating in --Brooklyn,New York-- for the past 5 years. This year, the board made the decision to open an office in --Coventry, England-- and would like to select some of its existing employees to fill some managerial roles at its new branch.
This Data Science project aims to compare the neighbourhoods in Brooklyn, New York (Current company location) with the neighbourhoods in Coventry(New Branch)  and **create Clusters of similar neighbourhoods**. This will help the company to:
1. Identify Employees who would have a smoother transition to the new branch(by identifying if their current residential address matches a cluster in the new location).
2. Identify Locations to consider as recommendation for employees who agree to relocate.


## Data

I will use the **Foursquare API** to explore neighborhoods in Brooklyn and Conventry. I will the explore the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters.
I will convert addresses into their equivalent latitude and longitude values. 
Finally, I will use the Folium library to visualize the neighborhoods in Brooklyn, Coventry and their emerging clusters.


### Let's begin

##### 1. Import Libaries and dependencies

In [1]:
!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Four
!pip install bs4
from bs4 import BeautifulSoup # this module helps in web scrapping.
import requests  # this module helps us to download a web page
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means from clustering stage
from sklearn.cluster import KMeans
import folium # map rendering library
import json # library to handle JSON files

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



## Lets Load Brooklyn,New York Neighbourhood Data
we will extract the Borough & Neighbourhoods in Brooklyn,New York City from this data


In [2]:
#Download json file from link
!wget -q -O 'newyork_data.json' https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json
print('Data downloaded!')

Data downloaded!


In [3]:
#Load data
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
    
#relevant data needed is in the 'features key': a list of the neighborhoods
neighborhoods_data = newyork_data['features']

#Tranform the data into a pandas dataframe

# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 
neighborhoods = pd.DataFrame(columns=column_names) 

#loop through the data and fill the dataframe
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)



    
#Select & preview Brooklyn,NY Neighborhood 
brooklyn_data = neighborhoods[neighborhoods['Borough'] == 'Brooklyn'].reset_index(drop=True)
brooklyn_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Brooklyn,Bay Ridge,40.625801,-74.030621
1,Brooklyn,Bensonhurst,40.611009,-73.99518
2,Brooklyn,Sunset Park,40.645103,-74.010316
3,Brooklyn,Greenpoint,40.730201,-73.954241
4,Brooklyn,Gravesend,40.59526,-73.973471


## Lets Also Load Coventry Neighbourhood Data
we will extract the Borough & Neighbourhoods in Coventry from <a href="https://en.wikipedia.org/wiki/CV_postcode_area">Wikipedia</a>.
We will scrap wikipedia to extract this data.


In [5]:
url ='https://en.wikipedia.org/wiki/CV_postcode_area'
html=requests.get(url).text
soup= BeautifulSoup(html, 'html.parser')
#Get table data for transformation
table=soup.find_all('table')
table_rows=table[1].find("tbody").find_all('tr')

hood_data=pd.DataFrame(columns=["PostalCode","Borough","Neighborhood"])
row_count=0

for row in table_rows:
    if row_count==0:
        pass
    else:
        
        postal_code=''
        for postal in row.find_all('th'):
            postal_code=postal.text.replace('\n','')
        
        value=''
        
        #Get second column
        value_count=0
        for td in row.find_all('td'):
            temp_value=td.text.split('\n')
            if value_count==1:
                value=temp_value
                
            value_count=value_count+1
            
        values_1=''
        ng_list=''
            
        if(len(value)>1):
            values_1=value[0].split('(')
            if(len(values_1)>1):
                values_1[1]=values_1[1].replace(')','')
                ng_list = values_1[1].split(',')

        for neighbor in ng_list:

            if (ng_list != []):
                hood=np.NAN

                if len(neighbor)>1:
                    hood=neighbor
                    hood_dict={"PostalCode":postal_code,"Borough":'Coventry',"Neighborhood":hood}
                    #print(hood_dict)
                    hood_data=hood_data.append(hood_dict,ignore_index=True)

    row_count=row_count+1
    
#Select only cells. without empty Borough
scraped_hood_data=hood_data[hood_data['Borough']!='']
scraped_hood_data

Unnamed: 0,PostalCode,Borough,Neighborhood
0,CV1,Coventry,Coventry City Centre
1,CV1,Coventry,Gosford Green
2,CV1,Coventry,Hillfields
3,CV1,Coventry,Spon End
4,CV1,Coventry,Coventry University
...,...,...,...
85,CV31,Coventry,Whitnash
86,CV31,Coventry,Radford Semele
87,CV32,Coventry,north
88,CV32,Coventry,Cubbington


In [6]:
#Define Functions to get latitude and longitude of Coventry
def get_lat_( address):
    try:
        geolocator = Nominatim(user_agent="ny_explorer")
        location = geolocator.geocode(address)
        latitude = location.latitude
        return latitude
    except:
        return np.NAN

def get_long_( address):
    try:
        geolocator = Nominatim(user_agent="ny_explorer")
        location = geolocator.geocode(address)
        longitude = location.longitude
        return longitude
    except:
        return np.NAN
    
#Get Longitutde
scraped_hood_data['Longitude']=scraped_hood_data['PostalCode'].apply(get_long_)

#Get Latitutde
scraped_hood_data['Latitude']=scraped_hood_data['PostalCode'].apply(get_lat_)
coventry_data=scraped_hood_data.copy()
coventry_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Longitude,Latitude
0,CV1,Coventry,Coventry City Centre,4.740227,36.111617
1,CV1,Coventry,Gosford Green,4.740227,36.111617
2,CV1,Coventry,Hillfields,4.740227,36.111617
3,CV1,Coventry,Spon End,4.740227,36.111617
4,CV1,Coventry,Coventry University,4.740227,36.111617


## Drop nan values and display data for both tables

In [7]:
coventry_data= coventry_data.dropna()
#drop Postal Code column (Since we now have LAtitude and Longtitude)
coventry_data.drop('PostalCode', inplace=True, axis=1)
print(coventry_data.shape)

(90, 4)


In [8]:
brooklyn_data= brooklyn_data.dropna()
print(brooklyn_data.shape)

(70, 4)
