-------------------------------------------------
-------------------------------------------------

# Coursera Capstone Project Notebook (Part 3)

[Link to Notebook (Part 1) of the project:](https://nbviewer.jupyter.org/gist/fy5std/1abce225f491d9471b80eca9edd8ae7c)

------------------------------------------------------

###     3. New Project - Where Do We Meet? WDWM

#### From Assignment

This capstone project will be graded by your peers. This capstone project is worth *70%* of your total grade. The project will be completed over the course of 2 weeks. Week 1 submissions will be worth *30%* whereas week 2 submissions will be worth 40% of your total grade.

##### Week4
For this week, you will required to submit the following:

1.   A description of the problem and a discussion of the background. **(15 marks)**

2.   A description of the data and how it will be used to solve the problem. **(15 marks)**

##### Week5
This week, you will continue working on your capstone project. Please remember by the end of this week, you will need to submit the following:

1.  A full report consisting of all of the following components. **(15 marks)**:
    * Introduction where you discuss the business problem and who would be interested in this project.
    * Data where you describe the data that will be used to solve the problem and the source of the data.
    * Methodology section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, and what machine learnings were used and why.
    * Results section where you discuss the results.
    * Discussion section where you discuss any observations you noted and any recommendations you can make based on the results.
    * Conclusion section where you conclude the report.

2. A link to your Notebook on your Github repository pushed showing your code. **(15 marks)**

3. Your choice of a presentation or blogpost. **(10 marks)**

### Week 4

### Background

We all have families, friends and business contacts around the country. Last decade we started to meet online, but meeting in a common place is always preferred, and sometimes is essential. In that case, the procedure depends on a few things; 
   * How far are the groups (3 km, 15 miles etc.),
   * How is the transportation (by car, by public transportation etc.),
   * Do groups have special needs/preferences in a meeting place (children, restaurant, disabled people etc.),
   * Is there an appropriate place in between, within the transportation?

Generally, after a quick assessment, the meeting place is chosen between known places. If the middle point is an unknown place or if a change on the meeting place is intended, then some search on the internet will help to determine the place

In this frame, social interaction would be the main need to use this tool. Also, it would be used with business purposes such as finding a warehouse, store or maintenance place for stores in various places. So any individual user may be in a situation to use this tool. Specifically, we intend to specify the problem to a refugee (asylum seeker) group in Netherlands.

### Problem Description 

*The problem* we focus is about shortening the internet search time to filter appropriate places to meet.

To be more precise, if two or more groups around the country intends to find an appropriate meeting point (children playground, hotel, restaurant, sports bar etc.) in a city in-between current locations, a tool which takes the current locations and gives the suggestions would help a lot.

*The audience* in this project is AZC guests (refugees and asylum seekers). Since these people are new in the country, with no prior experience and lack of language, they generally struggle to find an appropriate place. Most of the guests don't even know their locations in the country (just know the name of AZC) and can not estimate a middle point to meet.

### Data

In the described frame, I would like to find a solution to the probelem described above. So in Netherlands , there are community centers for refugees, called AZC. These centers are located all over Netherlands and people may move from one to another. When one wants to meet with a friend from another AZC, since the refugees would not have a car, she generally tries to find an in-between city, an appropriate place to meet and travel there by public transportation.

SO in this project, I will use the data from COA website (official AZC institution) to gather AZC locations. I will use foursquare data to find an appropriate place around intended meeting point.

### Methodology (Projection)

First, I will download the [webpage](https://www.coa.nl/en/search-location), and then scrape it to reach the exact latitude and longitudes of AZC's (reception centers). Then I will use some temporary information to locate the current locations of the groups. Then, with a few equations, I intend to find a middle point with a radius large enough to a public transportation. Then lastly, after finding the approximate meeting point, I would use the foursquare data to list the categories and alternative places to meet.

So let's move on.

### Real World Data

##### Import libraries and gather location data 

In [1]:
import numpy as np # library for vectorized computation
import pandas as pd # library to process data as dataframes
from bs4 import BeautifulSoup
import csv
import requests

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

print('Basic Database, WebScrape, JSON Libraries imported.')

Basic Database, WebScrape, JSON Libraries imported.


In [2]:
source = requests.get('https://www.coa.nl/en/search-location').text
soup = BeautifulSoup(source, 'html5lib')
print(soup.prettify())


<!DOCTYPE html>
<html dir="ltr" lang="en">
 <head>
  <link href="http://www.w3.org/1999/xhtml/vocab" rel="profile"/>
  <meta content="width=device-width, initial-scale=1.0" name="viewport"/>
  <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
  <link href="https://www.coa.nl/sites/www.coa.nl/themes/coa_bs/favicon.ico" rel="shortcut icon" type="image/vnd.microsoft.icon"/>
  <meta content="Approximately one-sixth of the Dutch municipalities now have a COA asylum centre. In some municipalities there are several reception locations, for example an azc and a process reception centre. Most reception locations are regular asylum seekers' centres." name="description"/>
  <meta content="Drupal 7 (https://www.drupal.org)" name="generator"/>
  <link href="https://www.coa.nl/en/search-location" rel="canonical"/>
  <link href="https://www.coa.nl/en/node/278" rel="shortlink"/>
  <title>
   Search location | www.coa.nl
  </title>
  <link href="https://www.coa.nl/sites/www.coa.nl/fi

This is the full wabpage, and there are location information inside, but since it was written in Java it's complicated to reach the location info with an automated process. I think it would be easier to scrape the information with a text editor using regex.

In [3]:
f = csv.writer(open("COA_Web.csv", "w"))
f.writerow([soup])
    

59851

I did some regex edits with Sublime Text on the csv file for faster process. Now the file contains lat-long of the AZC (Name).

##### Convert the data to dataframe

In [4]:
df_coa = pd.read_csv('coa_site_scrapped2.csv')
print(df_coa)

    Latitude  Longitude                                   Name
0    51.4948    3.59212                             Middelburg
1    51.4966    3.87917                                   Goes
2    52.0337    4.32979                               Rijswijk
3    52.1460    4.38730                              Wassenaar
4    52.1774    4.41329                                Katwijk
5    51.8850    4.56808                              Rotterdam
6    51.7612    4.62215                           s-Gravendeel
7    52.9321    4.75435  Den Helder Burgemeester Ritmeesterweg
8    52.3716    4.80231                Amsterdam - Willinklaan
9    52.6775    4.84204                          Heerhugowaard
10   52.3935    4.86152           Amsterdam - Transformatorweg
11   51.5346    4.90282                         Gilze en Rijen
12   51.5594    5.08258               Tilburg - Stationsstraat
13   52.0830    5.08572             Utrecht - Joseph Haydnlaan
14   51.5790    5.22727                             Ois

Our database for cuurent locations is ready to use. Now we can move on to plotting these locations.

##### Plot the location information on a map

In [5]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans
import folium # map rendering library

from folium import plugins
from folium.plugins import MarkerCluster
from folium.plugins import FastMarkerCluster

print('Geolocation, Clustering, Plotting and Map Libraries imported.')

Geolocation, Clustering, Plotting and Map Libraries imported.


Let's try the geolocator:

In [6]:
address = 'Emmen'

geolocator = Nominatim(user_agent="AZC explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(location)
print('The geograpical coordinate of ',address,' are {}, {}.'.format(latitude, longitude))


Emmen, Drenthe, Nederland
The geograpical coordinate of  Emmen  are 52.788937, 6.8939001.


In [7]:
df_coa_map = folium.Map(location=[df_coa["Latitude"].mean(), df_coa["Longitude"].mean()], zoom_start=7)
mc = MarkerCluster()

for each in range(len(df_coa)):
    popup_info = folium.Popup(df_coa.Name[each], parse_html=True)
    mc.add_child(folium.Marker(location=[df_coa.Latitude[each], df_coa.Longitude[each]], popup=popup_info))

#print (df_coa["Latitude"].mean(), df_coa["Longitude"].mean())    
df_coa_map.add_child(mc)
df_coa_map
