# Preferred areas to start a tourism business

## Background: 
#### Most major cities are marketed based on their tourism potential. Some of these places are only known by the locals. If a person wants to start a tourism business (or an existing tourism company wants to expand), it would be wise to have a premisse in close proximity to tourism places, so that you can also play the role of guide. 

## Synopsis:
1.	Obtain a list of commercial properties for sale in a city: 
    1.	My city of choice is Cape Town, South Africa
    1.	I will use a local website for commercial properties: https://property.mg.co.za/commercial-property-for-sale-in-cape-town-c432
1.	Determine the number of and rating of “points of interest” near these commercial properties using Foursquare.
1.	Suggest the best properties to pursue for establishing a tourism business close to well-rated tourism attractions (points of interest).

## Data
### Data acquisition:
- The website data will be scraped to find the name of the property, selling price and are in Cape Town.
- If data scraping is unsuccessful, the relevant data will be copied to a csv file and imported.
- The areas from the website data will be used to obtain data from Foursquare, particularly the name, location and rating for the point of interest.

### Data classification:
- This data will be used to cluster the points of interest in the same vicinity as the commercial property to determine the better places to start a tourism business. 
- The clusters will be presented on a map to show density and proximity to other landmarks.

## Tourism - Real Estate Notebook

### Import libraries

In [83]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
import json # library to handle JSON files
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt # plotting library
# backend for rendering plots within the browser
%matplotlib inline

from sklearn.cluster import KMeans # import k-means from clustering stage
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import folium # map rendering library

from bs4 import BeautifulSoup # to scrape web pages 
import re
import urllib # url commands 
import urllib3

import csv # to read and write CSV files
from datetime import datetime

print('Libraries imported.')

Libraries imported.


##### Website Layout:
The website consists of classes taht describe different aspects of the porperties for sale. Onr such class contsins the adrress and the neigbourhood. 

The deayluist layout consist of 10 listings, but ther ecould be more. Each listing is containced in a seperate class. 

The website has multiple pages for the search criteria (already embedded in the URL). All the pages will be read to build the dataframe.

In [230]:
# specify the url
page1 = 1
property_page = 'https://property.mg.co.za/commercial-property-for-sale-in-cape-town-c432?Mapped=True&Page=' + str(page1)
print (property_page)

# query the website
prop_page = urllib.request.urlopen(property_page)

# parse the html using beautiful soup 
property_soup = BeautifulSoup(prop_page, 'html.parser')

https://property.mg.co.za/commercial-property-for-sale-in-cape-town-c432?Mapped=True&Page=1


In [231]:
# get the list count
list_box = property_soup.find('div', attrs={'class':'sc_pageText pull-right'})
list_box_val = (list_box.text.strip())
list_count = int(list_box_val.rpartition("of ")[2])
print (list_count)

195


In [232]:
# get the page count
first_page = 1
last_page = 0
for each_pl in property_soup.find_all('li', class_="pagelink"):
    last_page = each_pl.text.strip()
    
print (last_page)

10


In [233]:
# set up the address dataframe and values
prop_addr = ""
prop_addr_ar = ["","",""]
prop_address = ["","",""]
property_df = pd.DataFrame({'Street' : pd.Categorical(np.nan),'Neighbourhood' : pd.Categorical(np.nan),
                            'City' : pd.Categorical(np.nan)})

In [234]:
# populate the address fields into the dataframe, reading multiple pages
# first page is already read - populate values and iterate to next page
i = 1
j = int(last_page) + 1

for i in range(1, j):
    print (i, property_page)
    
    for each_div in property_soup.find_all('div', attrs={'class':'sc_listingTileAddress primaryColor'}):
        prop_addr = each_div.text.strip() 
        prop_addr_ar = prop_addr.split(',')
        prop_address[0] = prop_addr_ar[0].strip()
        prop_address[1] = prop_addr_ar[1].strip()
        prop_address[2] = prop_addr_ar[2].strip()
        if prop_address[1] == prop_address[2]:
            if len(prop_addr_ar) > 3:
                prop_address[2] = prop_addr_ar[3].strip()

        property_df = property_df.append({'Street' : prop_address[0],'Neighbourhood' : prop_address[1], 
                                          'City' : prop_address[2]}, ignore_index=True)
        
    i = i + 1
    property_page = 'https://property.mg.co.za/commercial-property-for-sale-in-cape-town-c432?Mapped=True&Page=' + str(i)

1 https://property.mg.co.za/commercial-property-for-sale-in-cape-town-c432?Mapped=True&Page=1
2 https://property.mg.co.za/commercial-property-for-sale-in-cape-town-c432?Mapped=True&Page=2
3 https://property.mg.co.za/commercial-property-for-sale-in-cape-town-c432?Mapped=True&Page=3
4 https://property.mg.co.za/commercial-property-for-sale-in-cape-town-c432?Mapped=True&Page=4
5 https://property.mg.co.za/commercial-property-for-sale-in-cape-town-c432?Mapped=True&Page=5
6 https://property.mg.co.za/commercial-property-for-sale-in-cape-town-c432?Mapped=True&Page=6
7 https://property.mg.co.za/commercial-property-for-sale-in-cape-town-c432?Mapped=True&Page=7
8 https://property.mg.co.za/commercial-property-for-sale-in-cape-town-c432?Mapped=True&Page=8
9 https://property.mg.co.za/commercial-property-for-sale-in-cape-town-c432?Mapped=True&Page=9
10 https://property.mg.co.za/commercial-property-for-sale-in-cape-town-c432?Mapped=True&Page=10


In [235]:
property_df.dropna(inplace=True)
property_df.reindex()
print ("Dataframe size: ", property_df.size, " : ", property_df.columns)
print (property_df.head())
print ('\nl', property_df.describe)

Dataframe size:  630  :  Index(['Street', 'Neighbourhood', 'City'], dtype='object')
              Street Neighbourhood       City
1  3 Riverstone Road       Wynberg  Cape Town
2   F2 Bell Crescent      Westlake  Cape Town
3   F2 Bell Crescent      Westlake  Cape Town
4   F2 Bell Crescent      Westlake  Cape Town
5     7 Station Road    Rondebosch  Cape Town

l <bound method NDFrame.describe of                   Street          Neighbourhood       City
1      3 Riverstone Road                Wynberg  Cape Town
2       F2 Bell Crescent               Westlake  Cape Town
3       F2 Bell Crescent               Westlake  Cape Town
4       F2 Bell Crescent               Westlake  Cape Town
5         7 Station Road             Rondebosch  Cape Town
..                   ...                    ...        ...
206  A10 Westlake Square               Westlake  Cape Town
207    121 Castle Street  Cape Town City Centre  Cape Town
208  E6A Westlake Square               Westlake  Cape Town
209    32 Bar