# Landscaping Startups in Portland, Oregon, United States

### Business Problem

The neighborhoods in the city of Portland often have a variety of natural and artificial landscapes that need stewardship and maintenance. Venues may include parks, garden centers, country clubs, florists, hardware stores, private residences, and any business that caters to creating, maintaining, and managing vegetation and landscapes. 

A hypothetical group of investors is looking to start a landscaping business that maintains, desings, and installs vegetation in public and private spaces. It would behoove this group of stakeholders to determine in which neighbourhoods they find the most amount of these locations. 

This information would allow them to decide where they should place their new landscaping business; such that travel times between clients and appointments is reduced and their ability to reach customers improves. On the other hand, venue category may provide useful intelligence for deciding which neighborhood is closest to businesses the sell the inputs the startup will need more ocassionally: seed, plants, fertilizer, pesticides, or similar hardware.  Likewise, results may also allow them to determine in which neighborhoods they should spend resources researching the market and possible competitors.

![This landscape could use some mowing](https://img.timeinc.net/time/2010/portland_tavel/portland_park.jpg)

<p style="text-align: center;">A park in the city of Portland that could use a mowing.</p>

### Data

We will use Foursquare location data to access venue categories in Portland. We will determine neighborhood locations in the city of Portland by scraping several lists from the [Neighborhoods of Portland, Oregon](https://www.portlandoregon.gov/civic/35281) site. The list will be sorted and cleaned so that we can parse it to the geocoder and determine geographical locations for each neighborhood. Following this, we will explore each neighborhood within a 1000 m radius for venues. We will categorize the venues and determine which categories are most related to landscaping and gardening businesses. We will classify each neighborhood according to their rate of landscaping-related businesses. This will allow us to make a recommendation for which neighborhoods the investors should pay closer attention to.

## 1. Preparing Data


### 1.1 Required Packages

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library


from bs4 import BeautifulSoup # We use beautiful soup to scrape the website from Portland's local government site



print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2020.6.20          |   py36h9f0ad1d_0         151 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    openssl-1.1.1g             |       h516909a_1         2.1 MB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    ca-certificates-2020.6.20  |       hecda079_0         145 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    ------------------------------------------------------------
                       

### 1.2 Fetching and cleaning Portland neighborhood information

In [110]:
site = 'https://www.portlandoregon.gov/civic/35281'
nhs = requests.get(site)
soup = BeautifulSoup(nhs.text, "html.parser")

nhs_list = soup.find_all("h2")


portland = []

for child in nhs_list: # For loop produces the strings in all the 'h2' tags of our soup
    portland.append(child.string)
    show = pd.DataFrame(data = {'Neighborhood' : portland} )


Select only the rows that have the string *Neighborhood* in them. Then eliminate all the strings so that we are left only with the neighborhood name.

In [111]:
df = show[show['Neighborhood'].str.contains('Neighborhood')]

string_cleaning = [' Neighborhood Association', ' Neighborhood District Association', '\(formerly CTLH\)', 
                   ' Neighborhood Network', 'Association of Neighbors', ' Association of Neighborhoods', '\(HAND \)']


In [112]:
x = df.Neighborhood

for i in string_cleaning:
    x = x.str.replace(r'{}'.format(i), '')
    df = pd.DataFrame(data = {'Neighborhood':x} )

In [115]:
df.reset_index(drop=True, inplace = True)
df = df.drop(73) ## For some reason University Park is being misplaced by the geocoder. Migh as well remove the neighborhood.
df.reset_index(drop=True, inplace = True)

In [116]:
print(df.head(9))
print('There are ', df.shape[0], 'neighborhoods in Portland Oregon')

              Neighborhood
0                  Alameda
1              Arbor Lodge
2  Ardenwald/Johnson Creek
3            Argay Terrace
4        Arlington Heights
5             Arnold Creek
6                 Ashcreek
7        Beaumont-Wilshire
8                    Boise
There are  77 neighborhoods in Portland Oregon


## 2. Obtaining coordinates for each neighborhood

In [118]:
df.index

RangeIndex(start=0, stop=77, step=1)

In [119]:
loc = []
lat = []
long = []


In [120]:

for nh in df['Neighborhood']:
    
    adress = '{}, Portland, OR'.format(nh)
    
    geolocator = Nominatim(user_agent = "portland_explorer")
    location = geolocator.geocode(adress)
    latx = location.latitude
    longx = location.longitude
    
    loc = loc.append(nh)
    lat = lat.append(latx)
    long = long.append(longx)
    
    print(location)
    
    portland_data = pd.Series({'Neighborhood':nh, 'Latitude':lat, 'Longitude':long})

Alameda, Portland, Multnomah County, Oregon, 97212, United States of America


AttributeError: 'NoneType' object has no attribute 'append'