Urban Data Science & Smart Cities <br>
URSP688Y <br>
Instructor: Chester Harvey <br>
Urban Studies & Planning <br>
National Center for Smart Growth <br>
University of Maryland

[<img src="https://colab.research.google.com/assets/colab-badge.svg">](https://colab.research.google.com/github/ncsg/ursp688y_sp2024/blob/main/exercises/exercise07/exercise07.ipynb)

# Exercise 7

## Problem

In week 7, you learned how to extend tabular data with geospatial information: points, linestrings, and polygons.

For this next exercise, please ask a planning-related question with a spatial component, then find data and apply any data science methods you have learned so-far (or can Google!) to answer that question.

## Data

You are welcome to use any data you would like, including data used in previous demos and exercises.

## A Few Pointers
- Choose a straightforward question that requires a reasonable amount of data! Don't shoot for the moon. This exercise is intended to give you a chance to practice finding and analyzing spatial data, not to address the world's greatest challenges.
- Consider using this exercise to get a head start on your final project or explore options for it. Your project doesn't need to focus on spatial analysis for it to play a role. Are there datasets you might join together based on spatial locations?
- Don't go overboard. If you're hitting a wall with coding, write pseudocode and turn that in. Don't let the perfect be the enemy of the done. But if you're energized and having fun by chasing down a thorny solution to a coding problem, by all means feel free to keep at it!



In [1]:
# Where are schools in Delaware that are underenrolled
# link to save for future: Delaware Public School Funding Gap - Overview https://www.arcgis.com › home › item
# anothre one: https://education.delaware.gov/community/data/reports/nps/

In [2]:
import pandas as pd
import os

In [3]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
os.chdir('/content/drive/MyDrive/Colab Notebooks//woods_07/')

privateschools = pd.read_csv('listofprivateschoolsinDE.edit.csv')

In [5]:
privateschools

Unnamed: 0,PSS_SCHOOL_ID,PSS_INST,LoGrade,HiGrade,PSS_ADDRESS,PSS_CITY,PSS_COUNTY_NO,PSS_COUNTY_FIPS,PSS_STABB,PSS_FIPS,...,PSS_ASSOC_6,PSS_ASSOC_7,PSS_ASSOC_8,PSS_ASSOC_9,PSS_ASSOC_10,PSS_ASSOC_11,PSS_ASSOC_12,PSS_ASSOC_13,PSS_ASSOC_14,PSS_ASSOC_15
0,249817,ALBERT EINSTEIN ACADEMY,3,10,"101 GARDEN OF EDEN RD 104 WILMINGTON, DE",WILMINGTON,10003,3,DE,10,...,,,,,,,,,,
1,A0101559,APPLE GROVE SCHOOL,6,13,"2391 HAZLETTVILLE RD DOVER, DELAWARE",DOVER,10001,1,DE,10,...,,,,,,,,,,
2,A9700826,AQUINAS ACADEMY,2,17,"2370 RED LION RD BEAR, DELAWARE",BEAR,10003,3,DE,10,...,,,,,,,,,,
3,BB200329,ASTRA ZENECA CHILD DEVELOPMENT CENTER,2,3,"1920 ROCKLAND RD WILMINGTON, DELAWARE",WILMINGTON,10003,3,DE,10,...,,,,,,,,,,
4,BB200330,AUGUSTINE HILLS SCHOOL,9,16,"6 STONE HILL RD WILMINGTON, DELAWARE",WILMINGTON,10003,3,DE,10,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
72,A2100767,URBAN PROMISE ACADEMY,12,16,"2223 N MARKET ST WILMINGTON, DELAWARE",WILMINGTON,10003,3,DE,10,...,,,,,,,,,,
73,A0900734,URBAN PROMISE SCHOOL,2,11,"2401 THATCHER ST WILMINGTON , DELAWARE",WILMINGTON,10003,3,DE,10,...,,,,,,,,,,
74,249646,"URSULINE ACADEMY, LOWER",3,10,"1106 PENNSYLVANIA AVE WILMINGTON , DELAWARE",WILMINGTON,10003,3,DE,10,...,,,,,,,,,,
75,249748,WEST CENTER SCHOOL,6,13,"1418 YODER DR HARTLY, DELAWARE",HARTLY,10001,1,DE,10,...,,,,,,,,,,


In [6]:
privateschools.keys()

Index(['PSS_SCHOOL_ID', 'PSS_INST', 'LoGrade', 'HiGrade', 'PSS_ADDRESS',
       'PSS_CITY', 'PSS_COUNTY_NO', 'PSS_COUNTY_FIPS', 'PSS_STABB', 'PSS_FIPS',
       'PSS_ZIP5', 'PSS_PHONE', 'PSS_ENROLL_PK', 'PSS_ENROLL_K',
       'PSS_ENROLL_1', 'PSS_ENROLL_2', 'PSS_ENROLL_3', 'PSS_ENROLL_4',
       'PSS_ENROLL_5', 'PSS_ENROLL_6', 'PSS_ENROLL_7', 'PSS_ENROLL_8',
       'PSS_ENROLL_9', 'PSS_ENROLL_10', 'PSS_ENROLL_11', 'PSS_ENROLL_12',
       'PSS_ENROLL_T', 'PSS_ENROLL_TK12', 'PSS_RACE_AI', 'PSS_RACE_AS',
       'PSS_RACE_H', 'PSS_RACE_B', 'PSS_RACE_W', 'PSS_RACE_P', 'PSS_RACE_2',
       'PSS_FTE_TEACH', 'PSS_LOCALE', 'PSS_COED', 'PSS_TYPE', 'PSS_LEVEL',
       'PSS_RELIG', 'PSS_COMM_TYPE', 'PSS_INDIAN_PCT', 'PSS_ASIAN_PCT',
       'PSS_HISP_PCT', 'PSS_BLACK_PCT', 'PSS_WHITE_PCT', 'PSS_PACISL_PCT',
       'PSS_TWOMORE_PCT', 'PSS_STDTCH_RT', 'PSS_ORIENT', 'PSS_COUNTY_NAME',
       'PSS_ASSOC_1', 'PSS_ASSOC_2', 'PSS_ASSOC_3', 'PSS_ASSOC_4',
       'PSS_ASSOC_5', 'PSS_ASSOC_6', 'PSS_ASSOC_7', 

In [7]:
# make these keys match up w what csv says
columns_to_keep = ['PSS_INST' , 'PSS_ADDRESS' , 'PSS_ZIP5']
df = privateschools[columns_to_keep]



df.head()

Unnamed: 0,PSS_INST,PSS_ADDRESS,PSS_ZIP5
0,ALBERT EINSTEIN ACADEMY,"101 GARDEN OF EDEN RD 104 WILMINGTON, DE",19803
1,APPLE GROVE SCHOOL,"2391 HAZLETTVILLE RD DOVER, DELAWARE",19904
2,AQUINAS ACADEMY,"2370 RED LION RD BEAR, DELAWARE",19701
3,ASTRA ZENECA CHILD DEVELOPMENT CENTER,"1920 ROCKLAND RD WILMINGTON, DELAWARE",19803
4,AUGUSTINE HILLS SCHOOL,"6 STONE HILL RD WILMINGTON, DELAWARE",19803


In [8]:
from geopy.geocoders import Nominatim

In [9]:
'''
# Initialize the geocoder
geolocator = Nominatim(user_agent="PSS_ADDRESS")

# Function to get latitude and longitude for an address
def get_lat_long_with_retry(address):
    location = geolocator.geocode(address)
    if location:
        return location.latitude, location.longitude
    else:
        return None, None

# Add new columns for latitude and longitude
privateschools['Latitude'] = None
privateschools['Longitude'] = None

# Iterate through each row and geocode the address
for index, row in privateschools.iterrows():
    address = row['PSS_ADDRESS']  # Assuming 'Address' is the column name in your CSV
    latitude, longitude = get_lat_long_with_retry(address)
    privateschools.at[index, 'Latitude'] = latitude
    privateschools.at[index, 'Longitude'] = longitude

# Save the DataFrame back to a new CSV file
privateschools.to_csv('geocoded_dataset.csv', index=False)

print("Geocoding completed.")
'''

'\n# Initialize the geocoder\ngeolocator = Nominatim(user_agent="PSS_ADDRESS")\n\n# Function to get latitude and longitude for an address\ndef get_lat_long_with_retry(address):\n    location = geolocator.geocode(address)\n    if location:\n        return location.latitude, location.longitude\n    else:\n        return None, None\n\n# Add new columns for latitude and longitude\nprivateschools[\'Latitude\'] = None\nprivateschools[\'Longitude\'] = None\n\n# Iterate through each row and geocode the address\nfor index, row in privateschools.iterrows():\n    address = row[\'PSS_ADDRESS\']  # Assuming \'Address\' is the column name in your CSV\n    latitude, longitude = get_lat_long_with_retry(address)\n    privateschools.at[index, \'Latitude\'] = latitude\n    privateschools.at[index, \'Longitude\'] = longitude\n\n# Save the DataFrame back to a new CSV file\nprivateschools.to_csv(\'geocoded_dataset.csv\', index=False)\n\nprint("Geocoding completed.")\n'

In [10]:
import pandas as pd
from geopy.geocoders import Nominatim
import time

# Initialize the geocoder
geolocator = Nominatim(user_agent="PSS_ADDRESS")

# Function to get latitude and longitude for an address with retry logic
def get_lat_long_with_retry(address):
    retry_count = 0
    while retry_count < 3:  # Retry up to 3 times
        try:
            location = geolocator.geocode(address, timeout=10)  # Increase timeout to 10 seconds
            if location:
                return location.latitude, location.longitude
            else:
                return None, None
        except Exception as e:
            print(f"Geocoding attempt {retry_count + 1} failed: {str(e)}. Retrying...")
            retry_count += 1
            time.sleep(1)  # Wait for a second before retrying
    print("Geocoding failed after multiple attempts.")
    return None, None

# Add new columns for latitude and longitude
df['Latitude'] = None
df['Longitude'] = None

# Iterate through each row and geocode the address
for index, row in privateschools.iterrows():
    address = row['PSS_ADDRESS']
    latitude, longitude = get_lat_long_with_retry(address)
    df.at[index, 'Latitude'] = latitude
    df.at[index, 'Longitude'] = longitude

df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Latitude'] = None
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Longitude'] = None


Unnamed: 0,PSS_INST,PSS_ADDRESS,PSS_ZIP5,Latitude,Longitude
0,ALBERT EINSTEIN ACADEMY,"101 GARDEN OF EDEN RD 104 WILMINGTON, DE",19803,,
1,APPLE GROVE SCHOOL,"2391 HAZLETTVILLE RD DOVER, DELAWARE",19904,39.150148,-75.560097
2,AQUINAS ACADEMY,"2370 RED LION RD BEAR, DELAWARE",19701,39.606016,-75.666786
3,ASTRA ZENECA CHILD DEVELOPMENT CENTER,"1920 ROCKLAND RD WILMINGTON, DELAWARE",19803,39.780197,-75.550827
4,AUGUSTINE HILLS SCHOOL,"6 STONE HILL RD WILMINGTON, DELAWARE",19803,,
...,...,...,...,...,...
72,URBAN PROMISE ACADEMY,"2223 N MARKET ST WILMINGTON, DELAWARE",19802,39.753091,-75.539178
73,URBAN PROMISE SCHOOL,"2401 THATCHER ST WILMINGTON , DELAWARE",19802,39.749133,-75.530405
74,"URSULINE ACADEMY, LOWER","1106 PENNSYLVANIA AVE WILMINGTON , DELAWARE",19806,39.752715,-75.556676
75,WEST CENTER SCHOOL,"1418 YODER DR HARTLY, DELAWARE",19953,,


In [11]:
df.dropna(inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.dropna(inplace=True)


In [12]:
'''
#import matplotlib.pyplot as plt


# Assuming you have latitude and longitude columns in your DataFrame
# You can plot them on a scatter plot to visualize the locations
#plt.figure(figsize=(10, 6))
#plt.scatter(df['Longitude'], df['Latitude'], s=10, alpha=0.5)
##plt.title('Private Schools Locations')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.grid(True)
plt.show()
'''

"\n#import matplotlib.pyplot as plt\n\n\n# Assuming you have latitude and longitude columns in your DataFrame\n# You can plot them on a scatter plot to visualize the locations\n#plt.figure(figsize=(10, 6))\n#plt.scatter(df['Longitude'], df['Latitude'], s=10, alpha=0.5)\n##plt.title('Private Schools Locations')\nplt.xlabel('Longitude')\nplt.ylabel('Latitude')\nplt.grid(True)\nplt.show()\n"

In [13]:
import matplotlib.pyplot as plt
import folium

# center at mean lat and long
m = folium.Map(location=[df['Latitude'].mean(), df['Longitude'].mean()], zoom_start=10)

# markers for lat and long coordinate
for _, row in df.iterrows():
    folium.Marker([row['Latitude'], row['Longitude']]).add_to(m)


m

In [14]:
# get it on arcgis online
!pip install arcgis

Collecting arcgis
  Downloading arcgis-2.3.0.1.tar.gz (50.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.7/50.7 MB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting pylerc (from arcgis)
  Using cached pylerc-4.0-py3-none-any.whl
Collecting ujson>=3 (from arcgis)
  Using cached ujson-5.9.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (53 kB)
Collecting jupyterlab (from arcgis)
  Using cached jupyterlab-4.2.0-py3-none-any.whl (11.6 MB)
Collecting geomet (from arcgis)
  Using cached geomet-1.1.0-py3-none-any.whl (31 kB)
Collecting requests_toolbelt (from arcgis)
  Using cached requests_toolbelt-1.0.0-py2.py3-none-any.whl (54 kB)
Collecting pyspnego>=0.8.0 (from arcgis)
  Using cached pyspnego-0.10.2-py3-none-any.whl (129 kB)
Collecting requests-kerberos (from arcgis)
  Using cached requests_kerberos-0.14.0-py2.py3-none-any.whl (11 kB)
Collecting requests-gssapi (from arcgis)
  Using cach

In [15]:
from arcgis.gis import GIS
from arcgis.mapping import MapImageLayer
from arcgis.mapping import WebMap

# Authenticate with ArcGIS Online
gis = GIS("https://www.arcgis.com", "kwoods15", "031500Kjw$")

# Create a map in the notebook
webmap = WebMap()

# Create a MapImageLayer (replace the URL with your own service URL)
layer = MapImageLayer('https://sampleserver6.arcgisonline.com/arcgis/rest/services/USA/MapServer')

# Add the layer to the map
webmap.add_layer(layer)

# Export the map to ArcGIS
item_properties = {
    "title": "Private School Locations",
    "snippet": "This is a web map created with the ArcGIS API for Python.",
    "tags": "python, notebook, map"
}
webmap_item = webmap.save(item_properties=item_properties)
print("Map successfully exported to ArcGIS Online:", webmap_item.url)

Map successfully exported to ArcGIS Online: None
