## Wrangling public school location data 

### Goals of the Task



There are two tables in the dataset retrieved from the Seattle open data portal <br>
*Each row in both files is a school.* <br>
The csv file is designed to be used in a geospatial tool with an X,Y plotted map image, which we dont have access to. The json file is a dictionary that does contains location coordinates along with other information about each school but the information is nested (dictionaries within dictionaries). This file will need to be unnested in order to make a readable dataframe of longitude and latitude data, which can be joined to the schools data. 

- We can potentially use this data to identify how many public schools lie within 1km of each cycle hire station and how far away the nearest public school is from a given cycle hire station. This information could be useful for estimating school related demand on the cycle hire network in term time, versus school holidays. 

#### Step 1 : use pandas to read the schools csv file as a data frame 
- import pandas as pd 
- use pandas read_csv to create a schools data frame
- ensure you are pointing at the correct file path for the data source (you may have to navigate in your notebook!) 


#### Step 2 : drop unnecessary columns 

Remove the the X, Y coordinates, map label and status columns from the dataframe using a slice or selection method. 

Use head() and info() to preview the remaining dataframe 

#### Step 3 : read in the geojson file through the path
- import the json library 
- set the file path as a variable, for example: 
<blockquote>
    path = 'data/Seattle_Public_Schools_Sites_2022-2023.geojson'<br>
</blockquote>  

- open the file using json.load and the file path 

<blockquote>
    with open(path) as f: <br>
        -> schoolsdict = json.load(f) <br>
        -> print(schoolsdict)
</blockquote>

- review the schoolsdict variable by eye and look for the nested dictionary structure. You should see that the file contains (at the uppermost level) 4 keys - 'type', 'name', 'crs', 'features' and there are sub dictionaries nested under each key, but it is hard to read!

#### step 4: print the properties of each feature

- drilling into the features reveals a list of properties, containing school name and school ID which could be used to join to the csv file 

<blockquote>
    for feature in schoolsdict['features']:<br>
      ->  print(feature['properties'])<br>
              </blockquote>

#### step 5: print the coordinates of the first school in the file 

- using the index slicing method we can focus on position 0, the first school in the source data. This reveals the Longitude and Latitude of the school 

<blockquote>
    coords = schoolsdict['features'][0]['geometry']['coordinates'] <br>
print(coords)</blockquote>


#### Extend this method using a for loop to print all the coordinates 

<blockquote>
for i in range(len(schoolsdict['features'])):<br>
      -> print(schoolsdict['features'][i]['geometry']['coordinates'])<br>
</blockquote>

#### step 6: collect the School IDs and geolocation coordinates from the geojson file to a dataframe

- use a for loop to collect the data 
- the final data frame should be 117 rows long and 3 columns wide
- column headers are schoolid, longitude and latitude 
- make sure you map the correct data to the correct column header
- preview your dataframe to ensure it shows the expected results

In [108]:
long=[]
lat=[]
schid=[]
for i in range(len(schoolsdict['features'])):
    a=schoolsdict['features'][i]['properties']['schID']
    schid.append(a)
    b=schoolsdict['features'][i]['geometry']['coordinates'][0]
    long.append(b)
    c=schoolsdict['features'][i]['geometry']['coordinates'][1]
    lat.append(c)
geodf=pd.DataFrame(
    {'schoolid': schid,
     'longitude': long,
     'latitude': lat
    }
)

In [109]:
geodf

Unnamed: 0,schoolid,longitude,latitude
0,106,-122.293009,47.709945
1,292,-122.314658,47.713430
2,264,-122.263172,47.498863
3,203,-122.377588,47.509700
4,221,-122.258636,47.514820
...,...,...,...
112,109,-122.305245,47.621603
113,971,-122.337289,47.694944
114,210,-122.287531,47.725814
115,365,-122.353265,47.632023
