# Section 5. Spatial Data Practice

#### Instructor: Pierre Biscaye

The content of this notebook draws on material from UC Berkeley's Spatial Data Analysis [course](https://docs.google.com/document/d/1oC10pjyeBQTenQazCpaB8Lx1b5PC1SR3WFiPgCtXqcs/edit?tab=t.0) notes and lab exercises by [Jaecheol Lee](https://sites.google.com/view/jaecheollee) [course](https://github.com/dlab-berkeley/Python-Fundamentals).

In [None]:
import pickle
import numpy as np
from scipy import stats
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import rasterio
import rasterio.transform
import rasterio.mask
import rasterio.warp
import rasterio.windows
from matplotlib.colors import LinearSegmentedColormap
from shapely.geometry import (Point, LinearRing,
                              Polygon, MultiPolygon)

%matplotlib inline

## Challenge 1: Distances between points

The below data has (fake!) information on students, the locations of their homes, and locations of their favorite restaurants.
 
```python
pd.DataFrame({'name': ['A', 'B', 'C'], 
              'dep_lon': [3.0512085, 3.052085, 3.093580], 
              'dep_lat': [45.767776, 45.781205, 45.7709351], 
              'rest_lon': [3.060935, 3.0503950, 3.085095], 
              'rest_lat': [46.799595, 45.093523, 47.30595]}) 
```
1. Calculate the matrix of distances between the students' homes and their favorite restaurants. Include distances from students to the restaurants of other students. Let's assume Earth is flat (!) and ignore the km conversion for distances.
2. Identify the restaurant with smallest sum of distances to the 3 students.
3. In two subplots in a figure, plot the home locations in a subplot and plot the restaurant locations in the other. Mark with a red star the restaurant which has the smallest sum of distance.

Note: You might want to use np.zeros(3, 3) in procedure 1 to set the accumulator, and a double loop (loop within a loop).

In [None]:
file=pd.DataFrame({'name': ['A', 'B', 'C'], 
              'dep_lon': [3.0512085, 3.052085, 3.093580], 
              'dep_lat': [45.767776, 47.781205, 45.7709351], 
              'rest_lon': [3.060935, 3.0503950, 3.085095], 
              'rest_lat': [46.799595, 45.093523, 47.30595]}) 

In [None]:
file

In [None]:
# Your code for step 1


In [None]:
# Your code for step 2


In [None]:
# Your code for step 3


## Challenge 2. Exercises on indexing and plotting

Let's practice indexing and plotting point data in a data frame using a dataset of crime events, `crime_locations.csv`.

In [None]:
# Load the data
df = 

In [None]:
# inspect the data
df

### 2-1. Plot the first 10 crimes

In [None]:
# Get the first 10 x coordinates

# Get the first 10 y coordinates

# Plot the x coordinates and the y coordinates


### 2-2. Plot the crimes that happened in May

In [None]:
# Get the x coordinates that happened in May

# Get the y coordinates that heppened in May

# Plot the x coordinates and the y coordinates


### 2-3. Plot the crimes that happened in May and June

Plot these on the same plot, using a different color and shape for each month.

In [None]:
# Get the x coordinates that happened in May

# Get the first y coordinates that heppened in May


# Get the x coordinates that happened in June

# Get the y coordinates that heppened in June

# Plot the x coordinates and the y coordinates
# With different markers for May and June


### 2-4. Make a figure with subplots showing violent and nonviolent crimes

In [None]:

# Get the x coordinates of violent crimes

# Get the y coordinates of violent crimes

# Get the x coordinates of non-violent crimes

# Get the y coordinates of non-violent crimes

# Set a figure environment with appropriate rows and columns

# Plot the x coordinates and the y coordinates
# with different markers


### 2-4. Make a heatmap of violent crimes!

Follow the following code to:
1. Make a grid of points
2. Define a function to calcualte lambda (create a distance function first).
3. Use the lambda function to calculate lambda for all grid points, using h=3
4. Make a heat map

In [None]:
# 1. Make a grid

In [None]:
# 2. Lambda function


In [None]:
# 3. Calculating lambda


In [None]:
# 4. Making the heatmap


## Challenge 3. Mapping polygons and buffers

Working with the Hawai'i islands data, let's do an analysis in coast guard planning. 

Suppose there is a single fast-response heliport in Oahu, and the farthest it can travel for a rescue is 300km. 

You want to determine whether it can reach all of the coastal areas (within 20km of land) in Hawai'i.

First, we'll load the Hawai'i data. This file contains some polygons for the state of Hawai'i. There are multiple polygons because the islands making up the Hawaiian archipelago are not contiguous. The file also contains data on the location of the heliport in Oahu. Because the boundaries are complex in structure, the file uses many points to describe the structure of the islands, making maps accurate but making the data difficult to examine by eye. 

**Begin by examining the data. How many separate polygons are there?**

Hint: The polygons are of the class `shapely.geometry.MultiPolygon`. For many objects you can use the generic functions to learn about their properties: `dir()`, `len()` (for length), and `type()`.

In [None]:
with open('Lab4_hawaii.p', 'rb') as f:
    d = pickle.load(f)
hawaii=d['hawaii']
oahu=d['oahu']

# your code here to see how many polygons are in Hawaii


**Calculate the approximate number of degrees of longitude that would equal the flight range of a helicopter leaving Oahu** (Hint: this doesn't require Python, but if you use Python you may need the function `numpy.cos()`.). Also **calculate the number of degrees of longitude that would equal 20 km**, the off-coast distance that your helicopter team must monitor.

In [None]:
# Fill in the below code

def change_in_longitude(latitude, km):
    "Given a latitude and a distance in km, return the distance in degrees of longitude."
    # Find the radius of a circle around the earth at given latitude, given radius of earth 6371km
    
    return 

h_range = change_in_longitude(...)
print('The Oahu helicopter\'s range is {} degrees longitude'.format(h_range))
offcoast = change_in_longitude(...) 
print('The off-coast monitoring distance is {} degrees longitude'.format(offcoast))

Now we will **create a multipolygon of coastal areas in Hawaii**.

We start by creating a buffer around the Hawai'i multipolygon based on the offcoast distance, and then we difference it from Hawai'i to make a new multipolygon of just the coastal areas. This is easy in shapely using the `buffer` and `difference` methods.

In [None]:
# Fill in the below code
c_area = hawaii.buffer( ...  ).difference( ...  )

Finally, let's **plot the range of the helicopter** and any coastal areas that are out of range. 

We will difference the coastal areas from the helicopter range to highlight which areas are at risk.

In [None]:
# Fill in or edit the below code

# Make the helicopter location a shapely point
o_pt=Point(oahu['lat'],oahu['lon'])

# Define a polygon with the range of the helicopter
h_ellipse = o_pt.buffer( [object]  )

# Identify uncovered coastal areas
uncov = [object].difference([object])

# Make the figure
fig=plt.figure(figsize=(10,7))

# Plot Hawaii
for i in range(27):
    plt.plot(hawaii.geoms[i].exterior.xy[1],
             hawaii.geoms[i].exterior.xy[0],
             c='green')

# Plot the helipad with a red star, and label it "Helicopter pad"
plt.plot(
        
    )

# Plot the range of the helicopter with a purple dashed line and label it "Helicopter range"
plt.plot(
        
    )

# Plot the coastal monitoring areas with a blue dashed line and label it 
for i in range(len(c_area.geoms)):
    plt.plot(
        
    )

# Add a "Coastal monitoring boundary" label
plt.plot([], [], c=, linestyle=, label='Coast monitoring boundary')

# Plot the uncovered area and fill it in orange
plt.fill(
    , label='Uncovered area')
plt.legend()
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.title('Range of Oahu fast-response heliport and uncovered monitoring area outside of range')
plt.show()

## Challenge 4. Overlay a raster and a shapefile

Let's work with the GPW population data. We will zoom in on France and overlay the population data on administrative boundaries of France.

The shapefile is `FRA_adm1.shp`. This file has the administrative boundaries of the regions of France.

1. Load the GPW data using rasterio.
2. Load the France admin boundary data using geopandas.
3. Define a color scheme for mapping population and plot it.
4. Add the boundaries of French regions over the population heat map.
5. Zoom into France in the figure.
6. Save the figure.

If you want to push it, mask out the area outside France using the `FRA_adm0.shp` file.

7. Do this again for a country of your choice. You can download administrative boundaries [here](https://gadm.org/download_country.html).

In [None]:
# Your code here, with some hints

# 1
dataset = rasterio.open('Data/GPW.tif')
band = dataset.read(1)

In [None]:
# 2
fra = gpd.read_file()

# explore the bounds to determine how to set the zoom parameters
fra.bounds

In [None]:
# 3 
nodes = [...]  # positions for each color from 0-1
color_scheme = [...]  # corresponds to nodes
custom_cmap = LinearSegmentedColormap.from_list(
    'custom_name', list(zip(nodes, color_scheme)))
custom_cmap.set_under(...)  # set values under vmin to ...
custom_cmap.set_over(...)  # set values over vmax to ...

In [None]:
# 4 and 5

# Plotting starts
fig, ax = plt.subplots(figsize=(10, 4))

# Heat map
im = ax.imshow(
    
)
fig.colorbar(im)

# France admin  boundaries
fra.plot(ax=ax,color='none', edgecolor='k', alpha=0.3)

# Zoom in
ax.set_xlim()
ax.set_ylim()

# Label axes and title
ax.set_xlabel('Longitude')
ax.set_ylabel('Latitude')
ax.set_title('Population in France (Source: CIESIN 2018)')

# 6 save

In [None]:
# 7 Your code here

## Challenge 5: Mapping time series

Pick a location in Australia and examine the time series of January rainfall at that location using `AustraliaRainfall.nc`. 

In a new figure with two subplots, first plot the rainfall field in January of 2006 across space in one subplot. On top of this image, mark the location that you will examine with a white circle. 

In the second subplot, plot January rainfall in each year as a time series (so "years" is on the x-axis) at the specified location.

In [None]:
# code here

# import libraries

# load the data

# set the environment for multiple subplots
fig, (ax0, ax1) = plt.subplots( ... )

# Plot the data
# You might want to set ax argument within the plot method, e.g. data.plot(ax = ax0)


## Challenge 6: Map algebra

Use the same Australia rainfall data.

1. Calculate the historical mean rainfall for January, April and August. Plot on the same figure and use the same colorbar.

In [None]:
# Set an environment for a plot with three subplots
fig, (ax1, ax4, ax8) = plt.subplots(...)

# plot the 3 plots using .sel().mean().plot(ax=ax#)
# specify a common vmin and vmax to ensure the same colorbar
# choose any colormap you like.

plt.tight_layout()
plt.show()

2. Now use map algebra to plot the deviation of total 2006 rainfall from historical average annual rainfall, as a percentage of historical averages. Map that.

In [None]:
# You need two bands to calculate the deviation 
# 1. The 2006 total rainfall data: A
rain_2006 = 
# 2. The average annual rainfall data: B
rain_mean = 
# Numerator: their difference, Denominator: the average
deviation = ((rain_2006 - rain_mean) / rain_mean)

# Plot it

