# Lake Effect

<img src = 'https://upload.wikimedia.org/wikipedia/commons/thumb/b/bc/Chicago_skyline_kz01.jpg/640px-Chicago_skyline_kz01.jpg' width = 600>

__How does the lake affect temperatures in the city?__ 

You have access to 3 data sets:
1. A list of all of the nodes: `nodes.csv`
2. A snapshot of temperature readings for July 18, 2019 at about 3pm: `July18_2019.csv`
3. A series of latitude and longitude reference points for the lakeshore: `Chicago_Lakefront.csv`


<img src = "../images/Doing_Science_with_AoT/lake_effect_map.jpg"  width = 800>

In [None]:
import pandas as pd
pd.options.mode.chained_assignment = None
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import scipy
plt.style.use('seaborn')

# Solution

#### Explore the Data

In [None]:
nodes = pd.read_csv("../data/Nodes.csv")
nodes.head()

In [None]:
lakefront = pd.read_csv('../data/Chicago_Lakefront.csv')
lakefront.head()

In [None]:
temperatures = pd.read_csv("../data/July18_2019.csv")
temperatures.head()

## Calculate the **closest** distance from each node to the lakeshore:

<img src = "../images/Doing_Science_with_AoT/Slide8.jpeg"  width = 800>

In [None]:
lakefront['distance'] = np.nan       # Distance from a particular node to each lakefront point
nodes['lake_distance'] = np.nan      # Distance from the node to the nearest lakefront point

#### 1. Get the lat and long of the first node:

<img src = "../images/Doing_Science_with_AoT/Slide10.jpeg"  width = 800>

In [None]:
node_lon = abs(nodes['lon'].iloc[0])
print(node_lon)
node_lat = abs(nodes['lat'].iloc[0])
print(node_lat)

#### 2. Get the latitude and longitude distances from the node to the first lakeshore point:

<img src = "../images/Doing_Science_with_AoT/Slide14.jpeg"  width = 800>

In [None]:
lon_diff = node_lon - abs(lakefront['Longitude'].iloc[0])
print(lon_diff)
lat_diff = node_lat - abs(lakefront['Latitude'].iloc[0])
print(lat_diff)

#### 3. Calculate the straight-line distance from the node to the first lakeshore point:

<img src = "../images/Doing_Science_with_AoT/Slide16.jpeg"  width = 800>

In [None]:
distance_to_lake = np.sqrt(lon_diff ** 2 + lat_diff ** 2)
print(distance_to_lake)

#### 4. Store this distance to the `lakefront` dataframe:

In [None]:
lakefront['distance'].iloc[0] = distance_to_lake
lakefront.head()

#### 5. Repeat for the next lakeshore marker:

<img src = "../images/Doing_Science_with_AoT/Slide22.jpeg"  width = 800>

In [None]:
node_lon = abs(nodes['lon'].iloc[1])
node_lat = abs(nodes['lat'].iloc[1])
lon_diff = node_lon - abs(lakefront['Longitude'].iloc[1])
lat_diff = node_lat - abs(lakefront['Latitude'].iloc[1])
distance_to_lake = np.sqrt(lon_diff ** 2 + lat_diff ** 2)
lakefront['distance'].iloc[1] = distance_to_lake
lakefront.head()

#### 6. Repeat for all the lakeshore markers.

<img src = "../images/Doing_Science_with_AoT/Slide23.jpeg"  width = 800>

In [None]:
# Reset node_lon and node_lat values to implement loop:
node_lon = abs(nodes['lon'].iloc[0])
node_lat = abs(nodes['lat'].iloc[0])

In [None]:
for j in range (len(lakefront)):
    lon_diff = node_lon - abs(lakefront['Longitude'].iloc[j])
    lat_diff = node_lat - abs(lakefront['Latitude'].iloc[j])
    distance_to_lake = np.sqrt(lon_diff ** 2 + lat_diff ** 2)
    lakefront['distance'].iloc[j] = distance_to_lake 

In [None]:
lakefront

#### 7. Calculate the closest lakeshore marker to the node:

<img src = "../images/Doing_Science_with_AoT/Slide25.jpeg"  width = 800>

In [None]:
distance = lakefront['distance'].min()
nodes['lake_distance'].iloc[0] = distance

In [None]:
nodes.head()

#### 8. Convert lat/long degrees to miles:

In [None]:
nodes['lake_distance'] = nodes['lake_distance'].apply(lambda x: x*69)

In [None]:
nodes.head()

#### 9. Repeat for all other nodes:

In [None]:
for i in range (len(nodes)):
    node_lon = abs(nodes['lon'].iloc[i])
    node_lat = abs(nodes['lat'].iloc[i])
    for j in range (len(lakefront)):
        lon_diff = node_lon - abs(lakefront['Longitude'].iloc[j])
        lat_diff = node_lat - abs(lakefront['Latitude'].iloc[j])
        distance_to_lake = np.sqrt(lon_diff ** 2 + lat_diff ** 2)
        lakefront['distance'].iloc[j] = distance_to_lake 
    distance = lakefront['distance'].min()
    nodes['lake_distance'].iloc[i] = distance
nodes['lake_distance'] = nodes['lake_distance'].apply(lambda x: x*69)

In [None]:
nodes

In [None]:
nodes['lake_distance'].describe()

## Is there a correlation between distance from the lake and temperature?

#### 1. Convert the temperatures to Farenhei:

In [None]:
temperatures.head()

In [None]:
temperatures['Temperature_F'] = temperatures['value_hrf'].apply(lambda x: x*(9/5) + 32)

In [None]:
temperatures.head()

In [None]:
temperatures.describe()

#### 2. Visualize

In [None]:
nodes.head()

In [None]:
temperatures.head()

In [None]:
distance_temps = pd.merge(nodes, temperatures, on = 'node_id')

In [None]:
distance_temps

In [None]:
distance_temps.dropna(subset=['Temperature_F'], inplace = True)

In [None]:
distance_temps

#### 3. Scatter Plot

In [None]:
plt.scatter(distance_temps['lake_distance'], distance_temps['Temperature_F'])
plt.xlabel('Distance from Lake')
plt.ylabel('Temperature (F)')

In [None]:
sns.regplot(x = distance_temps['lake_distance'], y = distance_temps['Temperature_F'])

#### 4. Apply Statistics

In [None]:
scipy.stats.linregress(distance_temps['lake_distance'], distance_temps['Temperature_F'])