# Lecture 10 (Lecture 9 was Sneha Jha's Mercer+Hall Presentation)

## Topics we still need to cover
 - Using Pandas to Solve HW Problem from L8
 
For more detail on python please see the Purdue DataMine web link: <a href="https://thedatamine.github.io/the-examples-book/python.html" target="_blank">Data Mine on Python</a>

Also see: <a href="https://docs.python.org/3/" target="_blank">Python 3.9.1 Documentation</a>

In [None]:
# Bring in the packages we have used before.

import math
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import requests
import pandas as pd

## A HW problem: Use the average wheat yields data from L5 and the FIPS dictionary from L6 with the mapping methods above to create maps of county average wheat yields for any specified year ...

### Could you make a movie (a series of maps) over time?

### Could you find a real map projection rather than our poor Smith Center scaling approach?

In [None]:
# The read csv command from pandas replaces a fair bit of code we had
# previously required ...

WheatYields = pd.read_csv('Data/WheatYields--Wrangled.csv')
CitiesAndTowns = pd.read_csv('Data/Wrangled-cities-and-towns-of-the-united-states.csv')

### What are the contents of these DataFrames?

In [None]:
WheatYields

In [None]:
CitiesAndTowns

## What I'd like to do ...

### Merge these two tables to create a single table with the following columns ...

Year, State Ansi/Fips, County Ansi/Fips, Latitude, Longitude, Value (wheat yield in bu/acre)

This really amounts to deleting certain columns from the WheatYields dataframe and adding the columns for latitude and longitude

### But first, let's use these as examples to explore operations in pandas ...

`CitiesAndTowns` and `WheatYields` are examples of dataframes. Think of these as either a generalization of a numpy array or as a specialization of a python dictionary.

The values in a dataframe are typed in the sense that the elements in a single column must be of the same type. In addition, a dataframe has associated index and columns ...

### DataFrame as generalized numpy array ...

In [None]:
# The index attribute ...

CitiesAndTowns.index

In [None]:
print(WheatYields.index)
print(type(WheatYields.index))

In [None]:
# The columns attribute

CitiesAndTowns.columns

In [None]:
WheatYields.columns

In [None]:
type(CitiesAndTowns.columns)

### DataFrame as specialized dictionary

A dictionary maps a key to a value; a DataFrame maps a column name to a Series of column data ...

In [None]:
# In this sense each column is a pandas Series ...

print(type(WheatYields['County']))

WheatYields['County']

In [None]:
print(type(WheatYields['Value']))

WheatYields['Value']

### There are a number of ways to construct DataFrames, but we have two of them already, so maybe more interested in this later. The ways are:

- From a Series
- From a list of dictionaries
- From a dictionary of Series objects
- From a two-dimensional numpy array
- From a numpy structured array

Come back to these later ...

## Data Indexing and Selection with DataFrames ...

Recall some of the ways to access, set, and modify values in a numpy array. These include

- indexing ... e.g., array[3,7]
- slicing ... e.g., array[:,3:11]
- masking ... e.g., array[array > 0}

In [None]:
# Explore this for Series. Construct one from our DataFrames ...

LatSeries = CitiesAndTowns['LATITUDE']
print(LatSeries)

In [None]:
LatSeries.keys()

In [None]:
LatSeries[4]

In [None]:
print(list(LatSeries.items()))

In [None]:
# Slicing

LatSeries[1:7]

In [None]:
# Masking ...

LatSeries[(LatSeries > 40.0) & (LatSeries < 42.0)]

There is some danger of confusion between the explicit index and an implicit index. This comes up when the index set is a range as it is in these two examples. In order to avoid such problems it is recommended to use the following indexers ...

- loc -->
- iloc -->
- ix -->

In [None]:
LatSeries

In [None]:
print(LatSeries.loc[1])
print(LatSeries.iloc[1])

In [None]:
print(LatSeries.loc[1:3])
print(LatSeries.iloc[1:3])

### Suppose you wanted to make a combined FIPs ....

Let the integer FIPs be defined by putting the state FIPs into the 1000s place and letting the county FIPS represent a number between 1 and 999 ...

In [None]:
CitiesAndTowns['CombinedFIPS'] = 1000*CitiesAndTowns['STATE FIPS'] + CitiesAndTowns['COUNTY FIPS']
WheatYields['CombinedFIPS'] = 1000*WheatYields['State ANSI'] + WheatYields['County ANSI']

In [None]:
CitiesAndTowns

In [None]:
WheatYields

### A few more examples ...

In [None]:
CitiesAndTowns.values

In [None]:
CitiesAndTowns.T

In [None]:
CitiesAndTowns

In [None]:
CitiesAndTowns.loc[CitiesAndTowns.LATITUDE > 42.0, ['LATITUDE', 'LONGITUDE']]

In [None]:
WheatYields.loc[WheatYields.Year == 2007, :]

### There is a lot of superfluous information ... Pare it down to simplify

In [None]:
NewWheatYields = WheatYields.loc[:, ['Year', 'Value', 'CombinedFIPS']]
NewWheatYields

In [None]:
CitiesAndTowns

In [None]:
CitiesAndTowns['FEATURE2'].unique()

In [None]:
CitiesAndTowns.loc[(CitiesAndTowns.FEATURE2 == 'County Seat') | (CitiesAndTowns.FEATURE2 == 'State Capital County Seat'), :]

In [None]:
NewCitiesAndTowns = CitiesAndTowns.loc[(CitiesAndTowns.FEATURE2 == 'County Seat') | (CitiesAndTowns.FEATURE2 == 'State Capital County Seat'), ['LATITUDE', 'LONGITUDE', 'CombinedFIPS']]
NewCitiesAndTowns

In [None]:
Blah = pd.merge(NewCitiesAndTowns, NewWheatYields)

In [None]:
Blah

In [None]:
Blah.loc[Blah.Year == 1999,:]

In [None]:
dv = Blah.loc[Blah.Year == 1999,'Value']
dlat = Blah.loc[Blah.Year == 1999,'LATITUDE']
dlon = Blah.loc[Blah.Year == 1999,'LONGITUDE']
type(dv)

In [None]:
fig = plt.figure()
plt.style.use('classic')
plt.scatter(dlon, dlat, c=dv, cmap='cool')
plt.colorbar()
plt.title("U.S. County Average Wheat (bu/acre)")
plt.xlabel("Longitude Degrees")
plt.ylabel("Latitude Degrees")
plt.grid()