# Comparing rainfall across cities
##### Author: John Mays

In [1]:
import pandas as pd
from city_comparison_helpers import *

The question here is, what cities should I consider?  The more cities I consider, the more data I'd have to download.  If I downloaded the **GSOY** CSV for every one of the 83,000 or so weather stations that are listed, I'd have to download ~16GB of CSVs (a rough and likely low estimate).  If I wanted to bring in some big data technologies, I could do it, but I don't. So, should I...
- get a list of cities commonly considered rainy and find one weather station per city?
- get one weather station for every "big" city in the U.S.
- get all of the weather stations in a few states?

Although it's not perfect, I will likely go with the first option

## List of U.S. cities and towns commonly considered rainy:
- Seattle, WA
- Miami, FL
- New Orleans, LA
- Birmingham, AL
- Tampa, FL
- NY, NY
- White Plains, NY
- Brookhaven, NY
- Syracuse, NY
- Buffalo, NY
- Erie, PA
- San Francisco, CA
- Rochester, NY
- Cleveland, OH
- Akron, OH
- Pittsburgh, OR
- Portland, OR
- Salem, OR
- Houston, TX
- Ft. Lauderdale, FL
- Tallahasse, FL
- Aberdeen, WA
- Detroit, MI
- Forks, WA

**ANYWHERE BUT AKRON, OH!!:**

To be honest, I am mostly only interested in living North of the Mason-Dixon line, and also not particularly interested in living in the "middle of nowhere" so I'll take cities I am not interested in off of the list.

I'll also add a few cities I'm just curious about to come up with this list:

In [2]:
cities_stations = {
    "Seattle": "USW00094290",
    "Miami": "USW00012839", # USW00092811
    "New York City": "USW00094728",
    "White Plains": "USW00094745", # techinically Westchester County
    "Brookhaven": "USW00054790",
    "Syracuse": "USW00014771",
    "Buffalo": "USW00014733",
    "Erie": "USW00014860",
    "Pittsburgh": "USW00014762",
    "San Francisco": "USW00023272",
    "West Lafayette": "USC00129430",
    # "Rochester": "USW0001476",
    "Cleveland": "USW00014820",
    "Portland": "USC00356750",
    "Detroit": "CA006139520", # USW00014822, # USC00202016
    "Forks": "USC00452914",
    "Boston": "USW00014739",
    "Madison": "USC00470273",
    "Woods Hole": "US1MABA0003", # US1MABA0013 technically Falmouth, MA
    "Monterey": "USW00023259",
    "Richland": "USW00024163" # USC00457015
}

In [3]:
partial_schema = pd.DataFrame(
    {
        "Column" : [
            "DATE",
            "PRCP",
            "SNOW",
            "DYFG",
            "DYHF",
            "DYTS",
            "EMXP",
            "DP01",
            "DP10",
            "DP1X",
            "DSNW"
        ],
        "Description" : [
            "year AD",
            "total annual precipitation in inches",
            "total annual snowfall in inches",
            "number of days with fog",
            "number of days with 'heavy' fog",
            "number of days with thunderstorms",
            "highest daily total of precipitation in inches",
            "number of days with over 0.01 inches of unspecified precipitation, probably rain w/o snow",
            "number of days with over 0.10 inches of unspecified precipitation, probably rain w/o snow",
            "number of days with over 1.00 inches of unspecified precipitation, probably rain w/o snow",
            "number of days with over 1.00 inches of snowfall"
        ]
    }
)
columns_selection = list(partial_schema["Column"])

In [4]:
cities = total_concat(cities_stations, columns_selection)

In [6]:
cities.head()

Unnamed: 0,DATE,PRCP,SNOW,DYFG,DYHF,DYTS,EMXP,DP01,DP10,DP1X,DSNW,CITY
21,2012,1191.1,196.0,42.0,,4.0,66.0,184.0,122.0,5.0,3.0,Seattle
22,2013,756.1,23.0,65.0,,5.0,38.9,159.0,80.0,3.0,0.0,Seattle
23,2014,1188.1,51.0,72.0,1.0,4.0,36.3,161.0,108.0,9.0,1.0,Seattle
24,2015,1000.1,0.0,78.0,10.0,9.0,55.4,150.0,85.0,4.0,0.0,Seattle
25,2016,1145.1,35.0,61.0,3.0,5.0,44.7,187.0,115.0,7.0,0.0,Seattle


# Extra Methods:

In [5]:
# check_years(cities_stations, rng=(2011, 2021))