# Data for Creating a Plant Hardiness Map

This notebook contains code for gathering and processing the data needed to create an updated map of plant hardiness zones in the US. Just a warning: this notebook takes quite a while to run.

[click here to see the data plotted on an interactive map.](https://fletchgraham.github.io/hardiness/)

[or here to see an Observable notebook with a voronoi zone map.](https://observablehq.com/@fletchgraham/us-plant-hardiness-zones-voronoi)



In [1]:
from io import BytesIO
import tarfile
from urllib.request import urlopen
import pandas as pd
import numpy as np

In [2]:
url = 'https://www.ncei.noaa.gov/data/gsoy/archive/gsoy-latest.tar.gz'
b = BytesIO(urlopen(url).read())

This url gives us an archive filled to the brim with indivudual csv files. Each csv represents a weather station somewhere in the world and each record in that csv represents the summary of a particular year. Our task is to filter this data down and get it into one csv file. 

The code below iterates over the csv files in the archive and joins them into one pandas dataframe, ignoring them if they aren't in the US. It also drops any records that are older than 2014.

In [3]:

def extractor():
    with tarfile.open(mode='r', fileobj=b) as archive:
        for m in archive.getmembers():
            if not m.name.startswith('US'):
                continue

            yield archive.extractfile(m)
        

df_main = pd.DataFrame(columns=['STATION','NAME', 'DATE', 'LATITUDE', 'LONGITUDE', 'EMNT'])

gen = extractor()
for g in gen:
    df = pd.read_csv(g)
    if not 'EMNT' in df.keys():
        continue
    filtered = df[['STATION','NAME', 'DATE', 'LATITUDE', 'LONGITUDE', 'EMNT']].loc[df['DATE'] >= 2014].dropna()
    df_main = pd.concat([df_main, filtered])
    

Lets just inspect what we got out of that.

In [4]:
df_main.shape

(36254, 6)

In [5]:
df_main.head()

Unnamed: 0,STATION,NAME,DATE,LATITUDE,LONGITUDE,EMNT
41,USC00010160,"ALEXANDER CITY, AL US",2014,32.9452,-85.948,-15.6
42,USC00010160,"ALEXANDER CITY, AL US",2015,32.9452,-85.948,-12.8
43,USC00010160,"ALEXANDER CITY, AL US",2016,32.9452,-85.948,-6.7
44,USC00010160,"ALEXANDER CITY, AL US",2017,32.9452,-85.948,-10.0
45,USC00010160,"ALEXANDER CITY, AL US",2018,32.9452,-85.948,-12.2


Looks good! now we write the csv file.

In [6]:
df_main.to_csv('gsoy.csv')