# Main script to clean wind data at the daily level

**Citations (data sources)**

``Wind data:`` 

download the "MERRA2_100.tavgM_2d_slv_Nx" product; this provides monthly averages of U and V components

1. https://search.earthdata.nasa.gov/search/granules?p=C1276812859-GES_DISC&pg[0][qt]=1991-01-01T00%3A00%3A00.000Z%2C2017-12-31T23%3A59%3A59.999Z&pg[0][gsk]=-start_date&q=MERRA-2%20tavgM&tl=1624239533!3!!&m=-0.0703125!0.0703125!2!1!0!0%2C2

and data dictionary here:

2. https://gmao.gsfc.nasa.gov/pubs/docs/Bosilovich785.pdf
3. https://disc.gsfc.nasa.gov/datasets/M2T1NXSLV_5.12.4/summary


``Shapefiles for California ZIP codes (2010 census):``

4. https://www.census.gov/cgi-bin/geo/shapefiles/index.php?year=2010&layergroup=ZIP+Code+Tabulation+Areas

``Installation errors with Geopandas:``

5. https://stackoverflow.com/questions/54734667/error-installing-geopandas-a-gdal-api-version-must-be-specified-in-anaconda

``How to compute wind speed and direction:``

6. https://stackoverflow.com/questions/21484558/how-to-calculate-wind-direction-from-u-and-v-wind-components-in-r
7. https://github.com/blaylockbk/Ute_WRF/blob/master/functions/wind_calcs.py

``Wind speed and direction intuition:``

8. http://colaweb.gmu.edu/dev/clim301/lectures/wind/wind-uv
9. https://www.earthdatascience.org/courses/use-data-open-source-python/intro-vector-data-python/spatial-data-vector-shapefiles/intro-to-coordinate-reference-systems-python/

``To create maps of this wind data:``

and also used to provide intuition for winddir and windspeed

10. https://disc.gsfc.nasa.gov/information/howto?title=How%20to%20calculate%20and%20plot%20wind%20speed%20using%20MERRA-2%20wind%20component%20data%20using%20Python



**Citations (persons)**
1. N/A

**Preferred environment**
1. Code written in Jupyter Notebooks

In [1]:
# conda install -c conda-forge cartopy
!pip install cartopy

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting cartopy
  Downloading Cartopy-0.20.3.tar.gz (10.8 MB)
[K     |████████████████████████████████| 10.8 MB 7.9 MB/s 
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25herror
  Downloading Cartopy-0.20.2.tar.gz (10.8 MB)
[K     |████████████████████████████████| 10.8 MB 37.0 MB/s 
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25herror
  Downloading Cartopy-0.20.1.tar.gz (10.8 MB)
[K     |████████████████████████████████| 10.8 MB 41.2 MB/s 
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25herror
  Downloading Cartopy-0.20.0.tar.gz (10.8 MB)
[K     |████████████████████████████████| 10.8 MB 34.5 MB/s 
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build whe

In [2]:
!pip install geopandas
!pip install osmnx

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting geopandas
  Downloading geopandas-0.10.2-py2.py3-none-any.whl (1.0 MB)
[K     |████████████████████████████████| 1.0 MB 7.5 MB/s 
Collecting fiona>=1.8
  Downloading Fiona-1.8.21-cp37-cp37m-manylinux2014_x86_64.whl (16.7 MB)
[K     |████████████████████████████████| 16.7 MB 36.9 MB/s 
[?25hCollecting pyproj>=2.2.0
  Downloading pyproj-3.2.1-cp37-cp37m-manylinux2010_x86_64.whl (6.3 MB)
[K     |████████████████████████████████| 6.3 MB 38.6 MB/s 
Collecting munch
  Downloading munch-2.5.0-py2.py3-none-any.whl (10 kB)
Collecting cligj>=0.5
  Downloading cligj-0.7.2-py3-none-any.whl (7.1 kB)
Collecting click-plugins>=1.0
  Downloading click_plugins-1.1.1-py2.py3-none-any.whl (7.5 kB)
Installing collected packages: munch, cligj, click-plugins, pyproj, fiona, geopandas
Successfully installed click-plugins-1.1.1 cligj-0.7.2 fiona-1.8.21 geopandas-0.10.2 munch-2.5.0 pyproj-3.2.1
Loo

In [3]:
from google.colab import drive

drive.mount("/content/gdrive")

Mounted at /content/gdrive


In [4]:
# !ls "/content/gdrive/MyDrive"
# !mkdir "/content/gdrive/MyDrive/W210-data"
! ls "/content/gdrive/MyDrive/W210-data"
# !mkdir "/content/gdrive/MyDrive/W210-data/shapefiles_zcta/"
# !ls "/content/gdrive/MyDrive/W210-data/shapefiles_zcta/amazon-biome-states/"

MERRA2_400.tavg3_3d_asm_Nv.20211231.nc4  shapefiles_zcta
MERRA2_400.tavgM_2d_slv_Nx.202112.nc4


### Step 1: Import packages

In [5]:
import pandas as pd
import numpy as np
import netCDF4 as ncdf
import os
from datetime import date, timedelta
from math import pi

import matplotlib.pyplot as plt
import cartopy.crs as ccrs
from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER
import matplotlib.ticker as mticker

# geography
import geopandas as gpd
import osmnx as ox
import shapely
from shapely.geometry import Point
import sklearn.neighbors
dist = sklearn.neighbors.DistanceMetric.get_metric(
    'haversine'
)

# ignore warnings
import warnings
warnings.filterwarnings(
    'ignore'
)



### Step 2: Define working directories

In [7]:
# in_dir_zip_shapes = 'C:/Users/cilin/Research/CA_hospitals/Input/raw_data/census_geo/shapefiles_zcta/'
# in_dir = 'C:/Users/cilin/Research/CA_hospitals/Input/raw_data/winds/'
# in_health = 'C:/Users/cilin/Research/CA_hospitals/Input/final_data/health/'
# out_dir = 'C:/Users/cilin/Research/CA_hospitals/Input/final_data/winds/'

in_dir = "/content/gdrive/MyDrive/W210-data/"
in_dir_zip_shapes = "/content/gdrive/MyDrive/W210-data/shapefiles_zcta/amazon-biome-states/"

### Step 3: Define functions

``read_clean wind``

In [8]:
os.listdir(in_dir)

wind_1day = ncdf.Dataset('/content/gdrive/MyDrive/W210-data/MERRA2_400.tavg3_3d_asm_Nv.20211231.nc4', mode = 'r')
list(wind_1day.variables)

['lon',
 'lat',
 'lev',
 'time',
 'CLOUD',
 'DELP',
 'EPV',
 'H',
 'O3',
 'OMEGA',
 'PHIS',
 'PL',
 'PS',
 'QI',
 'QL',
 'QV',
 'RH',
 'SLP',
 'T',
 'U',
 'V']

In [9]:
# grab vars of interest ##
##########################
# timestamp
timestamp = wind_1day.variables['time']
# longitude and latitude
lons = wind_1day.variables['lon']
lats = wind_1day.variables['lat']
# eastward wind m/s
U = wind_1day.variables['U']
# northward wind m/s
V = wind_1day.variables['V']
# air temperature
T = wind_1day.variables['T']


In [None]:
timestamp[:]

masked_array(data=[   0,  180,  360,  540,  720,  900, 1080, 1260],
             mask=False,
       fill_value=999999,
            dtype=int32)

In [10]:
timestamp

<class 'netCDF4._netCDF4.Variable'>
int32 time(time)
    long_name: time
    units: minutes since 2021-12-31 01:30:00
    time_increment: 30000
    begin_date: 20211231
    begin_time: 13000
    vmax: 1000000000000000.0
    vmin: -1000000000000000.0
    valid_range: [-1.e+15  1.e+15]
unlimited dimensions: time
current shape = (8,)
filling on, default _FillValue of -2147483647 used

In [None]:
U.shape

(8, 72, 361, 576)

In [12]:
# see the ground level data
U[0][71][0][0]

2.1872091

In [13]:
V[0][71][0][0]

2.1277099

In [None]:
lons[0]

masked_array(data=-180.,
             mask=False,
       fill_value=1e+20)

In [None]:
U._FillValue

1000000000000000.0

In [None]:
V.shape

(8, 72, 361, 576)

In [None]:
V._FillValue

1000000000000000.0

In [17]:
# Replace vals #
################
#\_FillValues with NaNs:
U_nans = U[:]
V_nans = V[:]
_FillValueU = U._FillValue
_FillValueV = V._FillValue
U_nans[U_nans == _FillValueU] = np.nan
V_nans[V_nans == _FillValueV] = np.nan


In [18]:
# for timestamp
time_nans = timestamp[:]
# _FillValueTime = 999999.0
# time_nans[time_nans == _FillValueTime] = np.nan

In [24]:
(U_nans**2).shape

(8, 72, 361, 576)

In [25]:
# Add new vars #
################
# calculate wind speed
wspd = np.sqrt(U_nans**2 + V_nans**2)

# calculate wind direction in radians
wdir = np.arctan2(V_nans, U_nans)
            
# transform wind direction from radians to degrees
wdir_to_degrees = wdir * 180/pi  
            
#wdir_to_degrees = np.mod(180+np.rad2deg(np.arctan2(V2M_nans, U2M_nans)), 360)
wdir_to_degrees = np.mod(np.rad2deg(np.arctan2(V_nans, U_nans)), 360)

In [28]:
# print all shapes 
print("wspd: ", wspd.shape, "\n",
      "wdir: ", wdir.shape, "\n",
      "wdir_to_degrees: ", wdir_to_degrees.shape)

wspd:  (8, 72, 361, 576) 
 wdir:  (8, 72, 361, 576) 
 wdir_to_degrees:  (8, 72, 361, 576)


In [29]:
## transform to df ##
#####################
# create an empty df for wind speed and direction with size len(lats) x len(lons) 
df_wdir = pd.DataFrame(index=lats[:], columns=lons[:])   
df_wspd = pd.DataFrame(index=lats[:], columns=lons[:])
            
# create an empty df for u and v components with size len(lats) x len(lons) 
df_u = pd.DataFrame(index=lats[:], columns=lons[:])
df_v = pd.DataFrame(index=lats[:], columns=lons[:])

In [47]:
df_wdir.index
# df_wdir.shape

Float64Index([-90.0, -89.5, -89.0, -88.5, -88.0, -87.5, -87.0, -86.5, -86.0,
              -85.5,
              ...
               85.5,  86.0,  86.5,  87.0,  87.5,  88.0,  88.5,  89.0,  89.5,
               90.0],
             dtype='float64', length=361)

In [36]:
# create an empty df for timestamp
df_time = pd.DataFrame(index = timestamp[:])

In [57]:
df_time.index
# time_nans[]
time_nans[1]
# df_time.loc[1]

180

In [58]:
# populate each row in the empty df above with the wdir_meteo and wspd data and u and v components
# loop through time_index and only take the 71st level as the ground level data
for time_idx in range(len(df_time.index)):
  for idx, idx_val in enumerate(df_wdir.index):
    df_wdir.loc[idx_val, :] = wdir_to_degrees[time_idx][71][idx]
    df_wspd.loc[idx_val, :] = wspd[time_idx][71][idx]
    df_u.loc[idx_val, :] = U_nans[time_idx][71][idx]
    df_v.loc[idx_val, :] = V_nans[time_idx][71][idx]
    df_time.loc[idx_val, :] = time_nans[time_idx]

In [None]:
def read_clean_wind():
    ''''''
    # create empty df
    df = pd.DataFrame()

    for file in os.listdir(in_dir):
        if file.startswith('MERRA2'):
            print(file.split('.')[2])

            ## read .nc file ##
            ###################
            data = ncdf.Dataset(
                in_dir + file, mode='r'
            )
            # print metadata
            #print(data)

            # grab vars of interest ##
            ##########################
            # longitude and latitude
            lons = data.variables['lon']
            lats = data.variables['lat']
            # 2-meter eastward wind m/s
            U2M = data.variables['U2M']
            # 2-meter northward wind m/s
            V2M = data.variables['V2M']
            # timestamp
            timestamp = data.variables['time']

            # Replace vals #
            ################
            #\_FillValues with NaNs:
            U2M_nans = U2M[:]
            V2M_nans = V2M[:]
            _FillValueU2M = U2M._FillValue
            _FillValueV2M = V2M._FillValue
            U2M_nans[U2M_nans == _FillValueU2M] = np.nan
            V2M_nans[V2M_nans == _FillValueV2M] = np.nan

            # Add new vars #
            ################
            # calculate wind speed
            wspd = np.sqrt(U2M_nans**2+V2M_nans**2)

            # calculate wind direction in radians
            wdir = np.arctan2(V2M_nans, U2M_nans)
            
            # transform wind direction from radians to degrees
            wdir_to_degrees = wdir * 180/pi  
            
            #wdir_to_degrees = np.mod(180+np.rad2deg(np.arctan2(V2M_nans, U2M_nans)), 360)
            wdir_to_degrees = np.mod(np.rad2deg(np.arctan2(V2M_nans, U2M_nans)), 360)


            ## transform to df ##
            #####################
            # create an empty df for wind speed and direction with size len(lats) x len(lons) 
            df_wdir = pd.DataFrame(index=lats[:], columns=lons[:])   
            df_wspd = pd.DataFrame(index=lats[:], columns=lons[:])
            
            # create an empty df for u and v components with size len(lats) x len(lons) 
            df_u = pd.DataFrame(index=lats[:], columns=lons[:])
            df_v = pd.DataFrame(index=lats[:], columns=lons[:])

            # populate each row in the empty df above with the wdir_meteo and wspd data and u and v components
            for idx, idx_val in enumerate(df_wdir.index):
                df_wdir.loc[idx_val, :] = wdir_to_degrees[0][idx]
                df_wspd.loc[idx_val, :] = wspd[0][idx]
                df_u.loc[idx_val, :] = U2M_nans[0][idx]
                df_v.loc[idx_val, :] = V2M_nans[0][idx]

            # add index (latitude) as column
            df_wdir.reset_index(
                drop=False,
                inplace=True
            )
            
            df_wdir.rename(
                columns={'index':'lat'},
                inplace=True
            )
            
            
            df_wspd.reset_index(
                drop=False,
                inplace=True
            )
            
            df_wspd.rename(
                columns={'index':'lat'},
                inplace=True
            )
            
            df_u.reset_index(
                drop=False,
                inplace=True
            )
            
            df_u.rename(
                columns={'index':'lat'},
                inplace=True
            )
            
            df_v.reset_index(
                drop=False,
                inplace=True
            )
            
            df_v.rename(
                columns={'index':'lat'},
                inplace=True
            )

            # transform from wide to long
            df_wdir = pd.melt(
                df_wdir, id_vars='lat',
                var_name='lon',
                value_vars=lons[:],
                value_name='wdir'
            )
            
            df_wspd = pd.melt(
                df_wspd,
                id_vars='lat',
                var_name='lon',
                value_vars=lons[:],
                value_name='wspd'
            )
            
            df_u = pd.melt(
                df_u, id_vars='lat',
                var_name='lon',
                value_vars=lons[:],
                value_name='u'
            )
            
            df_v = pd.melt(
                df_v, id_vars='lat',
                var_name='lon',
                value_vars=lons[:],
                value_name='v'
            )

            # concatenate df_wdir and df_wspd
            df_temp1 = df_wdir.merge(
                df_wspd,
                on=['lat', 'lon'],
                how='left'
            )
            
            # concatenate df_u and df_v
            df_temp2 = df_u.merge(
                df_v,
                on=['lat', 'lon'],
                how='left'
            )
            
            # concatenate df_temp1 and df_temp2
            df_temp = df_temp2.merge(
                df_temp1,
                on=['lat', 'lon'],
                how='left'
            )
            
            # add time stamp 
            df_temp['year_month'] = file.split('.')[2]

            df = pd.concat(
                [df_temp, df],
                axis=0
            )
   
    # keep values in min, max range of California geometry
    df = df[
        df.lon.ge(-125) & df.lon.le(-115) & df.lat.ge(32) & df.lat.le(42)
    ]
    
    # convert wind direction to the meteorological convention
    # see here: https://stackoverflow.com/questions/21484558/how-to-calculate-wind-direction-from-u-and-v-wind-components-in-r
    # see here too: https://www.e-education.psu.edu/meteo300/node/719
    '''
    df['wdir_meteo'] = 270 - df.wdir
    df['wdir_meteo'] = np.where(
        df.wdir_meteo.lt(0), df.wdir_meteo+360, df.wdir_meteo
    )
    
    # replace wdir to reflect wdir_meteo
    df['wdir'] = df.wdir_meteo 
    df.drop(
        columns=['wdir_meteo'],
        inplace=True
    )
    
    df['wdir'] = np.mod(180+np.rad2deg(np.arctan2(V2M_nans, U2M_nans)), 360)
    '''
    # transform vars
    df['lat'] = df.lat.astype(float)
    df['lon'] = df.lon.astype(float)
    
    return df

``read census geom``

In [None]:
gdf = gpd.read_file(in_dir_zip_shapes + "states_amazon_biome.shp")
gdf

Unnamed: 0,id,nome,sigla,geocodigo,geometry
0,2,Rondônia,RO,11,"MULTIPOLYGON (((-59.98861 -11.91000, -59.98811..."
1,3,Acre,AC,12,"POLYGON ((-66.62735 -9.89885, -66.62730 -9.899..."
2,4,Amazonas,AM,13,"POLYGON ((-73.79834 -7.11298, -73.79827 -7.112..."
3,5,Roraima,RR,14,"POLYGON ((-60.18882 5.23222, -60.15748 5.23044..."
4,6,Pará,PA,15,"MULTIPOLYGON (((-46.38414 -1.03655, -46.38429 ..."
5,7,Amapá,AP,16,"MULTIPOLYGON (((-49.98155 0.87511, -49.98229 0..."
6,8,Tocantins,TO,17,"POLYGON ((-47.89974 -5.25030, -47.89995 -5.250..."
7,14,Mato Grosso,MT,51,"MULTIPOLYGON (((-50.35017 -9.83338, -50.35059 ..."
8,26,Maranhão,MA,21,"MULTIPOLYGON (((-44.08515 -2.41669, -44.08560 ..."


In [None]:
def read_census_geom():
    """ Read Census (lat, lon) coordinates for California zip-codes
    parameters:
    -----------
    None
    
    return:
    -------
    Df with osmnx_geom
    """
    ### Step 1 ### 
    ##############
    # Read the shapefiles for California's ZIP codes
    for file in os.listdir(in_dir_zip_shapes):
        if file.endswith('.shp'):
            gdf = gpd.read_file(in_dir_zip_shapes + file)

    # keep only cols of interest 
    # ('ZCTA5CE10' = 2010 Census ZIP codes,	'GEOID10' = 2010 Census Tract codes)
    gdf = gdf[
        ['ZCTA5CE10',
         'GEOID10',
         'geometry']
    ]
    
    
    ### Step 2 ###
    ###############
    # For each zip cpde extract polygon with (lat, lon) info

    zip_poly = pd.DataFrame()

    for idx, multipoly in enumerate(gdf.geometry):
        if isinstance(multipoly, shapely.geometry.polygon.Polygon):
            temp_df = pd.DataFrame(
                {
                    'lat': multipoly.exterior.coords.xy[1], 
                    'lon': multipoly.exterior.coords.xy[0],
                    'ZCTA10': gdf.loc[idx, 'ZCTA5CE10'],
                    'GEOID10': gdf.loc[idx, 'GEOID10']
                }
            )
            zip_poly = pd.concat(
                [zip_poly, temp_df],
                axis=0
            )

        if isinstance(multipoly, shapely.geometry.multipolygon.MultiPolygon):
            for poly in multipoly:
                temp_df = pd.DataFrame(
                    {
                        'lat': poly.exterior.coords.xy[1], 
                        'lon': poly.exterior.coords.xy[0],
                        'ZCTA10': gdf.loc[idx, 'ZCTA5CE10'],
                        'GEOID10': gdf.loc[idx, 'GEOID10']
                    }
                )
                zip_poly = pd.concat(
                    [zip_poly, temp_df],
                    axis=0
                )   
    

    # round (lat, lon) to 2 decimal points and add 0.005 to match the UW (lat, lon) values
    zip_poly['lat'] = zip_poly.lat.round(3)
    zip_poly['lon'] = zip_poly.lon.round(3)
    
    zip_poly.sort_values(
        by=['ZCTA10', 'lat', 'lon'],
        inplace=True
    )
    
    zip_poly.drop_duplicates(
        subset=['ZCTA10', 'lat', 'lon'],
        inplace=True
    )

    zip_poly.reset_index(
        drop=True,
        inplace=True
    )
    
    return zip_poly

``find zip (zcta) code for wind data``

In [None]:
def add_zcta_to_wind(df1, df2):
    '''
    params:
    -------
    df1: wind data
    df2: census geometry data
    
    return:
    -------
    '''
    
    # create labels
    df1['wind_lat_lon'] = [str(xy) for xy in zip(df1.lat, df1.lon)]
    df2['census_lat_lon'] = [str(xy) for xy in zip(df2.lat, df2.lon)]

    ## for each point in wind data find the nearest point in the census data ##
    ###############
    # keep only unique points in wind data
    df1_unique = df1.drop_duplicates(
        ['wind_lat_lon']
    )
    
    df2_unique = df2.drop_duplicates(
        ['census_lat_lon']
    )
    
    df1_unique.reset_index(
        drop=True,
        inplace=True
    )
    
    df2_unique.reset_index(
        drop=True,
        inplace=True
    )

    # transform to radians
    df1_unique['lat_r'] = np.radians(df1_unique.lat)
    df1_unique['lon_r'] = np.radians(df1_unique.lon)
    df2_unique['lat_r'] = np.radians(df2_unique.lat)
    df2_unique['lon_r'] = np.radians(df2_unique.lon)


    # compute pairwise distance (in miles)
    dist_matrix = (dist.pairwise(
        df2_unique[['lat_r', 'lon_r']],
        df1_unique[['lat_r', 'lon_r']]
    ))*3959

    # create a df from dist_matrix
    dist_matrix = pd.DataFrame(
        dist_matrix,
        index=df2_unique['census_lat_lon'],
        columns=df1_unique['wind_lat_lon']
    )
    
    # for each row (census_lat_lon point) extract the closest column (wind_lat_lon point) 
    closest_point = pd.DataFrame(
        dist_matrix.idxmin(axis=1),
        columns=['closest_wind_lat_lon']
    )
    
    closest_point.reset_index(
        drop=False,
        inplace=True
    )

    # merge with census data
    df2_unique = df2_unique.merge(
        closest_point,
        on='census_lat_lon',
        how='left'
    )
    
    # merge with census data 
    df2_unique = df2_unique.merge(
        df2[['census_lat_lon']],
        on=['census_lat_lon'],
        how='left'
    )
    
    # replicate df2_unique based on number of year_month entries in df1
    df2_unique = pd.concat(
        [df2_unique]*(df1.year_month.nunique()),
        axis=0
    )
    
    df2_unique.reset_index(
        drop=True,
        inplace=True
    )
    
    # add year_month column to df2_unique
    df2_unique['year_month'] = 0
    indeces = [n for n in range(1, df2_unique.shape[0]) if n%956926==0]

    year_month = np.sort(df1.year_month.unique())
    for idx, index in enumerate(indeces):
        if idx==0:
            df2_unique.iloc[0:indeces[idx], 8] = year_month[idx]
        else:
            df2_unique.iloc[indeces[idx-1]:indeces[idx], 8] = year_month[idx]
            
            
    # from df1 keep only cols of interest
    df1 = df1[
        ['year_month',
         'u',
         'v',
         'wdir',
         'wspd',
         'wind_lat_lon']
    ]
    
    # merge df2_unique with df1
    df2_unique = df2_unique.merge(
        df1,
        left_on=['year_month', 'closest_wind_lat_lon'],
        right_on=['year_month', 'wind_lat_lon'],
        how='left'
    )
    # keep only cols of interest
    df2_unique = df2_unique[
        ['lat',
         'lon',
         'ZCTA10',
         'u',
         'v',
         'wdir',
         'wspd',
         'year_month']
    ]
    
    df2_unique.dropna(
        inplace=True
    )
    
    df2_unique.reset_index(
        drop=True,
        inplace=True
    )
    
    df2_unique.drop_duplicates(
    ['year_month', 'ZCTA10'],
    inplace=True
    )

    df2_unique.reset_index(
        drop=True,
        inplace=True
    )
    
    return df2_unique

### Step 4: Read data

``wind``

In [None]:
df = read_clean_wind()
df
# A positive U wind comes from the west, and a negative U wind comes from the east. 
# The V wind component is parallel to the y- axis (i.e. latitude). A positive V wind comes from the south, and a negative V wind comes from the north.

202112


Unnamed: 0,lat,lon,u,v,wdir,wspd,year_month
32012,32.0,-125.0,2.557108,-2.114869,320.407379,3.318354,202112
32013,32.5,-125.0,2.700869,-2.065641,322.591034,3.40023,202112
32014,33.0,-125.0,2.823102,-2.050217,324.01178,3.489025,202112
32015,33.5,-125.0,2.952745,-2.071095,324.953674,3.606679,202112
32016,34.0,-125.0,3.125051,-2.153463,325.429382,3.795174,202112
...,...,...,...,...,...,...,...
37804,40.0,-115.0,1.703349,1.286709,37.067387,2.134717,202112
37805,40.5,-115.0,1.9087,1.062686,29.107273,2.184591,202112
37806,41.0,-115.0,1.948897,0.953448,26.069,2.169622,202112
37807,41.5,-115.0,2.232198,0.977663,23.652546,2.436911,202112


``census geom``

In [None]:
zip_poly = read_census_geom()
zip_poly.head(2)

Unnamed: 0,lat,lon,ZCTA10,GEOID10
0,37.465,-117.936,89010,689010
1,37.465,-117.935,89010,689010


### Step 5: Find zip (zcta) code for wind data

In [None]:
df_final = add_zcta_to_wind(df, zip_poly)
df_final.head(2)

Unnamed: 0,lat,lon,ZCTA10,u,v,wdir,wspd,year_month
0,37.465,-117.936,89010,0.504258,-0.719008,125.042938,0.878208,199101
1,35.396,-116.322,89019,-0.172753,-0.94694,79.661095,0.962568,199101


### Step 6: Export data

In [None]:
df_final.to_csv(out_dir  + 'winds.csv')

In [None]:
df_final['wdir'] = df_final.wdir.astype('float')

In [None]:
df_final.wdir.describe()

count    599311.000000
mean        179.408960
std          72.259209
min           0.041595
25%         140.561295
50%         180.584778
75%         220.575256
max         359.992188
Name: wdir, dtype: float64