# Main script to clean wind data at the zip code, monthly level

Modules: N/A <br>
Author: Cornelia Ilin <br>
Email: cilin@wisc.edu <br>
Date created: May 14, 2021 <br>

**Citations (data sources)**

``Wind data:`` 

download the "MERRA2_100.tavgM_2d_slv_Nx" product; this provides monthly averages of U and V components

1. https://search.earthdata.nasa.gov/search/granules?p=C1276812859-GES_DISC&pg[0][qt]=1991-01-01T00%3A00%3A00.000Z%2C2017-12-31T23%3A59%3A59.999Z&pg[0][gsk]=-start_date&q=MERRA-2%20tavgM&tl=1624239533!3!!&m=-0.0703125!0.0703125!2!1!0!0%2C2

__Jordan steps for wind data__ 
  * Search for report "M2TMNXSLV"
  * Narrow down scope with geoshape file for CA (NOT ZCTA)  
      * Acquired here https://data.ca.gov/dataset/ca-geographic-boundaries 
        * **No VPN access seems to be permitted here**
  * Query for 1991-01-01 00:00:00 - Today  
      * Click recurring

and data dictionary here:

2. https://gmao.gsfc.nasa.gov/pubs/docs/Bosilovich785.pdf
3. https://disc.gsfc.nasa.gov/datasets/M2T1NXSLV_5.12.4/summary


``Shapefiles for California ZIP codes (2010 census):``

4. https://www.census.gov/cgi-bin/geo/shapefiles/index.php?year=2010&layergroup=ZIP+Code+Tabulation+Areas

__Jordan tweaks__
  * There is an updated 2020 and 2022 ZCTA file available but I think it makes sense to keep the 2010 as that is what project started with and shouldn't have changed much

``Installation errors with Geopandas:``

5. https://stackoverflow.com/questions/54734667/error-installing-geopandas-a-gdal-api-version-must-be-specified-in-anaconda

``How to compute wind speed and direction:``

6. https://stackoverflow.com/questions/21484558/how-to-calculate-wind-direction-from-u-and-v-wind-components-in-r
7. https://github.com/blaylockbk/Ute_WRF/blob/master/functions/wind_calcs.py

``Wind speed and direction intuition:``

8. http://colaweb.gmu.edu/dev/clim301/lectures/wind/wind-uv
9. https://www.earthdatascience.org/courses/use-data-open-source-python/intro-vector-data-python/spatial-data-vector-shapefiles/intro-to-coordinate-reference-systems-python/

``To create maps of this wind data:``

and also used to provide intuition for winddir and windspeed

10. https://disc.gsfc.nasa.gov/information/howto?title=How%20to%20calculate%20and%20plot%20wind%20speed%20using%20MERRA-2%20wind%20component%20data%20using%20Python


``Error - MultiPolygon or multipoly is not iterable``  

This seems to come from an error in version of shapely. Force install with a version below 2.0, I use `shapely==1.8.5` in my env.


**Citations (persons)**
1. N/A

**Preferred environment**
1. Code written in Jupyter Notebooks

### Step 1: Import packages

In [1]:
import pandas as pd
import numpy as np
import netCDF4 as ncdf
import os
from datetime import date, timedelta
from math import pi
import fiona

import matplotlib.pyplot as plt
import cartopy.crs as ccrs
from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER
import matplotlib.ticker as mticker

# geography
import geopandas as gpd
import osmnx as ox
import shapely
from shapely.geometry import Point

#Moved from sklearn.neighbors to sklearn.metrics following their package change
import sklearn.metrics
dist = sklearn.metrics.DistanceMetric.get_metric(
    'haversine'
)

# ignore warnings
import warnings
warnings.filterwarnings(
    'ignore'
)

### Step 2: Define working directories

In [12]:
# in_dir_zip_shapes = 'C:/Users/cilin/Research/CA_hospitals/Input/raw_data/census_geo/shapefiles_zcta/'
# in_dir = 'C:/Users/cilin/Research/CA_hospitals/Input/raw_data/winds/'
# in_health = 'C:/Users/cilin/Research/CA_hospitals/Input/final_data/health/'
# out_dir = 'C:/Users/cilin/Research/CA_hospitals/Input/final_data/winds/'

#Local directories on my machine (not gdrive)
in_dir_zip_shapes = '../data/raw/wind/tl_2010_06_zcta510/'
in_dir_wind = '../data/raw/wind/ca only/'
in_dir_fire = '../data/interim/'
in_health = '../data/raw/health/'
out_dir = '../data/processed/'

### Step 3: Define functions

``read_clean wind``

In [14]:
fire_fix = pd.read_csv('../data/interim/fires_all_months.csv',index_col=0)
fire_fix["geometry"] = gpd.GeoSeries.from_wkt(fire_fix["fire_centroid"])
fire_fix['lat']=fire_fix['geometry'].apply(lambda temp: temp.y)
fire_fix['lon']=fire_fix['geometry'].apply(lambda temp: temp.x)
fire_fix

Unnamed: 0,fire_index,FIRE_NAME,ALARM_DATE,CONT_DATE,CAUSE,GIS_ACRES,Shape_Length,Shape_Area,index_right,fire_centroid,...,FIRE_AREA_KM2,year,month,year_month,end_year,end_month,duration_months,geometry,lat,lon
0,0,NELSON,2020-06-18,2020-06-23,11.0,109.602280,3252.523280,4.435447e+05,405.0,POINT (-121.3480590211847 38.88804091206984),...,0.443546,2020,6,202006,2020,6,1,POINT (-121.34806 38.88804),38.888041,-121.348059
1,1,AMORUSO,2020-06-01,2020-06-04,2.0,685.585022,9653.760308,2.774464e+06,1217.0,POINT (-121.3708983104108 38.82756661261951),...,2.774464,2020,6,202006,2020,6,1,POINT (-121.37090 38.82757),38.827567,-121.370898
2,3,FLEMING,2020-03-31,2020-04-01,9.0,12.931545,1577.155857,5.233211e+04,405.0,POINT (-121.2734135751918 38.9623284462546),...,0.052332,2020,3,202003,2020,4,2,POINT (-121.27341 38.96233),38.962328,-121.273414
3,4,MELANESE,2020-04-14,2020-04-19,18.0,10.315964,1035.787625,4.174722e+04,933.0,POINT (-121.3006534927401 39.48636412414794),...,0.041747,2020,4,202004,2020,4,1,POINT (-121.30065 39.48636),39.486364,-121.300653
4,5,PFE,2020-07-05,2020-07-05,14.0,36.701931,2348.114043,1.485274e+05,1217.0,POINT (-121.3810176618852 38.73133921489409),...,0.148527,2020,7,202007,2020,7,1,POINT (-121.38102 38.73134),38.731339,-121.381018
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9270,19698,TULLEY,2016-08-22,2016-09-04,7.0,607.445740,13095.080608,2.458246e+06,277.0,POINT (-123.8140318155857 41.28541623395056),...,2.458246,2016,9,201609,2016,9,1,POINT (-123.81403 41.28542),41.285416,-123.814032
9271,19720,DOCKERY,2016-07-30,2016-08-17,10.0,27.963249,2655.210701,1.131633e+05,51.0,POINT (-120.6873444598079 40.42539576204376),...,0.113163,2016,8,201608,2016,8,1,POINT (-120.68734 40.42540),40.425396,-120.687344
9272,19721,HOWARD,2016-09-11,2016-10-17,6.0,379.622528,7898.270122,1.536278e+06,1021.0,POINT (-120.9157313359543 41.47071744410114),...,1.536278,2016,10,201610,2016,10,1,POINT (-120.91573 41.47072),41.470717,-120.915731
9273,19722,WILLARD,2016-09-11,2016-11-28,14.0,2573.955322,27364.478495,1.041643e+07,51.0,POINT (-120.7510806570093 40.37593783404107),...,10.416427,2016,10,201610,2016,11,2,POINT (-120.75108 40.37594),40.375938,-120.751081


In [31]:
def read_clean_wind(year):
    """
    Read in wind data by year
    """
    # create empty df
    df = pd.DataFrame()

    for file in os.listdir(in_dir_wind):
        if file.startswith('MERRA2') and file[-10:-6] == str(year):

            ## read .nc file ##
            ###################
            data = ncdf.Dataset(
                in_dir_wind + file, mode='r'
            )
            # print metadata
            #print(data)

            # grab vars of interest ##
            ##########################
            # longitude and latitude
            lons = data.variables['lon']
            lats = data.variables['lat']
            # 2-meter eastward wind m/s
            U2M = data.variables['U2M']
            # 2-meter northward wind m/s
            V2M = data.variables['V2M']

            # Replace vals #
            ################
            #\_FillValues with NaNs:
            U2M_nans = U2M[:]
            V2M_nans = V2M[:]
            _FillValueU2M = U2M._FillValue
            _FillValueV2M = V2M._FillValue
            U2M_nans[U2M_nans == _FillValueU2M] = np.nan
            V2M_nans[V2M_nans == _FillValueV2M] = np.nan

            # Add new vars #
            ################
            # calculate wind speed
            wspd = np.sqrt(U2M_nans**2+V2M_nans**2)

            # calculate wind direction in radians
            wdir = np.arctan2(V2M_nans, U2M_nans)
            #WDIR= (270-atan2(V,U)*180/pi)%360
            
            # transform wind direction from radians to degrees
            #dir_to_degrees = np.mod(180+np.rad2deg(np.arctan2(V2M_nans, U2M_nans)), 360) # this computes "wind is blowing from"' meteorological convetion'
            wdir_to_degrees = np.mod(np.rad2deg(wdir), 360) # this computes "wind is blowing towards" 'oceonographic convention', see here: https://www.esri.com/arcgis-blog/products/product/analytics/displaying-speed-and-direction-symbology-from-u-and-v-vectors/
            
            
            ## transform to df ##
            #####################
            # create an empty df for wind speed and direction with size len(lats) x len(lons) 
            df_wdir = pd.DataFrame(index=lats[:], columns=lons[:])   
            df_wspd = pd.DataFrame(index=lats[:], columns=lons[:])
            
            # create an empty df for u and v components with size len(lats) x len(lons) 
            df_u = pd.DataFrame(index=lats[:], columns=lons[:])
            df_v = pd.DataFrame(index=lats[:], columns=lons[:])

            # populate each row in the empty df above with the wdir_meteo and wspd data and u and v components
            for idx, idx_val in enumerate(df_wdir.index):
                df_wdir.loc[idx_val, :] = wdir_to_degrees[0][idx]
                df_wspd.loc[idx_val, :] = wspd[0][idx]
                df_u.loc[idx_val, :] = U2M_nans[0][idx]
                df_v.loc[idx_val, :] = V2M_nans[0][idx]

            # add index (latitude) as column
            df_wdir.reset_index(
                drop=False,
                inplace=True
            )
            
            df_wdir.rename(
                columns={'index':'lat'},
                inplace=True
            )
            
            
            df_wspd.reset_index(
                drop=False,
                inplace=True
            )
            
            df_wspd.rename(
                columns={'index':'lat'},
                inplace=True
            )
            
            df_u.reset_index(
                drop=False,
                inplace=True
            )
            
            df_u.rename(
                columns={'index':'lat'},
                inplace=True
            )
            
            df_v.reset_index(
                drop=False,
                inplace=True
            )
            
            df_v.rename(
                columns={'index':'lat'},
                inplace=True
            )

            # transform from wide to long
            df_wdir = pd.melt(
                df_wdir, id_vars='lat',
                var_name='lon',
                value_vars=lons[:],
                value_name='wdir'
            )
            
            df_wspd = pd.melt(
                df_wspd,
                id_vars='lat',
                var_name='lon',
                value_vars=lons[:],
                value_name='wspd'
            )
            
            df_u = pd.melt(
                df_u, id_vars='lat',
                var_name='lon',
                value_vars=lons[:],
                value_name='u'
            )
            
            df_v = pd.melt(
                df_v, id_vars='lat',
                var_name='lon',
                value_vars=lons[:],
                value_name='v'
            )

            # concatenate df_wdir and df_wspd
            df_temp1 = df_wdir.merge(
                df_wspd,
                on=['lat', 'lon'],
                how='left'
            )
            
            # concatenate df_u and df_v
            df_temp2 = df_u.merge(
                df_v,
                on=['lat', 'lon'],
                how='left'
            )
            
            # concatenate df_temp1 and df_temp2
            df_temp = df_temp2.merge(
                df_temp1,
                on=['lat', 'lon'],
                how='left'
            )
            
            # add time stamp 
            df_temp['year_month'] = file.split('.')[2]

            df = pd.concat(
                [df_temp, df],
                axis=0
            )
   
    # keep values in min, max range of California geometry
    df = df[
        df.lon.ge(-125) & df.lon.le(-115) & df.lat.ge(32) & df.lat.le(42)
    ]
    
    # transform vars
    df['lat'] = df.lat.astype(float)
    df['lon'] = df.lon.astype(float)
    
    return df

``read census geom``

In [32]:
def read_census_geom():
    """ Read Census (lat, lon) coordinates for California zip-codes
    parameters:
    -----------
    None
    
    return:
    -------
    Df with osmnx_geom
    """
    ### Step 1 ### 
    ##############
    # Read the shapefiles for California's ZIP codes
    for file in os.listdir(in_dir_zip_shapes):
        if file.endswith('.shp'):
            gdf = gpd.read_file(in_dir_zip_shapes + file)

    # keep only cols of interest 
    # ('ZCTA5CE10' = 2010 Census ZIP codes,	'GEOID10' = 2010 Census Tract codes)
    gdf = gdf[
        ['ZCTA5CE10',
         'GEOID10',
         'geometry']
    ]
    
    
    ### Step 2 ###
    ###############
    # For each zip cpde extract polygon with (lat, lon) info

    zip_poly = pd.DataFrame()

    for idx, multipoly in enumerate(gdf.geometry):
        if isinstance(multipoly, shapely.geometry.polygon.Polygon):
            temp_df = pd.DataFrame(
                {
                    'lat': multipoly.exterior.coords.xy[1], 
                    'lon': multipoly.exterior.coords.xy[0],
                    'ZCTA10': gdf.loc[idx, 'ZCTA5CE10'],
                    'GEOID10': gdf.loc[idx, 'GEOID10']
                }
            )
            zip_poly = pd.concat(
                [zip_poly, temp_df],
                axis=0
            )

        if isinstance(multipoly, shapely.geometry.multipolygon.MultiPolygon):
            for poly in multipoly:
                temp_df = pd.DataFrame(
                    {
                        'lat': poly.exterior.coords.xy[1], 
                        'lon': poly.exterior.coords.xy[0],
                        'ZCTA10': gdf.loc[idx, 'ZCTA5CE10'],
                        'GEOID10': gdf.loc[idx, 'GEOID10']
                    }
                )
                zip_poly = pd.concat(
                    [zip_poly, temp_df],
                    axis=0
                )   
    

    # round (lat, lon) to 2 decimal points and add 0.005 to match the UW (lat, lon) values
    zip_poly['lat'] = zip_poly.lat.round(3)
    zip_poly['lon'] = zip_poly.lon.round(3)
    
    zip_poly.sort_values(
        by=['ZCTA10', 'lat', 'lon'],
        inplace=True
    )
    
    zip_poly.drop_duplicates(
        subset=['ZCTA10', 'lat', 'lon'],
        inplace=True
    )

    zip_poly.reset_index(
        drop=True,
        inplace=True
    )
    
    return zip_poly

``find zip (zcta) code for wind data``

In [33]:
def add_wind_to_fire(df1, df2):
    '''
    params:
    -------
    df1: wind data
    df2: fire geometry data
    
    return:
    -------
    '''
    
    # create labels
    df1['wind_lat_lon'] = [str(xy) for xy in zip(df1.lat, df1.lon)]
    df2['fire_lat_lon'] = [str(xy) for xy in zip(df2.lat, df2.lon)]

    ## for each point in wind data find the nearest point in the census data ##
    ###############
    # keep only unique points in wind data
    df1_unique = df1.drop_duplicates(
        ['wind_lat_lon']
    )
    
    df2_unique = df2.drop_duplicates(
        ['fire_lat_lon']
    )
    
    df1_unique.reset_index(
        drop=True,
        inplace=True
    )
    
    df2_unique.reset_index(
        drop=True,
        inplace=True
    )

    # transform to radians
    df1_unique['lat_r'] = np.radians(df1_unique.lat)
    df1_unique['lon_r'] = np.radians(df1_unique.lon)
    df2_unique['lat_r'] = np.radians(df2_unique.lat)
    df2_unique['lon_r'] = np.radians(df2_unique.lon)


    # compute pairwise distance (in miles)
    dist_matrix = (dist.pairwise(
        df2_unique[['lat_r', 'lon_r']],
        df1_unique[['lat_r', 'lon_r']]
    ))*3959

    # create a df from dist_matrix
    dist_matrix = pd.DataFrame(
        dist_matrix,
        index=df2_unique['fire_lat_lon'],
        columns=df1_unique['wind_lat_lon']
    )
    
    # for each row (fire_lat_lon point) extract the closest column (wind_lat_lon point) 
    closest_point = pd.DataFrame(
        dist_matrix.idxmin(axis=1),
        columns=['closest_wind_lat_lon']
    )
    
    closest_point.reset_index(
        drop=False,
        inplace=True
    )

    # merge with fire data
    df2_unique = df2.merge(
        closest_point,
        on='fire_lat_lon',
        how='left'
    )
    
    # merge with fire data 
    df2_unique = df2_unique.merge(
        df2[['fire_lat_lon']],
        on=['fire_lat_lon'],
        how='left'
    )

    # # replicate df2_unique based on number of year_month entries in df1
    # df2_unique = pd.concat(
    #     [df2_unique]*(df1.year_month.nunique()),
    #     axis=0
    # )
    
#     df2_unique.reset_index(
#         drop=True,
#         inplace=True
#     )

#     # add year_month column to df2_unique
#     df2_unique['year_month'] = 0
#     indeces = [n for n in range(1, df2_unique.shape[0]) if n%(df2_unique.shape[0]-1)==0]
    
#     year_month = np.sort(df1.year_month.unique())
#     print(indeces)
#     for idx, index in enumerate(indeces):
#         if idx==0:
#             df2_unique.iloc[0:indeces[idx], 17] = year_month[idx]
#         else:
#             df2_unique.iloc[indeces[idx-1]:indeces[idx], 17] = year_month[idx]
    
    df2['year_month'] = df2['year_month'].astype(int)
    df1['year_month'] = df1['year_month'].astype(int)
    
    # from df1 keep only cols of interest
    df1 = df1[
        ['year_month',
         'u',
         'v',
         'wdir',
         'wspd',
         'wind_lat_lon']
    ]
    
    
    # merge df2_unique with df1
    df2_unique = df2_unique.merge(
        df1,
        left_on=['year_month', 'closest_wind_lat_lon'],
        right_on=['year_month', 'wind_lat_lon'],
        how='left'
    )
    
    # keep only cols of interest
    df2_unique = df2_unique[
        ['lat',
         'lon',
         'fire_lat_lon','wind_lat_lon',
         'fire_index','ALARM_DATE','CONT_DATE','fire_centroid','geometry','GIS_ACRES',
         'u',
         'v',
         'wdir',
         'wspd',
         'year_month']
    ]
    
    df2_unique.dropna(
        inplace=True
    )
    
    df2_unique.reset_index(
        drop=True,
        inplace=True
    )
    
    df2_unique.drop_duplicates(
    ['year_month', 'fire_lat_lon'],
    inplace=True
    )

    df2_unique.reset_index(
        drop=True,
        inplace=True
    )
    
    return df2_unique

In [34]:
fire_fix.columns

Index(['fire_index', 'FIRE_NAME', 'ALARM_DATE', 'CONT_DATE', 'CAUSE',
       'GIS_ACRES', 'Shape_Length', 'Shape_Area', 'index_right',
       'fire_centroid', 'DURATION', 'FIRE_AREA_KM2', 'year', 'month',
       'year_month', 'end_year', 'end_month', 'duration_months', 'geometry',
       'lat', 'lon'],
      dtype='object')

In [35]:
#Test the join

df = read_clean_wind(2012)
df_final = add_wind_to_fire(df, fire_fix)
df_final


Unnamed: 0,lat,lon,fire_lat_lon,wind_lat_lon,fire_index,ALARM_DATE,CONT_DATE,fire_centroid,geometry,GIS_ACRES,u,v,wdir,wspd,year_month
0,35.398070,-118.460928,"(35.3980703184682, -118.4609278999623)","(35.5, -118.75)",13113,2012-06-07,2012-06-07,POINT (-118.4609278999623 35.3980703184682),POINT (-118.46093 35.39807),0.781771,1.561013,-0.93303,329.132904,1.818601,201206
1,34.953104,-118.898129,"(34.95310420967192, -118.8981290564402)","(35.0, -118.75)",13114,2012-06-22,2012-06-22,POINT (-118.8981290564402 34.95310420967192),POINT (-118.89813 34.95310),13.108104,1.252996,-1.590386,308.233063,2.024679,201206
2,34.906430,-119.209412,"(34.90642952082512, -119.2094120604393)","(35.0, -119.375)",13115,2012-06-29,2012-06-29,POINT (-119.2094120604393 34.90642952082512),POINT (-119.20941 34.90643),173.271912,0.908553,-1.504646,301.124908,1.757677,201206
3,34.908043,-118.893601,"(34.90804254980864, -118.8936006460696)","(35.0, -118.75)",13116,2012-06-18,2012-06-18,POINT (-118.8936006460696 34.90804254980864),POINT (-118.89360 34.90804),206.098831,1.252996,-1.590386,308.233063,2.024679,201206
4,35.123759,-119.459332,"(35.12375866296951, -119.4593318706585)","(35.0, -119.375)",13117,2012-06-21,2012-06-21,POINT (-119.4593318706585 35.12375866296951),POINT (-119.45933 35.12376),14.685063,0.908553,-1.504646,301.124908,1.757677,201206
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
395,37.775985,-119.674782,"(37.7759853399845, -119.6747818056264)","(38.0, -119.375)",19088,2012-06-16,2012-11-09,POINT (-119.6747818056264 37.7759853399845),POINT (-119.67478 37.77599),1704.925903,0.711625,0.779979,47.623787,1.055831,201207
396,37.775985,-119.674782,"(37.7759853399845, -119.6747818056264)","(38.0, -119.375)",19088,2012-06-16,2012-11-09,POINT (-119.6747818056264 37.7759853399845),POINT (-119.67478 37.77599),1704.925903,0.487091,0.435951,41.82885,0.65369,201208
397,37.775985,-119.674782,"(37.7759853399845, -119.6747818056264)","(38.0, -119.375)",19088,2012-06-16,2012-11-09,POINT (-119.6747818056264 37.7759853399845),POINT (-119.67478 37.77599),1704.925903,0.213126,0.362542,59.550182,0.420546,201209
398,37.775985,-119.674782,"(37.7759853399845, -119.6747818056264)","(38.0, -119.375)",19088,2012-06-16,2012-11-09,POINT (-119.6747818056264 37.7759853399845),POINT (-119.67478 37.77599),1704.925903,0.435755,0.338613,37.849804,0.551853,201210


### Step 4: Read data

In [36]:
for year in range(1991,2023):
    if os.path.exists(out_dir + str(year) + '_fire_winds.csv'):
        print('- File ' + str(year) + '_fire_winds.csv already exists. -')
    else:
        print(f'----- Starting on year {year} -----')
        df = read_clean_wind(year)
        df_final = add_wind_to_fire(df, fire_fix)
        df_final.to_csv(out_dir  + str(year) + '_fire_winds.csv')
        print('--- Now saving file ' + out_dir  + str(year) + '_fire_winds.csv ---')

- File 1991_fire_winds.csv already exists. -
- File 1992_fire_winds.csv already exists. -
- File 1993_fire_winds.csv already exists. -
- File 1994_fire_winds.csv already exists. -
- File 1995_fire_winds.csv already exists. -
- File 1996_fire_winds.csv already exists. -
- File 1997_fire_winds.csv already exists. -
- File 1998_fire_winds.csv already exists. -
- File 1999_fire_winds.csv already exists. -
- File 2000_fire_winds.csv already exists. -
- File 2001_fire_winds.csv already exists. -
- File 2002_fire_winds.csv already exists. -
- File 2003_fire_winds.csv already exists. -
- File 2004_fire_winds.csv already exists. -
- File 2005_fire_winds.csv already exists. -
- File 2006_fire_winds.csv already exists. -
- File 2007_fire_winds.csv already exists. -
- File 2008_fire_winds.csv already exists. -
- File 2009_fire_winds.csv already exists. -
- File 2010_fire_winds.csv already exists. -
- File 2011_fire_winds.csv already exists. -
- File 2012_fire_winds.csv already exists. -
- File 201

``census geom``

In [53]:
import glob
files = glob.glob(out_dir+'*.csv')
df_all_years = pd.DataFrame()

for f in files:
    print(f)
    if (f != '../data/processed/all_year_fire_wind.csv' 
        and f != 'fire_wind_processed.csv'):
        csv = pd.read_csv(f)
    df_all_years = df_all_years.append(csv)

../data/processed/1993_fire_winds.csv
../data/processed/all_year_fire_wind.csv
../data/processed/1994_fire_winds.csv
../data/processed/2021_fire_winds.csv
../data/processed/1995_fire_winds.csv
../data/processed/2020_fire_winds.csv
../data/processed/1992_fire_winds.csv
../data/processed/1998_fire_winds.csv
../data/processed/2022_fire_winds.csv
../data/processed/1997_fire_winds.csv
../data/processed/1996_fire_winds.csv
../data/processed/1999_fire_winds.csv
../data/processed/1991_fire_winds.csv
../data/processed/2018_fire_winds.csv
../data/processed/2010_fire_winds.csv
../data/processed/2005_fire_winds.csv
../data/processed/2002_fire_winds.csv
../data/processed/2017_fire_winds.csv
../data/processed/2016_fire_winds.csv
../data/processed/2003_fire_winds.csv
../data/processed/2004_fire_winds.csv
../data/processed/2011_fire_winds.csv
../data/processed/2019_fire_winds.csv
../data/processed/2006_fire_winds.csv
../data/processed/2013_fire_winds.csv
../data/processed/2014_fire_winds.csv
../data/p

In [54]:
df_all_years = pd.read_csv(out_dir + 'all_year_fire_wind.csv',index_col=0)
df_all_years['year_month']=df_all_years['year_month'].astype(str)
df_all_years['year']=df_all_years['year_month'].str.slice(start=0,stop=4)
df_all_years['month']=df_all_years['year_month'].str.slice(start=4,stop=6)
df_all_years = df_all_years.drop(['lon','fire_index','lat','wind_lat_lon','geometry','u','v','Unnamed: 0'],axis=1).reset_index(drop=True)
df_all_years = df_all_years.rename(columns={'wdir':'fire_wdir',
                             'wspd':'fire_wspd'})

df_all_years.to_csv(out_dir + 'fire_wind_processed.csv')

In [70]:
df_all_years[df_all_years['fire_lat_lon'] == '(37.74854288820926, -119.7035432531625)']

Unnamed: 0,fire_lat_lon,ALARM_DATE,CONT_DATE,fire_centroid,GIS_ACRES,fire_wdir,fire_wspd,year_month,year,month
7240,"(37.74854288820926, -119.7035432531625)",2007-10-29,2007-12-08,POINT (-119.7035432531625 37.74854288820926),239.260773,24.478956,0.169349,200710,2007,10
7325,"(37.74854288820926, -119.7035432531625)",2007-10-29,2007-12-08,POINT (-119.7035432531625 37.74854288820926),239.260773,215.334213,0.138538,200711,2007,11
7326,"(37.74854288820926, -119.7035432531625)",2007-10-29,2007-12-08,POINT (-119.7035432531625 37.74854288820926),239.260773,178.254211,0.474964,200712,2007,12


In [65]:
df_all_years['fire_lat_lon'][0]

'(35.25539508507167, -119.5812970773511)'