# Main script to clean wind data at the zip code, monthly level

Modules: N/A <br>
Author: Matt Lyons and Trevor Johnson
Email: mattslyons@berkeley.edu, trevorj@berkeley.edu <br>
Date modified: November 5, 2022 <br>

**Citations (data sources)**

``Wind data:`` 

1. Switched from using MERRA2 to ERA5 for a finer grain:

https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels?tab=form


``Shapefiles for California ZIP codes (2010 census):``

2. https://www.census.gov/cgi-bin/geo/shapefiles/index.php?year=2010&layergroup=ZIP+Code+Tabulation+Areas


``How to compute wind speed and direction:``

3. https://stackoverflow.com/questions/21484558/how-to-calculate-wind-direction-from-u-and-v-wind-components-in-r
4. https://github.com/blaylockbk/Ute_WRF/blob/master/functions/wind_calcs.py

``Wind speed and direction intuition:``

5. http://colaweb.gmu.edu/dev/clim301/lectures/wind/wind-uv
6. https://www.earthdatascience.org/courses/use-data-open-source-python/intro-vector-data-python/spatial-data-vector-shapefiles/intro-to-coordinate-reference-systems-python/

``To create maps of this wind data:``

and also used to provide intuition for winddir and windspeed

7. https://disc.gsfc.nasa.gov/information/howto?title=How%20to%20calculate%20and%20plot%20wind%20speed%20using%20MERRA-2%20wind%20component%20data%20using%20Python


**Citations (persons)**
1. Cornelia Ilin (cornelia.ilin@berkeley.edu)

**Preferred environment**
1. Code written in Jupyter Notebooks

### Step 1: Import packages

In [3]:
import pandas as pd
import numpy as np
import netCDF4 as ncdf
import os
from datetime import date, timedelta, datetime
from math import pi

import matplotlib.pyplot as plt
import cartopy.crs as ccrs
from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER
import matplotlib.ticker as mticker

# geography
import sklearn.neighbors
dist = sklearn.neighbors.DistanceMetric.get_metric(
    'haversine'
)

# ignore warnings
import warnings
warnings.filterwarnings(
    'ignore'
)

### Step 2: Define functions

``read_clean wind``

In [53]:
data = ncdf.Dataset(os.path.join(data_path,'wind_2001-2004.nc'), mode='r')

In [54]:
data

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF3_64BIT_OFFSET data model, file format NETCDF3):
    Conventions: CF-1.6
    history: 2022-11-06 08:23:20 GMT by grib_to_netcdf-2.25.1: /opt/ecmwf/mars-client/bin/grib_to_netcdf.bin -S param -o /cache/data6/adaptor.mars.internal-1667721456.1313322-4056-11-b2296c25-db61-4e35-b9f5-09cc66f025b4.nc /cache/tmp/b2296c25-db61-4e35-b9f5-09cc66f025b4-adaptor.mars.internal-1667718816.523943-4056-16-tmp.grib
    dimensions(sizes): longitude(42), latitude(40), time(35064)
    variables(dimensions): float32 longitude(longitude), float32 latitude(latitude), int32 time(time), int16 u10(time, latitude, longitude), int16 v10(time, latitude, longitude), int16 t2m(time, latitude, longitude)
    groups: 

In [58]:
lons = data.variables['longitude']
lats = data.variables['latitude']
times = data.variables['time']
# 2-meter eastward wind m/s
U10 = data.variables['u10']
# 2-meter northward wind m/s
V10 = data.variables['v10']


# Replace vals #
################
#\_FillValues with NaNs:
U10_nans = U10[:]
V10_nans = V10[:]
_FillValueU10 = U2M._FillValue
_FillValueV10 = V2M._FillValue
U10_nans[U10_nans == _FillValueU10] = np.nan
V10_nans[V10_nans == _FillValueV10] = np.nan

# Add new vars #
################
# calculate wind speed
wspd = np.sqrt(U10_nans**2+V10_nans**2)

# calculate wind direction in radians
wdir = np.arctan2(V10_nans, U10_nans)

# transform wind direction from radians to degrees
#dir_to_degrees = np.mod(180+np.rad2deg(np.arctan2(V2M_nans, U2M_nans)), 360) # this computes "wind is blowing from"' meteorological convetion'
wdir_to_degrees = np.mod(np.rad2deg(wdir), 360) # this computes "wind is blowing towards" 'oceonographic convention', see here: https://www.esri.com/arcgis-blog/products/product/analytics/displaying-speed-and-direction-symbology-from-u-and-v-vectors/


## transform to df ##
#####################
# create an empty df for wind speed and direction with size len(lats) x len(lons) 
#df_wdir2m = pd.DataFrame(index=lats[:], columns=lons[:])   
#df_wspd2m = pd.DataFrame(index=lats[:], columns=lons[:])

# same for 10m
df_wdir10m = pd.DataFrame(index=lats[:], columns=lons[:])   
df_wspd10m = pd.DataFrame(index=lats[:], columns=lons[:])

# create an empty df for u and v components with size len(lats) x len(lons) 
# df_u2m = pd.DataFrame(index=lats[:], columns=lons[:])
# df_v2m = pd.DataFrame(index=lats[:], columns=lons[:])

# likewise with 10m
df_u10m = pd.DataFrame(index=lats[:], columns=lons[:])
df_v10m = pd.DataFrame(index=lats[:], columns=lons[:])

In [61]:
print(len(times))

35064


In [68]:
period_length = len(data.variables['time']) + 1

years = pd.date_range(start='2001-01-01',end='2005-01-01',periods=period_length)
years = years[:-1]
years.values

array(['2001-01-01T00:00:00.000000000', '2001-01-01T01:00:00.000000000',
       '2001-01-01T02:00:00.000000000', ...,
       '2004-12-31T21:00:00.000000000', '2004-12-31T22:00:00.000000000',
       '2004-12-31T23:00:00.000000000'], dtype='datetime64[ns]')

In [41]:
df_u10m

Unnamed: 0,-124.410004,-124.160004,-123.910004,-123.660004,-123.410004,-123.160004,-122.910004,-122.660004,-122.410004,-122.160004,...,-116.410004,-116.160004,-115.910004,-115.660004,-115.410004,-115.160004,-114.910004,-114.660004,-114.410004,-114.160004
42.279999,,,,,,,,,,,...,,,,,,,,,,
42.029999,,,,,,,,,,,...,,,,,,,,,,
41.779999,,,,,,,,,,,...,,,,,,,,,,
41.529999,,,,,,,,,,,...,,,,,,,,,,
41.279999,,,,,,,,,,,...,,,,,,,,,,
41.029999,,,,,,,,,,,...,,,,,,,,,,
40.779999,,,,,,,,,,,...,,,,,,,,,,
40.529999,,,,,,,,,,,...,,,,,,,,,,
40.279999,,,,,,,,,,,...,,,,,,,,,,
40.029999,,,,,,,,,,,...,,,,,,,,,,


In [35]:
lons = data.variables['longitude']
lons

<class 'netCDF4._netCDF4.Variable'>
float32 longitude(longitude)
    units: degrees_east
    long_name: longitude
unlimited dimensions: 
current shape = (42,)
filling on, default _FillValue of 9.969209968386869e+36 used

In [36]:
U10 = data.variables['u10']
U10_nans = U10[:]
U10_nans

masked_array(
  data=[[[ 2.33720944,  0.61031802, -0.54355705, ...,  0.72115695,
           0.07023966, -0.26645973],
         [ 2.45118532,  0.89787185, -0.41598772, ...,  0.82049373,
           0.25165998, -0.33494982],
         [ 2.09670988,  0.84977193, -0.37782148, ...,  0.71854282,
           0.23911218, -0.42592139],
         ...,
         [ 1.20372455,  1.47977622,  1.6894291 , ...,  0.71749717,
           0.90623705,  0.76246014],
         [ 1.32972541,  1.54878914,  1.71191058, ...,  0.78180467,
           0.78598727,  0.64743861],
         [ 1.36370904,  1.52473918,  1.62459878, ...,  0.54862466,
           0.6369821 ,  0.47490631]],

        [[ 1.8060191 , -0.15666645, -1.01723662, ...,  0.79069269,
           0.13402432, -0.15509798],
         [ 2.174088  ,  0.40171079, -0.80340114, ...,  0.89630337,
           0.39648254, -0.1472556 ],
         [ 1.99841875,  0.51986927, -0.75007298, ...,  0.80794592,
           0.4351716 , -0.2429326 ],
         ...,
         [ 0.6730570

In [72]:
U10_nans[0].shape

(40, 42)

In [9]:
in_dir = 'C:\\Users\\matts\\Downloads\\wind_data_download\\ERA5\\'

In [3]:
# testing file name manipulation to get years
for file in os.listdir(in_dir):
    temp_file = file[5:-3]
    start_year = temp_file[0:4]
    end_year = temp_file[-4:]
    years = int(end_year) - int(start_year) + 1
    print("File",file,"starts with",start_year,"and ends with",end_year,"spanning",years,"year(s).")

File wind_2001-2004.nc starts with 2001 and ends with 2004 spanning 4 year(s).
File wind_2005-2008.nc starts with 2005 and ends with 2008 spanning 4 year(s).
File wind_2009-2012.nc starts with 2009 and ends with 2012 spanning 4 year(s).
File wind_2013-2016.nc starts with 2013 and ends with 2016 spanning 4 year(s).
File wind_2017.nc starts with 2017 and ends with 2017 spanning 1 year(s).


In [4]:
for file in os.listdir(in_dir):

    data = ncdf.Dataset(in_dir + file, mode='r')

    temp_file = file[5:-3]
    start_year = temp_file[0:4]
    end_year = temp_file[-4:]
    years = int(end_year) - int(start_year) + 1
    print("File",file,"starts with",start_year,"and ends with",end_year,"spanning",years,"year(s).")

    period_length = len(data.variables['time']) + 1

    years = pd.date_range(start=start_year+'-01-01',end=str(int(end_year)+1)+'-01-01',periods=period_length)
    years = years[:-1]
    print(years.values)
    
    print("Time array and data array lengths match:",len(data.variables['time'])==len(years))

File wind_2001-2004.nc starts with 2001 and ends with 2004 spanning 4 year(s).
['2001-01-01T00:00:00.000000000' '2001-01-01T01:00:00.000000000'
 '2001-01-01T02:00:00.000000000' ... '2004-12-31T21:00:00.000000000'
 '2004-12-31T22:00:00.000000000' '2004-12-31T23:00:00.000000000']
Time array and data array lengths match: True
File wind_2005-2008.nc starts with 2005 and ends with 2008 spanning 4 year(s).
['2005-01-01T00:00:00.000000000' '2005-01-01T01:00:00.000000000'
 '2005-01-01T02:00:00.000000000' ... '2008-12-31T21:00:00.000000000'
 '2008-12-31T22:00:00.000000000' '2008-12-31T23:00:00.000000000']
Time array and data array lengths match: True
File wind_2009-2012.nc starts with 2009 and ends with 2012 spanning 4 year(s).
['2009-01-01T00:00:00.000000000' '2009-01-01T01:00:00.000000000'
 '2009-01-01T02:00:00.000000000' ... '2012-12-31T21:00:00.000000000'
 '2012-12-31T22:00:00.000000000' '2012-12-31T23:00:00.000000000']
Time array and data array lengths match: True
File wind_2013-2016.nc st

In [5]:
print(years[24].day)

2


In [43]:
# code to test over-time for function
# do not run unless testing!

# create empty df
df = pd.DataFrame()

# create list to store dfs
# previous version concatenated--this is super inefficient if repeating
df_list = []

for file in os.listdir(in_dir):

    print("Processing file",file)

    ## read .nc file ##
    ###################
    data = ncdf.Dataset(in_dir + file, mode='r')

    temp_file = file[5:-3]
    start_year = temp_file[0:4]
    end_year = temp_file[-4:]
    num_years = int(end_year) - int(start_year) + 1
    # print("File",file,"starts with",start_year,"and ends with",end_year,"spanning",years,"year(s).")

    period_length = len(data.variables['time']) + 1

    years = pd.date_range(start=start_year+'-01-01',end=str(int(end_year)+1)+'-01-01',periods=period_length)
    years = years[:-1]

    # grab vars of interest ##
    ##########################
    # longitude and latitude
    lons = data.variables['longitude']
    lats = data.variables['latitude']
    # 10-meter eastward wind m/s
    U10 = data.variables['u10']
    # 10-meter northward wind m/s
    V10 = data.variables['v10']
    # 2-meter temperature measure
#     T2M = data.variables['t2m']

    # Replace vals #
    ################
    #\_FillValues with NaNs:
    U10_nans = U10[:]
    V10_nans = V10[:]
    _FillValueU10 = U10._FillValue
    _FillValueV10 = V10._FillValue
    U10_nans[U10_nans == _FillValueU10] = np.nan
    V10_nans[V10_nans == _FillValueV10] = np.nan

    # Add new vars #
    ################
    # calculate wind speed
    wspd = np.sqrt(U10_nans**2+V10_nans**2)

    # calculate wind direction in radians
    wdir = np.arctan2(V10_nans, U10_nans)
   
    # transform wind direction from radians to degrees
    #dir_to_degrees = np.mod(180+np.rad2deg(np.arctan2(V2M_nans, U2M_nans)), 360) # this computes "wind is blowing from"' meteorological convetion'
    wdir_to_degrees = np.mod(np.rad2deg(wdir), 360) # this computes "wind is blowing towards" 'oceanographic convention', see here: https://www.esri.com/arcgis-blog/products/product/analytics/displaying-speed-and-direction-symbology-from-u-and-v-vectors/

    for h_index in range(200):

        hour_timestamp = years[h_index]

        if hour_timestamp.day == 1 and hour_timestamp.hour == 0:
            print("Now processing",hour_timestamp.strftime("%m/%d/%Y"))

        ## transform to df ##
        #####################
        # create an empty df for wind speed and direction with size len(lats) x len(lons) 
        df_wdir = pd.DataFrame(index=lats[:], columns=lons[:])   
        df_wspd = pd.DataFrame(index=lats[:], columns=lons[:])

        # create an empty df for u and v components with size len(lats) x len(lons) 
        df_u = pd.DataFrame(index=lats[:], columns=lons[:])
        df_v = pd.DataFrame(index=lats[:], columns=lons[:])

        # populate each row in the empty df above with the wdir_meteo and wspd data and u and v components
        for idx, idx_val in enumerate(df_wdir.index):
            df_wdir.loc[idx_val, :] = wdir_to_degrees[h_index][idx]
            df_wspd.loc[idx_val, :] = wspd[h_index][idx]
            df_u.loc[idx_val, :] = U10_nans[h_index][idx]
            df_v.loc[idx_val, :] = V10_nans[h_index][idx]

        # add index (latitude) as column
        df_wdir.reset_index(
            drop=False,
            inplace=True
        )

        df_wdir.rename(
            columns={'index':'lat'},
            inplace=True
        )


        df_wspd.reset_index(
            drop=False,
            inplace=True
        )

        df_wspd.rename(
            columns={'index':'lat'},
            inplace=True
        )

        df_u.reset_index(
            drop=False,
            inplace=True
        )

        df_u.rename(
            columns={'index':'lat'},
            inplace=True
        )

        df_v.reset_index(
            drop=False,
            inplace=True
        )

        df_v.rename(
            columns={'index':'lat'},
            inplace=True
        )

        # transform from wide to long
        df_wdir = pd.melt(
            df_wdir, id_vars='lat',
            var_name='lon',
            value_vars=lons[:],
            value_name='wdir'
        )

        df_wspd = pd.melt(
            df_wspd,
            id_vars='lat',
            var_name='lon',
            value_vars=lons[:],
            value_name='wspd'
        )

        df_u = pd.melt(
            df_u, id_vars='lat',
            var_name='lon',
            value_vars=lons[:],
            value_name='u'
        )

        df_v = pd.melt(
            df_v, id_vars='lat',
            var_name='lon',
            value_vars=lons[:],
            value_name='v'
        )

        # concatenate df_wdir and df_wspd
        df_temp1 = df_wdir.merge(
            df_wspd,
            on=['lat', 'lon'],
            how='left'
        )

        # concatenate df_u and df_v
        df_temp2 = df_u.merge(
            df_v,
            on=['lat', 'lon'],
            how='left'
        )

        # concatenate df_temp1 and df_temp2
        df_temp = df_temp2.merge(
            df_temp1,
            on=['lat', 'lon'],
            how='left'
        )
        
        df_temp = df_temp.round(6)

        df_temp['Datetime'] = hour_timestamp

        df_list.append(df_temp)

# assemble final df
df = pd.concat(df_list)

Processing file wind_2001-2004.nc
Now processing 01/01/2001
Processing file wind_2005-2008.nc
Now processing 01/01/2005
Processing file wind_2009-2012.nc
Now processing 01/01/2009
Processing file wind_2013-2016.nc
Now processing 01/01/2013
Processing file wind_2017.nc
Now processing 01/01/2017


In [29]:
lons = data.variables['longitude']
lonsna = lons[:]
print(lonsna)

[-124.41 -124.16 -123.91 -123.66 -123.41 -123.16 -122.91 -122.66 -122.41
 -122.16 -121.91 -121.66 -121.41 -121.16 -120.91 -120.66 -120.41 -120.16
 -119.91 -119.66 -119.41 -119.16 -118.91 -118.66 -118.41 -118.16 -117.91
 -117.66 -117.41 -117.16 -116.91 -116.66 -116.41 -116.16 -115.91 -115.66
 -115.41 -115.16 -114.91 -114.66 -114.41 -114.16]


In [44]:
# transform vars
df['lat'] = df.lat.astype(float)
df['lon'] = df.lon.astype(float)

In [45]:
df.head(10)

Unnamed: 0,lat,lon,u,v,wdir,wspd,Datetime
0,42.279999,-124.410004,1.316132,-4.17089,287.513185,4.373617,2001-01-01
1,42.029999,-124.410004,1.720276,-4.124691,292.639445,4.469052,2001-01-01
2,41.779999,-124.410004,2.337209,-4.626282,296.80302,5.183149,2001-01-01
3,41.529999,-124.410004,2.451185,-5.043875,295.918485,5.607939,2001-01-01
4,41.279999,-124.410004,2.09671,-5.050475,292.545832,5.468408,2001-01-01
5,41.029999,-124.410004,1.867712,-5.040875,290.33035,5.375758,2001-01-01
6,40.779999,-124.410004,1.367369,-4.978476,285.357931,5.16284,2001-01-01
7,40.529999,-124.410004,1.239799,-4.916677,284.152774,5.070584,2001-01-01
8,40.279999,-124.410004,1.855165,-5.928259,287.37677,6.211755,2001-01-01
9,40.029999,-124.410004,3.085372,-7.120438,293.427669,7.760165,2001-01-01


In [49]:
def read_clean_wind():
    ''''''
    # create empty df
    df = pd.DataFrame()
    
    # create list to store dfs
    # previous version concatenated--this is super inefficient if repeating
    df_list = []

    for file in os.listdir(in_dir):
        
        print("Processing file",file)
        
        ## read .nc file ##
        ###################
        data = ncdf.Dataset(in_dir + file, mode='r')

        temp_file = file[5:-3]
        start_year = temp_file[0:4]
        end_year = temp_file[-4:]
        num_years = int(end_year) - int(start_year) + 1
        # print("File",file,"starts with",start_year,"and ends with",end_year,"spanning",years,"year(s).")
        
        period_length = len(data.variables['time']) + 1

        years = pd.date_range(start=start_year+'-01-01',end=str(int(end_year)+1)+'-01-01',periods=period_length)
        years = years[:-1]
        

        # grab vars of interest ##
        ##########################
        # longitude and latitude
        lons = data.variables['longitude']
        lats = data.variables['latitude']
        # 10-meter eastward wind m/s
        U10 = data.variables['u10']
        # 10-meter northward wind m/s
        V10 = data.variables['v10']
        # 2-meter temperature measure
        T2M = data.variables['t2m']

        # Replace vals #
        ################
        #\_FillValues with NaNs:
        U10_nans = U10[:]
        V10_nans = V10[:]
        T2M_nans = T2M[:]
        _FillValueU10 = U10._FillValue
        _FillValueV10 = V10._FillValue
        _FillValueT2M = T2M._FillValue
        U10_nans[U10_nans == _FillValueU10] = np.nan
        V10_nans[V10_nans == _FillValueV10] = np.nan
        T2M_nans[T2M_nans == _FillValueT2M] = np.nan
        
        U10_nans = U10_nans
        V10_nans = V10_nans
        T2M_nans = T2M_nans
        
        # Add new vars #
        ################
        # calculate wind speed
        wspd = np.sqrt(U10_nans**2+V10_nans**2)
        wspd = wspd

        # calculate wind direction in radians
        wdir = np.arctan2(V10_nans, U10_nans)

        # transform wind direction from radians to degrees
        #dir_to_degrees = np.mod(180+np.rad2deg(np.arctan2(V2M_nans, U2M_nans)), 360) # this computes "wind is blowing from"' meteorological convetion'
        wdir_to_degrees = np.mod(np.rad2deg(wdir), 360) # this computes "wind is blowing towards" 'oceanographic convention', see here: https://www.esri.com/arcgis-blog/products/product/analytics/displaying-speed-and-direction-symbology-from-u-and-v-vectors/

        wdir = wdir
        wdir_to_degrees = wdir_to_degrees
        
        for h_index in range(len(years)):

            hour_timestamp = years[h_index]

            if hour_timestamp.day == 1 and hour_timestamp.hour == 0:
                print("Now processing",hour_timestamp.strftime("%m/%d/%Y"))

            ## transform to df ##
            #####################
            # create an empty df for wind speed and direction with size len(lats) x len(lons) 
            df_wdir = pd.DataFrame(index=lats[:], columns=lons[:])   
            df_wspd = pd.DataFrame(index=lats[:], columns=lons[:])

            # create an empty df for u and v components with size len(lats) x len(lons) 
            df_u = pd.DataFrame(index=lats[:], columns=lons[:])
            df_v = pd.DataFrame(index=lats[:], columns=lons[:])

            # populate each row in the empty df above with the wdir_meteo and wspd data and u and v components
            for idx, idx_val in enumerate(df_wdir.index):
                df_wdir.loc[idx_val, :] = wdir_to_degrees[h_index][idx]
                df_wspd.loc[idx_val, :] = wspd[h_index][idx]
                df_u.loc[idx_val, :] = U10_nans[h_index][idx]
                df_v.loc[idx_val, :] = V10_nans[h_index][idx]

            # add index (latitude) as column
            df_wdir.reset_index(
                drop=False,
                inplace=True
            )

            df_wdir.rename(
                columns={'index':'lat'},
                inplace=True
            )


            df_wspd.reset_index(
                drop=False,
                inplace=True
            )

            df_wspd.rename(
                columns={'index':'lat'},
                inplace=True
            )

            df_u.reset_index(
                drop=False,
                inplace=True
            )

            df_u.rename(
                columns={'index':'lat'},
                inplace=True
            )

            df_v.reset_index(
                drop=False,
                inplace=True
            )

            df_v.rename(
                columns={'index':'lat'},
                inplace=True
            )

            # transform from wide to long
            df_wdir = pd.melt(
                df_wdir, id_vars='lat',
                var_name='lon',
                value_vars=lons[:],
                value_name='wdir'
            )

            df_wspd = pd.melt(
                df_wspd,
                id_vars='lat',
                var_name='lon',
                value_vars=lons[:],
                value_name='wspd'
            )

            df_u = pd.melt(
                df_u, id_vars='lat',
                var_name='lon',
                value_vars=lons[:],
                value_name='u'
            )

            df_v = pd.melt(
                df_v, id_vars='lat',
                var_name='lon',
                value_vars=lons[:],
                value_name='v'
            )

            # concatenate df_wdir and df_wspd
            df_temp1 = df_wdir.merge(
                df_wspd,
                on=['lat', 'lon'],
                how='left'
            )

            # concatenate df_u and df_v
            df_temp2 = df_u.merge(
                df_v,
                on=['lat', 'lon'],
                how='left'
            )

            # concatenate df_temp1 and df_temp2
            df_temp = df_temp2.merge(
                df_temp1,
                on=['lat', 'lon'],
                how='left'
            )

            df_temp = df_temp.round(6)

            df_temp['Datetime'] = hour_timestamp

            df_list.append(df_temp)
            
    # assemble final df
    df = pd.concat(df_list)
    
    # transform vars
    df['lat'] = df.lat.astype(float)
    df['lon'] = df.lon.astype(float)
    
    return df

### Step 4: Read data

``wind``

In [50]:
df = read_clean_wind()
df.head(2)

Processing file wind_2001-2004.nc
Now processing 01/01/2001
Now processing 02/01/2001
Now processing 03/01/2001
Now processing 04/01/2001
Now processing 05/01/2001
Now processing 06/01/2001
Now processing 07/01/2001
Now processing 08/01/2001
Now processing 09/01/2001
Now processing 10/01/2001
Now processing 11/01/2001
Now processing 12/01/2001
Now processing 01/01/2002
Now processing 02/01/2002
Now processing 03/01/2002
Now processing 04/01/2002
Now processing 05/01/2002
Now processing 06/01/2002
Now processing 07/01/2002
Now processing 08/01/2002
Now processing 09/01/2002
Now processing 10/01/2002
Now processing 11/01/2002
Now processing 12/01/2002
Now processing 01/01/2003
Now processing 02/01/2003
Now processing 03/01/2003
Now processing 04/01/2003
Now processing 05/01/2003
Now processing 06/01/2003
Now processing 07/01/2003
Now processing 08/01/2003
Now processing 09/01/2003
Now processing 10/01/2003
Now processing 11/01/2003
Now processing 12/01/2003
Now processing 01/01/2004
Now 

Unnamed: 0,lat,lon,u,v,wdir,wspd,Datetime
0,42.279999,-124.410004,1.316132,-4.17089,287.513185,4.373617,2001-01-01
1,42.029999,-124.410004,1.720276,-4.124691,292.639445,4.469052,2001-01-01


In [51]:
len(df)

250346880

# Yikes, save it off and we'll resume in Spark

In [52]:
compression_opts = dict(method='zip',archive_name='out.csv')

In [54]:
df.to_csv('I:\\.shortcut-targets-by-id\\11wLy1WKwOTcthBs1rpfEzkqax2BZG-6E\\W210_Capstone\\Data\\wind\\era5_all_wind_data_rounded.csv')

OSError: [Errno 22] Invalid argument