# Background
Wind power is one of the oldest and most recognizable types of renewable energy. Many readers may have memories of watching wind turbines through the window on flights or long rides in a car or train. And that is something to be noted; we don’t often see them. They are in areas we ignore. Wind turbines cannot be placed everywhere, and with heights of up to 400 feet, their vibrations adversely affect humans and animals. 

Despite wind power reliability, the approximately 54,000 turbines in the United States provided just 7.3% of the country’s energy in 2019. As shown in Figure 1, turbines are utilized in relatively few areas, especially when considering the United States’ greater than 1.75 million square miles of uninhabited land, as per the last census.

Of course, not all of this uninhabited land is ideal for installations. National parks, military bases, geographic entities, and more make wind power untenable in certain areas. Additionally, there needs to be enough wind to rotate the large rotors. 

### Research Questions


*   Where are the viable locations for the construction of wind turbines?
*   How much energy will be generated by the implementation of wind turbines in specific locations?
*   How does it affect humans in the future by replacing fossil fuels?

# Summary 
### Motivation
Our motivation for this project is to improve the environment and reduce the reliance on fossil fuels by locating viable areas where wind turbines can be constructed and calculate the potential energy production. What makes this important is because wind turbines are often disliked by humans and disturb local habitats. We need to find locations outside of large cities, away from protected areas, have a large amount of wind, and away from established wind turbines. Meeting these guidelines can be troublesome as identifying land with these traits can be somewhat problematic. 

### What This Project Is
This project is simply the first step in an extended process that (hopefully) ends in the proliferation of wind turbines in the United States. This project estimates optimal areas for wind turbine placements in the continental United States based on reasonable constraints. Additionally, taking a short analysis in some the benefits of a turbine(s) in the highest productivity location.

#### Identifying Turbine Placement Areas
We started with the continental United States and removed the areas turbines could not be placed in for identifying viable locations. Areas disqualified for this analysis in turbine placement are as follows:
*   Areas with existing turbines
*   Major cities (Population 10,000 and greater)
*   Designated protected areas
These disqualified areas are self-explanatory. It is not reasonable to place new turbines where existing turbines already are, or in or around large populations, or in national parks, military bases, protected wildlife areas, and so on. Our analysis determined potential turbine areas by systematically eliminating the disqualifying areas from contention, which reduced the entire area of the United States to only areas where turbine placement is practical.





# Data Sources
### [USGS United States Wind Turbines Dataset](https://eerscmap.usgs.gov/uswtdb/data/)
The USGS is one of the most comprehensive hosts of information about geographical data in the United States. Each row contains information like turbine location, size, energy output, rotor size, turbine manufacturer, turbine model, year online, and more.

### [ArcGIS United States Major Cities Dataset](https://hub.arcgis.com/datasets/esri::usa-major-cities?geometry=-107.388%2C31.327%2C-105.081%2C31.737)
An inherent problem with wind turbines is that they are enormous. Size plays a role in our analysis because they are a form of disturbance to landscape (physically and visually). Because of their size, turbines can cause nausea in nearby humans due to the generation of low-frequency vibrations. This dataset provides us information on the location (latitude/longitude), city population, and general demographics. 

### [NOAA United States Annual Wind Speed Dataset](https://www.climate.gov/maps-data/dataset/average-wind-speeds-map-viewer)
Knowing areas with high wind speeds is vital for efficient wind turbines. We will be pulling from two datasets with information on the average north-south and west-east wind speeds throughout the world. Knowing the values, we can calculate median wind directions and speed at a resolution of 2.5 degrees. 

### [USGS United States Protected Areas Dataset](https://www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/science/pad-us-data-download?qt-science_center_objects=0#qt-science_center_objects)
As mentioned previously, there are limitations to the places where wind turbines can be located; we will be removing '2 - managed for biodiversity - disturbance events suppressed'. These locations in the United States are where there cannot be turbine constructions. These locations be National Parks, Wilderness areas, protected habitats, endangered species locations, etc.

### [United States Census Cartographic Boundary Dataset](https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html)
Cartographic Boundary Dataset is a collection of boundary files of the United States by the Census. For this analysis, we will be using the state’s boundaries at 20m resolution to focus on the entire United States without extensive geometry. This dataset has a total of 52 rows and nine columns. The extra two rows come from Puerto Rico and Guam.

### lat_lon_windspeed.csv and lat_lon_windspeed_REDUCED_RESOLUTION.csv
These two files were created during this project and are imperative to the results and visualizations. They contain the latitude, longitude, and annual wind speed at geographic intervals across the U.S. The first file contains points approximately every 700 feet, and the second file is at ¼ the first file’s resolution.


# Installs

In [None]:
!pip install pyshp
!pip install netCDF4
!pip install pycrs
!pip install fiona
!pip install geopandas
!pip install shapely

Collecting pyshp
[?25l  Downloading https://files.pythonhosted.org/packages/38/85/fbf87e7aa55103e0d06af756bdbc15cf821fa580414c23142d60a35d4f85/pyshp-2.1.3.tar.gz (219kB)
[K     |█▌                              | 10kB 16.8MB/s eta 0:00:01[K     |███                             | 20kB 10.9MB/s eta 0:00:01[K     |████▌                           | 30kB 8.6MB/s eta 0:00:01[K     |██████                          | 40kB 7.5MB/s eta 0:00:01[K     |███████▌                        | 51kB 4.6MB/s eta 0:00:01[K     |█████████                       | 61kB 5.2MB/s eta 0:00:01[K     |██████████▍                     | 71kB 5.3MB/s eta 0:00:01[K     |████████████                    | 81kB 5.6MB/s eta 0:00:01[K     |█████████████▍                  | 92kB 5.5MB/s eta 0:00:01[K     |███████████████                 | 102kB 5.7MB/s eta 0:00:01[K     |████████████████▍               | 112kB 5.7MB/s eta 0:00:01[K     |██████████████████              | 122kB 5.7MB/s eta 0:00:01[K    

# Imports

In [None]:
import os
import requests
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import geopandas as gpd
import shapefile
from descartes import PolygonPatch
import netCDF4
import pycrs

# Files

In [None]:
turbine_shp = 'source/Turbine_SHP/uswtdb_v3_3_20210114.shp'
cities_shp = 'source/USA_Major_Cities_SHP/0c5a2fa1-3463-4fc7-99e5-e206023a7e682020313-1-nmlntc.mln9c.shp'
boundaries_shp = 'source/US_State_Boundaries_SHP/cb_2018_us_state_20m.shp'
annual_wind_uwnd = 'source/Annual_Wind_speed/uwnd.sig995.2020.nc'
annual_wind_vwnd = 'source/Annual_Wind_speed/vwnd.sig995.2020.nc'
protected_shp = 'source/US_PADS_SHP/US_PADS_SHP_CON.shp'

assert os.path.exists(turbine_shp), "Turbine SHP file does not exist."
assert os.path.exists(cities_shp), "Cities over 10,000 SHP file does not exist."
assert os.path.exists(boundaries_shp), "US State Boundaries SHP file does not exist."
assert os.path.exists(annual_wind_uwnd), "Annual Wind Speed uwnd SHP file does not exist."
assert os.path.exists(annual_wind_vwnd), "Annual Wind Speed vwnd SHP file does not exist."
assert os.path.exists(protected_shp), "Protected Areas SHP file does not exist."

# Globals Variables

In [None]:
PASS_STATES = set(['HI', 'AK', 'PR', 'GU'])
PADS_PASS_STATES = set(['Alaska', 'Puerto Rico', 'Unknown Federal', 'United States Virgin Islands', 'Hawaii', 'U.S. Minor Outlying Islands', 'Mariana Islands', 'American Samoa', 'Federated States of Micronesia', 'Palau'])
PADS_PASS_DES = set(['2 - managed for biodiversity - disturbance events suppressed'])
LAT_BTW_POINTS_MASTER = 0.0025
LON_BTW_POINTS_MASTER = 0.002975

# Turbine Data

The USWTDB is very well maintained and required little cleaning. At the time of our extraction of the data, there were no missing values. However, the database replaces unknown turbine radius (‘t_rd’) values with -9999.0. This value is essential because it determines the minimum distance at which other turbines can be placed. To alleviate this, we filtered out the rows that didn’t have a valid turbine radius as they made up a small portion of the dataset (< 10%). We also filtered out turbines, not in the continental U.S. We created a ‘buffer_radius’ field for the remaining records, which contains a turbine’s minimum distance to other turbines. The quantity is equal to 7 times the turbine radius, expressed in miles. Fields relevant to our analysis were ‘latitude,’ ‘longitude,’ ‘t_rd,’ and ‘buffer_radius.’ We collected, filtered, and extracted the shapefile data using the PyShp and stored it in a Pandas DataFrame for use in the project.

In [None]:
def get_USWTDB_data():
  turbine_sf = shapefile.Reader(turbine_shp)
  items = list()
  for record, shape in zip(turbine_sf.records(), turbine_sf.shapes()):
      attributes = record.as_dict()
      if attributes['t_state'] not in PASS_STATES:
          items.append(attributes)
  turbine_df = pd.DataFrame(items)
  turbine_df.rename(columns={'xlong': 'longitude','ylat': 'latitude'}, inplace=True, errors='raise')
  turbine_df = turbine_df[turbine_df['t_rd'] != -9999.0]
  turbine_df['buffer_radius'] = (turbine_df['t_rd'] * 7 * 0.000621371) / 2 # now in miles
  return turbine_df

In [None]:
get_USWTDB_data().head()

Unnamed: 0,case_id,faa_ors,faa_asn,usgs_pr_id,eia_id,t_state,t_county,t_fips,p_name,p_year,p_tnum,p_cap,t_manu,t_model,t_cap,t_hh,t_rd,t_rsa,t_ttlh,retrofit,retrofit_y,t_conf_atr,t_conf_loc,t_img_date,t_img_srce,longitude,latitude,buffer_radius
14,3060912.0,19-028065,2014-WTE-4081-OE,-9999.0,-9999.0,IA,Story County,19169,30 MW Iowa DG Portfolio,2017.0,10.0,30.0,Nordex,AW125/3000,3000.0,87.5,125.0,12271.85,150.0,0.0,-9999.0,3.0,3.0,2017-05-13,Digital Globe,-93.51371,42.019119,0.27185
15,3063321.0,19-028135,2014-WTE-4087-OE,-9999.0,-9999.0,IA,Hardin County,19083,30 MW Iowa DG Portfolio,2017.0,10.0,30.0,Nordex,AW125/3000,3000.0,87.5,125.0,12271.85,150.0,0.0,-9999.0,3.0,3.0,2017-06-20,Digital Globe,-93.367798,42.49794,0.27185
16,3049500.0,19-028030,2014-WTE-4080-OE,-9999.0,-9999.0,IA,Story County,19169,30 MW Iowa DG Portfolio,2017.0,10.0,30.0,Nordex,AW125/3000,3000.0,87.5,125.0,12271.85,150.0,0.0,-9999.0,3.0,3.0,2017-05-13,Digital Globe,-93.515892,42.016373,0.27185
17,3063269.0,19-028130,2016-WTE-5934-OE,-9999.0,-9999.0,IA,Story County,19169,30 MW Iowa DG Portfolio,2017.0,10.0,30.0,Nordex,AW125/3000,3000.0,87.5,125.0,12271.85,150.0,0.0,-9999.0,3.0,3.0,2017-07-23,Digital Globe,-93.632835,41.882477,0.27185
18,3053390.0,19-028015,2015-WTE-6386-OE,-9999.0,-9999.0,IA,Boone County,19015,30 MW Iowa DG Portfolio,2017.0,10.0,30.0,Nordex,AW125/3000,3000.0,87.5,125.0,12271.85,150.0,0.0,-9999.0,3.0,3.0,2017-06-01,Digital Globe,-93.700424,41.977608,0.27185


# City Data
The Major Cities Dataset is similarly well-maintained. There were no missing values, and we used the PyShp and pandas libraries to extract and transform the data from the shapefile. Like above, we filtered the data to only include cities in the continental U.S. Important fields were ‘latitude,’ ‘longitude,’ and ‘POP_CLASS,’ which is a categorical designation for cities based on their populations. The data is stored in Pandas DataFrame for use in the project.

In [None]:
def get_cities_data():
  city_sf = shapefile.Reader(cities_shp)
  items = list()
  for record, shape in zip(city_sf.records(), city_sf.shapes()):
      attributes = record.as_dict()
      if attributes['ST'] not in PASS_STATES:
          attributes['latitude'] = shape.points[0][1]
          attributes['longitude'] = shape.points[0][0]
          items.append(attributes)
  cities_df = pd.DataFrame(items)
  return cities_df

In [None]:
get_cities_data().head()

Unnamed: 0,FID,NAME,CLASS,ST,STFIPS,PLACEFIPS,CAPITAL,POP_CLASS,POPULATION,POP2010,WHITE,BLACK,AMERI_ES,ASIAN,HAWN_PI,HISPANIC,OTHER,MULT_RACE,MALES,FEMALES,AGE_UNDER5,AGE_5_9,AGE_10_14,AGE_15_19,AGE_20_24,AGE_25_34,AGE_35_44,AGE_45_54,AGE_55_64,AGE_65_74,AGE_75_84,AGE_85_UP,MED_AGE,MED_AGE_M,MED_AGE_F,HOUSEHOLDS,AVE_HH_SZ,HSEHLD_1_M,HSEHLD_1_F,MARHH_CHD,MARHH_NO_C,MHH_CHILD,FHH_CHILD,FAMILIES,AVE_FAM_SZ,HSE_UNITS,VACANT,OWNER_OCC,RENTER_OCC,latitude,longitude
0,1,Ammon,city,ID,16,1601990,,6,15181,13816,13002,73,67,113,9,884,307,245,6750,7066,1468,1503,1313,1058,734,2031,1767,1446,1136,665,486,209,29.6,28.0,30.8,4476,3.05,457,648,1618,1131,106,335,3352,3.61,4747,271,3205,1271,43.475792,-111.954103
1,2,Blackfoot,city,ID,16,1607840,,6,11946,11899,9893,40,418,125,18,2192,1077,328,5907,5992,1239,1099,890,817,818,1799,1235,1330,1143,721,579,229,30.8,29.7,32.1,4229,2.74,563,690,1091,1081,174,381,2958,3.31,4547,318,2788,1441,43.193937,-112.345567
2,3,Boise City,city,ID,16,1608830,State,8,225405,205671,182991,3043,1404,6501,457,14606,5139,6136,101690,103981,13155,12933,12750,13959,16966,32135,27048,29595,24177,12176,7087,3690,35.3,34.4,36.5,85704,2.36,16605,18104,16708,21233,2414,5919,50647,2.97,92700,6996,52345,33359,43.599015,-116.23011
3,4,Burley,city,ID,16,1611260,,6,10727,10345,7984,45,103,74,5,3460,1795,339,5136,5209,1073,959,790,768,699,1445,1136,1134,935,679,464,263,30.9,29.6,32.3,3644,2.76,498,634,950,861,139,358,2499,3.37,3885,241,2183,1461,42.536741,-113.793293
4,5,Caldwell,city,ID,16,1612250,,7,53942,46237,35856,300,539,406,41,16347,7449,1646,22821,23416,4962,4397,3803,3779,3687,7571,5559,4744,3624,2296,1222,593,28.1,27.3,28.9,14895,3.0,1795,2250,4407,3113,686,1755,10776,3.51,16323,1428,9699,5196,43.661626,-116.685619


# PAD Data
PADS differs from the turbines and cities datasets because it contains areas (like national parks and military bases) rather than point locations (like wind turbines and cities). As such, PADS required a slightly modified approach from that of the turbines and cities datasets, although, like the others, the data are well-maintained, required little to no cleaning, and we were only concerned with the geographic quantities (not the attributes for each area). We used PyShp to extract the attributes, geographic polygons, and bounding boxes (rectangular geographic area approximations) into separate DataFrames. We also filtered these data to include only the continental U.S. and not include biodiversity disturbance event areas, which are available for wind turbine construction.

The ‘bbox’ attribute is the most important in this dataset. We expanded the attribute by tabulating the centroid, east-west span, and north-south span of each bounding box. These quantities are necessary to rule out these areas from the areas of potential turbine placement.


In [None]:
def get_PADS_data():
  pad_sf = shapefile.Reader(protected_shp)
  
  pad_items = list()
  items = list()
  bbox_list = list()
  states = []
  
  i = 0
  for record, shape in zip(pad_sf.records(), pad_sf.shapes()):
    attributes = record.as_dict()
    if attributes['d_GAP_Sts'] not in PADS_PASS_DES:
      if attributes['d_State_Nm'] not in PADS_PASS_STATES:
        poly = pad_sf.shape(i).__geo_interface__
        bbox = pad_sf.shape(i).bbox
        states.append(attributes['d_State_Nm'])
        items.append(poly)
        pad_items.append(attributes)
        bbox_list.append(bbox)
    i += 1

  poly_df = pd.DataFrame(items)
  pad_df = pd.DataFrame(pad_items)
  bbox_df = pd.DataFrame(data={'name': states, 'bbox': bbox_list})
  poly_df['name'] = states
  return pad_df, poly_df, bbox_df

In [None]:
pad_df, poly_df, bbox_df = get_PADS_data()

In [None]:
pad_df.head(2)

Unnamed: 0,Category,d_Category,Own_Type,d_Own_Type,Own_Name,d_Own_Name,Loc_Own,Mang_Type,d_Mang_Typ,Mang_Name,d_Mang_Nam,Loc_Mang,Des_Tp,d_Des_Tp,Loc_Ds,Unit_Nm,Loc_Nm,State_Nm,d_State_Nm,Agg_Src,GIS_Src,Src_Date,GIS_Acres,Source_PAI,WDPA_Cd,Access,d_Access,Access_Src,Access_Dt,GAP_Sts,d_GAP_Sts,GAPCdSrc,GAPCdDt,IUCN_Cat,d_IUCN_Cat,IUCNCtSrc,IUCNCtDt,Date_Est,Comments,SHAPE_Leng,SHAPE_Area
0,Other,Other,FED,Federal,USBR,Bureau of Reclamation,United States Bureau of Reclamation,STAT,State,SPR,State Park and Recreation,Idaho Department of Parks and Recreation,SP,State Park,State Park,Lake Cascade State Park,Lake Cascade State Park - Blue Heron Campground,ID,Idaho,USGS_PADUS1_4Designation_USBR_PADUS1_3,IDPR_Managed_Land.shp,2006/03/15,9,,0,OA,Open Access,IDFG_PAD-US v.1.3.gdb,2016,3,3 - managed for multiple uses - subject to ext...,GAP,2011,Other Conservation Area,Other Conservation Area,GAP - Default,2018,1996,,943.478776,35110.679531
1,Other,Other,FED,Federal,USBR,Bureau of Reclamation,United States Bureau of Reclamation,STAT,State,SPR,State Park and Recreation,Idaho Department of Parks and Recreation,SP,State Park,State Park,Lake Cascade State Park,Lake Cascade State Park - Boulder Creek Campgr...,ID,Idaho,USGS_PADUS1_4Designation_USBR_PADUS1_3,IDPR_Managed_Land.shp,2006/03/15,18,,0,OA,Open Access,IDFG_PAD-US v.1.3.gdb,2016,3,3 - managed for multiple uses - subject to ext...,GAP,2011,Other Conservation Area,Other Conservation Area,GAP - Default,2018,1996,,1425.473005,73292.65313


In [None]:
poly_df.head()

Unnamed: 0,type,coordinates,name
0,Polygon,"[[(-116.05661287904096, 44.49500048372109), (-...",Idaho
1,Polygon,"[[(-116.09627998994497, 44.68758914187365), (-...",Idaho
2,Polygon,"[[(-116.11285006895555, 44.68253236619257), (-...",Idaho
3,Polygon,"[[(-116.05527337802039, 44.486643740358176), (...",Idaho
4,Polygon,"[[(-116.05674638184047, 44.526374244612384), (...",Idaho


In [None]:
bbox_df.head()

Unnamed: 0,name,bbox
0,Idaho,"[-116.05761007614535, 44.49301757309738, -116...."
1,Idaho,"[-116.10057304403412, 44.68397039672821, -116...."
2,Idaho,"[-116.11541553423595, 44.681168688617745, -116..."
3,Idaho,"[-116.05619337402062, 44.48639687833194, -116...."
4,Idaho,"[-116.0596684958905, 44.52476859867801, -116.0..."


In [None]:
"""
Returns:
  (longitude, latitude) of bbox centroid
"""
def calc_bbox_centroid(bbox):
  return np.mean([bbox[0], bbox[2]]), np.mean([bbox[1], bbox[3]])

"""
Returns:
  north-south span of bbox
"""
def north_south_span(bbox):
  return abs(bbox[1] - bbox[3])

"""
Returns:
  east-west span of bbox
"""
def east_west_span(bbox):
  return abs(bbox[0] - bbox[2])

In [None]:
bbox_df['centroid'] = bbox_df.bbox.apply(calc_bbox_centroid)
bbox_df['ns_span'] = bbox_df.bbox.apply(north_south_span)
bbox_df['ew_span'] = bbox_df.bbox.apply(east_west_span)
bbox_df.head()

Unnamed: 0,name,bbox,centroid,ns_span,ew_span
0,Idaho,"[-116.05761007614535, 44.49301757309738, -116....","(-116.05574579513754, 44.49406382148888)",0.002092,0.003729
1,Idaho,"[-116.10057304403412, 44.68397039672821, -116....","(-116.09842651698955, 44.685783486279206)",0.003626,0.004293
2,Idaho,"[-116.11541553423595, 44.681168688617745, -116...","(-116.1138376304353, 44.683066523426575)",0.003796,0.003156
3,Idaho,"[-116.05619337402062, 44.48639687833194, -116....","(-116.05564615255929, 44.487266535949104)",0.001739,0.001094
4,Idaho,"[-116.0596684958905, 44.52476859867801, -116.0...","(-116.0572376830003, 44.52617750489104)",0.002818,0.004862


# Master Array
The master array is a two-dimensional NumPy array representing the United States. It is a matrix of latitude and longitude coordinates. There are 10,000 rows, with each row corresponding to a continually incremented latitude between 25.0 and 50.0 degrees. There are 20,000 columns, with each column corresponding to a continually incremented longitude between -126.0 and -66.5 degrees. The result is a matrix of 200,000,000 elements, where each element represents a geographic point in the U.S. This resolution results in approximately 700 feet between points-this are the minimum required distance between medium-large turbines.

The master array is created with every element initialized to 0. The analysis iterates over the tabulated disqualifying areas, the existing turbines, cities, and protected areas to determine the regions where turbine placement is feasible. Their minimum buffers-finds the corresponding coordinates in the master array and change those elements from 0 to 1. This results in the disqualifying area and its minimum buffer all set to 1 in the master array. Once all disqualifying areas have been labeled with 1s in the master array, the analysis tabulates the indices of all the remaining 0s. Where they are converted from indices to latitude-longitude coordinates, these coordinates represent the areas where wind turbines can be placed.

These valid locations from the master array are joined with the wind speed data. This join gives the average annual wind speed at each data point where wind turbine construction is feasible. The analysis sorts the dataset to find the yearly top-n wind speeds at valid locations to identify the optimal placements for future wind turbines.

This process uses exclusively numeric NumPy arrays. As a result, broadcasting and vectorization make this version of the analysis significantly faster, more accurate, and at a higher resolution than similar analyses using GeoPandas or other spatial data libraries.

The pseudocode to complete such an analysis is shown in Figure 2 below. While the actual analysis code addresses some particulars of the datasets, it is conceptually identical to the pseudocode.


In [None]:
-120 -110 -100 -90 -80
      0 0 0 0 0 0     50
      0 0 0 0 0 0     40
      0 0 0 1 1 1     30
      0 0 0 1 1 1     20
      0 0 0 1 1 1     10
      0 0 0 0 0 0     0

master_latitudes[2] = 'the actual latitude for the 3rd row of master'

In [None]:
master_latitudes = np.linspace(25.0, 50.0, num=10000)
master_longitudes = np.linspace(-126, -66.5, num=20000)

master_shape = (10000, 20000)
master = np.zeros(master_shape)

# Reduce Master with turbines (with buffer), cities (with buffer), and PADs

## Step 1. Base Map and Master Array
The green areas in the base map correspond to 0s in the master array. These areas are where a turbine could be placed. The entire map is green, and all array entries are 0 initially because the analysis hasn't disqualified any areas yet. It is important to remember that the map is just a cartographic representation of the array, and, similarly, the array is a numerical representation of the map.

## Step 2. Identify the disqualified areas
The red areas are a subset of the protected areas dataset, and the highlighted segments in the array are their approximate corresponding array locations. In the real analysis, all existing turbines, cities, protected areas, and buffers will be identified and removed from the base map/master array.

## Step 3. Subtract the disqualified areas
The cartographic representation will remove these areas from the green part of the map by turning them white (the yellow-green comes into play later-please ignore that aspect in the graphic below). The numerical representation will change the value of the corresponding elements from 0 to 1.




## Helper functions

In [None]:
# latitude between points in master is 0.0025 degrees
def find_nearest_master_lat_index(latitude):
  upper_bound = 50.0
  lower_bound = 25.0
  num_steps = 10000
  step_size = abs(upper_bound - lower_bound) / num_steps
  nearest_master_index = int(np.rint((latitude - lower_bound) / step_size))
  return nearest_master_index

# longitude between points in master is 0.002975 degrees
def find_nearest_master_lon_index(longitude):
  upper_bound = -66.5
  lower_bound = -126
  num_steps = 20000
  step_size = abs(upper_bound - lower_bound) / num_steps
  nearest_master_index = int(np.rint((longitude - lower_bound) / step_size))
  return nearest_master_index

"""
Returns:
  miles converted to degrees latitude

Note:
  1 degree of latitude is approximately 69 miles
"""
def miles_to_latitude(miles):
  return miles / 69

"""
Returns:
  miles converted to degrees longitude
  
Note:
  1 degree of longitude is approximately 54.6 miles
"""
def miles_to_longitude(miles):
  return miles / 54.6

## Reducer Areas

In [None]:
turbine_df = get_USWTDB_data()
cities_df = get_cities_data()

### Turbines

In [None]:
# modifies master by turning all turbine indices into 1s
def turbine_indices():
  turbine_placement_cols = ['master_lat_index', 'master_lon_index', 'ew_buffer_points', 'ns_buffer_points']
  turbine_indices_w_buffers = turbine_df[turbine_placement_cols].to_numpy()
  for tiwb in turbine_indices_w_buffers:
    lat_i, lon_i = tiwb[0], tiwb[1]
    lat_dist, lon_dist = tiwb[2], tiwb[3]
    master[lat_i - lat_dist:lat_i + lat_dist + 1, lon_i - lon_dist:lon_i + lon_dist + 1] = 1
  return

In [None]:
turbine_df['master_lat_index'] = turbine_df['latitude'].apply(find_nearest_master_lat_index)
turbine_df['master_lon_index'] = turbine_df['longitude'].apply(find_nearest_master_lon_index)

turbine_df['buffer_radius_lat'] = turbine_df.buffer_radius.apply(miles_to_latitude)
turbine_df['buffer_radius_lon'] = turbine_df.buffer_radius.apply(miles_to_longitude)

turbine_df['ew_buffer_points'] = (turbine_df.buffer_radius_lat / LAT_BTW_POINTS_MASTER).apply(np.rint).astype(int)
turbine_df['ns_buffer_points'] = (turbine_df.buffer_radius_lon / LON_BTW_POINTS_MASTER).apply(np.rint).astype(int)

In [None]:
turbine_indices()
# MASTER IS NOW UPDATED WITH TURBINES

In [None]:
#check that sum > 0
master.sum()

353271.0

### Cities

In [None]:
# modifies master by turning all city indices into 1s
def cities_indices():
  city_placement_cols = ['master_lat_index', 'master_lon_index', 'ew_buffer_points', 'ns_buffer_points']
  city_indices_w_buffers = cities_df[city_placement_cols].to_numpy()
  for ciwb in city_indices_w_buffers:
    lat_i, lon_i = ciwb[0], ciwb[1]
    lat_dist, lon_dist = ciwb[2], ciwb[3]
    master[lat_i - lat_dist:lat_i + lat_dist + 1, lon_i - lon_dist:lon_i + lon_dist + 1] = 1
  return

In [None]:
cities_df['master_lat_index'] = cities_df['latitude'].apply(find_nearest_master_lat_index)
cities_df['master_lon_index'] = cities_df['longitude'].apply(find_nearest_master_lon_index)

cities_df['buffer_radius_lat'] = cities_df.POP_CLASS.apply(miles_to_latitude)
cities_df['buffer_radius_lon'] = cities_df.POP_CLASS.apply(miles_to_longitude)

cities_df['ew_buffer_points'] = (cities_df.buffer_radius_lat / LAT_BTW_POINTS_MASTER).apply(np.rint).astype(int)
cities_df['ns_buffer_points'] = (cities_df.buffer_radius_lon / LON_BTW_POINTS_MASTER).apply(np.rint).astype(int)

In [None]:
cities_indices()
# MASTER IS NOW UPDATED WITH CITIES

In [None]:
#check that sum > 353271.0
master.sum()

11401101.0

### PADS

In [None]:
# modifies master by turning all PADS indices into 1s
def pads_indices():
  pads_placement_cols = ['master_lat_index', 'master_lon_index', 'ew_buffer_points', 'ns_buffer_points']
  pads_indices_w_buffers = bbox_df[pads_placement_cols].to_numpy()
  for piwb in pads_indices_w_buffers:
    lat_i, lon_i = piwb[0], piwb[1]
    lat_dist, lon_dist = piwb[2], piwb[3]
    master[lat_i - lat_dist:lat_i + lat_dist + 1, lon_i - lon_dist:lon_i + lon_dist + 1] = 1
  return

In [None]:
bbox_df['centroid_lat'] = bbox_df.centroid.apply(lambda x: x[1])
bbox_df['centroid_lon'] = bbox_df.centroid.apply(lambda x: x[0])

bbox_df['master_lat_index'] = bbox_df['centroid_lat'].apply(find_nearest_master_lat_index)
bbox_df['master_lon_index'] = bbox_df['centroid_lon'].apply(find_nearest_master_lon_index)

bbox_df['ew_buffer_points'] = (0.5 * bbox_df.ew_span / LAT_BTW_POINTS_MASTER).apply(np.rint).astype(int)
bbox_df['ns_buffer_points'] = (0.5 * bbox_df.ns_span / LON_BTW_POINTS_MASTER).apply(np.rint).astype(int)

In [None]:
pads_indices()

In [None]:
#check that sum > 11401101.0
master.sum()

55603910.0

## Indices where master == 0 (areas where turbines could be placed)

In [None]:
lat_indices, lon_indices = np.where(master == 0)

# Wind Data
Like the preceding data sources, the NOAA data is exceptionally well-maintained and required little cleaning and manipulation. There are no null or missing values. Unlike the others, the wind speed data are formatted as netCDF. NetCDF files are standard for multidimensional and geographic data and are optimized for many-layered array-type data. The wind speed file we used contains NOAA’s wind speed data from 2020.

NOAA takes many wind speed readings across the globe each day. The file we used reports the daily averages for these readings from all 365 days in 2020. NOAA takes readings from latitudes -90 to 90 and longitudes 0 to 357.5, both in 2.5-degree increments, giving us 73 latitude points and 144 longitude points. Daily average wind speed readings are 73 x 144 = 10,512 coordinate pairings for each day in the timeframe. We used the year’s worth of data, which gave 10,512 * 365 = 3,836,880 wind speed readings. Additionally, wind speed is reported in U (East-West) and V (North-South) components. Therefore the data files contain 3,836,880 * 2 = 7,673,760 wind speed readings.

To extract the data from the files, we made heavy use of the netCDF4 library. We then used pandas and NumPy and created many new functions to arrange the data in meaningful ways and formats.

These functions culminate in mean_wind_speed_directory_table(), which extracts the data, combines all of the daily readings to determine the year’s U- and V-wind averages at each latitude-longitude interval, and arranges each of these into user-friendly DataFrames that are indexed by the geographic coordinates. The function eventually returns a dictionary with two keys: ‘U-wind’ and ‘V-wind,’ representing the corresponding wind component DataFrames.

We later combine the U- and V-components into a single magnitude by calculating their 2-norm. They are perpendicular vectors, and it is the magnitude of their combination which tells the story of true wind speed at a given location.



In [None]:
f_uwnd = netCDF4.Dataset(annual_wind_uwnd)
f_vwnd = netCDF4.Dataset(annual_wind_vwnd)

In [None]:
wind_lon = np.linspace(0, 357.5, num=144)
wind_lat = np.linspace(-90, 90, num=73)

### Wind Data Helper Functions

In [None]:
"""
Parameters:
    file - path to valid NetCDF4 wind speed file; designed for
           NOAA U-wind and V-wind files; default 'drive/MyDrive/data/Annual_Wind_speed/uwnd.sig995.2020.nc'

Returns:
    mean_speed - a n_lat x n_lon numpy array representing the
                 average wind speed for each (n_lat, n_lon)
                 combination over the duration of the data
                 contained in the file
"""
def mean_wind_speed(file='drive/MyDrive/data/Annual_Wind_speed/uwnd.sig995.2020.nc'):
    f_wind = netCDF4.Dataset(file)
    n_lat = f_wind.dimensions['lat'].size
    n_lon = f_wind.dimensions['lon'].size
    n_days = f_wind.dimensions['time'].size
    key = list(f_wind.variables)[3]

    mean_speed = np.zeros((n_lat, n_lon))
    for day in f_wind.variables[key][:]:
        for i in range(n_lat):
            for j in range(n_lon):
                mean_speed[i][j] += day[i][j]
    mean_speed = mean_speed / n_days
    return mean_speed

"""
Parameters:
    file - path to valid NetCDF4 wind speed file; designed for
           NOAA U-wind and V-wind files; default 'drive/MyDrive/data/Annual_Wind_speed/uwnd.sig995.2020.nc'

Returns:
    numpy array of latitude values from file
"""
def lat_array(file='drive/MyDrive/data/Annual_Wind_speed/uwnd.sig995.2020.nc'):
    f_wind = netCDF4.Dataset(file)
    return f_wind.variables['lat'][:].data

"""
Parameters:
    file - path to valid NetCDF4 wind speed file; designed for
           NOAA U-wind and V-wind files; default 'drive/MyDrive/data/Annual_Wind_speed/uwnd.sig995.2020.nc'

Returns:
    numpy array of longitude values from file
"""
def lon_array(file='drive/MyDrive/data/Annual_Wind_speed/uwnd.sig995.2020.nc'):
    f_wind = netCDF4.Dataset(file)
    return f_wind.variables['lon'][:].data

"""
Parameters:
    file - path to valid NetCDF4 wind speed file; designed for
           NOAA U-wind and V-wind files; default 'drive/MyDrive/data/Annual_Wind_speed/uwnd.sig995.2020.nc'

Returns:
    table - a pandas dataframe pivot table where the row indices are
            the file's latitudes and the column indices are the file's
            longitudes; the values represent the mean wind speed for
            the corresponding latitude and longitude indices
"""
def mean_wind_speed_table(file='drive/MyDrive/data/Annual_Wind_speed/uwnd.sig995.2020.nc'):
    f_wind = netCDF4.Dataset(file)
    mws = mean_wind_speed(file)
    lat_lon_speed = list()

    lon, lat = f_wind.variables['lon'][:].data, f_wind.variables['lat'][:].data
    n_lat, n_lon = f_wind.dimensions['lat'].size, f_wind.dimensions['lon'].size

    for i in range(n_lat):
        for j in range(n_lon):
            d = dict()
            d['lat'] = lat[i]
            d['lon'] = lon[j]
            d['mean_speed'] = mws[i][j]
            lat_lon_speed.append(d)

    df = pd.DataFrame(lat_lon_speed)
    table = pd.pivot_table(df, index='lat', columns='lon')
    return table

"""
Parameters:
    dir_path - path to directory of netcdf4 wind speed files;
               default 'drive/MyDrive/data/Annual_Wind_speed/'

Returns:
    Python list containing two elements; the first element is
    a list of the U-wind component files in dir_path, and the
    second element is a list of the V-wind component files in
    dir_path
"""
def u_and_v_files(dir_path='drive/MyDrive/data/Annual_Wind_speed/'):
    files = os.listdir(dir_path)
    if not dir_path.endswith('/'):
        dir_path += '/'
    u_files = [f'{dir_path}{file}' for file in files if 'uwnd' in file]
    v_files = [f'{dir_path}{file}' for file in files if 'vwnd' in file]
    return u_files, v_files

"""
Parameters:
    dir_path - path to directory of netcdf4 wind speed files;
               default 'drive/MyDrive/data/Annual_Wind_speed/'

Returns:
    n_lat - number of latitude dimensions in files in dir_path
    n_lon - number of longitude dimensions in files in dir_path
"""
def lat_lon_dims_directory(dir_path='drive/MyDrive/data/Annual_Wind_speed/'):
    u_files, v_files = u_and_v_files(dir_path)
    file = u_files[0] # could take any file from the bunch
    f_wind = netCDF4.Dataset(file)
    n_lat, n_lon = f_wind.dimensions['lat'].size, f_wind.dimensions['lon'].size
    return n_lat, n_lon

"""
Parameters:
    dir_path - path to directory of netcdf4 wind speed files;
               default 'drive/MyDrive/data/Annual_Wind_speed/'

Returns:
    lat - np array of latitude dimensions in files in dir_path
    lon - np array of longitude dimensions in files in dir_path
"""
def lat_lon_vals_directory(dir_path='drive/MyDrive/data/Annual_Wind_speed/'):
    u_files, v_files = u_and_v_files(dir_path)
    file = u_files[0] # could take any file from the bunch
    return lat_array(file), lon_array(file)

"""
Parameters:
    dir_path - path to directory of netcdf4 wind speed files;
               default 'drive/MyDrive/data/Annual_Wind_speed/'

Returns:
    u_dir_mean - a n_lat x n_lon numpy array representing the
                 average U-wind speed for each (n_lat, n_lon)
                 combination over the duration of the data
                 contained in the directory 'dir_path'
    v_dir_mean - a n_lat x n_lon numpy array representing the
                 average V-wind speed for each (n_lat, n_lon)
                 combination over the duration of the data
                 contained in the directory 'dir_path'
"""
def mean_wind_speed_directory(dir_path='drive/MyDrive/data/Annual_Wind_speed/'):
    u_files, v_files = u_and_v_files(dir_path)
    u_means = [mean_wind_speed(u_file) for u_file in u_files]
    v_means = [mean_wind_speed(v_file) for v_file in v_files]
    u_dir_mean, v_dir_mean = np.mean(u_means, axis=0), np.mean(v_means, axis=0)
    return u_dir_mean, v_dir_mean

"""
Parameters:
    dir_path - path to directory of netcdf4 wind speed files;
               default 'drive/MyDrive/data/Annual_Wind_speed/'

Returns:
    mean_wind_speed_dict - Python dictionary consisting of two entries:
                           'U-wind' and 'V-wind';
                            Each value is a pandas dataframe pivot table
                            where row indices are latitudes and column
                            indices are longitudes; the values represent
                            the mean wind speed for the corresponding
                            latitude and longitude indices and key
                            direction
"""
def mean_wind_speed_directory_table(mws_directory=None, dir_path='drive/MyDrive/data/Annual_Wind_speed/'):
    mean_wind_speed_dict = dict()
    if mws_directory:
        u_dir_mean, v_dir_mean = mws_directory
    else:
        u_dir_mean, v_dir_mean = mean_wind_speed_directory(dir_path)
    n_lat, n_lon = lat_lon_dims_directory(dir_path)

    u_lat_lon_speed, v_lat_lon_speed = list(), list()
    lat, lon = lat_lon_vals_directory(dir_path)
    for i in range(n_lat):
        for j in range(n_lon):
            u_dict, v_dict = dict(), dict()
            u_dict['lat'], v_dict['lat'] = lat[i], lat[i]
            u_dict['lon'], v_dict['lon'] = lon[j], lon[j]
            u_dict['mean_speed'], v_dict['mean_speed'] = u_dir_mean[i][j], v_dir_mean[i][j]
            u_lat_lon_speed.append(u_dict)
            v_lat_lon_speed.append(v_dict)

    u_df = pd.DataFrame(u_lat_lon_speed)
    v_df = pd.DataFrame(v_lat_lon_speed)
    u_table = pd.pivot_table(u_df, index='lat', columns='lon')
    v_table = pd.pivot_table(v_df, index='lat', columns='lon')
    mean_wind_speed_dict['U-wind'] = u_table
    mean_wind_speed_dict['V-wind'] = v_table

    return mean_wind_speed_dict

## Wind Data (cont.)

In [None]:
mws_dict = mean_wind_speed_directory_table()

In [None]:
mws_dict['U-wind'].head()

Unnamed: 0_level_0,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed
lon,0.0,2.5,5.0,7.5,10.0,12.5,15.0,17.5,20.0,22.5,25.0,27.5,30.0,32.5,35.0,37.5,40.0,42.5,45.0,47.5,50.0,52.5,55.0,57.5,60.0,62.5,65.0,67.5,70.0,72.5,75.0,77.5,80.0,82.5,85.0,87.5,90.0,92.5,95.0,97.5,...,260.0,262.5,265.0,267.5,270.0,272.5,275.0,277.5,280.0,282.5,285.0,287.5,290.0,292.5,295.0,297.5,300.0,302.5,305.0,307.5,310.0,312.5,315.0,317.5,320.0,322.5,325.0,327.5,330.0,332.5,335.0,337.5,340.0,342.5,345.0,347.5,350.0,352.5,355.0,357.5
lat,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2,Unnamed: 23_level_2,Unnamed: 24_level_2,Unnamed: 25_level_2,Unnamed: 26_level_2,Unnamed: 27_level_2,Unnamed: 28_level_2,Unnamed: 29_level_2,Unnamed: 30_level_2,Unnamed: 31_level_2,Unnamed: 32_level_2,Unnamed: 33_level_2,Unnamed: 34_level_2,Unnamed: 35_level_2,Unnamed: 36_level_2,Unnamed: 37_level_2,Unnamed: 38_level_2,Unnamed: 39_level_2,Unnamed: 40_level_2,Unnamed: 41_level_2,Unnamed: 42_level_2,Unnamed: 43_level_2,Unnamed: 44_level_2,Unnamed: 45_level_2,Unnamed: 46_level_2,Unnamed: 47_level_2,Unnamed: 48_level_2,Unnamed: 49_level_2,Unnamed: 50_level_2,Unnamed: 51_level_2,Unnamed: 52_level_2,Unnamed: 53_level_2,Unnamed: 54_level_2,Unnamed: 55_level_2,Unnamed: 56_level_2,Unnamed: 57_level_2,Unnamed: 58_level_2,Unnamed: 59_level_2,Unnamed: 60_level_2,Unnamed: 61_level_2,Unnamed: 62_level_2,Unnamed: 63_level_2,Unnamed: 64_level_2,Unnamed: 65_level_2,Unnamed: 66_level_2,Unnamed: 67_level_2,Unnamed: 68_level_2,Unnamed: 69_level_2,Unnamed: 70_level_2,Unnamed: 71_level_2,Unnamed: 72_level_2,Unnamed: 73_level_2,Unnamed: 74_level_2,Unnamed: 75_level_2,Unnamed: 76_level_2,Unnamed: 77_level_2,Unnamed: 78_level_2,Unnamed: 79_level_2,Unnamed: 80_level_2,Unnamed: 81_level_2
-90.0,-0.588238,-0.46014,-0.331055,-0.202534,-0.072112,0.058452,0.187536,0.317607,0.445564,0.574156,0.702466,0.826973,0.950987,1.072888,1.193663,1.310987,1.427818,1.539578,1.648452,1.754508,1.860212,1.959931,2.055001,2.148029,2.235494,2.320987,2.399719,2.473733,2.543733,2.608804,2.669438,2.725142,2.775212,2.817818,2.858381,2.893804,2.921128,2.943663,2.960776,2.974015,...,-2.77521,-2.817816,-2.858379,-2.893802,-2.921126,-2.943661,-2.960774,-2.974013,-2.978379,-2.978027,-2.973943,-2.962464,-2.944929,-2.923661,-2.897041,-2.862605,-2.82514,-2.778591,-2.731126,-2.674788,-2.615562,-2.551196,-2.480915,-2.407957,-2.327957,-2.244788,-2.158168,-2.065562,-1.968661,-1.870633,-1.769506,-1.661407,-1.55176,-1.439084,-1.324929,-1.205703,-1.08676,-0.965351,-0.840703,-0.715562
-87.5,-2.103591,-1.63845,-1.179788,-0.724506,-0.274084,0.169297,0.607959,1.042184,1.471973,1.896339,2.314156,2.727114,3.134156,3.533452,3.925776,4.307114,4.674578,5.027043,5.362114,5.676762,5.963804,6.223804,6.452114,6.647607,6.804297,6.918874,6.991128,7.017184,6.999015,6.932959,6.819297,6.661339,6.459578,6.213593,5.931691,5.612677,5.264578,4.889931,4.494226,4.08486,...,-5.338168,-5.323098,-5.337675,-5.3819,-5.458168,-5.562253,-5.700492,-5.865633,-6.057393,-6.269929,-6.497253,-6.734365,-6.979013,-7.219999,-7.453591,-7.670774,-7.867886,-8.039506,-8.175703,-8.278098,-8.342675,-8.361126,-8.335422,-8.263872,-8.146196,-7.984999,-7.775844,-7.527816,-7.239013,-6.916267,-6.558661,-6.179013,-5.769576,-5.341126,-4.899084,-4.445492,-3.980844,-3.51183,-3.04176,-2.57183
-85.0,-3.013309,-2.558098,-2.147253,-1.777112,-1.440562,-1.127816,-0.830915,-0.540774,-0.248731,0.052747,0.369719,0.707677,1.074367,1.46648,1.888663,2.340846,2.827466,3.341128,3.884508,4.4531,5.043945,5.652466,6.273663,6.89824,7.516973,8.12479,8.704649,9.248452,9.744297,10.17824,10.545001,10.826128,11.02317,11.124156,11.131973,11.042043,10.857747,10.587959,10.235846,9.811621,...,-3.627041,-2.805422,-2.112323,-1.574929,-1.210351,-1.029788,-1.037605,-1.222816,-1.574084,-2.068872,-2.683027,-3.387253,-4.144506,-4.93683,-5.7269,-6.49514,-7.222605,-7.888027,-8.486196,-9.004506,-9.438238,-9.786055,-10.043802,-10.210069,-10.285774,-10.269154,-10.16183,-9.96352,-9.677252,-9.307957,-8.860914,-8.349506,-7.782253,-7.174436,-6.543309,-5.901689,-5.263591,-4.646267,-4.061619,-3.515562
-82.5,-3.958238,-3.494576,-3.109788,-2.780281,-2.478379,-2.176267,-1.858872,-1.515562,-1.146337,-0.764224,-0.383379,-0.021337,0.299226,0.562959,0.757677,0.878311,0.927818,0.919156,0.864931,0.792466,0.724367,0.687043,0.714015,0.822043,1.034156,1.357114,1.799719,2.354156,3.006691,3.737888,4.517114,5.317747,6.112395,6.869719,7.56648,8.184578,8.710705,9.141057,9.468593,9.697959,...,-0.346267,1.070001,2.326198,3.345705,4.058452,4.430494,4.444367,4.120635,3.50486,2.658733,1.667184,0.607184,-0.43676,-1.40521,-2.249295,-2.949506,-3.50852,-3.951055,-4.313591,-4.64176,-4.989506,-5.39014,-5.873238,-6.445844,-7.094436,-7.784576,-8.468872,-9.088309,-9.587112,-9.910985,-10.023098,-9.908943,-9.570633,-9.03514,-8.348168,-7.561055,-6.734576,-5.925422,-5.175281,-4.517393
-80.0,-3.79521,-3.911126,-4.13683,-4.367041,-4.511126,-4.500774,-4.307816,-3.936548,-3.423661,-2.818238,-2.182393,-1.568802,-1.02852,-0.596267,-0.29183,-0.130422,-0.121055,-0.2619,-0.558027,-0.995914,-1.566478,-2.235633,-2.964436,-3.693872,-4.356619,-4.885069,-5.210774,-5.287745,-5.085351,-4.604929,-3.875562,-2.942534,-1.872957,-0.724717,0.441691,1.57986,2.659931,3.663452,4.581339,5.408029,...,-2.363591,-1.052886,0.309719,1.582114,2.616057,3.304015,3.574015,3.414719,2.867395,2.029438,1.027888,-0.005422,-0.941337,-1.673098,-2.138661,-2.312886,-2.218802,-1.911478,-1.471619,-1.009436,-0.636689,-0.464013,-0.582393,-1.049436,-1.876407,-3.020844,-4.389717,-5.844084,-7.224084,-8.367182,-9.137041,-9.454365,-9.299295,-8.726689,-7.848238,-6.811407,-5.778943,-4.890351,-4.244295,-3.885915


In [None]:
mws_dict['V-wind'].head()

Unnamed: 0_level_0,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed,mean_speed
lon,0.0,2.5,5.0,7.5,10.0,12.5,15.0,17.5,20.0,22.5,25.0,27.5,30.0,32.5,35.0,37.5,40.0,42.5,45.0,47.5,50.0,52.5,55.0,57.5,60.0,62.5,65.0,67.5,70.0,72.5,75.0,77.5,80.0,82.5,85.0,87.5,90.0,92.5,95.0,97.5,...,260.0,262.5,265.0,267.5,270.0,272.5,275.0,277.5,280.0,282.5,285.0,287.5,290.0,292.5,295.0,297.5,300.0,302.5,305.0,307.5,310.0,312.5,315.0,317.5,320.0,322.5,325.0,327.5,330.0,332.5,335.0,337.5,340.0,342.5,345.0,347.5,350.0,352.5,355.0,357.5
lat,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2,Unnamed: 23_level_2,Unnamed: 24_level_2,Unnamed: 25_level_2,Unnamed: 26_level_2,Unnamed: 27_level_2,Unnamed: 28_level_2,Unnamed: 29_level_2,Unnamed: 30_level_2,Unnamed: 31_level_2,Unnamed: 32_level_2,Unnamed: 33_level_2,Unnamed: 34_level_2,Unnamed: 35_level_2,Unnamed: 36_level_2,Unnamed: 37_level_2,Unnamed: 38_level_2,Unnamed: 39_level_2,Unnamed: 40_level_2,Unnamed: 41_level_2,Unnamed: 42_level_2,Unnamed: 43_level_2,Unnamed: 44_level_2,Unnamed: 45_level_2,Unnamed: 46_level_2,Unnamed: 47_level_2,Unnamed: 48_level_2,Unnamed: 49_level_2,Unnamed: 50_level_2,Unnamed: 51_level_2,Unnamed: 52_level_2,Unnamed: 53_level_2,Unnamed: 54_level_2,Unnamed: 55_level_2,Unnamed: 56_level_2,Unnamed: 57_level_2,Unnamed: 58_level_2,Unnamed: 59_level_2,Unnamed: 60_level_2,Unnamed: 61_level_2,Unnamed: 62_level_2,Unnamed: 63_level_2,Unnamed: 64_level_2,Unnamed: 65_level_2,Unnamed: 66_level_2,Unnamed: 67_level_2,Unnamed: 68_level_2,Unnamed: 69_level_2,Unnamed: 70_level_2,Unnamed: 71_level_2,Unnamed: 72_level_2,Unnamed: 73_level_2,Unnamed: 74_level_2,Unnamed: 75_level_2,Unnamed: 76_level_2,Unnamed: 77_level_2,Unnamed: 78_level_2,Unnamed: 79_level_2,Unnamed: 80_level_2,Unnamed: 81_level_2
-90.0,-2.921267,-2.943802,-2.960774,-2.974084,-2.97845,-2.978238,-2.974013,-2.962534,-2.944929,-2.923802,-2.89683,-2.862605,-2.82514,-2.77845,-2.731126,-2.674717,-2.615492,-2.551055,-2.480844,-2.407816,-2.327675,-2.244576,-2.158168,-2.065422,-1.968591,-1.870492,-1.769083,-1.661267,-1.551619,-1.438943,-1.324717,-1.205492,-1.086689,-0.965281,-0.840562,-0.715069,-0.588027,-0.459929,-0.330914,-0.202464,...,1.086691,0.965283,0.840564,0.715071,0.588029,0.459931,0.330917,0.202466,0.071691,-0.058661,-0.187886,-0.317745,-0.445844,-0.574295,-0.702534,-0.827112,-0.951196,-1.073238,-1.193661,-1.311055,-1.427957,-1.539717,-1.64852,-1.754788,-1.860281,-1.959999,-2.05521,-2.148309,-2.235774,-2.321126,-2.399788,-2.473872,-2.543943,-2.608872,-2.669436,-2.72521,-2.775281,-2.817957,-2.85845,-2.893802
-87.5,-5.485985,-5.534365,-5.560069,-5.566619,-5.552112,-5.522534,-5.472605,-5.406478,-5.324365,-5.223661,-5.104647,-4.967182,-4.809929,-4.633872,-4.435633,-4.214013,-3.969717,-3.702253,-3.411548,-3.099717,-2.762957,-2.408309,-2.035069,-1.643027,-1.237816,-0.823098,-0.399647,0.026198,0.448874,0.868945,1.278522,1.673663,2.048593,2.403945,2.73324,3.033029,3.300635,3.534931,3.73479,3.89979,...,4.896128,4.911409,4.932959,4.952466,4.961902,4.962747,4.942607,4.899085,4.826691,4.723663,4.582466,4.4031,4.185212,3.925846,3.626339,3.287536,2.914719,2.508945,2.074508,1.617607,1.142043,0.65155,0.155001,-0.343309,-0.835422,-1.322675,-1.79352,-2.246689,-2.675281,-3.082112,-3.456126,-3.8019,-4.114224,-4.394576,-4.642182,-4.8569,-5.040985,-5.192957,-5.318168,-5.413872
-85.0,-6.456337,-6.47852,-6.50183,-6.529858,-6.567182,-6.609858,-6.661689,-6.715281,-6.764154,-6.806407,-6.831548,-6.838661,-6.819647,-6.772957,-6.696337,-6.587957,-6.4469,-6.27176,-6.065844,-5.827816,-5.5619,-5.265844,-4.938731,-4.58683,-4.205351,-3.798872,-3.368591,-2.912675,-2.436619,-1.946337,-1.442112,-0.931196,-0.41683,0.091621,0.591057,1.075987,1.541762,1.982395,2.399649,2.786198,...,0.630494,1.181691,1.805424,2.469086,3.147818,3.800424,4.397818,4.913663,5.320071,5.602114,5.746973,5.753381,5.620776,5.360916,4.985212,4.511762,3.95824,3.34324,2.685987,2.00148,1.304226,0.605494,-0.086971,-0.764788,-1.423731,-2.055914,-2.661619,-3.232464,-3.765985,-4.257886,-4.698379,-5.090915,-5.428168,-5.708731,-5.935914,-6.112605,-6.240985,-6.331196,-6.392041,-6.430774
-82.5,-7.092605,-7.227957,-7.394295,-7.58514,-7.783943,-7.967534,-8.107323,-8.172182,-8.137393,-7.989224,-7.714365,-7.3169,-6.810774,-6.21514,-5.553098,-4.859999,-4.161689,-3.491689,-2.873238,-2.326619,-1.865633,-1.497182,-1.220914,-1.025914,-0.901055,-0.828309,-0.789365,-0.760914,-0.72352,-0.665069,-0.568802,-0.433168,-0.252112,-0.025633,0.240283,0.543945,0.881902,1.250916,1.647607,2.073733,...,-2.802745,-1.599506,-0.289295,1.06148,2.371409,3.566198,4.570494,5.329578,5.801339,5.977184,5.868381,5.507395,4.952747,4.271902,3.53324,2.810283,2.155142,1.608593,1.189226,0.89479,0.705494,0.580635,0.478311,0.348029,0.144086,-0.164084,-0.59352,-1.149084,-1.810703,-2.544436,-3.312393,-4.06676,-4.767041,-5.375844,-5.875069,-6.258731,-6.53683,-6.728661,-6.86514,-6.977041
-80.0,-5.109647,-5.079436,-5.193098,-5.389647,-5.603802,-5.757182,-5.781407,-5.631478,-5.282323,-4.742605,-4.034576,-3.194576,-2.272323,-1.309576,-0.352957,0.56655,1.415283,2.16986,2.805353,3.306198,3.64648,3.816269,3.810564,3.627466,3.284719,2.816198,2.258804,1.670353,1.106902,0.613804,0.236409,-0.003027,-0.105633,-0.079084,0.052959,0.265283,0.536902,0.850071,1.193522,1.569297,...,-1.913872,-0.720914,0.572747,1.888663,3.134156,4.197466,4.986198,5.426339,5.48486,5.179719,4.570987,3.761902,2.869508,2.020353,1.317184,0.832747,0.603945,0.622114,0.844931,1.202536,1.60824,1.96817,2.197677,2.21979,1.975283,1.441691,0.622959,-0.438098,-1.667957,-2.962393,-4.206337,-5.286689,-6.113661,-6.632182,-6.827745,-6.74021,-6.443802,-6.039154,-5.626619,-5.296267


In [None]:
uw_array = mws_dict['U-wind'].to_numpy()
vw_array = mws_dict['V-wind'].to_numpy()

In [None]:
# magnitude of wind speed at each coordinate
wind_magnitude = np.sqrt(uw_array**2 + vw_array**2)

In [None]:
wind_magnitude

array([[2.97990318, 2.97954657, 2.97922443, ..., 2.97872171, 2.97951617,
        2.98095932],
       [5.87546795, 5.77180348, 5.68386058, ..., 6.26895124, 6.12659875,
        5.99369021],
       [7.12490829, 6.96527716, 6.84722474, ..., 7.85314196, 7.5733043 ,
        7.32898552],
       ...,
       [2.07246366, 2.04497007, 2.01448057, ..., 2.14788562, 2.12697954,
        2.10177781],
       [1.02470706, 1.05248221, 1.08039268, ..., 0.93408604, 0.96627716,
        0.99442566],
       [1.26783621, 1.26887061, 1.26900993, ..., 1.26925763, 1.26828608,
        1.26833946]])

# Match Wind Data with Reduced Master

## Step 4. Join wind data for valid areas
The analysis maps annual wind speed onto the valid turbine locations to identify the windiest valid areas. The cartographic representation encodes annual wind speed onto color with the green-yellow scale shown. The numerical representation in the analysis is a series of associative NumPy arrays, but shown below is a simplified, hypothetical DataFrame that demonstrates the data's connections.


## *** Saving and Loading Checkpoint ***

In [None]:
# need to save master_latitudes, master_longitudes, lat_indices, lon_indices, wind_magnitude
i = 0
for arr in [master_latitudes, master_longitudes, lat_indices, lon_indices, wind_magnitude]:
  names = ['master_latitudes', 'master_longitudes', 'lat_indices', 'lon_indices', 'wind_magnitude']
  with open(f'drive/MyDrive/data/{names[i]}.npy', 'wb') as f:
    np.save(f, arr)
  i += 1

In [None]:
# RESTART/CLEAR RUNTIME IN COLAB

In [None]:
# open necessary saved files
with open(f'drive/MyDrive/data/master_latitudes.npy', 'rb') as f:
    master_latitudes = np.load(f)

with open(f'drive/MyDrive/data/master_longitudes.npy', 'rb') as f:
    master_longitudes = np.load(f)

with open(f'drive/MyDrive/data/lat_indices.npy', 'rb') as f:
    lat_indices = np.load(f)

with open(f'drive/MyDrive/data/lon_indices.npy', 'rb') as f:
    lon_indices = np.load(f)

with open(f'drive/MyDrive/data/wind_magnitude.npy', 'rb') as f:
    wind_magnitude = np.load(f)

## Run transformation

In [None]:
latitudes_for_new_turbines = np.apply_along_axis(lambda x: master_latitudes[x], axis=0, arr=lat_indices)
longitudes_for_new_turbines = np.apply_along_axis(lambda x: master_longitudes[x], axis=0, arr=lon_indices)

## *** Saving and Loading Checkpoint ***

In [None]:
# SAVE HERE IF RAM CONSTRAINTS PROBLEMATIC
# need to save latitudes_for_new_turbines, longitudes_for_new_turbines
i = 0
for arr in [latitudes_for_new_turbines, longitudes_for_new_turbines]:
  names = ['latitudes_for_new_turbines', 'longitudes_for_new_turbines']
  with open(f'drive/MyDrive/data/{names[i]}.npy', 'wb') as f:
    np.save(f, arr)
  i += 1

## *** Saving and Loading Checkpoint ***

In [None]:
# open necessary saved files
with open(f'drive/MyDrive/data/latitudes_for_new_turbines.npy', 'rb') as f:
    latitudes_for_new_turbines = np.load(f)

with open(f'drive/MyDrive/data/longitudes_for_new_turbines.npy', 'rb') as f:
    longitudes_for_new_turbines = np.load(f)

with open(f'drive/MyDrive/data/wind_magnitude.npy', 'rb') as f:
    wind_magnitude = np.load(f)

## Create final dataframe of [lat, lon, wind speed]

In [None]:
def find_nearest_wind_lat_index(latitude):
  upper_bound = 90.0
  lower_bound = -90.0
  num_steps = 73
  step_size = 2.5
  nearest_wind_index = int(np.rint((latitude - lower_bound) / step_size))
  return nearest_wind_index

def find_nearest_wind_lon_index(longitude):
  upper_bound = 357.5
  lower_bound = 0.0
  num_steps = 144
  step_size = abs(upper_bound - lower_bound) / num_steps
  nearest_wind_index = int(np.rint((longitude - lower_bound) / step_size))
  return nearest_wind_index

def wind_speed_from_latlon_array(arr):
  lat, lon = arr[0], arr[1]
  return wind_magnitude[find_nearest_wind_lat_index(lat), find_nearest_wind_lon_index(lon)]

In [None]:
lat_lon_wind_speed_df = pd.DataFrame(data={'latitude': latitudes_for_new_turbines, 'longitude': longitudes_for_new_turbines})
lat_lon_wind_speed_df.head()

Unnamed: 0,latitude,longitude
0,25.0,-126.0
1,25.0,-125.997025
2,25.0,-125.99405
3,25.0,-125.991075
4,25.0,-125.988099


In [None]:
lat_lon_wind_speed_df['wind_magnitude'] = lat_lon_wind_speed_df.apply(wind_speed_from_latlon_array, axis=1, raw=True)

In [None]:
lat_lon_wind_speed_df.head()

# Save lat_lon_wind_speed_df to CSV
## Result
The real analysis followed the pattern shown in the example, which reduced the continental U.S. to areas where turbine construction is possible and then mapped annual wind speed data onto those remaining areas. The analysis then identified the areas with the highest annual wind speeds, which are 
the most optimal areas for new turbine placement. 


In [None]:
lat_lon_wind_speed_df.to_csv('drive/MyDrive/data/lat_lon_wind_speed.csv', index=False)

# Visualizations
Pandas and Matplotlib to create the final visualizations. Utilizing the Pandas sample function to reduce our scatter plots' resolution and limit coordinates to the continental United States. The benefit to this is that we can process just a fraction of our total data and get the same results when we calibrate for image size resolution.

In [None]:
lat_lon = 'source/lat_lon_wind_speed/lat_lon_wind_speed_REDUCED_RESOLUTION.csv'

lat_lon_df = pd.read_csv(lat_lon)
boundaries_sf = gpd.read_file(boundaries_shp)

In [None]:
from shapely.geometry import Point
from shapely.geometry.polygon import Polygon

boundary_pass_states = set(['Puerto Rico', 'Alaska', 'Hawaii'])

# Load and create basemap.
def load_states(pass_states):
    states_sf = shapefile.Reader(boundaries_shp)
    fig = plt.figure() 
    count = 0
    items = list()
    states = []

    for record in states_sf.records():
        attributes = record.as_dict()
        if attributes['NAME'] not in pass_states:
            poly = states_sf.shape(count).__geo_interface__
            ax = fig.gca() 
            ax.add_patch(PolygonPatch(poly, fc='None', ec='#000000', alpha=1, zorder=2 ))
            ax.axis('scaled')
            states.append(attributes['NAME'])
            items.append(poly)
        count += 1
    
    states_df = pd.DataFrame(items)
    states_df['name'] = states
    return plt, states_df

# Filter points outside of the United States and remove undesired states.
def filter_points(sf, df, pass_states):
    sf = gpd.read_file(boundaries_shp)
    sf_cleaned = sf[~sf['NAME'].isin(pass_states)]
    poly = sf_cleaned['geometry'].geometry.unary_union
    poly = list(poly)[1]
    x, y = poly.exterior.xy
    poly_list = Polygon([(x[i], y[i]) for i in range(0, len(y))])

    df['bounds'] = df.apply(lambda x: Point(x['longitude'], x['latitude']).within(poly_list), axis=1)
    df = df.loc[df.bounds, :]
    return df.drop(columns=['bounds'])

In [None]:
# Be VERY careful with this, I have reduced the sample size to provide better performance, 
# but to get the full expierence 10 million through 50 million are are good clear sample sizes.
# If you choose to change this number ber aware that it could be a while berfore it is done.
values = filter_points(boundaries_shp, lat_lon_df.sample(10000), boundary_pass_states)
values.head()

In [None]:
# Create simple map
states_plt, states_df = load_states(boundary_pass_states)
states_plt.scatter(values['longitude'], values['latitude'], s=0.2)
states_plt.rcParams['figure.figsize'] = [30, 20]
states_plt.show()

## Potential Locations for Turbines
After processing over 100,000 rows of data and sectioning out areas outside of protected areas, established wind turbines within a specified range, we are returned 50,000,000 rows and three columns of data as potential wind turbine locations. When converted into an image, these rows equate to a resolution of 10,000 by 5,000. 

### How did we generate this plot?
Creating a unary union of the 48 states dissolves the state boundaries, giving us a vision of the 48 continental states. Converting this list of boundaries into the exterior removes any potential artifacts from any states failing to connect accurately. And by creating a tupled coordinate list and passing it into Shapely’s Polygon function. Iterating through our coordinates list allows us to create a Point with the Shapely library and check to determine if the latitude and longitude are within the bounding borders. These points are logged into a DataFrame. From this DataFrame, we can visualize the points by plotting a simple state outline and using a scatter plot. This function allowed us to create this plot with only points inside the United States boundaries. We have the ability to set the color based on our wind magnitude, and this gives us the visualization to the right.




In [None]:
# Create gradient map
states_plt, states_df = load_states(boundary_pass_states)
states_plt.scatter(values['longitude'], values['latitude'], c=values['wind_magnitude'], cmap='summer', s=0.2)
states_plt.rcParams['figure.figsize'] = [30, 20]
states_plt.show()

## Productivity in Montana
### Highest Average Wind Speeds in the United States
Zooming into Montana, we can see that a large portion of the state consists of the brightest yellow, meaning this location has the highest average wind speeds in the United States at about 8.5 MPH on average. Interestingly, this block of high wind speeds is, for the most part, cut out of the analysis because of currently established wind turbines. Luckily for us, in the Northern end of these high wind values is just over 3.5 million acres of mostly rural land prime for the establishment of wind turbines. We are analyzing the Montana 30-Meter Residential-Scale Wind Resource Map from the Office of Energy Efficiency and Renewable Energy, coming directly out the Rocky mountains' valleys in high gusts of winds pushing through rivers and riverbeds and over the top remaining hills. For the lower elevation points, there are very insignificant amounts of wind. 


In [None]:
# Plot Montana
focus_pass_states = set(['Puerto Rico', 'Alaska', 'Hawaii', 'Oregon', 'Washington', 'Alabama', 'Arizona', 'Arkansas', 'California', 'Colorado', 'Connecticut', 'Delaware', 'District of Columbia', 'Florida', 'Georgia', 'Hawaii', 'Illinois', 'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Maryland', 'Massachusetts', 'Michigan', 'Minnesota', 'Mississippi', 'Missouri', 'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey', 'New Mexico', 'New York', 'North Carolina', 'North Dakota', 'Ohio', 'Oklahoma', 'Pennsylvania', 'Rhode Island', 'South Carolina', 'South Dakota', 'Tennessee', 'Texas', 'Utah', 'Vermont', 'Virginia', 'West Virginia', 'Wisconsin'])

states_plt, states_df = load_states(focus_pass_states)
states_plt.scatter(values['longitude'], values['latitude'], c=values['wind_magnitude'], cmap='summer', s=10)
states_plt.rcParams['figure.figsize'] = [30, 20]
states_plt.show()

## Additional High Performing Locations
### Texas and Florida
An enormous and sparsely populated swath of central and south Texas-spanning from Lubbock to the Rio Grande Valley-is among the windiest regions with turbine potential in the United States. Although some wind turbines and settlements exist in this area (shown as the small white squares in the yellowish region), the vast majority of the land is undeveloped and perfect for new wind turbines. The diverse landscape, which combines canyons and mountains with deserts, plains, and forests, has plentiful wind resources to offer. Similarly, the southern tip of Florida has great wind power potential in the area spanning from Miami across to the Gulf Coast. Winds from the Gulf and the Carribean swirl across the Sunshine State and out into the Atlantic Ocean, endowing the swampy southern areas with great wind resources.


In [None]:
# Plot Florida
focus2_pass_states = set(['Puerto Rico', 'Alaska', 'Arizona', 'Arkansas', 'Mississippi', 'Alabama', 'Georgia', 'South Carolina', 'California', 'Colorado', 'Connecticut', 'Delaware', 'District of Columbia', 'Hawaii', 'Idaho', 'Illinois', 'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Maryland', 'Massachusetts', 'Michigan', 'Minnesota', 'Missouri', 'Montana', 'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey', 'New Mexico', 'New York', 'North Carolina', 'North Dakota', 'Ohio', 'Oklahoma', 'Oregon', 'Pennsylvania', 'Rhode Island', 'South Dakota', 'Tennessee', 'Texas', 'Utah', 'Vermont', 'Virginia', 'Washington', 'West Virginia', 'Wisconsin', 'Wyoming'])

states_plt, states_df = load_states(focus2_pass_states)
states_plt.scatter(values['longitude'], values['latitude'], c=values['wind_magnitude'], cmap='summer', s=10)
states_plt.rcParams['figure.figsize'] = [30, 20]
states_plt.show()

In [None]:
# Plot Texas
focus4_pass_states = set(['Puerto Rico', 'Alabama', 'Alaska', 'Arizona', 'Arkansas', 'Wisconsin', 'Michigan', 'Illinois', 'Indiana', 'Ohio', 'Colorado', 'Connecticut', 'Delaware', 'District of Columbia', 'Florida', 'Georgia', 'Hawaii', 'Idaho', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Maryland', 'Massachusetts', 'Minnesota', 'Mississippi', 'Missouri', 'Montana', 'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey', 'New Mexico', 'New York', 'North Carolina', 'North Dakota', 'Oklahoma', 'Oregon', 'Pennsylvania', 'Rhode Island', 'South Carolina', 'South Dakota', 'Tennessee', 'California', 'Utah', 'Vermont', 'Virginia', 'Washington', 'West Virginia', 'Wyoming'])

states_plt, states_df = load_states(focus4_pass_states)
states_plt.scatter(values['longitude'], values['latitude'], c=values['wind_magnitude'], cmap='summer', s=10)
states_plt.rcParams['figure.figsize'] = [30, 20]
states_plt.show()

In [None]:
# Plot California
focus3_pass_states = set(['Puerto Rico', 'Alabama', 'Alaska', 'Arizona', 'Arkansas', 'Wisconsin', 'Michigan', 'Illinois', 'Indiana', 'Ohio', 'Colorado', 'Connecticut', 'Delaware', 'District of Columbia', 'Florida', 'Georgia', 'Hawaii', 'Idaho', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Maryland', 'Massachusetts', 'Minnesota', 'Mississippi', 'Missouri', 'Montana', 'Nebraska', 'New Hampshire', 'New Jersey', 'New Mexico', 'New York', 'North Carolina', 'North Dakota', 'Oklahoma', 'Oregon', 'Pennsylvania', 'Rhode Island', 'South Carolina', 'South Dakota', 'Tennessee', 'Texas', 'Utah', 'Vermont', 'Virginia', 'Washington', 'West Virginia', 'Wyoming'])

states_plt, states_df = load_states(focus3_pass_states)
states_plt.scatter(values['longitude'], values['latitude'], c=values['wind_magnitude'], cmap='summer', s=10)
states_plt.rcParams['figure.figsize'] = [30, 20]
states_plt.show()

# Conclusion

### Summary
Wind is a reliable but underutilized renewable energy source with great potential in the United States. This project took the first steps towards realizing this potential by identifying the country’s most optimal areas for new turbines. We found these areas through a systematic geographic analysis of the country, which excluded areas with existing turbines, cities, and designated protected areas. The remaining regions are visualized with a gradient color indicating mean annual wind speeds. We can see the most optimal locations for turbine construction by the brighter yellow color; the most notable states are Montana, central and south Texas, and southern Florida. 

The data used in this analysis come from reputable sources in the science and geographic communities, such as ESRI, U.S. National Oceanic and Atmospheric Administration (NOAA), and the United States Geological Survey (USGS). The datasets were combined across their geographic features using a method James coined aforementioned “master array.”

### Additional Insights
Beyond the optimal areas our analysis identifies, we discovered that much of the United States’ wind power potential lies offshore. While our study only considered land turbines, these offshore locations will undoubtedly play an essential role in wind power growth.

Slightly removed from the analysis, we also discovered that GIS is superior to programming-based approaches for spatial data science projects like this one. Although we created some useful techniques to accomplish this analysis, geographic information systems could produce a more comprehensive investigation of this subject.

### What Worked and What Didn’t Work
The project took several ideological iterations to run smoothly. The original intent to predominantly use geospatial Python libraries proved untenable due to the size of our datasets and suboptimal library conventions. The project also had graphical struggles, as our large datasets and relatively average computing resources struggled to render the detailed visualizations. Beyond these issues, the project ran smoothly. We developed the “master array,” which turned out to be a more optimal workaround for the geospatial library troubles. Although the master array did not alleviate our infrastructure issues for rendering visualizations, it gave us much greater control in determining the resolution of our analysis and results. Additionally, we were fortunate to work with well-maintained datasets that required little to no cleaning and manipulation.

### Next Steps
Identifying the areas feasible for wind turbine construction is the first of many steps needed for mass adoption of wind power. There are lingering questions about infrastructure and engineering and many discussions about turbine machinery and energy transmission and storage. There are financial questions of cost and who should put up the money for such ventures. There are environmental concerns-will turbines upset our wildlife? There are political questions-who gets turbines first? There are social questions-do we want wind turbines near our towns or in our counties? Most notably, there are layered, integrated questions, which concern several of these subject areas on top of additional geographical context.

There is much left to do before our society can comfortably scale wind power to its potential. This project has taken the first step in this journey towards a wind-powered sustainable future, and the authors hope for continued progress to that end.



