The global offshore wind turbine dataset was posted October 4, 2021 and provides "geocoded information on global offshore wind turbines (OWTs) derived from Sentinel-1 synthetic aperture radar (SAR) time-series images from 2015 to 2019. It identified 6,924 wind turbines comprising of more than 10 nations." This notebook extracts the data from the dataset's Shapefile and transforms it to create a racing bar chart. This script outputs an MP4 file, but a similar graphic was made using Flourish linked below.
Dataset: https://figshare.com/articles/dataset/Global_offshore_wind_farm_dataset/13280252/5
Flourish Graphic: https://public.flourish.studio/visualisation/7385349/

In [1]:
#requires installation of ffmpeg for bar chart output
import shapefile
import csv
import pandas as pd
import numpy as np
import bar_chart_race as bcr

path_to_shapefile = "C:\\Users\\mikeb\\OneDrive\Data Analyst\\DataSetsProjects\OffshoreWind\Global offshore wind turbines dataset_v1.3\\\GOWF_V1.3.shp"
path_to_csv = "C:\\Users\\mikeb\\OneDrive\\Data Analyst\\DataSetsProjects\\OffshoreWind\\OffshoreWind.csv"
path_to_mp4 = "C:\\Users\\mikeb\\OneDrive\\Data Analyst\\DataSetsProjects\\OffshoreWind\\OffshoreWind.mp4"

Download shapefiles from https://figshare.com/articles/dataset/Global_offshore_wind_farm_dataset/13280252/5

In [2]:
sf = shapefile.Reader(path_to_shapefile)

In [3]:
print( sf)

shapefile Reader
    6924 shapes (type 'POINT')
    6924 records (8 fields)


In [4]:
fields = sf.fields
print(fields)

[('DeletionFlag', 'C', 1, 0), ['centr_lat', 'F', 13, 11], ['centr_lon', 'F', 13, 11], ['continent', 'C', 50, 0], ['country', 'C', 50, 0], ['sea_area', 'C', 50, 0], ['occ_year', 'N', 10, 0], ['occ_month', 'N', 10, 0]]


Read the raw records in the shapefile metadata

In [5]:
print(sf.records())

[Record #0: [9.20576, 105.782, 'Asia', 'Vietnam', 'South China Sea', 2016, 3], Record #1: [9.20389, 105.783, 'Asia', 'Vietnam', 'South China Sea', 2016, 3], Record #2: [9.2066, 105.789, 'Asia', 'Vietnam', 'South China Sea', 2016, 2], Record #3: [9.20865, 105.788, 'Asia', 'Vietnam', 'South China Sea', 2016, 2], Record #4: [9.20774, 105.781, 'Asia', 'Vietnam', 'South China Sea', 2016, 1], Record #5: [9.20935, 105.796, 'Asia', 'Vietnam', 'South China Sea', 2016, 1], Record #6: [9.21071, 105.788, 'Asia', 'Vietnam', 'South China Sea', 2016, 4], Record #7: [9.21174, 105.78, 'Asia', 'Vietnam', 'South China Sea', 2016, 2], Record #8: [9.20978, 105.78, 'Asia', 'Vietnam', 'South China Sea', 2016, 2], Record #9: [9.2114, 105.795, 'Asia', 'Vietnam', 'South China Sea', 2015, 9], Record #10: [9.21348, 105.795, 'Asia', 'Vietnam', 'South China Sea', 2015, 9], Record #11: [9.21461, 105.802, 'Asia', 'Vietnam', 'South China Sea', 2015, 8], Record #12: [9.21253, 105.803, 'Asia', 'Vietnam', 'South China Se

Begin transforming the data to match desired format

In [6]:
header = ['latitude', 'longitude', 'continent', 'country','sea_area','year','month']
df = pd.DataFrame(sf.records(),columns = header)
df['month']=df['month'].apply(lambda x: '{0:0>2}'.format(x))
print(df)

      latitude  longitude continent  country         sea_area  year month
0      9.20576   105.7820      Asia  Vietnam  South China Sea  2016    03
1      9.20389   105.7830      Asia  Vietnam  South China Sea  2016    03
2      9.20660   105.7890      Asia  Vietnam  South China Sea  2016    02
3      9.20865   105.7880      Asia  Vietnam  South China Sea  2016    02
4      9.20774   105.7810      Asia  Vietnam  South China Sea  2016    01
...        ...        ...       ...      ...              ...   ...   ...
6919  65.65020    24.5631    Europe  Finland  Gulf of Bothnia  2015    01
6920  65.65390    24.5205    Europe  Finland  Gulf of Bothnia  2015    01
6921  65.65650    24.5044    Europe  Finland  Gulf of Bothnia  2015    01
6922  65.66130    24.5019    Europe  Finland  Gulf of Bothnia  2015    01
6923  65.66680    24.5044    Europe  Finland  Gulf of Bothnia  2015    01

[6924 rows x 7 columns]


In [7]:
df['yearmonth'] = df.year.map(str) + "-" + df.month.map(str)

In [8]:
df.insert(0, 'ID', df.index)

In [9]:
df

Unnamed: 0,ID,latitude,longitude,continent,country,sea_area,year,month,yearmonth
0,0,9.20576,105.7820,Asia,Vietnam,South China Sea,2016,03,2016-03
1,1,9.20389,105.7830,Asia,Vietnam,South China Sea,2016,03,2016-03
2,2,9.20660,105.7890,Asia,Vietnam,South China Sea,2016,02,2016-02
3,3,9.20865,105.7880,Asia,Vietnam,South China Sea,2016,02,2016-02
4,4,9.20774,105.7810,Asia,Vietnam,South China Sea,2016,01,2016-01
...,...,...,...,...,...,...,...,...,...
6919,6919,65.65020,24.5631,Europe,Finland,Gulf of Bothnia,2015,01,2015-01
6920,6920,65.65390,24.5205,Europe,Finland,Gulf of Bothnia,2015,01,2015-01
6921,6921,65.65650,24.5044,Europe,Finland,Gulf of Bothnia,2015,01,2015-01
6922,6922,65.66130,24.5019,Europe,Finland,Gulf of Bothnia,2015,01,2015-01


In [10]:
df = df.drop(columns = ['latitude','longitude','sea_area','year','month'])

In [11]:
df

Unnamed: 0,ID,continent,country,yearmonth
0,0,Asia,Vietnam,2016-03
1,1,Asia,Vietnam,2016-03
2,2,Asia,Vietnam,2016-02
3,3,Asia,Vietnam,2016-02
4,4,Asia,Vietnam,2016-01
...,...,...,...,...
6919,6919,Europe,Finland,2015-01
6920,6920,Europe,Finland,2015-01
6921,6921,Europe,Finland,2015-01
6922,6922,Europe,Finland,2015-01


Take the tranformed table and make a pivot table from the records aggregating them to count the total number of observed wind turbines per month.

In [12]:
table = df.pivot_table(values='ID', index=['yearmonth'], columns=['country'], aggfunc='count')

In [13]:
table

country,Belgium,China,Denmark,Finland,Germany,Ireland,Japan,Netherlands,South Korea,Spain,Sweden,United Kingdom,United States,Vietnam
yearmonth,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
2015-01,105.0,188.0,439.0,9.0,605.0,5.0,2.0,31.0,,,63.0,1140.0,,9.0
2015-02,3.0,14.0,5.0,,11.0,,,11.0,,,,32.0,,
2015-03,2.0,2.0,1.0,,20.0,1.0,,8.0,,,3.0,33.0,,2.0
2015-04,2.0,2.0,2.0,,19.0,,,10.0,,,1.0,25.0,,
2015-05,4.0,4.0,2.0,,24.0,,,5.0,,,1.0,13.0,,1.0
2015-06,5.0,4.0,4.0,,19.0,,,5.0,,,,14.0,,1.0
2015-07,9.0,8.0,2.0,,23.0,,,5.0,,,,23.0,,4.0
2015-08,2.0,5.0,4.0,,16.0,,,7.0,,,,10.0,,4.0
2015-09,6.0,6.0,4.0,,21.0,,,11.0,,,,12.0,,6.0
2015-10,3.0,9.0,7.0,,34.0,,,15.0,,,1.0,14.0,,2.0


The values in the table represent the total number of NEW wind turbines found in the month. I need the running total and use the "cumsum" method to add the totals up left to right, and then fill in the zeroes.

In [14]:
table=table.cumsum(axis=0).ffill(axis=0).fillna(0)

In [15]:
table

country,Belgium,China,Denmark,Finland,Germany,Ireland,Japan,Netherlands,South Korea,Spain,Sweden,United Kingdom,United States,Vietnam
yearmonth,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
2015-01,105.0,188.0,439.0,9.0,605.0,5.0,2.0,31.0,0.0,0.0,63.0,1140.0,0.0,9.0
2015-02,108.0,202.0,444.0,9.0,616.0,5.0,2.0,42.0,0.0,0.0,63.0,1172.0,0.0,9.0
2015-03,110.0,204.0,445.0,9.0,636.0,6.0,2.0,50.0,0.0,0.0,66.0,1205.0,0.0,11.0
2015-04,112.0,206.0,447.0,9.0,655.0,6.0,2.0,60.0,0.0,0.0,67.0,1230.0,0.0,11.0
2015-05,116.0,210.0,449.0,9.0,679.0,6.0,2.0,65.0,0.0,0.0,68.0,1243.0,0.0,12.0
2015-06,121.0,214.0,453.0,9.0,698.0,6.0,2.0,70.0,0.0,0.0,68.0,1257.0,0.0,13.0
2015-07,130.0,222.0,455.0,9.0,721.0,6.0,2.0,75.0,0.0,0.0,68.0,1280.0,0.0,17.0
2015-08,132.0,227.0,459.0,9.0,737.0,6.0,2.0,82.0,0.0,0.0,68.0,1290.0,0.0,21.0
2015-09,138.0,233.0,463.0,9.0,758.0,6.0,2.0,93.0,0.0,0.0,68.0,1302.0,0.0,27.0
2015-10,141.0,242.0,470.0,9.0,792.0,6.0,2.0,108.0,0.0,0.0,69.0,1316.0,0.0,29.0


In [16]:
table.to_csv(path_to_csv)

In [17]:
bcr.bar_chart_race(df = table,  
                   sort='desc',
                   title='Number of Global Offshore Wind Turbines 2015-2019',
                   dpi=600,
                   shared_fontdict={'family' : 'Arial', 'color' : '.1'},
                   filename = path_to_mp4)

  ax.set_yticklabels(self.df_values.columns)
  ax.set_xticklabels([max_val] * len(ax.get_xticks()))
