In [1]:
%load_ext autoreload
%autoreload 2

# North Sea Wind Farms

## Goal

The goal of this notebook is to collect (spatial) data and generate statistics on wind farms in the North Sea. We want information on:
1. Spatial status and changes
2. Temporal changes
3. Technical specifics on turbines (power, infrastructure incl. cabels, materials, supply chain)
5. Ownership
6. End-users

## Data sources

### Location and ownership

1. MapStand: installed and planned windfarms (wfs)
2. Global Offshore Wind Farm Database And Intelligence, 4COffshore ([source](https://www.4coffshore.com/windfarms/)). Holds information on location, ownership and production of wind farms. Is a commercial provider. Some data seem a bit messy (e.g. the categorisation into sea areas is not consistent. 
3. Wind Europe Public database ([source](https://windeurope.org/intelligence-platform/product/european-offshore-wind-farms-map-public/)). Holds information on location and ownership. Is a lobby organisation. Dataset is probably not complete. 
4. EMODnet Wind Turbines ([source](https://emodnet.ec.europa.eu/geonetwork/emodnet/eng/catalog.search#/search?any=EMODnet%20Human%20Activities,%20Wind%20Farms)). Norway data is lacking. Data is current and has been extensively verified up until January 1st this year, so this is a very useful resource.
5. Crown Estate offshore wind farm onwership tables ([source](https://www.thecrownestate.co.uk/en-gb/what-we-do/on-the-seabed/energy/offshore-wind-farm-ownership/#OFTOownership)). More data available at the Crown Estate open data portal ([source](https://opendata-thecrownestate.opendata.arcgis.com/search?groupIds=f0d0ec92da76434d9e91f2e4dcb3a99f)). And here for Scotland ([source](https://crown-estate-scotland-spatial-hub-coregis.hub.arcgis.com/search?tags=offshore%20wind)). Official UK registries. Data quality is presumed to be good.
6. NVE data from Norway ([source](https://nedlasting.nve.no/gis/)). Offical Norwegian registry. Data quality is presumed to be good.
7. Netherlands wind turbines from RIVM ([source](https://data.rivm.nl/meta/srv/dut/catalog.search#/search?resultType=details&sortBy=relevance&fast=index&_content_type=json&from=1&to=20&any=Windturbines%20-%20vermogen)). Data quality is good. Is point data instead of polygons. Info on individual turbines (height, power, location).
8. Denmark
9. Germany
10. Belgium

### Tenders and supply chain
1. Bloomberg Terminal (at office of FTM): contains information on the supply chain of wind turbine manufacturers. Some data might be outdated. 
2. Aleph (FTM data repository) contains all European Tenders

In [2]:
import pandas as pd
import geopandas as gpd
from requests import Request
import ast
import yaml
from owslib.wfs import WebFeatureService 
from dotenv import load_dotenv
import os
from hvplot import pandas
#import missingno as msno
import fiona

%matplotlib inline
pd.options.display.max_columns = 50
load_dotenv('../../.env')

ERROR 1: PROJ: proj_create_from_database: Open of /opt/conda/share/proj failed


True

In [3]:
# Set path

PATH_DATA = '../../drive/energy/renewables/'

# Get API_KEY

API_KEY = os.environ.get('MAPSTAND_API_KEY')

# Import config

with open('../../config/ontology.yaml') as yaml_file:
    config = yaml.safe_load(yaml_file)

cols_clean = config.get('infra_columns').get('wind')
values_clean = config.get('infra_values').get('wind')

## Import and clean EMODnet

In [4]:
# Import EMODnet

emod = gpd.read_file(f'{PATH_DATA}EMODnet_HA_WindFarms_20221219/EMODnet_HA_WindFarms_pg_20221219.shp')

# Filter North Sea countries

emod = emod[emod.COUNTRY.isin(['Denmark', 'United Kingdom', 'Netherlands', 'Germany', 'Belgium'])].copy()

# Column names to lowercase

emod.columns = emod.columns.str.lower()

# Select relevant columns

emod = emod[['country', 'name', 'n_turbines', 'power_mw', 'status', 'year', 'geometry']].copy().reset_index().drop('index', axis=1)

# Get list of project names for comparison

emod_names = list(set(emod.name.str.lower()))
print(f'Found {len(emod_names)} unique windparks in the North Sea (excluding Norway)')

Found 322 unique windparks in the North Sea (excluding Norway)


In [22]:
emod.to_file(f'../../drive/energy/renewables/emodnet_windfarms_northsea.geojson', driver='GeoJSON')

## Import and clean MapStand

In [8]:
# Set request parameters

url = f'https://hub.mapstand.com/gs/ows?VERSION=1.3.0&apikey={API_KEY}'
wfs = WebFeatureService(url)

In [9]:
# Import wfs

layers = ['mps:powerplant_windfarm_planned', 'mps:powerplant_windfarm_installed']
dfs = []

for layer in layers:
    
    params = dict(service='WFS', version="1.3.0", request='GetFeature',
          typeName=layer, outputFormat='json')
    
    wfs_request_url = Request('GET', url, params=params).prepare().url
    
    df = gpd.read_file(wfs_request_url)
    df['dataset'] = layer.replace('mps:powerplant_windfarm_', '')
    dfs.append(df)

gdf = pd.concat(dfs)

In [13]:
# filter dataset

countries = ['Netherlands', 'United Kingdom', 'Belgium', 'Norway', 'Denmark', 'Germany']

ns = gdf[(gdf.admin_area_name.isin(countries))\
        & (gdf.windfarm_type != 'ONSHORE')].copy()

# We can drop some columns, because they don't have any data, or the data is not relevant

cols = ['attribution', 'coast_distance', 'mps_created_time', 
        'mps_est_coast_distance_km', 'mps_est_elevation_max_m',
        'mps_est_elevation_min_m', 'mps_est_shore_status', 
        'mps_project_tag', 'mps_uuid', 'closure_year', 
        'decommissioning_year', 'round_name']

ns.drop(cols, axis=1, inplace=True)

# Rename columns

cols = {'admin_area_name': 'country',
        'mps_datasource_tags': 'source',
        'mps_est_area_sqkm': 'km2',
        'updt': 'updated_on'}

ns = ns.rename(columns=cols)

# Clean company names

ns.owner_group = ns.owner_group.str.replace(' (GROUP)', '')

# Convert year, updated to datetime

ns.installation_year = pd.to_datetime(ns.installation_year, format='%Y', errors='coerce')
ns.updated_on = pd.to_datetime(ns.updated_on.str[0:10])

print(f'There are {ns.shape[0]} records and {ns.shape[1]} columns left after filtering and cleaning') 

There are 349 records and 18 columns left after filtering and cleaning


In [14]:
ns.to_file('../../drive/energy/renewables/mapstand_windfarms_northsea.geojson', driver='GeoJSON')

### Compare EMODnet and MapStand

In [18]:
# perform spatial join

emod_ns = emod.sjoin(ns, predicate='intersects', how='left')
print(f"{emod_ns.id.isna().sum()} rows of emodnet might not be in mapstand")

67 rows of emodnet might not be in mapstand


In [21]:
# and try the other way around

ns_emod = ns.sjoin(emod, predicate='intersects', how='left')
print(f'{ns_emod.index_right.isna().sum()} rows of mapstand might not be in emodnet')

138 rows of mapstand might not be in emodnet


### Import wind development zones

Some of these zones are present in the EMODnet dataset, but not in the planned or installed, while the plans of some of these zones seem to be quite serious, for instance, because the electricity grid is already planned.

In [15]:
wdz = gpd.read_file(PATH_DATA + 'wind_development_zones.geojson')
len(wdz)

244

In [16]:
wdz[['name', 'geometry']].explore()

## Import companies

In [None]:
com = gpd.read_file(PATH_DATA + 'companies.geojson')
len(com)

Columns needed
- name: owner1
- group_co_name: owner2
- equity_breakdown: owner3
- group_equity_breakdown: owner4
- expanded_equity_breakdown: owner5
- expanded_grp_equity_breakdown: owner6

In [None]:
selection = com[['id', 'name', 'group_co_name', 'equity_breakdown',
          'group_equity_breakdown', 'expanded_equity_breakdown', 'expanded_grp_equity_breakdown']].copy()

selection = selection[selection['name'].isin(list(set(ns.owner)))].copy()
selection.reset_index(inplace=True)

len(selection)

In [None]:
def clean_com(df, col):

    # Select right colummns
    df = df[['id', col]].copy()
    df = df[(df[col] != '') & (df[col].notna())]
    
    # Split companies over multiple lines
    df[col] = df[col].str.split(r'%\), ')
    df = df.explode(col)

    # Create share column if there are shares
    if col not in ['name', 'group_co_name']:    
        df['share'] = df[col].str.extract(r'\s\(([0-9]+\.[0-9])(%\))?')[0].astype('float')

    # Create rank column
    df['rank'] = df.groupby('id').cumcount()+1
    
    # Clean company column
    
    df[col] = df[col].str.replace('\s\([0-9]+\.[0-9]+', '', regex=True)
    df[col] = df[col].str.replace('%\)', '', regex=True)

    df[col] = df[col].str.replace('\s\(GROUP\)', '', regex=True)

    # Rename column
    df = df.rename(columns = {col: 'owner'})
    df.owner = df.owner.str.upper()

    #df = df[df.owner != 'NONE'].copy()

    return df


In [None]:
owner1 = clean_com(selection, 'name')
owner2 = clean_com(selection, 'group_co_name')
owner3 = clean_com(selection, 'equity_breakdown')
owner4 = clean_com(selection, 'expanded_equity_breakdown')
owner5 = clean_com(selection, 'group_equity_breakdown')
owner6 = clean_com(selection, 'expanded_grp_equity_breakdown')

In [None]:
set(owner2.owner)

## Data quality

### Check for completeness

Let's check against the most reliable dataset out there: EMODnet

In [None]:
# Alternative import

ns = gpd.read_file('../../drive/energy/renewables/mapstand_windfarms_northsea.geojson')

In [None]:
print(f'mapstand: {len(ns)}\nemodnet: {len(emod)}')

One way to see if there are fields in EMODnet that are not in MapStand is to perform a spatial join. 

In [None]:
emod_ns = emod.sjoin(ns, predicate='intersects', how='left')
print(f"{emod_ns.id.isna().sum()} rows of emodnet might not be in mapstand")

In [None]:
emod_ns.columns

In [None]:
to_merge = emod_ns[['country_left', 'name_left', 'geometry', 'id']].copy()

In [None]:
test = wdz.sjoin(to_merge[to_merge.id.isna()], predicate='intersects', how='left')

In [None]:
test[test.index_right.notna()][['name', 'geometry']].explore()

TODO: select emodnet windfarms and add them to MapStand

In [None]:
# How complete is the whole dataset?

msno.matrix(ns)

In [None]:
# How complete is the data for installed windfarms?

msno.matrix(ns[ns.dataset=='installed'])

So a couple of things need to be cleared up:
1. Capacity is not known for at least one windfarm.
2. Installation year is not known for at least two. We should get more clarity on that.
3. There is one windfarm from which much data is lacking, which one is that?

In [None]:
ns[(ns.capacity_mw.isna()) & (ns.dataset=='installed')]

This entry seems to contain a fault. Otary is active in Belgian waters, but the polygon is in the German part of the North Sea.

In [None]:
# Let's check it out on a map

ns[(ns.capacity_mw.isna()) & (ns.dataset=='installed')][['geometry']].explore()

See if there are any other mismatches between country in the dataset and the actual location of the windparks. We can do that by overlaying the EEZs and compare countries.

In [None]:
# Import EEZ

eez = gpd.read_file('../../drive/energy/renewables/EMODnet_HA_EEZ/EMODnet_HA_OtherManagementAreas_EEZ_v11_20210506.shp')

In [None]:
ns_test = ns.copy()
ns_test = ns_test.sjoin(eez[['Country', 'geometry']], predicate='intersects', how='left')
ns_test[(ns_test.country.str.strip() != ns_test.Country.str.strip()) & (ns_test.country != 'Norway')][['name', 'country', 'Country', 'geometry']].explore()

#### How much wind power has every country installed through the years?

In [None]:
# Select windfarms that have an installation year and have status 'installed'

installed_ts = ns[(ns.installation_year.notna()) & (ns.dataset=='installed')].sort_values(by=['country', 'installation_year'])

# Calculate cumulative capacity in mw per country

installed_ts = installed_ts.groupby(['country', 'installation_year']).agg({'capacity_mw': 'sum'})
installed_ts['cum_capacity_mw'] = installed_ts.groupby(['country']).capacity_mw.cumsum()
installed_ts.reset_index(inplace=True)

In [None]:
# Plot cumulative capacity in time series

installed_ts[installed_ts['installation_year'] > '2000-01-01']\
                        .hvplot.line(x='installation_year',
                                     y='cum_capacity_mw',
                                     by='country',
                                     width=1000,
                                     height=600,
                                     ylabel='Cumulative power MW',
                                     title='Known installed wind power capacity per country (MW)')
    

#### What share do companies have in the wind power generation on the North Sea?

In [None]:
# Plot bar chart (show 20 largest power generators, so there are more of them)

ns.groupby(['owner_group', 'dataset']).agg({'capacity_mw': 'sum'})\
                         .sort_values(by=['capacity_mw'], ascending=False)\
                         .capacity_mw.nlargest(20)\
                         .hvplot.bar(x='owner_group',
                                     y='capacity_mw',
                                     by='dataset',
                                     stacked=True,
                                     width=1000,
                                     height=760,
                                     rot=45,
                                     xlabel='Company - owner group',
                                     ylabel='Capacity in MW',
                                     title='Capacity of windfarms per corporate group, installed and planned')
                                                                    
                                                                   

## Other datasets

#### 4COffshore

The latitude is given, but the longitude is not. We therefore might need to link the data on project names

In [None]:
# Import 4COffshore

fourc = pd.read_csv(f'{PATH_DATA}4c_offshore_freemium_windfarms.csv')

# Filter out North Sea Countries

fourc = fourc[fourc.country_filter.isin(['netherlands', 'norway', 'united-kingdom', 'germany', 'denmark', 'belgium'])].copy()

# Rename columns

fourc = fourc.rename(columns=cols_clean)

# Clean latitude column

fourc.latitude = fourc.latitude.str.replace('°', '', regex=False)

# Clean power_mw column

fourc.power_mw = fourc.power_mw.str.replace(' MW', '').astype(float)
            
# Select relevant columns

fourc = fourc[['name', 'other_names', 'country', 'owner', 'developers', 'status', 'power_mw', 'category_round', 'main_url', 'comments', 'turbine_model', 'foundation_type', 'latitude']].copy()

# Clean owners to proper list

fourc[['developers', 'owner']] = fourc[['developers', 'owner']].applymap(lambda x: ast.literal_eval(x))

# Get list of project names for comparison

fourc_names = list(set(fourc.name.str.lower()))
len(fourc)

#### Wind Europe

The dataset contains coordinates, so we can perform a spatial join to link the data to the polygons of EMODNet. But let's also get the project names to see if we can match some of the data from 4COffshore. There might be some wind farms outside of the North Sea, but we can use a spatial mask to filter these out.

In [None]:
# Import Wind Europe

weurope = pd.read_csv(f'{PATH_DATA}windeurope.csv', sep=';')

# Clean columns

weurope.columns = weurope.columns.str.lower().str.replace(' ', '_', regex=False)

# Drop some columns

weurope = weurope.drop([col for col in weurope.columns if '(group)' in col], axis=1)

# Rename columns

weurope = weurope.rename(columns=cols_clean)

# Select relevant columns

weurope = weurope[['name', 'status', 'power_mw', 'year', 'year_awarded', 'foundation_type', 'owner', 'turbine_manufacturer', 'latitude', 'longitude']].copy()

# Get list of project names for comparison

weurope_names = list(set(weurope['name'].str.lower()))
len(weurope_names)

In [None]:
emod.to_csv(PATH_DATA + 'emod.csv', index=True, sep=',')

#### NVE Norwegian data

There are two slightly different datasets available:
1. From the NVE map interface (norwegian_wind_farms.geojson). This is the most elaborate one and also contains ownership information. 
2. Also from the NVE but from the open data service. This has fewer data fields, but contain 4 more records.

In [None]:
nve = gpd.read_file(f'{PATH_DATA}norwegian_wind_farms.geojson')

# Clean columns and values

nve = nve.rename(columns=cols_clean)
nve = nve[nve.name.str.contains('offshore')==True]
nve.status = nve.status.replace(values_clean, regex=False)
nve.phase = nve.phase.replace(values_clean, regex=False)

# Select relevant columns

nve = nve[[col for col in nve.columns if col.islower()]].drop(['datafangstdato', 'malemetode', 'noyaktighet'], axis=1).copy()
len(nve)

In [None]:
nve.to_csv(PATH_DATA + 'nve.csv', index=True, sep=',')

In [None]:
fourc.to_csv(PATH_DATA + '4coffshore.csv', sep=',')
weurope.to_csv(PATH_DATA + 'windeurope.csv', sep=',')

In [None]:
country = 'United Kingdom'
fourc[fourc.country==country][['name', 'other_names', 'latitude']].sort_values(by='name')[40:55]

In [None]:
emod[emod.country==country][['name', 'geometry']].sort_values(by='name')[25:40]