In [76]:
import geopandas as gpd
import pandas as pd
import numpy as np

import src.infrastructure as infra
from src.companies import get_licence_and_company
from src.geo_utils import write_to_geojson

# Turn on autoload for debugging
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


The goal of this notebook is to create a cleaned dataset with oil and gas platforms and their infraatructure. The output should be several geojson files for use in Flourish or other visualization tools, although you could also use the visualisation capabilities of geopandas or fiona, holoview or some other module to visualise the data in this notebook. This modules are included in this docker image.

For platforms there multiple datasets:
1. EMODnet - contains all platforms, alsow wit decommissioned ones, but is not fully up to date. 
2. National datasets, that have a very different structure, but has been normalised by me earlier, except for wellbores. This dataset is the most current one.

Data should be current up to 2023-03-01.

Some warnings:
- Altough the data is from official sources, it's quite messy and the datasets contain different kinds of data. Some normalization was necessary so I had to make some choices. These choices are made explicit in the code.
- The company data (operators, licence holders) have been normalized as well, where different sources were used, like the national gas and oil agencies, but also manually linking daughter and parent companies, using company registries and news articles. That process might lead to some errors, so if you use this data, you might need to check, for instance at [Mapstand](https://app.mapstand.com/). Up until now, I haven't found any mismatches with the data from Mapstand (we largely use the same sources), but just be careful. 

## Overviews

We probably need some overviews of infrastructure, like all platforms, pipelines, structures, cables, etc. If you want to import just one of more countries, change 'all' to an iso-2-alpha country code (list, so comma separated).

In [37]:
platforms = infra.get_platforms(['all'], eez=True, only_platforms=False)

Imported 746 rows from no: no_facility
Imported 252 rows from nl: nl_facility
Imported 23337 rows from uk: uk_infrastructure_points
Imported 0 rows from be: int_platforms
Imported 67 rows from dk: int_platforms
Imported 2 rows from de: int_platforms


In [3]:
infrastructure = platforms[platforms.type_normalised != 'Platform'].copy()
len(infrastructure)

23499

In [4]:
platforms = platforms[platforms.type_normalised == 'Platform'].drop_duplicates(subset='infra_name').copy()
len(platforms)

701

In [6]:
pipes = infra.get_pipelines(['all'], eez=True)

Imported 64 rows from no: no_pipeline_thin
Imported 492 rows from nl: nl_pipelines
Imported 3368 rows from uk: uk_pipelines_linear
Imported 0 rows from be: int_pipelines
Imported 1 rows from dk: int_pipelines
Imported 0 rows from de: int_pipelines


In [7]:
wellbores = infra.get_wellbores(['all'], eez=True)

Imported 8840 rows from no: no_wellbore
Imported 2381 rows from nl: nl_wellbores
Imported 16188 rows from uk: uk_wells
Imported 0 rows from be: int_wellbores
Imported 706 rows from dk: int_wellbores
Imported 248 rows from de: int_wellbores


In [8]:
# Geopandas doesn't like datetimes, so let's convert it to string.

wellbores[['start_date', 'end_date']] = wellbores[['start_date', 'end_date']].astype(str)

### Add ownership data from licences

All infrastructure are part of a licence area. It might be useful to see who owns the licence there. There are some caveats though:
1. We don't in what measure licence holders are responsble for all infrastructure in their areas. Maybe there is (defunct) infrastructure for which other parties are responsible. 
2. This data doesn't contain share sizes of different companies holding a licence together. That data is available for the UK and Norway on their national portals, but not for the Netherlands. 

In [40]:
platforms_com = get_licence_and_company(platforms)

Merged 24153, but could not merge 3855 because of missing company names


In [41]:
platform_infra_com = get_licence_and_company(infrastructure)

Merged 23448, but could not merge 3700 because of missing company names


In [42]:
pipes_infra_com = get_licence_and_company(pipes)

Merged 3604, but could not merge 398 because of missing company names


In [43]:
wellbores_infra_com = get_licence_and_company(wellbores)

Merged 24609, but could not merge 5716 because of missing company names


In [18]:
# Write to csv file

platforms.to_csv('../../data/infrastructure/csv/platforms_companies.csv', index=False)
platform_infra.to_csv('../../data/infrastructure/csv/infra_companies.csv', index=False)
pipes_infra.to_csv('../../data/infrastructure/csv/pipes_companies.csv', index=False)
wellbores_infra.to_csv('../../data/infrastructure/csv/wellbores_companies.csv', index=False)

### Add radius to platforms

For visualisation purposes we want to focus what is directly under or near platforms. So we need to create a radius around the platforms perform a spatial clip 

In [44]:
# Create radius geometry

platforms['radius'] = platforms.geometry.buffer(500)

# Write radius to file

radius = platforms.drop('geometry', axis=1)
radius = radius.set_geometry('radius')
radius = radius.to_crs(4326)
#radius[['owner_name_normalised', 'owner_name', 'owner_country']] = radius[['owner_name_normalised', 'owner_name', 'owner_country']].astype(str)
#radius.to_file('../data/visuals/radius.geojson', driver='GeoJSON')

### Clip and write to file

In [45]:
# Clip infrastructure and pipeline datasets

platform_infra = gpd.clip(infrastructure, platforms['radius'])
pipes_infra = gpd.clip(pipes, platforms['radius'])
wellbores_infra = gpd.clip(wellbores, platforms['radius'])
#infra_total_infra = gpd.clip(infra_total, platforms['radius'])

In [113]:
# Write to geojson

dfs = [platforms_com, platform_infra_com, pipes_infra_com, wellbores_infra_com]
names = ['platforms', 'infrastructure', 'pipes', 'wellbores']

for df, name in zip(dfs, names):
    write_to_geojson(df, name)
