In [205]:
import pandas as pd
from pandas import DataFrame
import numpy as np
import requests

# Data Extraction and Cleaning

First, we import the necessary files. This includes the network service provider information (saved as `nsp_info`), and generation by fuel type (saved as `fuel_gen`).

## NSP Data

In [206]:
nsp_info = pd.read_csv('https://www.emi.ea.govt.nz/Wholesale/Datasets/MappingsAndGeospatial/NetworkSupplyPointsTable/20221217_NetworkSupplyPointsTable.csv')

In [207]:
nsp_info

Unnamed: 0,Current flag,NSP,NSP replaced by,POC code,Network participant,Embedded under POC code,Embedded under network participant,Reconciliation type,X flow,I flow,...,Start date,Start TP,End date,End TP,SB ICP,Balancing code,MEP,Responsible participant,Certification expiry,Metering information exemption expiry date
0,1,ABY0111ALPEGN,,ABY0111,ALPE,,,GN,1,1,...,2012-01-01,1,,,,CENTRALALPEG,TPNZ,TPNZ,2023-02-12,
1,1,AKK0011SMRTEN,,AKK0011,SMRT,KOE1101,TOPE,EN,1,0,...,2021-08-01,1,,,,AKK0011SMRTE,AMCI,SMRT,2026-07-29,
2,1,AKL0331AIALEN,,AKL0331,AIAL,MNG0331,VECT,EN,1,1,...,2022-10-01,1,,,1001136290AA143,AKL0331AIALE,AMCI,AIAL,2022-12-24,
3,1,ALB0331UNETGN,,ALB0331,UNET,,,GN,1,1,...,2020-08-14,1,,,,NORTHRNUNETG,TPNZ,TPNZ,2025-06-29,
4,1,ALB1101UNETGN,,ALB1101,UNET,,,GN,1,0,...,2008-05-01,1,,,,NORTHRNUNETG,TPNZ,TPNZ,2025-04-23,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2209,0,WWC0011WFNZEN,WWC0011TENCEN,WWC0011,WFNZ,HEP0331,UNET,EN,1,0,...,2011-01-01,1,2017-06-30,48.0,0002220000WFEDE,WWC0011WFNZE,AMCI,WFNZ,2017-12-14,
2210,0,WWD1102MERIGD,WWD1102MERIGG,WWD1102,MERI,,,GD,1,0,...,2008-12-01,1,2009-02-16,48.0,,WWD1102MERIG,MERG,MERI,2023-08-11,
2211,0,WWD1103MERIGD,WWD1103MERIGG,WWD1103,MERI,,,GD,1,0,...,2008-12-01,1,2009-02-16,48.0,,WWD1103MERIG,MERG,MERI,2023-08-12,
2212,0,WWK0111WAIKEN,WWK0111WAIKEN,WWK0111,WAIK,WRK0331,HAWK,EN,1,0,...,2005-09-01,1,2008-04-30,48.0,,WWK0111WAIKE,,,,


In [208]:
nsp_info.columns.tolist()

['Current flag',
 'NSP',
 'NSP replaced by',
 'POC code',
 'Network participant',
 'Embedded under POC code',
 'Embedded under network participant',
 'Reconciliation type',
 'X flow',
 'I flow',
 'Description',
 'NZTM easting',
 'NZTM northing',
 'Network reporting region ID',
 'Network reporting region',
 'Zone',
 'Island',
 'Start date',
 'Start TP',
 'End date',
 'End TP',
 'SB ICP',
 'Balancing code',
 'MEP',
 'Responsible participant',
 'Certification expiry',
 'Metering information exemption expiry date']

We can see this dataset contains a lot of information about the network service providers (NSPs). The only columns relevant for us are `POC code`, `NZTM easting`, and `NZTM northing`. POC stands for point of connection on the electrcitiy grid. Some NSPs have the same POC code because they consist of multiple generation units within the same facility. NZTM easting and northing describe the location of the POC in the NZTM coordinate system. We can remove the unnecessary columns.

In [209]:
nsp_info = nsp_info[['POC code', 'NZTM easting', 'NZTM northing']].copy()

In [210]:
nsp_info[nsp_info['NZTM easting'].isna()]

Unnamed: 0,POC code,NZTM easting,NZTM northing
1,AKK0011,,
25,BCK0011,,
26,BCK0012,,
30,BJL0011,,
34,BMR0011,,
...,...,...,...
2057,VWC0011,,
2058,VWC0011,,
2059,VWC0011,,
2117,WPH0011,,


Notice there are several POCs without NZTM coordinates. These POCs are usually either no longer operating or are embedded under another POC. As a result, we can disregard these POCs without losing any information.

In [211]:
nsp_info.dropna(inplace=True)

Another point to note is that several generation units can have the same POC code if they are operating within the same facility. We can see this by comparing the length of the dataset with the length of unique values.

In [212]:
print("Total data points: {count}".format(count=len(nsp_info)))

print("Unique data points: {count}".format(count=len(nsp_info.value_counts())))

Total data points: 1906
Unique data points: 485


There were an astounding number of duplicates in our data. We can drop these without any loss of information.

In [213]:
nsp_info.drop_duplicates(inplace=True)

In [214]:
nsp_info

Unnamed: 0,POC code,NZTM easting,NZTM northing
0,ABY0111,1424393.0,5097839.0
2,AKL0331,1762701.0,5907699.0
3,ALB0331,1750900.0,5932739.0
4,ALB1101,1750900.0,5932739.0
5,ANI0331,1931640.0,5754616.0
...,...,...,...
2155,WRA0111,1981648.0,5674331.0
2158,WRA0501,1981648.0,5674331.0
2197,WTN0111,1237336.0,4879291.0
2198,WTN0661,1237336.0,4879291.0


This looks good for the moment, let's export this data so we have it saved if needed elsewhere.

In [215]:
nsp_info.to_csv('data/nsp_info.csv', index=False)

## Fuel Generation Data

Now we extract the fuel generation data. This dataset contains a list of point of connections (POCs), corresponding to the power stations across New Zealand. More importantly, they contain information about the fuel generation type for each POC.

In [216]:
fuel_gen = pd.read_csv('https://www.emi.ea.govt.nz/Wholesale/Datasets/Generation/Generation_MD/202211_Generation_MD.csv')

In [217]:
fuel_gen

Unnamed: 0,Site_Code,POC_Code,Nwk_Code,Gen_Code,Fuel_Code,Tech_Code,Trading_Date,TP1,TP2,TP3,...,TP41,TP42,TP43,TP44,TP45,TP46,TP47,TP48,TP49,TP50
0,ARA,ARA2201,MRPL,aratiatia,Hydro,Hydro,2022-11-01,23640,23590,30130,...,39070,39060,38950,39100,39010,38960,39040,38960,,
1,ARA,ARA2201,MRPL,aratiatia,Hydro,Hydro,2022-11-02,37340,24300,21050,...,37510,36780,36970,36920,36980,36870,36900,26000,,
2,ARA,ARA2201,MRPL,aratiatia,Hydro,Hydro,2022-11-03,25040,15960,10370,...,36520,36410,36480,34060,24770,18960,15440,11810,,
3,ARA,ARA2201,MRPL,aratiatia,Hydro,Hydro,2022-11-04,11830,12730,8000,...,23840,24620,24830,25140,25120,25250,24020,12300,,
4,ARA,ARA2201,MRPL,aratiatia,Hydro,Hydro,2022-11-05,12330,12300,12320,...,22680,22600,22670,22680,23600,23640,23630,23600,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2255,WWD,WWD1103,MERI,west_wind,Wind,Wind,2022-11-26,10451,12149,8548,...,29315,27086,25050,27218,28169,28079,28025,24999,,
2256,WWD,WWD1103,MERI,west_wind,Wind,Wind,2022-11-27,21975,23230,28269,...,2720,1146,297,3718,4911,10490,19926,27001,,
2257,WWD,WWD1103,MERI,west_wind,Wind,Wind,2022-11-28,27339,26956,28241,...,15251,19838,18063,16951,16498,18356,24159,28895,,
2258,WWD,WWD1103,MERI,west_wind,Wind,Wind,2022-11-29,32248,33141,33195,...,32352,30964,30027,30044,30767,28349,28680,29829,,


In [218]:
fuel_gen.columns

Index(['Site_Code', 'POC_Code', 'Nwk_Code', 'Gen_Code', 'Fuel_Code',
       'Tech_Code', 'Trading_Date', 'TP1', 'TP2', 'TP3', 'TP4', 'TP5', 'TP6',
       'TP7', 'TP8', 'TP9', 'TP10', 'TP11', 'TP12', 'TP13', 'TP14', 'TP15',
       'TP16', 'TP17', 'TP18', 'TP19', 'TP20', 'TP21', 'TP22', 'TP23', 'TP24',
       'TP25', 'TP26', 'TP27', 'TP28', 'TP29', 'TP30', 'TP31', 'TP32', 'TP33',
       'TP34', 'TP35', 'TP36', 'TP37', 'TP38', 'TP39', 'TP40', 'TP41', 'TP42',
       'TP43', 'TP44', 'TP45', 'TP46', 'TP47', 'TP48', 'TP49', 'TP50'],
      dtype='object')

This dataset contains fuel generation (in kWh) for every trading period of every day in November 2022. However, we are only concerned with the `POC_Code` and `Fuel_Code` columns. We can also include the `Gen_Code` column for specific names of the power stations locations.

In [219]:
fuel_gen = fuel_gen[['POC_Code', 'Gen_Code', 'Fuel_Code']].copy()

Again, there will be several duplicates since each POC contains multiple generation units in the same facility. We can drop these duplicates.

In [220]:
fuel_gen.drop_duplicates(inplace=True)

Now we only want power stations which generate using wind and hydroelectric power.

In [221]:
fuel_gen = fuel_gen[fuel_gen['Fuel_Code'].isin(['Hydro', 'Wind'])].reset_index(drop=True).copy()

In [222]:
fuel_gen

Unnamed: 0,POC_Code,Gen_Code,Fuel_Code
0,ARA2201,aratiatia,Hydro
1,ARG1101,argyle_wairau,Hydro
2,ARI1101,arapuni,Hydro
3,ARI1102,arapuni,Hydro
4,ASB0661,highbank,Hydro
5,ATI2201,atiamuri,Hydro
6,AVI2201,aviemore,Hydro
7,BEN2202,benmore,Hydro
8,BPE0331,twf_12,Wind
9,BWK1101,waipori,Hydro


Now we have a neat list of 48 wind and hydro-power generation stations in New Zealand. Let's export this data.

In [223]:
fuel_gen.to_csv('data/fuel_gen.csv', index=False)

# Identifying Coordinates

Our goal is to identify the coordinates of every POC in the `fuel_gen` dataset, using the `nsp_info` dataset. Now the latter dataset contains coordinates for all POCs, not just hydro and wind. To quickly cross-reference between the POCs in the two datasets, we use a dictionary of the POCs in `nsp_info` dataset. The key will store the POC code and the values will be the NZTM coordinates as a list.

To avoid confusion, I will just import the saved file.

In [224]:
nsp_info = pd.read_csv('data/nsp_info.csv')

In [225]:
nsp_info

Unnamed: 0,POC code,NZTM easting,NZTM northing
0,ABY0111,1424393.0,5097839.0
1,AKL0331,1762701.0,5907699.0
2,ALB0331,1750900.0,5932739.0
3,ALB1101,1750900.0,5932739.0
4,ANI0331,1931640.0,5754616.0
...,...,...,...
480,WRA0111,1981648.0,5674331.0
481,WRA0501,1981648.0,5674331.0
482,WTN0111,1237336.0,4879291.0
483,WTN0661,1237336.0,4879291.0


Now we create the dictionary.

In [226]:
location_hash = {}

for i in range(len(nsp_info)):
    
    key = nsp_info.iloc[i][0]
    value = [nsp_info.iloc[i][1], nsp_info.iloc[i][2]]
    
    location_hash[key] = value

Now let's identify the locations of the POCs in the `fuel_gen` data. We can store this information in a list. First, let's import the `fuel_gen.csv` file again.

In [227]:
fuel_gen = pd.read_csv('data/fuel_gen.csv')

In [228]:
location_list = []

for i in range(len(fuel_gen)):
    poc = fuel_gen.iloc[i][0]
    
    location_list.append(location_hash[poc])

Thankfully, this worked without any trouble which means all the POCs in the `fuel_gen` data existed in the `nsp_info` data. I can't remember how many times these sort of problems occur and throw off the whole process. Anyway, let's append this list to the `fuel_gen` dataframe.

In [229]:
fuel_gen[['NZTM easting', 'NZTM northing']] = location_list

In [230]:
fuel_gen

Unnamed: 0,POC_Code,Gen_Code,Fuel_Code,NZTM easting,NZTM northing
0,ARA2201,aratiatia,Hydro,1873657.0,5721160.0
1,ARG1101,argyle_wairau,Hydro,1616836.0,5386748.0
2,ARI1101,arapuni,Hydro,1831834.0,5782895.0
3,ARI1102,arapuni,Hydro,1831834.0,5782895.0
4,ASB0661,highbank,Hydro,1503869.0,5133912.0
5,ATI2201,atiamuri,Hydro,1863994.0,5746271.0
6,AVI2201,aviemore,Hydro,1390245.0,5051586.0
7,BEN2202,benmore,Hydro,1377144.0,5061473.0
8,BPE0331,twf_12,Wind,1824481.0,5537751.0
9,BWK1101,waipori,Hydro,1375377.0,4908483.0


Perfect! But the work is not complete. There are some power stations with the same location coordinates but different POC codes. These are generation units in the same facility, but with different POC codes. No problem, we can drop the duplicates.

In [234]:
fuel_gen = fuel_gen.drop_duplicates(subset=['NZTM easting', 'NZTM northing']).reset_index(drop=True)

In [235]:
fuel_gen

Unnamed: 0,POC_Code,Gen_Code,Fuel_Code,NZTM easting,NZTM northing
0,ARA2201,aratiatia,Hydro,1873657.0,5721160.0
1,ARG1101,argyle_wairau,Hydro,1616836.0,5386748.0
2,ARI1101,arapuni,Hydro,1831834.0,5782895.0
3,ASB0661,highbank,Hydro,1503869.0,5133912.0
4,ATI2201,atiamuri,Hydro,1863994.0,5746271.0
5,AVI2201,aviemore,Hydro,1390245.0,5051586.0
6,BEN2202,benmore,Hydro,1377144.0,5061473.0
7,BPE0331,twf_12,Wind,1824481.0,5537751.0
8,BWK1101,waipori,Hydro,1375377.0,4908483.0
9,COL0661,coleridge,Hydro,1480658.0,5197735.0


This would be sufficient if the weather API we plan on using can take NZTM coordinates as input. However, to be safe we should convert these to NZGD (New Zealand Geodetic Datum) 2000 (version 20180701) coordinates, the latitude and longitude system. This conversion is not straightforward so I use a website https://www.geodesy.linz.govt.nz/concord/ to do this. By inputting the NZTM coordinates into the website converter, I was able to obtain the latitude and longitude coordinates in the format below. I have verified that these conversions are accurate.

In [240]:
coords = [[-38.61589062,176.14303618,-26.981], [-41.67085104,173.20225110,-17.059], [-38.07202558,175.64291928,-30.510], [-43.94163712,171.80214900,-10.349],
          [-38.39283133,176.02273036,-28.359], [-44.65844302,170.35433494,-10.134], [-44.56559630,170.19357061,-10.000], [-40.28061426,175.64051012,-13.439],
          [-45.94046956,170.10217216,-6.1200], [-43.36376560,171.52712193,-14.314], [-45.18039051,169.30622444,-7.9980], [-39.56649650,174.31011668,-18.307],
          [-37.92434939,175.53746149,-31.127], [-40.41045204,175.63139355,-13.905], [-45.52086826,167.27749984,-7.2430], [-38.11342929,176.81461707,-27.603], 
          [-40.57657526,175.44893017,-13.874], [-38.35273636,175.74426746,-29.037], [-46.30539088,168.35952698,-4.3560], [-45.03318014,170.09335299,-9.2140],
          [-44.26587174,170.02921164,-9.7380], [-44.30180131,170.11055110,-9.8130], [-44.34356091,170.17949166,-9.9230], [-38.40796630,176.08596914,-28.188],
          [-38.14825015,176.22704093,-29.211], [-45.47721291,169.31913607,-8.0540], [-39.15319409,175.83744312,-25.189], [-41.31494147,173.24305864,-16.553],
          [-37.72771589,176.13086945,-30.948], [-44.01380711,170.46085338,-11.229], [-44.12516128,170.21174758,-10.246], [-38.98172132,175.77022447,-25.780],
          [-38.80535422,177.15139892,-22.222], [-40.35426721,175.77371000,-14.698], [-37.73832727,175.15413879,-31.488], [-40.31545684,175.85606683,-15.130],
          [-38.41908188,175.80036275,-28.633], [-38.29166090,175.68340783,-29.559], [-45.85520944,170.47538814,-5.9240], [-44.69209931,170.42902881,-9.9850],
          [-39.75409738,174.63550258,-16.137], [-41.29561783,174.66049610,-12.669]]

This is the output format I received from the website. Within each sublist, the first value is the latitude, second is longitude, and third is the height. Let's add this data to our `fuel_gen` dataframe.

In [241]:
fuel_gen[['Latitude','Longitude','Height']] = DataFrame(coords)

We can remove the NZTM coordinates now, since they are redundant. We can also remove the height coordinate for this project.

In [242]:
fuel_gen.drop(columns=['NZTM easting', 'NZTM northing', 'Height'], inplace=True)

In [243]:
fuel_gen

Unnamed: 0,POC_Code,Gen_Code,Fuel_Code,Latitude,Longitude
0,ARA2201,aratiatia,Hydro,-38.615891,176.143036
1,ARG1101,argyle_wairau,Hydro,-41.670851,173.202251
2,ARI1101,arapuni,Hydro,-38.072026,175.642919
3,ASB0661,highbank,Hydro,-43.941637,171.802149
4,ATI2201,atiamuri,Hydro,-38.392831,176.02273
5,AVI2201,aviemore,Hydro,-44.658443,170.354335
6,BEN2202,benmore,Hydro,-44.565596,170.193571
7,BPE0331,twf_12,Wind,-40.280614,175.64051
8,BWK1101,waipori,Hydro,-45.94047,170.102172
9,COL0661,coleridge,Hydro,-43.363766,171.527122


Finally, we have obtained the latitude and longitude coordinates for all the hydro and wind generation stations across New Zealand. As one final exercise, let's plot these locations on the map of New Zealand.

# Visualizing Power Station Locations

In [247]:
import plotly.express as px

The `plotly` package provides an excellent resource for visualizing geographical locations OpenStreetMap.

In [248]:
fig = px.scatter_mapbox(fuel_gen, 
                        lat="Latitude", 
                        lon="Longitude", 
                        hover_name="Gen_Code", 
                        hover_data=["Gen_Code", "Fuel_Code"],
                        color="Fuel_Code",
                        zoom=8, 
                        height=600,
                        width=800)

fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

The results of the code above may not be visible on Github, so an image is attached to the repository in the `figures` directory. It is also visible in `README.md`.