<a href="https://colab.research.google.com/github/niekh-13/geodata-etl-workshop/blob/main/Introductie_GeoPandas_Workshop.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introductie Pandas en GeoPandas: eenvoudige ETL scripting

In deze workshop voor Kaartviewer inspiratie dagen 2024 leer je de basis van Pandas en GeoPandas kennen door gebruik te maken van open data van Nederlandse netbeheerders en CBS.

In [21]:
%%capture
# Install necessary packages
!pip install pandas geopandas ipython-extensions

In [46]:
!ls

CoteqNetbeheer_kleinverbruik_01012024.csv   Liander_kleinverbruiksgegevens_20240101.csv
Enexis_kleinverbruiksgegevens_01012024.csv  stedin-kleinverbruikgegevens-2024.csv
kleinverbruiksgegevens-2024.zip


## Stap 1: Data downloaden van gekozen netbeheerder

### Kies één van de netbeheerders en download hun dataset.

In [22]:
%%capture

import requests
import zipfile
import os


#### Liander



In [39]:
# Download Liander
url = "https://www.liander.nl/-/media/files/open-data/kleinverbruikdata/kleinverbruiksgegevens-2024.zip"
response = requests.get(url)
delimiter = ';'

# Get filenames and paths for Liander
filename = 'Liander_kleinverbruiksgegevens_20240101.csv'
zip_path = url.split("/")[-1]

# Write response content in to zipfile
with open(zip_path, "wb") as f:
  f.write(response.content)

# Extract the csv file from zip file
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
  zip_ref.extractall(".")

print(f"{filename} is downloaded")

Liander_kleinverbruiksgegevens_20240101.csv is downloaded


#### Enexis


In [26]:
# Download Enexis
url = "https://enxp433-oda01.s3.eu-west-1.amazonaws.com/kv/Enexis_kleinverbruiksgegevens_01012024.csv"
response = requests.get(url)
delimiter = ';'

# Get filename for Enexis
filename = url.split("/")[-1]

# Write response content in to csv file
with open(filename, "wb") as f:
  f.write(response.content)

print(f"{filename} is downloaded")


Enexis_kleinverbruiksgegevens_01012024.csv is downloaded


#### Stedin

In [36]:
!wget https://www.stedin.net/-/media/project/online/files/zakelijk/open-data/stedin-kleinverbruikgegevens-2024.csv

delimiter = '\t'

--2024-09-13 12:28:04--  https://www.stedin.net/-/media/project/online/files/zakelijk/open-data/stedin-kleinverbruikgegevens-2024.csv
Resolving www.stedin.net (www.stedin.net)... 20.101.166.0
Connecting to www.stedin.net (www.stedin.net)|20.101.166.0|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 23869452 (23M) [text/csv]
Saving to: ‘stedin-kleinverbruikgegevens-2024.csv’


2024-09-13 12:28:06 (16.9 MB/s) - ‘stedin-kleinverbruikgegevens-2024.csv’ saved [23869452/23869452]



#### Coteq

In [28]:
# Download Coteq
url = "https://d3a07q56iliqjn.cloudfront.net/web-uploads/Documenten/Open-data/CoteqNetbeheer_kleinverbruik_01012024.csv"
response = requests.get(url)
delimiter = ';'

# Get filename for Coteq
filename = url.split("/")[-1]

# Write response content in to csv file
with open(filename, "wb") as f:
  f.write(response.content)

print(f"{filename} is downloaded")

CoteqNetbeheer_kleinverbruik_01012024.csv is downloaded



## Stap 2: Netbeheer data in pandas laden

In [41]:
import pandas as pd

columns = [
    "NETBEHEERDER", "NETGEBIED", "STRAATNAAM", "POSTCODE_VAN", "POSTCODE_TOT",
    "WOONPLAATS", "LANDCODE", "PRODUCTSOORT", "VERBRUIKSSEGMENT", "AANSLUITINGEN_AANTAL",
    "LEVERINGSRICHTING_PERC", "FYSIEKE_STATUS_PERC", "SOORT_AANSLUITING_PERC",
    "SOORT_AANSLUITING", "SJV_GEMIDDELD", "SJV_LAAG_TARIEF_PERC", "SLIMME_METER_PERC"
]

# Inlezen van netbeheerder data met pandas
data = pd.read_csv(filename, sep=delimiter, dtype=str, names=columns, skiprows=1)

# Data van netbeheerder uniform maken voor pandas
data = data.map(lambda x: x.replace(',', '.') if isinstance(x, str) else x)

#### Controleer data

In [43]:
print(data.head())
# print(data.info())
# print(data.describe())

  NETBEHEERDER NETGEBIED                  STRAATNAAM POSTCODE_VAN  \
0   Liander NB   LIANDER  De Ruyterkade Steigers         1011AA     
1   Liander NB   LIANDER  De Ruyterkade Steigers         1011AA     
2   Liander NB   LIANDER  De Ruyterkade                  1011AC     
3   Liander NB   LIANDER  De Ruyterkade                  1011AC     
4   Liander NB   LIANDER  Oosterdokskade                 1011AD     

  POSTCODE_TOT                WOONPLAATS LANDCODE PRODUCTSOORT  \
0     1011AB    AMSTERDAM                    NL       GAS        
1     1011AB    AMSTERDAM                    NL       ELK        
2     1011AC    AMSTERDAM                    NL       GAS        
3     1011AC    AMSTERDAM                    NL       ELK        
4     1011AE    AMSTERDAM                    NL       GAS        

  VERBRUIKSSEGMENT AANSLUITINGEN_AANTAL LEVERINGSRICHTING_PERC  \
0     KVB                            18                  100.0   
1     KVB                            54                 

## Stap 3: CBS Postcode Data downloaden

In [3]:
# Make variables for download
pc6_url = "https://download.cbs.nl/postcode/2024-cbs_pc6_2023_v1.zip"
pc6_dirname = "CBS_Postcode" # Name of the directory

# Download CBS Postcode data
response = requests.get(pc6_url)

# Get filename for CBS Postcode
filename = pc6_url.split("/")[-1]

# Write response content in to zip file
with open(filename, "wb") as f:
    f.write(response.content)

# Extract the files from zip file
with zipfile.ZipFile(filename, 'r') as zip_ref:
    zip_ref.extractall(f"./{pc6_dirname}")

print(f"{pc6_dirname} data is gedownload en uitgepakt")

CBS Postcode data is gedownload en uitgepakt
CBS Wijken en Buurten data is gedownload en uitgepakt


## Stap 4: CBS Postcode data inlezen

In [None]:
import geopandas as gpd

# CBS Postcode data inlezen
cbs_postcode_file = "CBS_Postcode/cbs_pc6_2023_v1.gpkg"
cbs_postcode = gpd.read_file(cbs_postcode_file)

#### Controleer data

In [None]:
print(cbs_postcode.head())
print(cbs_postcode.info())
print(cbs_postcode.describe())

## Stap 5: CBS Postcode Data koppelen aan netbeheer data

In [6]:
# Merging the datasets
merged_data = pd.merge(data, cbs_postcode, left_on="POSTCODE_VAN", right_on="PC6")

# Further data processing steps...

total 610M
-rw-r--r-- 1 root root 541M Sep 13 10:14 cbs_pc6_2023_v1.gpkg
-rw-r--r-- 1 root root  69M Sep 13 10:14 pc6_2023_v1_20240828.xlsx


## Stap 6: CBS wijkbuurten kaart data downloaden

In [None]:
# Make variables for download
cbv_url = "https://download.cbs.nl/regionale-kaarten/wijkbuurtkaart_2023_v1.zip"
cbs_name = "CBS_wijkbuurtkaart" # Name of the directory

# Download CBS Wijken data
response = requests.get(cbv_url)

# Get filename for Enexis
filename = cbv_url.split("/")[-1]

# Write response content in to zip file
with open(filename, "wb") as f:
    f.write(response.content)

# Extract the files from zip file
with zipfile.ZipFile(filename, 'r') as zip_ref:
    zip_ref.extractall(f"./{cbs_name}")

print(f"{cbs_name} data is gedownload en uitgepakt")

### Stap 7: CBS wijkbuurten kaart data inladen

In [None]:
# CBS Wijken en Buurten data inlezen
cbs_wijken_file = "CBS_wijkbuurtkaart/wijkenbuurten_2023_v1.gpkg"
cbs_wijkbuurt = gpd.read_file(cbs_wijken_file)


#### Controleer data


In [None]:
print(cbs_wijkbuurt.head())
print(cbs_wijkbuurt.info())
print(cbs_wijkbuurt.describe())