### Data understanding: installed pV CBS versus Enexis data
What does the data look like? 
We take a look at the installed pV in time. What are the differences and what are the similarities between a set of selected municipalities (Den Bosch, Arnhem, Best and Loon op Zand). And how does the CBS data on installed pV compare to the data from the Enexis files that we use? This notebook helps us understand.

In [1]:
# !pip install cbsodata
!pip install --upgrade pip
!pip install altair --upgrade

!pip install jupyter pandas vega
!pip install --upgrade notebook  # need jupyter_client >= 4.2 for sys-prefix below

Collecting pip
  Using cached pip-22.1.1-py3-none-any.whl (2.1 MB)


ERROR: To modify pip, please run the following command:
C:\Users\cjf_v\miniconda3\python.exe -m pip install --upgrade pip













ERROR: Invalid requirement: '#'


In [2]:
import cbsodata
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as pl
import altair as alt

In [3]:
#Read in (Zonnestroom; vermogen bedrijven en woningen, regio (indeling 2019)
zonnestroom_2019 = '84783NED'
df_zonnestroom_2019 = pd.DataFrame(cbsodata.get_data(zonnestroom_2019))
df_zonnestroom_2019 = df_zonnestroom_2019[
    (
(df_zonnestroom_2019['RegioS'] == "'s-Hertogenbosch") |
(df_zonnestroom_2019['RegioS'] == "Loon op Zand") |
(df_zonnestroom_2019['RegioS'] == "Arnhem") |
(df_zonnestroom_2019['RegioS'] == "Best")
    )
    &  (df_zonnestroom_2019['BedrijfstakkenWoningen']=='Woningen')
   ]

df_zonnestroom_2019.head()

Unnamed: 0,ID,BedrijfstakkenWoningen,RegioS,Perioden,AantalInstallaties_1,OpgesteldVermogenVanZonnepanelen_2
3672,3672,Woningen,Arnhem,2012,336.0,694.0
3673,3673,Woningen,Arnhem,2013,778.0,2005.0
3674,3674,Woningen,Arnhem,2014,1135.0,3124.0
3675,3675,Woningen,Arnhem,2015,1772.0,4619.0
3676,3676,Woningen,Arnhem,2016,2530.0,7197.0


### Check if we're at version 4.2.0 so we can use the right graphs.

In [4]:
alt.__version__

'4.2.0'

In [5]:
alt.Chart(df_zonnestroom_2019).mark_line().encode(
    x=alt.X("Perioden", bin=False, title='Year'),
    y=alt.Y(alt.repeat('layer'), aggregate='mean', title="Installed pV - number vs. power"),
    color=alt.ColorDatum(alt.repeat('layer'))
).repeat(layer=["AantalInstallaties_1", "OpgesteldVermogenVanZonnepanelen_2"])

### Comparing municipalities

In [6]:
chart = alt.Chart(df_zonnestroom_2019).mark_point().encode(
    alt.X(alt.repeat("column"), type='ordinal', title='Year'),
    alt.Y(alt.repeat("row"), type='quantitative'),
    color='RegioS:N'
).properties(
    width=375,
    height=250
).repeat(
    column=['Perioden'],
    row=['AantalInstallaties_1', 'OpgesteldVermogenVanZonnepanelen_2'],
    
).interactive()
chart

### Compare the CBS data with the Enexis data

CBS data goes from 2012 to 2019 at the end of the year, and the Enexis data is from 1-1-2020.

Load the enexis 2020 data:

In [7]:
import pandas as pd

Helper methods for recurring work

In [8]:
def load_enexis_data(file_name):
  return pd.read_csv(file_name,
                         sep                = ';',
                         decimal            = ',',
                         thousands          = '.',
                         encoding           = 'unicode_escape')

In [9]:
def filter_on_4_municipalities(df):
    return df[    
    (df['Gemeente']=='Arnhem') |
    (df['Gemeente']=='Best') |
    (df['Gemeente']=="'s-Hertogenbosch") |
    (df['Gemeente']=="s-Hertogenbosch") |
    (df['Gemeente']=='Loon op Zand') 
]

In [10]:
def summarize_number_of_connections(df):
    return df.groupby('Gemeente')['Aantal aansluitingen met opwekinstallatie'].sum()

In [11]:
def summarize_opgesteld_vermogen(df):
    return df.groupby('Gemeente')['Opgesteld vermogen'].sum()

Load the data for the 4 Enexis datapoints in the 'decentrale opwek' files

In [12]:
decentral_generation_012020 = load_enexis_data('../data/Enexis_decentrale_opwek_kv_(zon_pv)_01012020.csv')
decentral_generation_012020 = filter_on_4_municipalities(decentral_generation_012020)

decentral_generation_072020 = load_enexis_data('../data/Enexis_decentrale_opwek_kv_(zon_pv)_01072020.csv')
decentral_generation_072020 = filter_on_4_municipalities(decentral_generation_072020)

decentral_generation_012021 = load_enexis_data('../data/Enexis_decentrale_opwek_kv_(zon_pv)_01012021.csv')
decentral_generation_012021 = filter_on_4_municipalities(decentral_generation_012021)

decentral_generation_072021 = load_enexis_data('../data/Enexis_decentrale_opwek_kv_(zon_pv)_01072021.csv')
decentral_generation_072021 = filter_on_4_municipalities(decentral_generation_072021)   
decentral_generation_072021.head(75)  
     

Unnamed: 0,ï»¿Peildatum,Netbeheerder,Provincie,Gemeente,CBS Buurt,CBS Buurtcode,Aantal aansluitingen in CBS-buurt,Aantal aansluitingen met opwekinstallatie,Opgesteld vermogen
1962,202107,Enexis,Noord-Brabant,Best,Centrum,7530001.0,1036,77,393
1963,202107,Enexis,Noord-Brabant,Best,Hoge Akker,7530002.0,878,113,420
1964,202107,Enexis,Noord-Brabant,Best,Speelheide,7530003.0,543,149,623
1965,202107,Enexis,Noord-Brabant,Best,De Leeuwerik,7530004.0,807,188,647
1966,202107,Enexis,Noord-Brabant,Best,Villawijk,7530005.0,121,12,122
...,...,...,...,...,...,...,...,...,...
3012,202107,Enexis,Noord-Brabant,'s-Hertogenbosch,A2 zone Rosmalen-Zuid,7960507.0,149,27,212
3013,202107,Enexis,Noord-Brabant,'s-Hertogenbosch,'t Ven,7960601.0,915,182,793
3014,202107,Enexis,Noord-Brabant,'s-Hertogenbosch,Rosmalen-Centrum,7960602.0,1099,135,563
3015,202107,Enexis,Noord-Brabant,'s-Hertogenbosch,Hondsberg,7960603.0,1024,217,811


Keep only data from selected municipalities and get the total number of pv connections there:

In [13]:
totalNumberOfPVConnections012020 = summarize_number_of_connections(decentral_generation_012020)
totalNumberOfPVConnections072020 = summarize_number_of_connections(decentral_generation_072020)
totalNumberOfPVConnections012021 = summarize_number_of_connections(decentral_generation_012021)
totalNumberOfPVConnections072021 = summarize_number_of_connections(decentral_generation_072021)

totalNumberOfPVConnections072020

Gemeente
Best               2831.0
Loon op Zand       1367.0
s-Hertogenbosch    7784.0
Name: Aantal aansluitingen met opwekinstallatie, dtype: float64

Note that 'Arnhem' is not part of the Enexis data. This is because it is not in their servicing area.

Do the same for the generative power ('opgesteld vermogen'):

In [14]:
totalOpgesteldVermogen012020 = summarize_opgesteld_vermogen(decentral_generation_012020)
totalOpgesteldVermogen072020 = summarize_opgesteld_vermogen(decentral_generation_072020)
totalOpgesteldVermogen012021 = summarize_opgesteld_vermogen(decentral_generation_012021)
totalOpgesteldVermogen072021 = summarize_opgesteld_vermogen(decentral_generation_072021)

totalOpgesteldVermogen012020

Gemeente
's-Hertogenbosch    25406.0
Best                10147.0
Loon op Zand         5156.0
Name: Opgesteld vermogen, dtype: float64

In [15]:
def create_row(id, regio, period, installations, generative_power):
    return {"ID":id, "BedrijfstakkenWoningen":"Woningen",
        "RegioS":regio, "Perioden":period,
        "AantalInstallaties_1":installations, "OpgesteldVermogenVanZonnepanelen_2":generative_power}

Add the data to the CBS data and create one graph from it.

In [26]:
best = "Best"
den_bosch = "'s-Hertogenbosch"
den_bosch_july_2020 = "s-Hertogenbosch"
loon_op_zand = "Loon op Zand"

bestRow01012020 = create_row(67890,"Best (Enexis 2020-1)", "Enexis 2020-1", totalNumberOfPVConnections012020[best], totalOpgesteldVermogen012020[best] )
sHertogenboschRow01012020 =  create_row(67891, "'s-Hertogenbosch (Enexis 2020-1)", "Enexis 2020-1", totalNumberOfPVConnections012020[den_bosch], totalOpgesteldVermogen012020[den_bosch] ) 
loonOpZandRow01012020 = create_row(67892, "Loon op Zand (Enexis 2020-1)", "Enexis 2020-1", totalNumberOfPVConnections012020[loon_op_zand], totalOpgesteldVermogen012020[loon_op_zand] )

bestRow01072020 = create_row(67893,"Best (Enexis 2020-7)", "Enexis 2020-7", totalNumberOfPVConnections072020[best], totalOpgesteldVermogen072020[best] )
sHertogenboschRow01072020 =  create_row(67894,  "'s-Hertogenbosch (Enexis 2020-7)", "Enexis 2020-7", totalNumberOfPVConnections072020[den_bosch_july_2020], totalOpgesteldVermogen072020[den_bosch_july_2020] ) 
loonOpZandRow01072020 = create_row(67895, "Loon op Zand (Enexis 2020-7)", "Enexis 2020-7", totalNumberOfPVConnections072020[loon_op_zand], totalOpgesteldVermogen072020[loon_op_zand] )

bestRow01012021 = create_row(67896,"Best (Enexis 2021-1)", "Enexis 2021-1", totalNumberOfPVConnections012021[best], totalOpgesteldVermogen012021[best] )
sHertogenboschRow01012021 =  create_row(67897, "'s-Hertogenbosch (Enexis 2021-1)", "Enexis 2021-1", totalNumberOfPVConnections012021[den_bosch], totalOpgesteldVermogen012021[den_bosch] ) 
loonOpZandRow01012021 = create_row(67898, "Loon op Zand (Enexis 2021-1)", "Enexis 2021-1", totalNumberOfPVConnections012021[loon_op_zand], totalOpgesteldVermogen012021[loon_op_zand] )

bestRow01072021 = create_row(67899,"Best (Enexis 2021-7)", "Enexis 2021-7", totalNumberOfPVConnections072021[best], totalOpgesteldVermogen072021[best] )
sHertogenboschRow01072021 =  create_row(67900, "'s-Hertogenbosch (Enexis 2021-7)", "Enexis 2021-7", totalNumberOfPVConnections072021[den_bosch], totalOpgesteldVermogen072021[den_bosch] ) 
loonOpZandRow01072021 = create_row(67901, "Loon op Zand (Enexis 2021-7)", "Enexis 2021-7", totalNumberOfPVConnections072021[loon_op_zand], totalOpgesteldVermogen072021[loon_op_zand] )

df = df_zonnestroom_2019.append(bestRow01012020, ignore_index=True)
df = df.append(sHertogenboschRow01012020, ignore_index=True)
df = df.append(loonOpZandRow01012020, ignore_index=True)

df = df.append(bestRow01072020, ignore_index=True)
df = df.append(sHertogenboschRow01072020, ignore_index=True)
df = df.append(loonOpZandRow01072020, ignore_index=True)

df = df.append(bestRow01012021, ignore_index=True)
df = df.append(sHertogenboschRow01012021, ignore_index=True)
df = df.append(loonOpZandRow01012021, ignore_index=True)

df = df.append(bestRow01072021, ignore_index=True)
df = df.append(sHertogenboschRow01072021, ignore_index=True)
df = df.append(loonOpZandRow01072021, ignore_index=True)

  df = df_zonnestroom_2019.append(bestRow01012020, ignore_index=True)
  df = df.append(sHertogenboschRow01012020, ignore_index=True)
  df = df.append(loonOpZandRow01012020, ignore_index=True)
  df = df.append(bestRow01072020, ignore_index=True)
  df = df.append(sHertogenboschRow01072020, ignore_index=True)
  df = df.append(loonOpZandRow01072020, ignore_index=True)
  df = df.append(bestRow01012021, ignore_index=True)
  df = df.append(sHertogenboschRow01012021, ignore_index=True)
  df = df.append(loonOpZandRow01012021, ignore_index=True)
  df = df.append(bestRow01072021, ignore_index=True)
  df = df.append(sHertogenboschRow01072021, ignore_index=True)
  df = df.append(loonOpZandRow01072021, ignore_index=True)


In [27]:
combinedDataChart = alt.Chart(df).mark_point().encode(
    alt.X(alt.repeat("column"), type='ordinal', title='Year'),
    alt.Y(alt.repeat("row"), type='quantitative'),
    color='RegioS:N'
).properties(
    width=375,
    height=190
).repeat(
    column=['Perioden'],
    row=['AantalInstallaties_1', 'OpgesteldVermogenVanZonnepanelen_2'],
    
)
combinedDataChart

## Observations:

Comparing municipalities shows the same general form of the data, depending mainly on the size of the municipality.

The CBS zonnestroom data 2019 accounts for the data of all of 2019, up to 31-12-2019. The Enexis data is of 1-1-2020, so these values (even though they are spaced 'a year' apart, are about the same information. In the graphs there is no value for Arnhem on 1-1-2020, that is because it is not in the Enexis service area.

For the municipalities observed, we see close alignment between the CBS zonnestroom and Enexis data for the number of installations. For the installed power we see that the Enexis values are slightly higher.

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=d0604020-40e6-4d7d-a2ba-74ef2b385723' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>