### Data understanding: installed pV CBS versus Enexis data
What does the data look like? 
We take a look at the installed pV in time. What are the differences and what are the similarities between a set of selected municipalities (Den Bosch, Arnhem, Best and Loon op Zand). And how does the CBS data on installed pV compare to the data from the Enexis files that we use? This notebook helps us understand.

In [1]:
# !pip install cbsodata
!pip install --upgrade pip
!pip install altair --upgrade

!pip install jupyter pandas vega
!pip install --upgrade notebook  # need jupyter_client >= 4.2 for sys-prefix below

Collecting pip
  Downloading pip-22.1-py3-none-any.whl (2.1 MB)
[K     |████████████████████████████████| 2.1 MB 16.2 MB/s 
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 21.2.3
    Uninstalling pip-21.2.3:
      Successfully uninstalled pip-21.2.3
Successfully installed pip-22.1
Collecting altair
  Downloading altair-4.2.0-py3-none-any.whl (812 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m812.8/812.8 kB[0m [31m19.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: altair
  Attempting uninstall: altair
    Found existing installation: altair 4.1.0
    Not uninstalling altair at /shared-libs/python3.9/py/lib/python3.9/site-packages, outside environment /root/venv
    Can't uninstall 'altair'. No files were found to uninstall.
Successfully installed altair-4.2.0
Collecting jupyter
  Downloading jupyter-1.0.0-py2.py3-none-any.whl (2.7 kB)
Collecting vega
  Downloading vega-3.6.0-py3-none-any.whl 

In [2]:
import cbsodata
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as pl
import altair as alt

In [3]:
#Read in (Zonnestroom; vermogen bedrijven en woningen, regio (indeling 2019)
zonnestroom_2019 = '84783NED'
df_zonnestroom_2019 = pd.DataFrame(cbsodata.get_data(zonnestroom_2019))
df_zonnestroom_2019 = df_zonnestroom_2019[
    (
(df_zonnestroom_2019['RegioS'] == "'s-Hertogenbosch") |
(df_zonnestroom_2019['RegioS'] == "Loon op Zand") |
(df_zonnestroom_2019['RegioS'] == "Arnhem") |
(df_zonnestroom_2019['RegioS'] == "Best")
    )
    &  (df_zonnestroom_2019['BedrijfstakkenWoningen']=='Woningen')
   ]

df_zonnestroom_2019.head()

Unnamed: 0,ID,BedrijfstakkenWoningen,RegioS,Perioden,AantalInstallaties_1,OpgesteldVermogenVanZonnepanelen_2
3672,3672,Woningen,Arnhem,2012,336.0,694.0
3673,3673,Woningen,Arnhem,2013,778.0,2005.0
3674,3674,Woningen,Arnhem,2014,1135.0,3124.0
3675,3675,Woningen,Arnhem,2015,1772.0,4619.0
3676,3676,Woningen,Arnhem,2016,2530.0,7197.0


### Check if we're at version 4.2.0 so we can use the right graphs.

In [4]:
alt.__version__

'4.2.0'

In [5]:
alt.Chart(df_zonnestroom_2019).mark_line().encode(
    x=alt.X("Perioden", bin=False, title='Year'),
    y=alt.Y(alt.repeat('layer'), aggregate='mean', title="Installed pV - number vs. power"),
    color=alt.ColorDatum(alt.repeat('layer'))
).repeat(layer=["AantalInstallaties_1", "OpgesteldVermogenVanZonnepanelen_2"])

### Comparing municipalities

In [42]:
chart = alt.Chart(df_zonnestroom_2019).mark_point().encode(
    alt.X(alt.repeat("column"), type='ordinal', title='Year'),
    alt.Y(alt.repeat("row"), type='quantitative'),
    color='RegioS:N'
).properties(
    width=375,
    height=250
).repeat(
    column=['Perioden'],
    row=['AantalInstallaties_1', 'OpgesteldVermogenVanZonnepanelen_2'],
    
).interactive()
chart

### Compare the CBS data with the Enexis data

CBS data goes from 2012 to 2019 at the end of the year, and the Enexis data is from 1-1-2020.

Load the enexis 2020 data:

In [7]:
import pandas as pd

In [8]:
decentral_generation_012020 = '../data/Enexis_decentrale_opwek_kv_(zon_pv)_01012020.csv'
df_decentral_generation = pd.read_csv(decentral_generation_012020,
                         sep                = ';',
                         decimal            = ',',
                         thousands          = '.',
                         encoding           = 'unicode_escape')        

Keep only data from selected municipalities and get the total number of pv connections there:

In [9]:
df_decentral_generation = df_decentral_generation[    
    (df_decentral_generation['Gemeente']=='Arnhem') |
    (df_decentral_generation['Gemeente']=='Best') |
    (df_decentral_generation['Gemeente']=="'s-Hertogenbosch") |
    (df_decentral_generation['Gemeente']=='Loon op Zand') 
]

totalNumberOfPVConnections = df_decentral_generation.groupby('Gemeente')['Aantal aansluitingen met opwekinstallatie'].sum()

totalNumberOfPVConnections


Gemeente
's-Hertogenbosch    6332.0
Best                2478.0
Loon op Zand        1145.0
Name: Aantal aansluitingen met opwekinstallatie, dtype: float64

Note that 'Arnhem' is not part of the Enexis data. This is because it is not in their servicing area.

In [10]:
df_decentral_generation.tail(50)

Unnamed: 0,ï»¿Peildatum,Netbeheerder,Provincie,Gemeente,CBS Buurt,CBS Buurtcode,Aantal aansluitingen in CBS-buurt,Aantal aansluitingen met opwekinstallatie,Opgesteld vermogen
1507,202001.0,Enexis,Noord-Brabant,'s-Hertogenbosch,Lokeren,7961004.0,669.0,53.0,166.0
1508,202001.0,Enexis,Noord-Brabant,'s-Hertogenbosch,Maasstroom,7961005.0,901.0,112.0,275.0
1509,202001.0,Enexis,Noord-Brabant,'s-Hertogenbosch,De Staatsliedenbuurt,7961006.0,779.0,88.0,295.0
1510,202001.0,Enexis,Noord-Brabant,'s-Hertogenbosch,Het Zilverpark,7961007.0,827.0,105.0,405.0
1511,202001.0,Enexis,Noord-Brabant,'s-Hertogenbosch,Maasvallei,7961008.0,991.0,100.0,314.0
1512,202001.0,Enexis,Noord-Brabant,'s-Hertogenbosch,Maasoever,7961009.0,1165.0,143.0,563.0
1513,202001.0,Enexis,Noord-Brabant,'s-Hertogenbosch,Boschveld,7961101.0,1610.0,154.0,295.0
1514,202001.0,Enexis,Noord-Brabant,'s-Hertogenbosch,Paleiskwartier,7961102.0,2006.0,10.0,51.0
1515,202001.0,Enexis,Noord-Brabant,'s-Hertogenbosch,Deuteren,7961104.0,874.0,52.0,177.0
1516,202001.0,Enexis,Noord-Brabant,'s-Hertogenbosch,De Schutskamp,7961106.0,2579.0,98.0,331.0


Do the same for the generative power ('opgesteld vermogen'):

In [11]:
df_decentral_generation = df_decentral_generation[    
    (df_decentral_generation['Gemeente']=='Best') |
    (df_decentral_generation['Gemeente']=='Arnhem') |
    (df_decentral_generation['Gemeente']=="'s-Hertogenbosch") |
    (df_decentral_generation['Gemeente']=='Loon op Zand') 
]

totalNumberOfPVConnections = df_decentral_generation.groupby('Gemeente')['Opgesteld vermogen'].sum()

totalNumberOfPVConnections

Gemeente
's-Hertogenbosch    25406.0
Best                10147.0
Loon op Zand         5156.0
Name: Opgesteld vermogen, dtype: float64

Add the data to the CBS data and create one graph from it.

In [39]:
bestRow = {"ID":67890, "BedrijfstakkenWoningen":"Woningen",
 "RegioS":"Best (Enexis 1-1-2020)", "Perioden":"Enexis 1-1-2020",
 "AantalInstallaties_1":2478.0, "OpgesteldVermogenVanZonnepanelen_2":10147.0}

sHertogenboschRow = {"ID":67891, "BedrijfstakkenWoningen":"Woningen",
 "RegioS":"'s-Hertogenbosch (Enexis 1-1-2020)", "Perioden":"Enexis 1-1-2020",
 "AantalInstallaties_1":6332.0, "OpgesteldVermogenVanZonnepanelen_2":25406}

loonOpZandRow = {"ID":67891, "BedrijfstakkenWoningen":"Woningen",
 "RegioS":"Loon op Zand (Enexis 1-1-2020)", "Perioden":"Enexis 1-1-2020",
 "AantalInstallaties_1":1145.0, "OpgesteldVermogenVanZonnepanelen_2":5156.0}

df = df_zonnestroom_2019.append(bestRow, ignore_index=True)
df = df.append(sHertogenboschRow, ignore_index=True)
df = df.append(loonOpZandRow, ignore_index=True)

df.tail()


Unnamed: 0,ID,BedrijfstakkenWoningen,RegioS,Perioden,AantalInstallaties_1,OpgesteldVermogenVanZonnepanelen_2
30,4950,Woningen,Loon op Zand,2018,761.0,2669.0
31,4951,Woningen,Loon op Zand,2019,1042.0,3914.0
32,67890,Woningen,Best (Enexis 1-1-2020),Enexis 1-1-2020,2478.0,10147.0
33,67891,Woningen,'s-Hertogenbosch (Enexis 1-1-2020),Enexis 1-1-2020,6332.0,25406.0
34,67891,Woningen,Loon op Zand (Enexis 1-1-2020),Enexis 1-1-2020,1145.0,5156.0


In [43]:
combinedDataChart = alt.Chart(df).mark_point().encode(
    alt.X(alt.repeat("column"), type='ordinal', title='Year'),
    alt.Y(alt.repeat("row"), type='quantitative'),
    color='RegioS:N'
).properties(
    width=375,
    height=190
).repeat(
    column=['Perioden'],
    row=['AantalInstallaties_1', 'OpgesteldVermogenVanZonnepanelen_2'],
    
)
combinedDataChart

## Observations:

Comparing municipalities shows the same general form of the data, depending mainly on the size of the municipality.

The CBS zonnestroom data 2019 accounts for the data of all of 2019, up to 31-12-2019. The Enexis data is of 1-1-2020, so these values (even though they are spaced 'a year' apart, are about the same information. In the graphs there is no value for Arnhem on 1-1-2020, that is because it is not in the Enexis service area.

For the municipalities observed, we see close alignment between the CBS zonnestroom and Enexis data for the number of installations. For the installed power we see that the Enexis values are slightly higher.

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=d0604020-40e6-4d7d-a2ba-74ef2b385723' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>