### Data understanding 03 - CBS installed pV
What does the data look like? 
We take a look at the development of the installed pV in time. What are the differences and what are the simularities between a set of selected municipalities (Den Bosch, Arnhem, Best and Loon op Zand)

In [1]:
# !pip install cbsodata
!pip install --upgrade pip
!pip install altair --upgrade

!pip install jupyter pandas vega
!pip install --upgrade notebook  # need jupyter_client >= 4.2 for sys-prefix below

Collecting pip




  Downloading pip-22.1-py3-none-any.whl (2.1 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 21.2.4
    Uninstalling pip-21.2.4:
      Successfully uninstalled pip-21.2.4
Successfully installed pip-22.1




Collecting vega
  Downloading vega-3.6.0-py3-none-any.whl (1.3 MB)
     ---------------------------------------- 1.3/1.3 MB 2.8 MB/s eta 0:00:00
Installing collected packages: vega
Successfully installed vega-3.6.0


ERROR: Invalid requirement: '#'


In [2]:
import cbsodata
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as pl
import altair as alt

In [3]:
#Read in (Zonnestroom; vermogen bedrijven en woningen, regio (indeling 2019)
zonnestroom_2019 = '84783NED'
df_zonnestroom_2019 = pd.DataFrame(cbsodata.get_data(zonnestroom_2019))
df_zonnestroom_2019 = df_zonnestroom_2019[
    (
(df_zonnestroom_2019['RegioS'] == "'s-Hertogenbosch") |
(df_zonnestroom_2019['RegioS'] == "Loon op Zand") |
(df_zonnestroom_2019['RegioS'] == "Arnhem") |
(df_zonnestroom_2019['RegioS'] == "Best")
    )
    &  (df_zonnestroom_2019['BedrijfstakkenWoningen']=='Woningen')
   ]

df_zonnestroom_2019

Unnamed: 0,ID,BedrijfstakkenWoningen,RegioS,Perioden,AantalInstallaties_1,OpgesteldVermogenVanZonnepanelen_2
3672,3672,Woningen,Arnhem,2012,336.0,694.0
3673,3673,Woningen,Arnhem,2013,778.0,2005.0
3674,3674,Woningen,Arnhem,2014,1135.0,3124.0
3675,3675,Woningen,Arnhem,2015,1772.0,4619.0
3676,3676,Woningen,Arnhem,2016,2530.0,7197.0
3677,3677,Woningen,Arnhem,2017,3182.0,9199.0
3678,3678,Woningen,Arnhem,2018,4952.0,14754.0
3679,3679,Woningen,Arnhem,2019,6384.0,20182.0
3816,3816,Woningen,Best,2012,81.0,253.0
3817,3817,Woningen,Best,2013,290.0,1054.0


### Check if we're at version 4.2.0 so we can use the right graphs.

In [4]:
alt.__version__

'4.2.0'

In [5]:
alt.Chart(df_zonnestroom_2019).mark_line().encode(
    x=alt.X("Perioden", bin=False, title='Year'),
    y=alt.Y(alt.repeat('layer'), aggregate='mean', title="Installed pV - number vs. power"),
    color=alt.ColorDatum(alt.repeat('layer'))
).repeat(layer=["AantalInstallaties_1", "OpgesteldVermogenVanZonnepanelen_2"])

### Comparing municipalities

In [6]:
chart = alt.Chart(df_zonnestroom_2019).mark_point().encode(
    alt.X(alt.repeat("column"), type='ordinal', title='Year'),
    alt.Y(alt.repeat("row"), type='quantitative'),
    color='RegioS:N'
).properties(
    width=250,
    height=250
).repeat(
    column=['Perioden'],
    row=['AantalInstallaties_1', 'OpgesteldVermogenVanZonnepanelen_2'],
    
).interactive()
chart

### Compare the CBS data with the Enexis data

CBS data goes from 2012 to 2019, and the Enexis data is from 2020 and 2021. Comparing on the same date is therefore not possible. However, we can compare the 2019 CBS data with the 2020 Enexis data and see if there are things to notice.

Load the enexis 2020 data:

In [7]:
import pandas as pd

In [8]:
decentral_generation_012020 = '../data/Enexis_decentrale_opwek_kv_(zon_pv)_01012020.csv'
df_decentral_generation = pd.read_csv(decentral_generation_012020,
                         sep                = ';',
                         decimal            = ',',
                         thousands          = '.',
                         encoding           = 'unicode_escape')        

Keep only data from selected municipalities and get the total number of pv connections there:

In [9]:
df_decentral_generation = df_decentral_generation[    
    (df_decentral_generation['Gemeente']=='Arnhem') |
    (df_decentral_generation['Gemeente']=='Best') |
    (df_decentral_generation['Gemeente']=="'s-Hertogenbosch") |
    (df_decentral_generation['Gemeente']=='Loon op Zand') 
]

totalNumberOfPVConnections = df_decentral_generation.groupby('Gemeente')['Aantal aansluitingen met opwekinstallatie'].sum()

totalNumberOfPVConnections


Gemeente
's-Hertogenbosch    6332.0
Best                2478.0
Loon op Zand        1145.0
Name: Aantal aansluitingen met opwekinstallatie, dtype: float64

![Picture title](image-20220515-200203.png)

Note that 'Arnhem' is not part of the Enexis data. Possibly it is not in their servicing area.

Do the same for the generative power ('opgesteld vermogen'):

In [11]:
df_decentral_generation = df_decentral_generation[    
    (df_decentral_generation['Gemeente']=='Best') |
    (df_decentral_generation['Gemeente']=='Arnhem') |
    (df_decentral_generation['Gemeente']=="'s-Hertogenbosch") |
    (df_decentral_generation['Gemeente']=='Loon op Zand') 
]

totalNumberOfPVConnections = df_decentral_generation.groupby('Gemeente')['Opgesteld vermogen'].sum()

totalNumberOfPVConnections

Gemeente
's-Hertogenbosch    25406.0
Best                10147.0
Loon op Zand         5156.0
Name: Opgesteld vermogen, dtype: float64

Get the data we want and put it in a data frame. This is done by hand, since it is only for once.

In [12]:
data = {'Municipality': ['Best', "'s-Hertogenbosch", 'Loon op Zand'], 
'NumberOfInstallations': [2478.0, 6332.0, 1145.0],
'OpgesteldVermogenVanZonnepanelen': [10147.0, 25406.0, 5156.0]}
df = pd.DataFrame(data)

print(df)

       Municipality  NumberOfInstallations  OpgesteldVermogenVanZonnepanelen
0              Best                 2478.0                           10147.0
1  's-Hertogenbosch                 6332.0                           25406.0
2      Loon op Zand                 1145.0                            5156.0


## Conclusion:

The number of installations in 's-Hertogenbosch in 2020 (Enexis data) is still higher than that of the CBS data in 2019, but the increase seems too low. The years before the number increased from 4k to 6k, and the next year it goes from 6k to 6.3k. This is possible of course, but not likely.

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=d0604020-40e6-4d7d-a2ba-74ef2b385723' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>