## Solar Potential
### Consolidating the collected information - Part 1

The Municipality of Copenhagen shares a lot of interesting datasets (covering a wide array of topics) with the public. <br>

Among them there is one with [solar rooftop potential for buildings in Copenhagen](https://data.kk.dk/dataset/soldata-3d-kobenhavn).

The author also made a very good note on the data, pointing out that it should basically be used 'with a grain of salt'. Below is the Google Translate version of why this is so:
    
"Data is indicative only and gives a clue as to whether it can be a good idea with solar cells. The map does not take into account all shadows. For example, shadows from trees, chimneys, antennae, smaller twigs and bay windows are not included. If the roof of your property is flat, the roof may also be more suitable than the map shows. At the same time, does the map not show how the solar cells fit the architecture of the building and the area? Therefore, it is usually a good idea to get help from a counselor to make a more detailed assessment of the possibilities."

Still, it is way better than having nothing and easier to process than the alternative of trying to extract similar data from [this live map with rooftop solar potential](http://kbhkort.kk.dk/spatialmap?&selectorgroups=themecontainer%20bygninger%20detaljer&mapext=702689.6%206165734.4%20747310.4%206186265.6&layers=theme-startkort%20theme-disclaimer%20theme-bymodel_bygning_solgrupper_40&mapheight=807&mapwidth=1748&profile=solanalyser). I would have rather worked with this more updated data, but time was simply too limited.

##### Data Format

The file is basically a 7GB archived JSON file with the following format:

In [2]:
{
  "name": "bymodel.SOLDATA_3D_KØBENHAVN",
  "type": "FeatureCollection",
  "crs": {
    "type": "name",
    "properties": {
      "name": "EPSG:25832"
    }
  },
  "features": [
    {
      "type": "Feature",
      "geometry": {
        "type": "Polygon",
        "coordinates": [
          [
            [
              728157.043,
              6176689.234,
              26.181
            ],
            [
              728157.321,
              6176689.84,
              26.217
            ],
            [
              728157.527,
              6176689.586,
              26.181
            ],
            [
              728157.043,
              6176689.234,
              26.181
            ]
          ]
        ]
      },
      "properties": {
        "id": 1,
        "old_id": 1,
        "dgn_id": 76,
        "elementid": 1739045,
        "exposedseconds": 13398300,
        "shadowedseconds": 3824100,
        "directinsolation": 405282.860341,
        "diffuseinsolation": 571313.153412,
        "byg_id": 655982500039710,
        "horizontal": 143.972626614887,
        "vertical": 6,
        "area": 0.0983159397859777,
        "samletsol": 976596.013753,
        "solgruppe1": 2,
        "solgruppe2": 2,
        "solgruppe3": 3
      }
    }
  ]
}

{'name': 'bymodel.SOLDATA_3D_KØBENHAVN',
 'type': 'FeatureCollection',
 'crs': {'type': 'name', 'properties': {'name': 'EPSG:25832'}},
 'features': [{'type': 'Feature',
   'geometry': {'type': 'Polygon',
    'coordinates': [[[728157.043, 6176689.234, 26.181],
      [728157.321, 6176689.84, 26.217],
      [728157.527, 6176689.586, 26.181],
      [728157.043, 6176689.234, 26.181]]]},
   'properties': {'id': 1,
    'old_id': 1,
    'dgn_id': 76,
    'elementid': 1739045,
    'exposedseconds': 13398300,
    'shadowedseconds': 3824100,
    'directinsolation': 405282.860341,
    'diffuseinsolation': 571313.153412,
    'byg_id': 655982500039710,
    'horizontal': 143.972626614887,
    'vertical': 6,
    'area': 0.0983159397859777,
    'samletsol': 976596.013753,
    'solgruppe1': 2,
    'solgruppe2': 2,
    'solgruppe3': 3}}]}

In [1]:
from datetime import datetime
import time

import os
import ast
import json

import utm

import pandas as pd
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
pd.set_option('display.max_colwidth', 500)

### Split the 7 GB file in smaller files (400k rows)

In [None]:
!cd /home/osboxes/courses/ibm/data && split -l 400000 soldata3dkoebenhavn.json solar_split

### Get the paths of the files that resulted in the split of the orrigial large JSON file  

In [2]:
mydir = os.path.join(os.getcwd(), 'data')
files = [os.path.join(mydir, f) for f in os.listdir(mydir) if os.path.isfile(os.path.join(mydir, f))]
files = [f for f in files if 'solar_split' in f]
files = sorted(files)

#### Clean up these files so that we're able to read them with Pandas

In [6]:
new_dir = os.path.join(os.getcwd(), 'data', 'solar')
for crt in range(len(files)): #
    f = files[crt]
    with open(files[crt]) as fh:
        lines = fh.readlines()
        if crt == 0:
            lines[-1] += ']}'  
        elif crt == len(files)-1:
            lines[0] = '[' + lines[0][1:] #add start list bracket, remove the heading comma
            lines[-2] = lines[-2] + ']'  #add end list bracket
            lines = lines[:-1] #remove ']}'
        else:
            lines[0] = '[' + lines[0][1:] #add start list bracket, remove the heading comma
            lines[-1] = lines[-1] + ']'  #add end list bracket
        
        
        crt_filename = f.split('/')[-1]
        new_f = os.path.join(new_dir, crt_filename)
        print(new_f)
        with open(new_f, 'a+') as new_fh:
            new_fh.writelines(lines)

/home/osboxes/courses/ibm/data/solar/solar_splitaa
/home/osboxes/courses/ibm/data/solar/solar_splitab
/home/osboxes/courses/ibm/data/solar/solar_splitac
/home/osboxes/courses/ibm/data/solar/solar_splitad
/home/osboxes/courses/ibm/data/solar/solar_splitae
/home/osboxes/courses/ibm/data/solar/solar_splitaf
/home/osboxes/courses/ibm/data/solar/solar_splitag
/home/osboxes/courses/ibm/data/solar/solar_splitah
/home/osboxes/courses/ibm/data/solar/solar_splitai
/home/osboxes/courses/ibm/data/solar/solar_splitaj
/home/osboxes/courses/ibm/data/solar/solar_splitak
/home/osboxes/courses/ibm/data/solar/solar_splital
/home/osboxes/courses/ibm/data/solar/solar_splitam
/home/osboxes/courses/ibm/data/solar/solar_splitan
/home/osboxes/courses/ibm/data/solar/solar_splitao
/home/osboxes/courses/ibm/data/solar/solar_splitap
/home/osboxes/courses/ibm/data/solar/solar_splitaq
/home/osboxes/courses/ibm/data/solar/solar_splitar
/home/osboxes/courses/ibm/data/solar/solar_splitas
/home/osboxes/courses/ibm/data/

### Keep only the features dictionary entries and store them in new files in CSV format 

#### Create the new storage location

In [7]:
new_dir = os.path.join(os.getcwd(), 'data', 'solar_clean')
if not os.path.exists(new_dir):
    os.makedirs(new_dir)
    print('Created {}'.format(new_dir))
else:
    print('Dir {} already exists'.format(new_dir))

Dir /home/osboxes/courses/ibm/data/solar_clean already exists


#### Extract the relevant information and store it for further processing

In [9]:
mydir = os.path.join(os.getcwd(), 'data', 'solar')
files = [os.path.join(mydir, f) for f in os.listdir(mydir) if os.path.isfile(os.path.join(mydir, f))]
files = [f for f in files if 'solar_split' in f]
files = sorted(files)

for i in range(len(files)):
    with open(files[i]) as f:
        data = json.load(f)
        if i>0:
            df = pd.DataFrame.from_dict(data)
        else:
            df = pd.DataFrame.from_dict(data['features'])
        print('File {} has {} rows and columns'.format(files[i], df.shape))
        new_f_name = files[i].split('/')[-1] + '.csv'
        new_file = os.path.join(new_dir, new_f_name)
        df.to_csv(new_file, index=False, sep=';')
        print('Written to CSV: {}'.format(new_file))

File /home/osboxes/courses/ibm/data/solar/solar_splitaa has (399997, 3) rows and columns
Written to CSV: /home/osboxes/courses/ibm/data/solar_clean/solar_splitaa.csv
File /home/osboxes/courses/ibm/data/solar/solar_splitab has (400000, 3) rows and columns
Written to CSV: /home/osboxes/courses/ibm/data/solar_clean/solar_splitab.csv
File /home/osboxes/courses/ibm/data/solar/solar_splitac has (400000, 3) rows and columns
Written to CSV: /home/osboxes/courses/ibm/data/solar_clean/solar_splitac.csv
File /home/osboxes/courses/ibm/data/solar/solar_splitad has (400000, 3) rows and columns
Written to CSV: /home/osboxes/courses/ibm/data/solar_clean/solar_splitad.csv
File /home/osboxes/courses/ibm/data/solar/solar_splitae has (400000, 3) rows and columns
Written to CSV: /home/osboxes/courses/ibm/data/solar_clean/solar_splitae.csv
File /home/osboxes/courses/ibm/data/solar/solar_splitaf has (400000, 3) rows and columns
Written to CSV: /home/osboxes/courses/ibm/data/solar_clean/solar_splitaf.csv
File

### Remove the old files

In [1]:
!rm -rf data/solar_split*

In [2]:
!rm -rf /home/osboxes/courses/ibm/data/solar