# Hands-on

In this notebook, we provide the necessary steps to partially reproduce the results from our paper "DeepSolar tracker: towards unsupervised assessment with open-source data of the accuracy of deep learning-based distributed PV mapping". We run the pipeline over a small area of 600 km² over the Rhône <i> département </i>, covering 42 cities. 

This notebooks explains setup necessary to run the mapping algorithm. Reproducing the same steps, it is possible to run the algorithm over complete departements in order to completely reproduce our results and even expand them. To do so, we refer the reader to the IGN's [Geoservices portal](https://geoservices.ign.fr/) to download the complete orthoimages and topological data.

The data necessary to run this notebook can be downloaded from the [Zenodo repository](https://zenodo.org/record/6862675) associated with the paper. This repository is organized into the following folders:

- <b> WEIGHTS </b> : the model weights for the classification and segmentation branch
- <b> RNI_2020 </b> : the 2020 edition of the <i> registre national d'installations </i>, which is used to automatically assess the accuracy of the PV mapping. 
- <b> IGN_TOPO_2021_69 </b> : the topological data for the year 2021 for the Rhône <i> département </i>.
- <b> COMMUNES_2021 </b> : the shapefile of the French cities.
- <b> IGN_ORTHO_2020_69 </b> : a subset of the orthoimagery for the year 2020 for the Rhône, containing 24 images. These images are provided by the IGN under an open licence. The complete records can be accessed [here](https://geoservices.ign.fr/documentation/donnees/ortho/bdortho). The images are splitted into several folders. Make sure to merge these folders together before further processing.
- <b> LOOK_UP_TABLE </b> : The look-up table, used to infer the installations' tilt angle based on their location and surface. We refer the reader to the [paper](https://arxiv.org/abs/2207.07466) for more details on the construction of this table.


Please download the data, unzip the folders in a root_folder and specify its path and specify the path to the root folder. In the following, it is assumed that the structure is the following:

```python
source_dir # Root folder
    |
    - WEIGHTS # Folder containing the models weights
    - RNI_2020 # Folder containing the RNI
    - COMMUNES_2021 # Folder containing the shapes of the cities
    - IGN_ORTHO_2020_69 # The folder containing the sample of orthoimages
    - IGN_TOPO_2021_69 # The folder containing the topological data from 2021 for the Rhône
    - LOOK_UP_TABLE # The folder containing the look up table
```

In [None]:
root_folder = 'path/to/the/data/repository'

## 1. Setting up the data and model directories 

Once the data is downloaded, we need to specify in which directories these files are located. In the cell below, input the local directories. At the end, the `config.yml` file will be automatically edited.


In [None]:
import os

# Source data paths
images_directory = os.path.join(root_folder, "IGN_ORTHO_2020_69")
topological_data_directory =  os.path.join(root_folder, "IGN_TOPO_2021_69")
model_dir = os.path.join(root_folder, "WEIGHTS")
cities_directory = os.path.join(root_folder, "COMMUNES_2021")
look_up_table_dir = os.path.join(root_folder, "LOOK_UP_TABLE")

## 2. Setting up the working directories and the parameters 

The pipeline is comprised of three parts : classification, segmentation and aggregation. 

- During classification : tile images are cut into thumbnails and images that contain a PV panel are stored in a dedicated folder
- During segmentation : images from the dedicated folder are segmented to delineate the PV panels. The segmentation masks are then converted as polygons and stored in a `geojson` file, located in the `data` folder
- During aggregation, PV panels characteristics are extracted. The extraction method is specified by the user.


### 2.1. Setting up the working directories

These directories are the directories in which the intermediary files and the final outputs will be stored. 

In [None]:
outputs_dir = '../data' # directory that stores the outputs of the model
aux_dir = '../aux' # directory that stores the auxiliary outputs used for inference
temp_dir = '../temp' # directory in which the temporary outputs are stored.

### 2.2. Choosing which parts of the pipeline to execute

First of all, we need to choose which parts of the pipeline we want to execute. Since we are lauching the pipeline for the first time, we run all parts.

In [None]:
run_classification = True
run_segmentation = True
run_aggregation = True

### 2.3. Setting up the parameters

Each part of the pipeline requires some parameters to be executed. 

#### Preprocessing

- `tiles_list` : The list of tiles that we want to process

By default, this attribute is set to `None`. In this case, the list of tiles to proceed is automatically created in order to map the complete <i> département </i>. In this tutorial, we focus on a subset of 24 tiles, so we input the list of tiles to be proceeded. These tiles are adjacent and cover 42 cities in the west of Lyon.  


In [None]:
tiles_list = [
    '69-2020-0815-6525-LA93-0M20-E080', '69-2020-0810-6535-LA93-0M20-E080', '69-2020-0830-6535-LA93-0M20-E080', 
    '69-2020-0825-6525-LA93-0M20-E080', '69-2020-0820-6535-LA93-0M20-E080', '69-2020-0835-6525-LA93-0M20-E080', 
    '69-2020-0810-6530-LA93-0M20-E080', '69-2020-0815-6520-LA93-0M20-E080', '69-2020-0820-6530-LA93-0M20-E080', 
    '69-2020-0835-6520-LA93-0M20-E080', '69-2020-0830-6530-LA93-0M20-E080', '69-2020-0825-6520-LA93-0M20-E080', 
    '69-2020-0820-6525-LA93-0M20-E080', '69-2020-0835-6535-LA93-0M20-E080', '69-2020-0830-6525-LA93-0M20-E080', 
    '69-2020-0825-6535-LA93-0M20-E080', '69-2020-0810-6525-LA93-0M20-E080', '69-2020-0815-6535-LA93-0M20-E080', 
    '69-2020-0830-6520-LA93-0M20-E080', '69-2020-0825-6530-LA93-0M20-E080', '69-2020-0820-6520-LA93-0M20-E080', 
    '69-2020-0835-6530-LA93-0M20-E080', '69-2020-0815-6530-LA93-0M20-E080', '69-2020-0810-6520-LA93-0M20-E080'
]

If you want to run the detection over a complete departement, download a complete archive of a <i> département </i> on the IGN website, specify the path to this archive and set `tiles_list` to `None` in the `config.yml` file.

#### Classification and segmentation

- `patch_size` : the size of the thumbnail that is passed into the classification and segmentation models
- `device` : the device on which inference (classification and segmentation) will be made


In [None]:
patch_size = 299 # Assuming one is using the joined pretrained models
device = 'cuda' # assuming you have a GPU. Else, replace with 'cpu'

#### Classification 

- `cls_batch_size` : the number of samples to be processed at the same time
- `cls_threshold` : the classification threshold (above = PV panel, below = no PV panel)
- `cls_model` : the name of the classification model, located in the `models_dir` directory.

In [None]:
cls_batch_size = 512 # Depends on your available VRAM
cls_threshold = 0.4 # assuming you're using the default model
cls_model = "model_bdappv_cls" # or replace with your own model name

#### Segmentation

- `seg_threshold` : the segmentation threshold
- `num_gpu` : the number of GPUs to be used. Depends on your infrastructure.
- `seg_batch_size` : the batch size for segmentation. Depends on your available VRAM
- `seg_model` : the name of the segmentation model in the `models_dir` directory.

In [None]:
seg_threshold = 0.46
num_gpu = 1
seg_batch_size = 128
seg_model = 'model_bdappv_seg'

#### Aggregation

- `filter_building` : whether the polygons should be matched with a building 
- `filter_LUT` : True : whether the tilt is inputed using the look up table
- `constant_kWp` : False : whether the installed capacity is estimated from the surface area using a linear regression model or not.


In [None]:
filter_building = True 
filter_LUT = True
constant_kWp = False

Finally, we edit the configuration file.

In [None]:
import yaml

# Edit the config file

with open("../config.yml") as f:
     config = yaml.safe_load(f)
        
        
# edit the values

config['source_images_dir'] = images_directory
config['model_dir'] = model_dir
config['source_commune_dir'] = cities_directory
config['source_topo_dir'] = topological_data_directory
config['look_up_table_dir'] = look_up_table_dir

config['temp_dir'] = temp_dir
config['aux_dir'] = aux_dir
config['outputs_dir'] = outputs_dir

config['run_classification'] = run_classification
config['run_segmentation'] = run_segmentation
config['run_aggregation'] = run_aggregation

config['tiles_list'] = tiles_list

config['patch_size'] = patch_size
config['device'] = device

config['cls_threshold'] = cls_threshold
config['cls_model'] = cls_model
config['cls_batch_size'] = cls_batch_size

config['seg_threshold'] = seg_threshold 
config['seg_batch_size'] = seg_batch_size
config['seg_model'] = seg_model
config['num_gpu'] = num_gpu

config['filter_building'] = filter_building
config['filter_LUT'] =  filter_LUT
config['constant_kWp'] = constant_kWp

# save the config file
# the configuration file is saved in the notebook folder. The original config file, which
# is used if the script is run from the command line, remains unedited.
with open("config.yml", "w") as f:
    yaml.dump(config, f)

## 3. A first run

If you are doing inference on a <i> département </i> for the first time, you <b> first need to run the `auxiliary.py` script </b>. This script generate the auxiliary data that is then used throughout the main pipeline. To run this script, enter the <i> département </i> number and execute the cell below.

In [None]:
import sys

sys.path.append('../scripts/pipeline_components/')
sys.path.append('../scripts/src/')

In [None]:
%run ../auxiliary.py --dpt=69

Once the auxiliary files have been generated, the main pipeline can be executed.

In [None]:
%run ../main.py --dpt=69

This is it ! We mapped all installations on our target area. The files generated are the following : 
- `arrays_{dpt}.geojson` : the file with all segmentation polygons obtained at the end of the segmentation stage.
- `arrays_characteristics_{dpt}.geojson` : a file with the polygons of all installations after filtering. The characteristics of the polygons are also reported.
- `characteristics_{dpt}.csv` : a file where each row is an installation. Contains the characteristics and the localization of the installation.
- `aggregated_characterisitcs_{dpt}.csv` : aggregates the installed capacity and number of installations per city.


## 4. Assessing the accuracy of the detections

Now that we've detected our installations, we want to measure the accuracy of the estimates. To do so, we leverage the <i> registre national d'installations </i> (RNI). The RNI aggregates the installed capacity of installations below 36 kWc by city. When evaluating our outputs, we compare our estimates with this reference. 

#### Setting up the directory and running the evaluation script

The RNI is located in the directory `rni_dir`. We specify the complete path to the RNI and the name of the file, and run the evaluation.

In [None]:
rni_path = os.path.join(root_folder, "RNI_2020")
filename = 'RNI_2020.json'
rni_path # Display the source dir and copy/paste it as the argument `--source_dir` in the script below.

In [None]:
# run the evaluation
# we also manually input the `evaluatoin dir` where the results of the evaluation will be stored.
%run ../evaluate.py --dpt=69 --filename='RNI_2020.json' --source_dir='/paste/the/directory/here' --evaluation_dir='../evaluation'

The outputs are located in the newly created `evaluation_dir` directory. If you want to visualize your outputs, go to the `visualization.ipynb` notebook !

Additionally, you can remove the `config.yml` file which has been used for this process.

In [None]:
os.remove('config.yml')