# Tutorial

This notebook provides some guidance to download the data, set up the directories and launch the registry creation for a given departement. It assumes that the environment has been created. It is still under construction

## 1. Downloads

The data needed is the following : 

- Images from the IGN
- Topological data
- Geographical coordinates of the cities

Besides, we need a classification and a segmentation model


#### Image data

#### Topological data

#### Geographical coordinates of the cities

#### Model weights

<b> Include the RF model weights </b>

Once the models are downloaded, put them in the `model` directory. 



## 2. Setting up the data and model directories 

Once the data is downloaded, we need to specify in which directories these files are located. In the cell below, input the local directories. At the end, the `config.yml` file will be automatically edited.

In [1]:
# Paths

images_directory = ''
topological_data_directory = ''
geographical_coordinates = ''

model_dir = ''

## 3. Setting up the working directories and the parameters 

The pipeline is comprised of three parts : classification, segmentation and aggregation. 

- During classification : tile images are cut into thumbnails and images that contain a PV panel are stored in a dedicated folder
- During segmentation : images from the dedicated folder are segmented to delineate the PV panels. The segmentation masks are then converted as polygons and stored in a `geojson` file, located in the `data` folder
- During aggregation, PV panels characteristics are extracted. The extraction method is specified by the user.


### 3.1. Setting up the working directories

These directories are the directories in which the intermediary files and the final outputs will be stored. 

In [7]:
outputs_dir = 'data' # directory that stores the outputs of the model
aux_dir = 'aux' # directory that stores the auxiliary outputs used for inference
temp_dir = 'temp' # directory in which the temporary outputs are stored.

### 3.2. Choosing which parts of the pipeline we want to execute

First of all, we need to choose which parts of the pipeline we want to execute. Since we are lauching the pipeline for the first time, we run all parts.

In [8]:
run_classification = True
run_segmentation = True
run_aggregation = True

### 3.3. Setting up the parameters

Each part of the pipeline requires some parameters to be executed. 

#### Classification and segmentation

- `patch_size` : the size of the thumbnail that is passed into the classification and segmentation models
- `device` : the device on which inference (classification and segmentation) will be made


In [4]:
patch_size = 299 # Assuming one is using the joined pretrained models
device = 'cuda' # assuming you have a GPU. Else, replace with 'cpu'

#### Classification 

- `cls_batch_size` : the number of samples to be processed at the same time
- `cls_threshold` : the classification threshold (above = PV panel, below = no PV panel)
- `cls_model` : the name of the classification model, located in the `models_dir` directory.

In [5]:
cls_batch_size = 512 # Depends on your available VRAM
cls_threshold = 0.4 # assuming you're using the default model
cls_model = "model_bdappv_cls" # or replace with your own model name

#### Segmentation

- `seg_threshold` : the segmentation threshold
- `num_gpu` : the number of GPUs to be used. Depends on your infrastructure.
- `seg_batch_size` : the batch size for segmentation. Depends on your available VRAM
- `seg_model` : the name of the segmentation model in the `models_dir` directory.

In [6]:
seg_threshold : 0.46
num_gpu : 1
seg_batch_size : 128
seg_model : 'model_bdappv_seg'

#### Aggregation

- `filter_building` : whether the polygons should be matched with a building 
- `filter_LUT` : True : whether the tilt is inputed using the look up table
- `constant_kWp` : False : whether the installed capacity is estimated from the surface area using a linear regression model or not.


In [9]:
filter_building = True 
filter_LUT = True # update to replace with the random forest, which is located in the /models directory
constant_kWp = False

Finally, we edit the configuration file.

In [None]:
# Edit the config file

## 4. A first run

If you are doing inference on a departement for the first time, you <b> first need to run the `auxiliary.py` script </b>. This script generate the auxiliary data that is then used throughout the main pipeline. To run this script, enter the departement number and execute the cell below.

In [11]:
dpt = 69
# ./auxiliary.py --dpt=dpt

Once the auxiliary files have been generated, the main pipeline can be executed.

In [13]:
# ./main.py --dpt=dpt

This is it ! We mapped all installations for our target departement. The files generated are the following : 
- `arrays_{dpt}.geojson` : the file with all segmentation polygons obtained at the end of the segmentation stage.
- `arrays_characteristics_{dpt}.geojson` : a file with the polygons of all installations after filtering. The characteristics of the polygons are also reported
- `characteristics_{dpt}.csv` : a file where each row is an installation. Contains the characteristics and the localization of the installation.
- `aggregated_characterisitcs_{dpt}.csv` : aggregates the installed capacity and number of installations per city 


## 5. Monitoring the accuracy of the detections

Now that we've detected our installations, we want to measure the accuracy of the estimates. To do so, we first need to download the <i> registre national d'installations </i> (RNI). The RNI aggregates the installed capacity of installations below 36 kWc by city. When evaluating our outputs, we compare our estimates with this reference. 

#### Downloading the RNI

The RNI can be accessed here. Since in this example we are working with images released in 2020, we download the RNI for 2020. Other years can be accessed here (2017, 2018, 2019, 2021). 

#### Setting up the directory and running the evaluation script

Put the RNI in the `source_dir` and input its name in the cell below. Then, execute the cell to run the evaluation.

In [14]:
filename = 'RNI_2020.json'
source_dir = '' # enter your path to the RNI here.

# edit the configuration file

# run the evaluation
# ./evaluate.py --dpt=dpt --filename=filename --source_dir=source_dir

The outputs are located in the newly created `evaluation_dir` directory. If you want to visualize your outputs, use the `visualization.ipynb` notebook !