# RAMP Sarting kit on fire detection

## Table of Contents

- [Introduction](#introduction)
- [The Dataset](#the-dataset)
- [Preprocessing](#Preprocessing)
- [Requirements](#Requirements)
- [Data exploration](#data-exploration)
- [Base Model](#base-model)

## Introduction

The objective of this challenge is to predict the occurrence of wildfires based on meteorological data and information about the population in French municipalities.

## The Dataset

We have merged several datasets to gather diverse information regarding the occurrence of wildfires. The meteorological data is sourced from Météo France and has been merged with information on wildfire incidents in French municipalities. This merging process has resulted in a comprehensive database that allows us to analyze the occurrence of wildfires based on specific characteristics.

For this challenge, the data underwent preprocessing and was subsequently divided to maintain a private test set. This test set will be used to evaluate the models on our servers.

Please review and provide any additional details or clarifications if needed.


## Requirements

In [12]:
import geoviews as gv
import numpy as np
import pandas as pd
import geoviews.feature as gf
import xarray as xr
from cartopy import crs
from geoviews import dim
import geopandas as gpd
import matplotlib.pyplot as plt
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

ModuleNotFoundError: No module named 'geoviews'

## Preprocessing

Our database is composed of several merged datasets. For the simplicity of the challenge, we provide you with the already merged data, and we have dropped non-essential data in advance. Our dataset now contains only the most relevant features.

In [13]:
X_train = pd.read_csv('data/train/X_train.csv')
X_test = pd.read_csv('data/test/public/X_test_public.csv')

Y_train = pd.read_csv('data/train/Y_train.csv')
Y_test = pd.read_csv('data/test/public/Y_test_public.csv')

In [14]:
X_train = pd.get_dummies(X_train, columns = ['Date',
                                         'communes (name)',
                                         'communes (code)',
                                         'department (name)',
                                         'region (name)', 
                                         'Type Touristique',
                                         'Libellé typologie'
                                        ])

X_test = pd.get_dummies(X_test, columns = ['Date',
                                         'communes (name)',
                                         'communes (code)',
                                         'department (name)',
                                         'region (name)', 
                                         'Type Touristique',
                                         'Libellé typologie'
                                        ])

## Data exploration


In [15]:
X_train.columns


Index(['Direction du vent moyen 10 mn', 'Vitesse du vent moyen 10 mn',
       'Température', 'Humidité',
       'Précipitations dans les 24 dernières heures', 'Latitude', 'Longitude',
       'Altitude', 'Température minimale du sol sur 12 heures (en °C)',
       'region (code)',
       ...
       'region (name)_Saint-Barthélemy',
       'region (name)_Saint-Pierre-et-Miquelon',
       'region (name)_Terres australes et antarctiques françaises',
       'region (name)_Île-de-France', 'Type Touristique_Commune touristique',
       'Type Touristique_Station classée de tourisme',
       'Libellé typologie_Communes de densité intermédiaire',
       'Libellé typologie_Communes densément peuplées',
       'Libellé typologie_Communes peu denses',
       'Libellé typologie_Communes très peu denses'],
      dtype='object', length=358)

In [16]:
X_train.dtypes

Direction du vent moyen 10 mn                          float64
Vitesse du vent moyen 10 mn                            float64
Température                                            float64
Humidité                                               float64
Précipitations dans les 24 dernières heures            float64
                                                        ...   
Type Touristique_Station classée de tourisme              bool
Libellé typologie_Communes de densité intermédiaire       bool
Libellé typologie_Communes densément peuplées             bool
Libellé typologie_Communes peu denses                     bool
Libellé typologie_Communes très peu denses                bool
Length: 358, dtype: object

In [17]:
Y_train

Unnamed: 0,Incendie
0,0.0
1,0.0
2,0.0
3,0.0
4,0.0
...,...
17980,0.0
17981,0.0
17982,0.0
17983,1.0


## Base Model

In [None]:
model = XGBClassifier()
model.fit(X_train, Y_train)

clf = Pipeline(steps=[
    ('scaler', StandardScaler()),
    ('classifier', XGBClassifier())
])

clf.fit(X_train, Y_train)
y_pred = clf.predict(X_test)
print(accuracy_score(Y_test, y_pred))
