## 📄 Summary of Algerian Forest Fire Dataset

The **Algerian Forest Fire Dataset** contains meteorological data collected from two different regions of Algeria: the **Bejaia region** and the **Sidi Bel-Abbes region**. The dataset spans from **June 2012 to September 2012** and includes various attributes that influence the likelihood of forest fires.

---

### 📊 Attributes

1. **Date** – The date of observation.
2. **Temperature** – Ambient temperature in Celsius (°C).
3. **RH** – Relative Humidity in percentage (%).
4. **Ws** – Wind speed in kilometers per hour (km/h).
5. **Rain** – Amount of rainfall in millimeters (mm).
6. **FFMC** – *Fine Fuel Moisture Code* (from the FWI system); represents moisture content of surface litter and fine fuels.
7. **DMC** – *Duff Moisture Code*; indicates moisture in moderately compact organic layers.
8. **DC** – *Drought Code*; represents long-term moisture in deep organic layers.
9. **ISI** – *Initial Spread Index*; estimates the expected spread rate of a fire.
10. **BUI** – *Buildup Index*; a combination of DMC and DC reflecting total fuel availability.
11. **FWI** – *Fire Weather Index*; indicates the intensity of a potential fire.
12. **Classes** – Target variable: `1` for fire occurrence, `0` for no fire.

---

### 📍 Regions Covered

- **Bejaia Region** – Located in the northeast of Algeria.
- **Sidi Bel-Abbes Region** – Located in the northwest of Algeria.

---

### 🔍 Usage

This dataset can be utilized for:

- Analyzing the impact of weather conditions on forest fire occurrences.
- Developing machine learning models to **predict fire risk**.
- Studying **seasonal fire trends** in different regions of Algeria.

---

### 🔗 Source

This dataset is publicly available from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/).


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

## Load Dataset

In [None]:
#load dataset
df = pd.read_csv("../dataset/Algerian_Forest_firesdataset.csv",header = 1)

In [None]:
df.head()

In [None]:
df.info()

## 1. Data Cleaning

In [None]:
# missing values
df.isnull().sum()

In [None]:
df[df.isnull().any(axis=1)]

- The data set is converted in two sets based on Region from 122th index.
- We make two columns:

  - **Bejaia Region Dataset**
  - **Sidi-Bel Abbes Region Dataset**

- **Note:** Add new column with `Region`.


In [None]:
df.loc[:122,"Region"] = 0
df.loc[122:,"Region"] = 1

In [None]:
df.info()

In [None]:
df.tail()

In [None]:
#convert Region to int
df[['Region']] = df[['Region']].astype(int)

In [None]:
df.head()

In [None]:
df.isnull().sum()

In [None]:
#remove null values
df = df.dropna().reset_index(drop=True)

In [None]:
df.head()

In [None]:
df.isnull().sum()

In [None]:
df.iloc[[122]]

In [None]:
#row 122 is not needed its just like a header so we will remove it
df = df.drop(122).reset_index(drop = True)

In [None]:
df.iloc[[122]]

In [None]:
df.columns

In [None]:
df.columns = df.columns.str.strip()

In [None]:
df.columns

In [None]:
df.info()

In [None]:
#from day to Ws column
#change the required columns as integer data type

df[['day', 'month','year','Temperature', 'RH', 'Ws']]=df[['day', 'month','year','Temperature', 'RH', 'Ws']].astype('int')

df.info()

In [None]:
## Changing the other columns to float data type
objects = [features for features in df.columns if df[features].dtypes =='O'] #get all objects types
objects

In [None]:
for i in objects:
    if i!='Classes':
        df[i]=df[i].astype(float)

In [None]:
df.info()

In [None]:
df.describe()

In [None]:
#convert the dataset into a csv file to save the cleaned dataset
df.to_csv("Algerian_Forest_firesdataset_Cleaned.csv",index = False)

## 2. EDA

In [None]:
#here we will drop day, month, year becuase we are predicting FWI and we dont need these columns
df_copy = df.drop(['day','month','year'],axis = 1)

In [None]:
df_copy.head()

In [None]:
# Encoding of categories in classes
df_copy['Classes'].value_counts()

In [None]:
df_copy['Classes'] = np.where(df_copy['Classes'].str.contains('not fire'),0,1)

In [None]:
df_copy.head()

In [None]:
df_copy['Classes'].value_counts()

## 3. Data Visualisation

In [None]:
#plot density plot for all features
plt.style.use('seaborn-v0_8')
df_copy.hist(bins=50, figsize=(20,15))
plt.show()

In [None]:
percentage = df_copy['Classes'].value_counts(normalize = True)*100

In [None]:
# Percentage for pie chart
labels = ['not fire', 'fire']
sizes = df_copy['Classes'].value_counts(sort = True)
colors = ["lightblue","red"]
explode = (0.1,0)  # explode 1st slice
plt.figure(figsize=(5,5))
plt.pie(sizes, explode=explode, labels=labels, colors=colors, autopct='%1.1f%%', shadow=True, startangle=140,)
plt.title('Percentage of Classes in the dataset')
plt.show()

In [None]:
df_copy.corr()

In [None]:
#plot correlation
plt.figure(figsize=(20,10))
sns.heatmap(df_copy.corr(), annot=True, cmap='coolwarm')
plt.show()

### This heatmap is indicating the FWI has a good correlation with other features

In [None]:
# plot pair plot
sns.pairplot(df_copy, hue='Classes')
plt.show()

In [None]:
#boxplot
plt.figure(figsize=(20,10))
sns.boxplot(data=df_copy, orient='h')
plt.xticks(rotation=90)
plt.show()

In [None]:
df['Classes'] = np.where(df['Classes'].str.contains('not fire'), 'not fire', 'fire')

In [None]:
#monthly fire analysis
df_temp = df.loc[df['Region']==0]
plt.subplots(figsize=(20,10))
sns.set_style('whitegrid')
sns.countplot(x='month', data=df, hue='Classes')
plt.xlabel('Month')
plt.ylabel('number of fire count')
plt.title('Fire analysis of Bejaia Region', weight='bold')

In [None]:
#monthly fire analysis
df_temp = df.loc[df['Region']==1]
plt.subplots(figsize=(20,10))
sns.set_style('whitegrid')
sns.countplot(x='month', data=df, hue='Classes')
plt.xlabel('Month')
plt.ylabel('number of fire count')
plt.title('Fire analysis of Sidi-Bell Region', weight='bold')

## 🔍 Summary of Analysis

### 🧹 Data Cleaning
- Removed null values and unnecessary rows.
- Converted columns to appropriate data types for analysis.

### 🌍 Region Segmentation
- The dataset was split into two subsets based on region:
  - **Bejaia Region**
  - **Sidi Bel-abbes Region**
- A new `Region` column was added for clarity.

### 📊 Exploratory Data Analysis (EDA)
- Encoded the `Classes` column to numerical values (`1` for fire, `0` for no fire).
- Performed extensive EDA to understand feature distributions and relationships.
- Created various visualizations:
  - Density plots
  - Pie charts
  - Heatmaps
  - Pair plots
  - Box plots

### 🔑 Key Findings

#### 🔗 Correlation Analysis
- **Fire Weather Index (FWI)** shows strong correlation with other features.
- Indicates its high importance in predicting forest fires.

#### 🔥 Class Distribution
- **137 fire occurrences**
- **106 non-fire occurrences**
- Shows a slight imbalance in the target variable.

#### 📆 Monthly Fire Trends
- Both regions exhibit more fire incidents during **summer months**.
- Suggests a strong **seasonal pattern** in forest fire activity.

### ✅ Conclusion
Meteorological factors such as **Temperature**, **Relative Humidity (RH)**, **Wind Speed (Ws)**, and **Rainfall** play a crucial role in forest fire occurrences.  
This dataset is suitable for building **predictive models** that can help in forecasting fires and implementing **preventive measures**.
