# Amazon Fires 


## Context of the problem


Forest fires are a serious problem for the preservation of the Tropical Forests. Brazil has the largest rainforest on the planet, and the Amazon Forest is the most important. So, understanding the frequency and periodicity of the fires in the Amazon Forest can help the govermental entities to addopt new conservation polities to prevent fires and protect the Forests.

## Definition of the problem

The objective of this work is understanding the frequency of the fires in Amazon Forest througth the exploratory data analysis, using seaborn and matplolib libraries.


## Tasks for Data Analysis

1. Data Import
2. Data Cleaning
3. Exploratory Data Analysis
4. Interpretation of Results

### Setups

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.pyplot import MaxNLocator, FuncFormatter
import calendar
import folium
from folium import plugins

### 1. Data Import

In [None]:
filepath = "../input/forest-fires-in-brazil/amazon.csv"

amazon_dataframe = pd.read_csv(filepath, encoding = "latin1")

columns = amazon_dataframe.columns

shape = amazon_dataframe.shape

print(amazon_dataframe.head(10), "\n")
print("Number of rows: ", shape[0], "\n")
print("Number of columns: ",  shape[1], "\n")
print("Columns of the dataframe: ", columns)

### 2. Data Cleaning

#### Information about data types:

In [None]:
amazon_dataframe.info()

#### Renaming name of month to english:

In [None]:
months_renaming = {"Janeiro": "January",
                  "Fevereiro": "February",
                  "Março": "March",
                  "Abril": "April",
                  "Maio": "May",
                  "Junho": "June",
                  "Julho": "July",
                  "Agosto": "August",
                  "Setembro": "September",
                  "Outubro": "October",
                  "Novembro": "November",
                  "Dezembro": "December"}

amazon_dataframe = amazon_dataframe.replace(months_renaming)

amazon_dataframe.head()

#### Dropping Columns

In [None]:
amazon_dataframe["date"] = pd.to_datetime(amazon_dataframe["date"])

amazon_dataframe.date.dtype


#### Dropping the duplicates rows:

In [None]:
number_duplicated_rows = amazon_dataframe.duplicated().sum()

print("Number of duplicated rows: ", number_duplicated_rows)

duplicated_rows = amazon_dataframe[amazon_dataframe.duplicated()]

print("Duplicated rows:\n ", duplicated_rows)


In [None]:
amazon_dataframe.drop_duplicates(inplace=True)

print("New dimensions of the dataset: ", amazon_dataframe.shape)

#### Dropping null values:

In [None]:
number_null_values = amazon_dataframe.isnull().sum()

print("Number of null values:\n", number_null_values)

### 3. Exploratory Data Analysis

#### Total of Fires

In [None]:
print("Total of fires registed: ", amazon_dataframe.shape[0])

#### Statistic Description

In [None]:
amazon_dataframe.describe()

#### Fires Distribution

In [None]:
plt.hist(amazon_dataframe["number"], bins=100, edgecolor ="k")
plt.xlabel("Number of fires")
plt.ylabel("Frequency")
plt.title("Fires Distribution")


#### Evolution of Fires in Brazil between 1998 and 2017


In [None]:
fires_year = amazon_dataframe.groupby(amazon_dataframe["year"]).count().number
print(fires_year)

In [None]:
plt.figure(figsize=(12,7))
plot = sns.lineplot(data=amazon_dataframe, x="year", y="number", markers=True)
plot.xaxis.set_major_locator(plt.MaxNLocator(19))
plot.set_xlim(1998, 2017)

#### Distribution of Fires by Month 

In [None]:
month_fires = amazon_dataframe.groupby(amazon_dataframe["month"]).number.count().reset_index()
month_fires.sort_values("number", ascending=False)
print(month_fires)


In [None]:
plt.style.use("ggplot")

month_fires.plot(x="month", y="number", kind="bar", figsize=(12,7), color="orange", alpha = 0.5)

plt.title("Distribution of fires by month")
plt.xlabel("Month", fontsize=16)
plt.ylabel("Number of fires", fontsize=16)

#### Distribution of Fires by Day of the Week

In [None]:
fires_weekday = amazon_dataframe.groupby(amazon_dataframe["date"].dt.dayofweek).count().date

fires_weekday.index = [calendar.day_name[x] for x in range(0,7)]
print(fires_weekday)

In [None]:
plt.style.use("ggplot")

fires_weekday.plot(kind="bar", figsize=(12,7), color="orange", alpha = 0.5)

plt.title("Distribution of fires by day of the week")
plt.xlabel("Day of the week", fontsize=16)
plt.ylabel("Number of fires", fontsize=16)

#### Distribution of Fires by State

In [None]:
fires_state = amazon_dataframe.groupby(amazon_dataframe["state"]).count().number
print(fires_state)

In [None]:
plt.style.use("ggplot")

fires_state.plot(kind="bar", figsize=(12,7), color="orange", alpha=0.5)

plt.title("Distribution of fires by state")
plt.xlabel("State", fontsize=16)
plt.ylabel("Number of fires", fontsize=16)

After the analysis of the distribution of the fires by year, month, day of the week and states, it was realized a detailded analysis of the distribution of the fires in the states with high number of fires.

So, it was chosen the Mato Grosso, Paraiba and Rio states.

#### Analysis of the Distibution of Fires in the Mato Grosso State

In [None]:
mato_grosso_dataframe = amazon_dataframe[amazon_dataframe["state"] == "Mato Grosso"]
print(mato_grosso_dataframe.head())

number_fires_mato_grosso = mato_grosso_dataframe.shape[0]
print("\nNumber of fires in Mato Grosso State: ", number_fires_mato_grosso)


In [None]:
year_fires_mato_grosso = mato_grosso_dataframe.groupby(mato_grosso_dataframe["year"]).count().number
print(year_fires_mato_grosso)

In [None]:
plt.style.use("ggplot")

year_fires_mato_grosso.plot(kind="bar", figsize=(12,7), color="black", alpha=0.5)

plt.title("Distribution of fires in Mato Grosso State by Year")
plt.xlabel("Year", fontsize=16)
plt.ylabel("Number of fires", fontsize=16)

In [None]:
month_fires_mato_grosso = mato_grosso_dataframe.groupby(mato_grosso_dataframe["month"]).count().number
print(month_fires_mato_grosso)

In [None]:
plt.style.use("ggplot")

month_fires_mato_grosso.plot(kind="bar", figsize=(12,7), color="pink", alpha=0.5)

plt.title("Distribution of fires in Mato Grosso State by Month")
plt.xlabel("Month", fontsize=16)
plt.ylabel("Number of fires", fontsize=16)

In [None]:
day_week_fires_mato_grosso = mato_grosso_dataframe.groupby(mato_grosso_dataframe["date"].dt.dayofweek).count().date
day_week_fires_mato_grosso.index = [calendar.day_name[x] for x in range(0,7)]
print(day_week_fires_mato_grosso)

In [None]:
plt.style.use("ggplot")

day_week_fires_mato_grosso.plot(kind="bar", figsize=(12,7), color="yellow", alpha=0.5)

plt.title("Distribution of fires in Mato Grosso State by Day of Week")
plt.xlabel("Day of Week", fontsize=16)
plt.ylabel("Number of fires", fontsize=16)

#### Analysis of the Distribution of Fires in Paraiba State

In [None]:
paraiba_fires_dataframe = amazon_dataframe[amazon_dataframe["state"] == "Paraiba"]
print(paraiba_fires_dataframe.head())

number_fires_paraiba = paraiba_fires_dataframe.shape[0]
print("Number of fires in Paraiba: ", number_fires_paraiba)

In [None]:
year_fires_paraiba = paraiba_fires_dataframe.groupby(paraiba_fires_dataframe["year"]).count().number
print(year_fires_paraiba)

In [None]:
plt.style.use("ggplot")

year_fires_paraiba.plot(kind="bar", figsize=(12,7), color="blue", alpha=0.5)

plt.title("Distribution of fires in Paraiba State by Year")
plt.xlabel("Year", fontsize=16)
plt.ylabel("Number of fires", fontsize=16)

In [None]:
month_fires_paraiba = paraiba_fires_dataframe.groupby(paraiba_fires_dataframe["month"]).count().number
print(month_fires_paraiba)

In [None]:
plt.style.use("ggplot")

month_fires_paraiba.plot(kind="bar", figsize=(12,7), color="red", alpha=0.5)

plt.title("Distribution of fires in Paraiba State by Month")
plt.xlabel("Month", fontsize=16)
plt.ylabel("Number of fires", fontsize=16)

In [None]:
day_week_fires_paraiba = paraiba_fires_dataframe.groupby(paraiba_fires_dataframe["date"].dt.dayofweek).count().date
day_week_fires_paraiba.index = [calendar.day_name[x] for x in range(0,7)]
print(day_week_fires_paraiba)


In [None]:
plt.style.use("ggplot")

day_week_fires_paraiba.plot(kind="bar", figsize=(12,7), color="gray", alpha=0.5)

plt.title("Distribution of fires in Paraiba State by Day of Week")
plt.xlabel("Day of Week", fontsize=16)
plt.ylabel("Number of fires", fontsize=16)

#### Analysis of the Distribution of Fires in Rio State

In [None]:
rio_fires_dataframe = amazon_dataframe[amazon_dataframe["state"] == "Rio"]
print(rio_fires_dataframe.head())

number_fires_rio = rio_fires_dataframe.shape[0]
print("Number of fires in Rio: ", number_fires_rio)

In [None]:
year_fires_rio = rio_fires_dataframe.groupby(rio_fires_dataframe["year"]).count().number
print(year_fires_rio)

In [None]:
plt.style.use("ggplot")

year_fires_rio.plot(kind="bar", figsize=(12,7), color="brown", alpha=0.5)

plt.title("Distribution of fires in Rio State by Year")
plt.xlabel("Year", fontsize=16)
plt.ylabel("Number of fires", fontsize=16)

In [None]:
month_fires_rio = rio_fires_dataframe.groupby(rio_fires_dataframe["month"]).count().number
print(month_fires_rio)

In [None]:
plt.style.use("ggplot")

month_fires_rio.plot(kind="bar", figsize=(12,7), color="blue", alpha=0.5)

plt.title("Distribution of fires in Rio State by Month")
plt.xlabel("Month", fontsize=16)
plt.ylabel("Number of fires", fontsize=16)

In [None]:
day_week_fires_rio = rio_fires_dataframe.groupby(rio_fires_dataframe["date"].dt.dayofweek).count().date
day_week_fires_rio.index = [calendar.day_name[x] for x in range(0,7)]
print(day_week_fires_rio)

In [None]:
plt.style.use("ggplot")

day_week_fires_rio.plot(kind="bar", figsize=(12,7), color="green", alpha=0.5)

plt.title("Distribution of fires in Rio State by Day of Week")
plt.xlabel("Day of Week", fontsize=16)
plt.ylabel("Number of fires", fontsize=16)

#### Fires Location Analysis

#### New columns: Latitude and Longitude

In [None]:
latitude={
    'Acre':-9.02,'Alagoas':-9.57,'Amapa':02.05,'Amazonas':-5.00,'Bahia':-12.00,'Ceara':-5.00,
          
    'Distrito Federal':-15.45,'Espirito Santo':-20.00,'Goias':-15.55,'Maranhao':-5.00,'Mato Grosso':-14.00
      
    ,'Minas Gerais':-18.50,'Pará':-3.20,'Paraiba':-7.00,'Pernambuco':-8.00,'Piau':-7.00,'Rio':-22.90,
          
    'Rondonia':-11.00,'Roraima':-2.00,'Santa Catarina':-27.25,'Sao Paulo':-23.32,'Sergipe':-10.30,
         
    'Tocantins':-10.00
    }


longitude={
    'Acre':-70.8120,'Alagoas':-36.7820,'Amapa':-50.50,'Amazonas':-65.00,'Bahia':-42.00,'Ceara':-40.00,
    
    'Distrito Federal':-47.45,'Espirito Santo':-40.45,'Goias':-50.10,'Maranhao':-46.00,'Mato Grosso':-55.00,
    
    'Minas Gerais':-46.00,'Pará':-52.00,'Paraiba':-36.00,'Pernambuco':-37.00,'Piau':-73.00, 'Rio':-43.17,
    
    'Rondonia':-63.00,'Roraima':-61.30,'Santa Catarina':-48.30,'Sao Paulo':-46.37,'Sergipe':-37.30,
    
    'Tocantins':-48.00
    }

amazon_dataframe["latitude"] = amazon_dataframe["state"].map(latitude)
amazon_dataframe["longitude"] = amazon_dataframe["state"].map(longitude)
amazon_dataframe.head()

#### Distribution of Fires between 1998-2017 by Geospatial Localization 

In [None]:

brasil_map = folium.Map(location=[-16.1237611, -59.9219642], zoom_start=3.5, tiles='Stamen Terrain')
brasil_map

fires = plugins.MarkerCluster().add_to(brasil_map) 

for latitude, longitude in zip(amazon_dataframe.latitude, amazon_dataframe.longitude):
    folium.Marker(
        location=[latitude, longitude],
        icon=None,
    ).add_to(fires)
    
brasil_map

### 4. Interpretation of Results

- The number of fire in the last 20 years increased, where Mato Grosso, Paraiba and Rio were the states with a high number of fires. 
- Most fires happened during the spring/summer months such as September, October, November, December, January, and February. However, the number of fires in the remaining months (autumn/winter) still high. The explanation for these observations is the [Tropical Climate in Brazil ](https://seasonsyear.com/Brazil) with temperatures of 17$^{\circ}$C - 27$^{\circ}$C and precipitation of 12mm-240mm 
- The days of the week with a high number of registered fires are tuesday, thursday and weekend.