# Analysis of forest fires in Brazil between 1998-2017

<img src="https://media1.tenor.com/images/9d82f04f3b28893bb2a51eb32c28d96f/tenor.gif" width="750" align="center">

### Context

The planet is getting hotter and drier, marked by the impacts of global warming. Such impacts are increasingly intensified by the destruction caused by forest fires (DENNISON et al., 2014). These events cause several problems, not only environmental, but also cause economic damage and represent a great danger to life. Understanding the frequency of forest fires over a period of time can help you take steps to avoid them.

### Data

This dataset reports the number of forest fires in Brazil divided by states. The series covers the period of approximately 10 years (1998 to 2017). Data can be found below:

- [dados.gov](http://dados.gov.br/dataset/sistema-nacional-de-informacoes-florestais-snif)

With these data, it is possible to assess the evolution of fires over the years, as well as the regions where they were concentrated.
Legal Amazon comprises the states of Acre, Amapá, Pará, Amazonas, Rondônia, Roraima and part of Mato Grosso, Tocantins and Maranhão.

###### Importing Libraries

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use("ggplot")

###### Analyzing the data

In [None]:
df = pd.read_csv('/kaggle/input/forest-fires-in-brazil/amazon.csv')
df.head()

In [None]:
df.info()

In [None]:
df['number'].describe()

###### Checking for missing values and duplicate values

In [None]:
df.isna().sum()

In [None]:
len(df[df.duplicated() == True])

In [None]:
df.drop_duplicates(inplace = True)

### Analysis by Year

In [None]:
df['year'].unique()

In [None]:
ax = df.groupby(['year'])['number'].sum()
ax.plot(kind = 'line', figsize=(16, 3), color = 'darkred')

plt.title("Number of Fires per year in Brazil")

With this graph, we can see that between 1998 and 2003 there was a great increase in fires and soon after between 2003 and 2008 there was a great fall in fires. Soon after we observe that the number of fires keeps increasing and decreasing. With that, we conclude that it is difficult to observe an increase in the number of fires, the numbers are still very high. Let's look at the number of fires per month.

### Analysis by Month

In [None]:
df['month'].unique()

In [None]:
df = df.replace('Mar�o', 'Março')

In [None]:
df['month'].value_counts()

In [None]:
ax = df.groupby(['month'])['number'].sum().reindex(['Janeiro','Fevereiro','Março','Abril','Maio','Julho','Julho','Agosto',
                                                           'Setembro','Outubro','Novembro','Dezembro'])
ax.plot(kind = 'bar', figsize=(18, 6), color = 'darkred')
plt.title("Number of Fires per month in Brazil between 1998 and 2017")
plt.xlabel('Months')
plt.ylabel('Fire Average')

Between the month of May and the month of November is considered the driest period in Brazil. That's why we can see that there is more fire in this period. But we are analyzing all states. Therefore, we cannot consider that this relatively dry period is the only factor in the increase in the number of fires. Let's look at the number of fires by state.

### Analysis by state

In [None]:
df['state'].unique()

In [None]:
df = df.replace('Par�', 'Pará')
df = df.replace('Rio', 'Rio de Janeiro')
df = df.replace('Piau', 'Piaui')

In [None]:
df['state'].value_counts()

In [None]:
ax = df.groupby(['state'])['number'].sum().sort_values(ascending = True)
ax.plot(kind = 'bar', figsize=(12, 8), color = 'darkred')
plt.xticks(rotation =90)
plt.title("Number of Fires per state in Brazil between 1998 and 2017")

We can observe that the state of Mato Grosso is the one with the highest number of fires in relation to other states, with Paraíba being the second state and São Paulo the third state. Let's look at the number of fires by regions.

### Analysis by regions

In [None]:
Region_state = {'Acre':'Norte', 'Amapa':'Norte', 'Amazonas':'Norte', 'Pará': 'Norte', 'Rondonia': 'Norte', 'Roraima': 'Norte',
                 'Tocantins': 'Norte', 'Alagoas': 'Nordeste', 'Bahia': 'Nordeste', 'Ceara': 'Nordeste', 'Maranhao': 'Nordeste',
                 'Paraiba': 'Nordeste', 'Pernambuco': 'Nordeste', 'Piaui': 'Nordeste', 'Sergipe': 'Nordeste', 
                 'Distrito Federal': 'Centro-Oeste', 'Goias': 'Centro-Oeste', 'Mato Grosso': 'Centro-Oeste',
                 'Rio de Janeiro': 'Sudeste', 'Sao Paulo': 'Sudeste', 'Minas Gerais': 'Sudeste', 'Espirito Santo': 'Sudeste',
                 'Sul': 'Santa Catarina'}

df['Region'] = df['state'].map(Region_state)

In [None]:
df['Region'].value_counts()

In [None]:
ax = df.groupby(['Region'])['number'].sum().sort_values(ascending = True)
ax.plot(kind = 'bar', figsize=(18, 6), color = 'darkred')
plt.title("Number of Fires by Region in Brazil between 1998 and 2017")
plt.xlabel('Region')
plt.ylabel('Fire average')

We can observe that the Northeast is the region with the highest number of fires, this is even expected because it is the driest region in relation to the others.

###### Yearly Analysis by Region

In [None]:
sns.set_style('whitegrid')
sns.relplot(x = 'year', y = 'number',
            data = df,
            kind = 'line',
            style = 'Region',
            hue ='Region',
            height = 8.27, 
            aspect = 20/10,
            ci = None,
            markers = True,
            dashes = False)
plt.xlabel('Year')
plt.ylabel('Fire Average')
plt.title('Fires in the Amazon per Year in the 5 Brazilian Regions between the years 1998 to 2017')

###### Monthly Analysis by Region

In [None]:
sns.set_style('whitegrid')
sns.relplot(x = 'month', y = 'number',
            data = df,
            kind = 'line',
            style = 'Region',
            hue ='Region',
            height = 8.27, 
            aspect = 20/10,
            ci = None,
            markers = True,
            dashes = False)
plt.xlabel('Month')
plt.ylabel('Fire Average')
plt.title('Fires in the Amazon per month in the 5 Brazilian Regions between the years 1998 to 2017')