### Abstract

_Abstract overview of the the kernel_

Fires are a serious problem in Brazil. As stated under the Dataset description, "Understanding the frequency of forest fires in a time series can help to take action to prevent them". Being able to pin-point where and when that frequency is most observed should give some clarity on what is the scope we are looking at.

### Data Source 

_Describing the origin of the data sources. What is the format of the original data? How to access the data?_ 

This is a small dataset we are presented with, it has around 6,500 observations and 5 features; it is a mix between categorical and numeric values.

### Acquiring and Loading Data

_Presenting the code and methods for acquiring the data. Loading the data into appropriate format for analysis. Explaining the process and results_

In [None]:
#first let's import all necessery libraries for this analysis
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
import matplotlib.pyplot as plt

In [None]:
#using pandas library and 'read_csv' function to read amazon csv file as file already formated for us from Kaggle
amazon_df=pd.read_csv('../input/forest-fires-in-brazil/amazon.csv', encoding='latin1')
#examining head of the dataset
amazon_df.head(10)

In [None]:
amazon_df.describe(include="all")

### Understanding Data

_Going over features presented in the dataset for analysis. Explaining the process and the results_

In [None]:
#checking the length of the dataset
len(amazon_df)

In [None]:
#checking if there are any nulls we are dealing with (missing data)
amazon_df.isna().sum()

In [None]:
#cheking unique values in the state column
amazon_df.state.unique()


In [None]:
#checking unique values in the month column
amazon_df.month.unique()

Here, it is important to note that months presented to us are not in English. Therefore, to make it easier for my analysis and for viewers - it would be helpful to change months into English format.

In [None]:
#creating a dictionary with translations of months
month_map={'Janeiro': 'January', 'Fevereiro': 'February', 'Março': 'March', 'Abril': 'April', 'Maio': 'May',
          'Junho': 'June', 'Julho': 'July', 'Agosto': 'August', 'Setembro': 'September', 'Outubro': 'October',
          'Novembro': 'November', 'Dezembro': 'December'}
#mapping our translated months
amazon_df['month']=amazon_df['month'].map(month_map)
#checking the month column for the second time after the changes were made
amazon_df.month.unique()

Here, we can already point interesting observation that 50% percentile from all observations (across all months, years and regions) sums up to 24 fire reports.

In [None]:
#cheking the numeric percentile distribution for the fires reported
amazon_df.number.describe()

In [None]:
#chekcing how many fires were reported in 20 years 
amazon_df.number.sum()

In [None]:
amazon_df.number.plot(kind="box")

In [None]:
#we are already given the year column, however for good practice we can also extract it from the date one
amazon_df['Year']=pd.DatetimeIndex(amazon_df['date']).year
#cheking unique years in new created column 
amazon_df.Year.unique()

### Exploring and Visualizing Data

_Exploring the data by analyzing its statistics and visualizing the values of features and correlations between different features. Explaining the process and the results_

In [None]:
#we are not going to be using old year column and date column as they serve no significant purpose anymore 
amazon_df.drop(columns=['date', 'year'], axis=1, inplace=True)
#changing order of columns for preffered format
amazon_df=amazon_df[['state','number','month','Year']]
#changing names of columns for preffered format
amazon_df.rename(columns={'state': 'State', 'number': 'Fire_Number', 'month': 'Month'}, inplace=True)
#checking changes made
amazon_df.head()

First, it will be interesting to look at the trend of fires beings reported over 20 years.

In [None]:
#creating a list of years we have 
years=list(amazon_df.Year.unique())
#creating an empty list, which will be populated later with amount of fires reported
sub_fires_per_year=[]
#using for loop to extract sum of fires reported for each year and append list above
for i in years:
    y=amazon_df.loc[amazon_df['Year']==i].Fire_Number.sum().round(0)
    sub_fires_per_year.append(y)
#creating a dictionary with results     
fire_year_dic={'Year':years,'Total_Fires':sub_fires_per_year}
#creating a new sub dataframe for later plot 
time_plot_1_df=pd.DataFrame(fire_year_dic)
#checking the dataframe
time_plot_1_df.head(5)

In [None]:
#using plotly Scatter 
time_plot_1=go.Figure(go.Scatter(x=time_plot_1_df.Year, y=time_plot_1_df.Total_Fires,
                                 mode='lines+markers', line={'color': 'red'}))
#layout changes
time_plot_1.update_layout(title='Brazil Fires per 1998-2017 Years',
                   xaxis_title='Year',
                   yaxis_title='Fires')
#showing the figure
time_plot_1.show()

Very interesting! Take a moment to hover over the graph to explore the dynamic features of Plotly. Unfortunately, we can definetly see a growth of fires reported throughout 20 years with couple ups and downs. However, wee can look deeper to understand what regions (states) contribute the most and perhaps generate those spikes and when those reports are most likely to be at its highest - therefore, let's keep looking!

In [None]:
#With idea to look deeper, it will require a bit more prep-work 

#putting all available states in the list
states=list(amazon_df.State.unique())
#creating empty list for each state that will be later appended
acre_list=[]
alagoas_list=[] 
amapa_list=[] 
amazonas_list=[] 
bahia_list=[] 
ceara_list=[]
distrito_list=[] 
espirito_list=[] 
goias_list=[] 
maranhao_list=[] 
mato_list=[] 
minas_list=[]
para_list=[] 
paraiba_list=[] 
perna_list=[]
piau_list=[]
rio_list=[]
rondonia_list=[]
roraima_list=[]
santa_list=[]
sao_list=[]
sergipe_list=[]
tocantins_list=[]

In [None]:
#It get's interesting here

#breaking down fires reported for each state throughtout 20 years and appending empty lists
for x in states:
    st=x
    for i in years:
        ye=i
        if st=='Acre':
            y=amazon_df.loc[(amazon_df['State']== st) & (amazon_df['Year']== ye)].Fire_Number.sum().round(0)
            acre_list.append(y)
        elif st=='Alagoas':
            y=amazon_df.loc[(amazon_df['State']== st) & (amazon_df['Year']== ye)].Fire_Number.sum().round(0)
            alagoas_list.append(y)
        elif st=='Amazonas':
            y=amazon_df.loc[(amazon_df['State']== st) & (amazon_df['Year']== ye)].Fire_Number.sum().round(0)
            amazonas_list.append(y)
        elif st=='Amapa':
            y=amazon_df.loc[(amazon_df['State']== st) & (amazon_df['Year']== ye)].Fire_Number.sum().round(0)
            amapa_list.append(y)
        elif st=='Bahia':
            y=amazon_df.loc[(amazon_df['State']== st) & (amazon_df['Year']== ye)].Fire_Number.sum().round(0)
            bahia_list.append(y)
        elif st=='Ceara':
            y=amazon_df.loc[(amazon_df['State']== st) & (amazon_df['Year']== ye)].Fire_Number.sum().round(0)
            ceara_list.append(y)
        elif st=='Distrito Federal':
            y=amazon_df.loc[(amazon_df['State']== st) & (amazon_df['Year']== ye)].Fire_Number.sum().round(0)
            distrito_list.append(y)
        elif st=='Espirito Santo':
            y=amazon_df.loc[(amazon_df['State']== st) & (amazon_df['Year']== ye)].Fire_Number.sum().round(0)
            espirito_list.append(y)
        elif st=='Goias':
            y=amazon_df.loc[(amazon_df['State']== st) & (amazon_df['Year']== ye)].Fire_Number.sum().round(0)
            goias_list.append(y)
        elif st=='Maranhao':
            y=amazon_df.loc[(amazon_df['State']== st) & (amazon_df['Year']== ye)].Fire_Number.sum().round(0)
            maranhao_list.append(y)
        elif st=='Mato Grosso':
            y=amazon_df.loc[(amazon_df['State']== st) & (amazon_df['Year']== ye)].Fire_Number.sum().round(0)
            mato_list.append(y)
        elif st=='Minas Gerais':
            y=amazon_df.loc[(amazon_df['State']== st) & (amazon_df['Year']== ye)].Fire_Number.sum().round(0)
            minas_list.append(y)
        elif st=='Pará':
            y=amazon_df.loc[(amazon_df['State']== st) & (amazon_df['Year']== ye)].Fire_Number.sum().round(0)
            para_list.append(y)
        elif st=='Paraiba':
            y=amazon_df.loc[(amazon_df['State']== st) & (amazon_df['Year']== ye)].Fire_Number.sum().round(0)
            paraiba_list.append(y)
        elif st=='Pernambuco':
            y=amazon_df.loc[(amazon_df['State']== st) & (amazon_df['Year']== ye)].Fire_Number.sum().round(0)
            perna_list.append(y)
        elif st=='Piau':
            y=amazon_df.loc[(amazon_df['State']== st) & (amazon_df['Year']== ye)].Fire_Number.sum().round(0)
            piau_list.append(y)
        elif st=='Rio':
            y=amazon_df.loc[(amazon_df['State']== st) & (amazon_df['Year']== ye)].Fire_Number.sum().round(0)
            rio_list.append(y)
        elif st=='Rondonia':
            y=amazon_df.loc[(amazon_df['State']== st) & (amazon_df['Year']== ye)].Fire_Number.sum().round(0)
            rondonia_list.append(y)
        elif st=='Roraima':
            y=amazon_df.loc[(amazon_df['State']== st) & (amazon_df['Year']== ye)].Fire_Number.sum().round(0)
            roraima_list.append(y)
        elif st=='Santa Catarina':
            y=amazon_df.loc[(amazon_df['State']== st) & (amazon_df['Year']== ye)].Fire_Number.sum().round(0)
            santa_list.append(y)
        elif st=='Sao Paulo':
            y=amazon_df.loc[(amazon_df['State']== st) & (amazon_df['Year']== ye)].Fire_Number.sum().round(0)
            sao_list.append(y)
        elif st=='Sergipe':
            y=amazon_df.loc[(amazon_df['State']== st) & (amazon_df['Year']== ye)].Fire_Number.sum().round(0)
            sergipe_list.append(y)
        elif st=='Tocantins':
            y=amazon_df.loc[(amazon_df['State']== st) & (amazon_df['Year']== ye)].Fire_Number.sum().round(0)
            tocantins_list.append(y)

In [None]:
#with those lists populated, now creating a powerful dataframe
time_plot_2_df=pd.DataFrame(list(zip(years, acre_list, alagoas_list, amapa_list, amazonas_list,
                                     bahia_list, ceara_list, distrito_list, espirito_list,
                                     goias_list, maranhao_list, mato_list, minas_list, para_list,
                                     paraiba_list, perna_list, piau_list, rio_list, rondonia_list,
                                     roraima_list, santa_list, sao_list, sergipe_list, tocantins_list)),
                            columns =['Year', 'Acre', 'Alagoas', 'Amapa', 'Amazonas', 'Bahia', 'Ceara',
                                      'Distrito Federal', 'Espirito Santo', 'Goias', 'Maranhao',
                                      'Mato Grosso', 'Minas Gerais', 'Pará', 'Paraiba', 'Pernambuco',
                                      'Piau', 'Rio', 'Rondonia', 'Roraima', 'Santa Catarina',
                                      'Sao Paulo', 'Sergipe', 'Tocantins'])
#checking the dataframe
time_plot_2_df.head(10)

In [None]:
#examining top 10 states with the most fires reported (please igone the year observation, will be removed later)
time_plot_2_df.sum().nlargest(11)

Now, we know which states (top 10) are generating the most fire reports. Let's visualize those numbers to get even a better understanding!

In [None]:
#creating a dataframe for bar plot visualization
bar_plot_df=pd.DataFrame(time_plot_2_df.sum().nlargest(11))
#reseting index for first column
bar_plot_df=bar_plot_df.reset_index()
#renaming
bar_plot_df.rename(columns={'index':'State', 0:'Reported_Fires'}, inplace=True)
#removing Year observation
bar_plot_df.drop(bar_plot_df[bar_plot_df.State == 'Year'].index, inplace=True)
#checking dataframe
bar_plot_df

In [None]:
#making barplot
bar_plot=px.bar(bar_plot_df, x='State', y='Reported_Fires', color='Reported_Fires',
           labels={'Reported_Fires':'Count of reported fires ', 'State':'States'}, color_continuous_scale='Reds')
#making layout changes
bar_plot.update_layout(xaxis_tickangle=-45, title_text='Top 10 States for Amount of Reported Fires per 1998-2017 Years')
#outputing plot
bar_plot.show()

Take a moment to hover over the graph to explore the dynamic features of Plotly.

In [None]:
#preparing a figure that will be populated 
time_plot_2 = go.Figure()
#adding individual graphs to the figure
time_plot_2.add_trace(go.Scatter(x=time_plot_2_df.Year, y=time_plot_2_df['Mato Grosso'],
                                 mode='lines+markers', name='Mato Grosso', line={'color': 'red'}))
time_plot_2.add_trace(go.Scatter(x=time_plot_2_df.Year, y=time_plot_2_df['Paraiba'],
                                 mode='lines+markers', name='Paraiba', line={'color': 'yellow'}))
time_plot_2.add_trace(go.Scatter(x=time_plot_2_df.Year, y=time_plot_2_df['Sao Paulo'],
                                 mode='lines+markers', name='Sao Paulo', line={'color': 'green'}))
time_plot_2.add_trace(go.Scatter(x=time_plot_2_df.Year, y=time_plot_2_df['Rio'],
                                 mode='lines+markers', name='Rio', line={'color': 'blue'}))
time_plot_2.add_trace(go.Scatter(x=time_plot_2_df.Year, y=time_plot_2_df['Bahia'],
                                 mode='lines+markers', name='Bahia', line={'color': 'pink'}))
time_plot_2.add_trace(go.Scatter(x=time_plot_2_df.Year, y=time_plot_2_df['Piau'],
                                 mode='lines+markers', name='Piau', line={'color': 'brown'}))
time_plot_2.add_trace(go.Scatter(x=time_plot_2_df.Year, y=time_plot_2_df['Goias'],
                                 mode='lines+markers', name='Goias', line={'color': 'grey'}))
time_plot_2.add_trace(go.Scatter(x=time_plot_2_df.Year, y=time_plot_2_df['Minas Gerais'],
                                 mode='lines+markers', name='Minas Gerais', line={'color': 'purple'}))
time_plot_2.add_trace(go.Scatter(x=time_plot_2_df.Year, y=time_plot_2_df['Tocantins'],
                                 mode='lines+markers', name='Tocantins', line={'color': 'orange'}))
time_plot_2.add_trace(go.Scatter(x=time_plot_2_df.Year, y=time_plot_2_df['Amazonas'],
                                 mode='lines+markers', name='Amazonas', line={'color': 'gold'}))
#making changes to layout
time_plot_2.update_layout(title='Brazil Fires in Top-10 (frequent) regions per 1998-2017 Years',
                   xaxis_title='Year',
                   yaxis_title='Fires')
#outputing plot
time_plot_2.show()

Amazing visualization! Take a moment to hover over the graph to explore the dynamic features of Plotly. Now, I believe this is one of the examples where plotly library shows how powerful and dynamic it is. Not only we are presented with a very powerful visuals, but hovering over data points we are able to see details of each; also, double clicking on whichever selected state (listed on the legend) you are able to jump in and analyse it individually; not to mention all the zoomin, lasso and many more features that puts plotly library ahead others. From here, examining each state couple interesting trends come out (Mato Grosso I'm looking at you).

In [None]:
#creating subdataframe for visualizing this states geographically
geo_plot_df=pd.DataFrame(time_plot_2_df.sum().nlargest(11))
#formatting new dataframe
geo_plot_df.rename(columns={0:'Count'}, inplace=True)
geo_plot_df.reset_index(inplace=True)
geo_plot_df.rename(columns={'index':'State'}, inplace=True)
geo_plot_df.drop(geo_plot_df.index[5], inplace=True)
#cheking new sub dataframe 
geo_plot_df

In [None]:
#taking my time and adding all coordinates (latitude and longitude) for this top 10 states
lat=[-16.350000, -22.15847, -23.533773, -22.908333, -11.409874, -21.5089, -16.328547,
     -19.841644, -21.175, -3.416843]
long=[-56.666668, -43.29321, -46.625290, -43.196388, -41.280857, -43.3228, -48.953403,
     -43.986511, -43.01778, -65.856064]
#adding new coordinates as columns to subdataframe above
geo_plot_df['Lat']=lat
geo_plot_df['Long']=long
#checking changes in subdataframe for geo visualization
geo_plot_df

In [None]:
#using scatter geo with above created subdataframe
fig = px.scatter_geo(data_frame=geo_plot_df, scope='south america',lat='Lat',lon='Long',
                     size='Count', color='State', projection='hammer')
fig.update_layout(
        title_text = '1998-2017 Top-10 States in Brazil with reported fires')
fig.show()

Clusters is first thing we can see here! Take a moment to hover over the graph to explore the dynamic features of Plotly. There are definetly some states that can be grouped into one for better visualizing practices; as well as combining Mato Grosso and Amazonas regions - which, according with geography may result in better read bubbles on the geographical plot - give it a shot and let me know in the comments.

In [None]:
#according to different sources, months from June - November are the hottes in Brazil

#isolating the hottest months by season
month_array_summer=['June','July','August']
month_array_fall=['September','October','November']
#leaving data only for hottest months
box_plot_df_summer=amazon_df.loc[amazon_df['Month'].isin(month_array_summer)]
box_plot_df_fall=amazon_df.loc[amazon_df['Month'].isin(month_array_fall)]
#visualizing reports
box_plot=go.Figure()

box_plot.add_trace(go.Box(y=box_plot_df_summer.Fire_Number, x=box_plot_df_summer.Month,
                          name='Summer', marker_color='#3D9970',
                          boxpoints='all', jitter=0.5, whiskerwidth=0.2,
                          marker_size=2,line_width=2))
box_plot.add_trace(go.Box(y=box_plot_df_fall.Fire_Number, x=box_plot_df_fall.Month,
                         name='Fall', marker_color='#FF851B',
                         boxpoints='all', jitter=0.5, whiskerwidth=0.2,
                          marker_size=2,line_width=2))

box_plot.update_layout(
        title_text = 'Distribution of Fire Reports from 1998-2017 in the hottest months')
box_plot.show()

Looks fantastic! Take a moment to hover over the graph to explore the dynamic features of Plotly. This box plot with additional distribution legends gives us some very interesting statistics on percentiles that we are able to hover over and take notes. July, October and November definetly stand out with their highest medians of reports.

### Conclusion

_Final thoughts on the dataset and Plotly_

Plotly was very fun to use with this dataset. With powerful and dynamic visualizations we discovered couple very interesting means. We found that there is unfortunately a positive trend on fire reports among this 20 years - which, only highlights all the issues and help needed for preserving tropical forests. We found that state like Mato Grosso is an extreme observation and combined with Amazonas region would really raise a red flag on how much frequency it generates; also, with the rest of the states how there is no decline, but a steady distribution of fire reports coming year after year! We imputed approximate coordinates for regions given and visualized it on the geographical scale to identify clusters of regions. Also, we looked at statistical distributions among hottest months in Brazil and were able to pin-point the ones with highest medians. Overall, this dataset could definitely have more features so that more information could be analyzed and correlations identified - which would result in doing powerful predictions and machine learning.

I wish for all readers to explore and analyze your data deeply for the best data-driven decisions that can help improve your strategies for projects, research works, businesses, etc. If you liked this post or want to start a discussion - upvote and leave and a comment!