## Introduction
### Hello everyone, this my analysis of Forest Fires In India Data. 
- I will make an Exploratory Data Analysis using the data provided by the dataset author.
- To visualize the results of EDA, I will use Plotly, it is a cool Python module that plot interactive and high quality graphics.

In [1]:
import pandas as pd ## for working with data.
import numpy as np ## for Linear Algebra.
import plotly.express as px ## Visualization
import plotly.graph_objects as go ## Again, Visualization
import matplotlib.pyplot as plt ## Again, Visualization
pd.set_option('display.max_rows',200)
import os ## data processing, CSV file I/O (e.g. pd.read_csv)
import warnings
warnings.filterwarnings('ignore') ## I hate warnings.

In [2]:
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

/kaggle/input/forest-fires-in-india/datafile.csv


In [3]:
path = os.path.join(dirname, filename)

In [4]:
df = pd.read_csv(path) 

In [5]:
## Let's see if the data has any null values or not.
df.isnull().sum() 

States/UTs    0
2010-2011     0
2009-10       0
2008-09       0
dtype: int64

In [6]:
## Let's take a look at the data.
df.head() 

Unnamed: 0,States/UTs,2010-2011,2009-10,2008-09
0,Andaman and Nicobar,0,7,1
1,Andhra Pradesh,1119,1837,2442
2,Arunachal Pradesh,485,576,786
3,Assam,1322,2511,1901
4,Bihar,81,397,143


#### I can create some new columns/attributes that will show percent change in number of fires recorded. Also, i'll change the shape of data in order to make it easy for me to work.

In [7]:
df['Percent_first'] = (df['2009-10']-df['2008-09'])/df['2008-09']
df['Percent_second'] = (df['2010-2011']-df['2009-10'])/df['2009-10']
df.fillna(0, inplace=True)

In [8]:
first = df[['2008-09', 'States/UTs', 'Percent_first', 'Percent_second']]
first.loc[:,'Year'] = '2008-09'
first.columns = ['Fires', 'States/UTs', 'Percent_first', 'Percent_second', 'Year']

second = df[['2009-10', 'States/UTs', 'Percent_first', 'Percent_second']]
second.loc[:,'Year'] = '2009-10'
second.columns = ['Fires', 'States/UTs', 'Percent_first', 'Percent_second', 'Year']

third = df[['2010-2011', 'States/UTs', 'Percent_first', 'Percent_second']]
third.loc[:,'Year'] = '2010-11'
third.columns = ['Fires', 'States/UTs', 'Percent_first', 'Percent_second', 'Year']


df1 = pd.concat([first,second,third])
del first,second,third

In [9]:
df1.head()  ## df1 will be our data that we'll use for analysis.

Unnamed: 0,Fires,States/UTs,Percent_first,Percent_second,Year
0,1,Andaman and Nicobar,6.0,-1.0,2008-09
1,2442,Andhra Pradesh,-0.247748,-0.390855,2008-09
2,786,Arunachal Pradesh,-0.267176,-0.157986,2008-09
3,1901,Assam,0.320884,-0.473517,2008-09
4,143,Bihar,1.776224,-0.79597,2008-09


## Exploratory Data Analysis

In [10]:
px.bar(df1, 'States/UTs', 'Fires', color='Year', title='Total Forest Fires by State')

In [11]:
px.line(df1, 'States/UTs', 'Fires', color='Year', title='Fires in States throughout Years')

#### From above graphs, we can see that :
- Mizoram, Chattisgarh and Madhya pradesh are the top three states with most forest fires.
- On the second Year(2009-10), reported forest fires have incresed.
- We don't see the same pattern in the third year, Fire counts have decresed in third year for all the states.

In [12]:
temp = df1.groupby(by='States/UTs')['Fires'].sum().sort_values().reset_index()
px.bar(temp.tail(), 'States/UTs', 'Fires', color='Fires', title = 'states with most recorded forest fires')

In [13]:
temp.head() ## All these states have zero forest fires reported so we can't plot them.

Unnamed: 0,States/UTs,Fires
0,Puducherry,0
1,Lakshadweep,0
2,Chandigarh,0
3,Dadra and Nagar haveli,0
4,Daman and Diu,0


### Now, Let's see states with most forest fires in each year.

In [14]:
temp = df1[df1['Year']=='2008-09'].sort_values(by='Fires')
px.bar(temp.tail(), 'States/UTs', 'Fires', title = 'Year : 2008-09')

In [15]:
temp = df1[df1['Year']=='2009-10'].sort_values(by='Fires')
px.bar(temp.tail(), 'States/UTs', 'Fires', title = 'Year : 2009-10')

In [16]:
temp = df1[df1['Year']=='2010-11'].sort_values(by='Fires')
px.bar(temp.tail(), 'States/UTs', 'Fires', title = 'Year : 2010-11')

### Total fires in each year.

In [17]:
temp = df1.groupby(by='Year')['Fires'].sum()
fig = go.Figure(data=[go.Pie(labels=temp.index, values=temp.values)])
fig.update_traces(marker=dict(line=dict(color='#000000', width=4)))
fig.show()

### Forest Fire change percent.

In [18]:
temp=df1.sort_values(by='Percent_first')
px.bar(temp, 'States/UTs', 'Percent_first', color='Percent_first', title = 'First Year')

In [19]:
temp=df1.sort_values(by='Percent_second')
px.bar(temp, 'States/UTs', 'Percent_second', color='Percent_second', title = 'Second Year')