<a href="https://www.kaggle.com/code/mikedelong/fire-eda-with-bar-charts?scriptVersionId=149333044" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [1]:
import pandas as pd
df = pd.read_csv(
    filepath_or_buffer='/kaggle/input/global-fire-burned-area/GlobalFireBurnedArea_2022.csv',
    index_col=['ID'],
    parse_dates=['Initialdate', 'Finaldate'],
)
df['year'] = df['Initialdate'].dt.year
df['month'] = df['Initialdate'].dt.month
df.head()

Unnamed: 0_level_0,Initialdate,Finaldate,Area_ha,Area_m2,Area_Km2,CountryName,Continent,Region,year,month
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
25078590,2022-01-09,2022-02-06,50232.10763,502321076.3,502.321076,Ghana,Africa,Western Africa,2022,1
25079092,2022-01-11,2022-02-08,82380.29538,823802953.8,823.802954,Nigeria,Africa,Western Africa,2022,1
25079113,2022-01-11,2022-02-03,36851.12748,368511274.8,368.511275,Nigeria,Africa,Western Africa,2022,1
25083241,2022-01-03,2022-02-12,43303.63519,433036351.9,433.036352,Nigeria,Africa,Western Africa,2022,1
25095507,2022-01-01,2022-02-11,75753.14059,757531405.9,757.531406,Central African Republic,Africa,Middle Africa,2022,1


In [2]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 319278 entries, 25078590 to 26100041
Data columns (total 10 columns):
 #   Column       Non-Null Count   Dtype         
---  ------       --------------   -----         
 0   Initialdate  319278 non-null  datetime64[ns]
 1   Finaldate    319278 non-null  datetime64[ns]
 2   Area_ha      319278 non-null  float64       
 3   Area_m2      319278 non-null  float64       
 4   Area_Km2     319278 non-null  float64       
 5   CountryName  319278 non-null  object        
 6   Continent    319278 non-null  object        
 7   Region       319278 non-null  object        
 8   year         319278 non-null  int32         
 9   month        319278 non-null  int32         
dtypes: datetime64[ns](2), float64(3), int32(2), object(3)
memory usage: 24.4+ MB


It would be helpful to have some way to adjust this data according to the size of the countries.

Unfortunately three of our columns have essentially the same data (area in different units); so if we want to build a scatter plot we will need to use one of the dates.

In [3]:
from plotly.express import scatter
SAMPLE_FRACTION = 0.03
scatter(data_frame=df.sample(frac=SAMPLE_FRACTION, random_state=2023),
        x='Initialdate', y='Area_ha', color='Continent',
        hover_name='CountryName', log_y=True)

We really have more data than we can meaningfully plot; let's try looking at the largest fires by area.

In [4]:
scatter(data_frame=df[df['Area_ha'] > 10000],
        x='Initialdate', y='Area_ha', color='Continent',
        hover_name='CountryName', log_y=True)

Even if we just focus on fires above 10k ha it's hard to say which continent has more fires (although the answer is probably Africa). Let's try a bar plot.

In [5]:
from plotly.express import bar
bar(data_frame=df['Continent'].value_counts().to_frame().reset_index(), 
    x='Continent', y='count')

In [6]:
bar(data_frame=df[df['Area_ha'] > 10000]['Continent'].value_counts().to_frame().reset_index(), 
    x='Continent', y='count')

Africa dominates the number of total fires and the number of large fires. And it's not even close.

In [7]:
bar(data_frame=df[['Continent', 'month']].groupby(by=['Continent', 'month']).size().reset_index(),
   x='Continent', y=0, color='month')

We can sort of see how the fire season waxes and wanes in Africa in this plot.

In [8]:
bar(data_frame=df[['Continent', 'month']].groupby(by=['Continent', 'month']).size().reset_index(),
   facet_col='Continent', facet_col_wrap=1, x='month',
    y=0, color='month',).update_yaxes(matches=None)

The fire season peaks in different months in different parts of the world, but surprisingly it doesn't peak differently in different hemispheres.

In [9]:
bar(data_frame=df[['Continent', 'month', 'Area_ha']].groupby(by=['Continent', 'month']).sum().reset_index(),
   facet_col='Continent', facet_col_wrap=1,
    x='month',
    y='Area_ha', color='month',).update_yaxes(matches=None)

If we switch to plotting the total size we see a plot that is similar but not the same.

In [10]:
bar(df[['Region', 'Area_ha']].groupby(by='Region').mean().reset_index().sort_values(ascending=False, by='Area_ha'), 
    x='Region', y='Area_ha')

Different regions of the world have substantially different mean fire sizes.

In [11]:
from plotly.colors import qualitative
bar(df[['CountryName', 'Region', 'Area_ha']].groupby(by=['CountryName', 'Region']).mean().reset_index().sort_values(ascending=False, by='Area_ha'), 
    y='CountryName', x='Area_ha', color='Region', height=2400, 
    color_discrete_sequence=qualitative.Light24
   )

In [12]:
bar(df[['CountryName', 'Continent', 'Area_ha']].groupby(by=['CountryName', 'Continent']).mean().reset_index().sort_values(ascending=False, by='Area_ha'), 
    y='CountryName', x='Area_ha', color='Continent', height=2400,)

In [13]:
from plotly.express import strip
strip(df[['CountryName', 'Continent', 'Area_ha', 'Region']].groupby(by=['CountryName', 'Continent', 'Region']).mean().reset_index().sort_values(ascending=False, by='Area_ha'), 
    hover_name='CountryName', y='Area_ha', color='Continent', x='Region',  log_y=True)