This notebook here will be to try to get a good idea of whether I should use the MODIS or VIIRS data. I originally started out with a download of the [MODIS data](http://firemapper.sc.egov.usda.gov/gisdata.php) because it offered more data (it goes back to 2001, where VIIRS goes back to 2012). However, the VIIRS data is at a higher resolution, and in that sense it's at least worth taking a look at. 

I'm going to start off by looking at some basic statistics from each of the data sets for a given year - # of obs., # of fires, variable distribution. Then, I'll do some geographical plotting to compare how the distribution of fire/non-fire observations differs across the two data sets. A number of these functions are pulled from the `map_exploration.ipynb` notebook also stored in this folder. Normally, I would throw these into a folder in the name of DRY, but right now I'm doing EDA and I'm not too worried about it. 

In [4]:
import pandas as pd

In [7]:
def read_df(year, modis=True): 
    """This function will read in a year of data, and add a month column. 
    
    Args: 
        year: str
        modis: bool
            Whether to use the modis or viirs data for plotting. 
        
    Return:
        Pandas DataFrame
    """
    if modis: 
        output_df = pd.read_csv('../../../data/csvs/detected_fires_MODIS_' + str(year) + '.csv', 
                                parse_dates=['date'], true_values=['t'], false_values=['f'])
    else: 
         output_df = pd.read_csv('../../../data/csvs/detected_fires_VIIRS_' + str(year) + '.csv', 
                                parse_dates=['date'], true_values=['t'], false_values=['f'])
    output_df['month'] = output_df.date.apply(lambda dt: dt.strftime('%B'))
    output_df.dropna(subset=['region_name'], inplace=True) # These will be obs. in Canada. 
    return output_df

def calc_minor_stats(df): 
    """Calculate some minor, fixed stats for the inputted DataFrame. 
    
    Return the number of observations, the number of fires, and the 
    percentage of obs. that are a fire from the inputted DataFrame. 
    
    Args: 
        df: Pandas DataFrame
            Inputted DataFrame to grab some minor stats from. 
    
    Return: int (n_obs), int(n_fires), float (pct_fires)
    """
    
    n_obs = df.shape[0]
    n_fires = df.query('fire_bool == True').shape[0]
    pct_fires = float(n_fires) / n_obs
    
    return n_obs, n_fires, pct_fires

Year: 2012
--------------------------------------------------
Modis - Num. obs: 247286, Num. fires: 59731, Pct. fires: 0.241546225828
Year: 2013
--------------------------------------------------
Modis - Num. obs: 179295, Num. fires: 26997, Pct. fires: 0.150573077888
Year: 2014
--------------------------------------------------
Modis - Num. obs: 224904, Num. fires: 23932, Pct. fires: 0.106409845979
Year: 2015
--------------------------------------------------
Modis - Num. obs: 145010, Num. fires: 32873, Pct. fires: 0.22669471071
