This notebook will be aimed at exploring the distribution of fires/non-fires geographically. After building out some plotting functionality in another repository ([dsfuncs](https://github.com/sallamander/dsfuncs)), I'll be using it to plot fires. I'll start off by focusing on counties that have had a good number of fires in 2015, and look at the distribution of fires/non-fires in those counties in 2015 (and probably in a one month span). After that, I'll move to looking at the distribution of fires/non-fires in those same counties, but in earlier years. I'm aiming to explore the data to identify patterns that I might use in modeling, and specifically looking to see if: 

1. Fires clump together, as we might expect. Since the "detected fires" are just centroids detected at one point in the day, it seems reasonable to expect that multiple centroids that are truly fires would be detected close to each other in time and space. 
2. Fires are not present in the same locations from year to year. Forest-fires are often described as cyclical in nature, at least when we look at the same location across time. I'll be looking to try to confirm this (or support this) with this exploration here. 

The first thing I'll need to do is read in the data and figure out how to parse it to a given state and/or county. I'm going to intially focus on forest-fires in Washington, California, Colorado, Montana, and Texas. I think this will give me a reasonable mix of places that have enough potential fires to look at, while still being somewhat geographically diverse (unfortunately nothing in the east will really have enough). 

In [1]:
import pandas as pd
from dsfuncs.geo_plotting import CountyMap
%matplotlib inline

The first step will be to create functions to read in the data, and then only grab those rows of the data set that correspond to a given location (state & county) as well as a given month (or months). 

In [11]:
def read_df(year, modis=True): 
    """This function will read in a year of data, and add a month column. 
    
    Args: 
        year: str
        
    Return:
        Pandas DataFrame
    """
    if modis: 
        output_df = pd.read_csv('../../../data/csvs/detected_fires_MODIS_' + str(year) + '.csv', 
                                parse_dates=['date'], true_values=['t'], false_values=['f'])
    else: 
         output_df = pd.read_csv('../../../data/csvs/detected_fires_VIIRS_' + str(year) + '.csv', 
                                parse_dates=['date'], true_values=['t'], false_values=['f'])
    output_df['month'] = output_df.date.apply(lambda dt: dt.strftime('%B'))
    return output_df
    
def grab_by_location(df, state_names, county_names=None): 
    """Grab the data for a specified inputted state and county. 
    
    Args: 
        df: Pandas DataFrame
        state: set (or iterable of strings)
        county: set (or iterable of strings)
    
    Return: 
        Pandas DataFrame
    """
    if county_names: 
        output_df = df.query('state_name in @state_names and county_name in @county_names')
    else: 
        output_df = df.query('state_name in @state_names')
    return output_df

def grab_by_date(df, months): 
    """Grab the data for a set of specified months.
    
    Args: 
        df: Pandas DataFrame
        months: set (or iterable of strings)
    
    Return: 
        Pandas DataFrame
    """
    
    output_df = df.query("month in @months")
    return output_df

The next step is to just parse the data to get it into a format to plot. The format that the `CountyMap` class will expect the data to be in is an iterable of three items: 

1. Longitude of the point. 
2. Latitude of the point. 
3. Color to plot the point in. 

I'll create a function that will take in the previously parsed location/date DataFrame and get the data set up to be in that format. 

In [15]:
def format_df(df): 
    """Format the data to plot it on maps. 
    
    This function will grab the latitude and longitude 
    columns of the DataFrame, and return those, along 
    with a third column that will be newly generated. This 
    new column will hold what color we want to use to plot 
    the lat/long coordinate - I'll use red for fire and 
    green for non-fire. 
    
    Args: 
        df: Pandas DataFrame
    
    Return: 
        numpy.ndarray
    """
    
    keep_cols = ['long', 'lat', 'fire_bool']
    intermediate_df = df[keep_cols]
    output_df = parse_fire_bool(intermediate_df)
    output_array = output_df.values
    return output_array

def parse_fire_bool(df): 
    """Parse the fire boolean to a color for plotting. 
    
    Args: 
        df: Pandas DataFrame
        
    Return: 
        Pandas DataFrame
    """
    
    # Plot actual fires red and non-fires green. 
    output_df = df.drop('fire_bool', axis=1)
    output_df['plotting_mark'] = df['fire_bool'].apply(lambda f_bool: 'ro' if f_bool == True else 'go')
    return output_df

Now let's put all of this into a master function. 

In [20]:
def read_n_parse(year, state_names, county_names=None, months=None, plotting=False): 
    """Read and parse the data for plotting.
    
    Args: 
        year: str
    Return: 
        Pandas DataFrame
    """
    
    fires_df = read_df(year)
    if state_names: 
        fires_df = grab_by_location(fires_df, state_names, county_names)
    
    if months: 
        fires_df = grab_by_date(fires_df, months)
    
    if plotting: 
        fires_df = format_df(fires_df)
    return fires_df