# Wildfire and Drought Data Wrangling

### Collect Data

  - &#x2611; Download [wildfire Sqlite DB](https://www.kaggle.com/rtatman/188-million-us-wildfires) from Kaggle
  - &#x2611; Download [drought soil and weather CSVs](https://www.kaggle.com/cdminix/us-drought-meteorological-data) from Kaggle
  - &#x2611; Import soil and weather CSVs into Sqlite
  - &#x2611; Remove non-California data to keep the dataset more focused
  - &#x2611; Remove wildfire and soil/weather data that does not overlap
  - &#x2611; Load in county FIPS codes and geospatial lat/long into Sqlite
  - Add indexes/foreign keys to speed up Sqlite
    - &#x2611; year
    - &#x2611; fips
    - &#x2611; long/lat on fires and soil
  - &#x2611; Truncate Latitude and Longitude to 11 km (1 decimal place)
  - &#x2611; Backfill FIPS_CODE for fire using long/lat (maybe?)
  - Weather by date and long/lat between 2000-01-01 and 2015-12-31 from NASA Power API
  - &#x2611; Drought score by date and FIPS county between 2000-01-01 and 2015-12-31

In [None]:
!brew install spatialite-tools
!pip install -q pandas
!pip install -q pysqlite3
!pip install -q pyspatialite
!pip install -q requests
!pip install -q shapely

In [1]:
import pandas as pd
import re
import time
import sqlite3
import shapely.wkt
from shapely.geometry import Point
import requests

#### California Counties

1. Scrape Wikipedia for the Unites States counties from Wikipedia.
2. Filter out non-California counties.
3. Truncate longitude and latitude to 1 decimal place (~11 km wide). This should make the analysis go faster and also better generalize the location of predicted fires fires.
4. Join the Wikipedia county data with the Geographic boundies data for each California county. The Geographic boundaries data is in the form of `MULTIPOLYGON (((` tuples that can be interpreted by the shapely python package.
5. Set the index of the county DataFrame to FIPS which is the unique identifier for each county.

In [3]:
county_df = pd.read_html('https://en.wikipedia.org/wiki/User:Michael_J/County_table')[0]
float_degrees = lambda x: float(x.replace('°','').replace('–','-'))
county_df['latitude'] = county_df['Latitude'].apply(float_degrees)
county_df['longitude'] = county_df['Longitude'].apply(float_degrees)
county_df['lat'] = round(county_df['latitude'], 1)
county_df['long'] = round(county_df['longitude'], 1)
county_df['name'] = county_df['County [2]']

county_df = county_df[county_df['State'] == 'CA']
county_df = county_df.loc[:, county_df.columns.intersection(['FIPS', 'name', 'latitude', 'longitude', 'lat', 'long'])]

# Downloaded from https://data.edd.ca.gov/api/views/bpwh-bcb3/rows.csv?accessType=DOWNLOAD
county_geo_df = pd.read_csv('./county_geospatial.csv')
county_geo_df = county_geo_df.loc[:, county_geo_df.columns.intersection(['name', 'geo_multipolygon'])]

county_df = pd.merge(county_df, county_geo_df, left_on='name', right_on='name')
county_df = county_df.set_index('FIPS')

print(county_df.head())

       latitude   longitude   lat   long       name  \
FIPS                                                  
6001  37.648081 -121.913304  37.6 -121.9    Alameda   
6003  38.617610 -119.798999  38.6 -119.8     Alpine   
6005  38.443550 -120.653856  38.4 -120.7     Amador   
6007  39.665959 -121.601919  39.7 -121.6      Butte   
6009  38.187844 -120.555115  38.2 -120.6  Calaveras   

                                       geo_multipolygon  
FIPS                                                     
6001  MULTIPOLYGON (((-122.3110971410252 37.86340197...  
6003  MULTIPOLYGON (((-119.93538249202298 38.8084818...  
6005  MULTIPOLYGON (((-120.25874105290194 38.5799975...  
6007  MULTIPOLYGON (((-121.6354363647807 40.00088422...  
6009  MULTIPOLYGON (((-120.2108859831663 38.50000349...  


Load the counties DataFrame into Sqlite to make joins and analysis using SQL easier.

In [282]:
conn = sqlite3.connect('/Users/eerichmo/Documents/fires.sqlite')

cur = conn.cursor()

cur.execute('DROP TABLE county')
cur.execute('''CREATE TABLE county (
	fips	  					INTEGER NOT NULL,
	name	  					TEXT NOT NULL,
	latitude 					REAL NOT NULL,
	longitude					REAL NOT NULL,
	lat	    					REAL NOT NULL,
	long	  					REAL NOT NULL,
	geo_multipolygon	TEXT NOT NULL,
	PRIMARY KEY(fips)
);''')

county_df.to_sql('county', conn, if_exists='append')

conn.commit()
conn.close()

Calculate the Goegraphic boundaries for California.

In [3]:
ca_bounds = [-180, 90, 180, -90]

for i, county in county_df.iterrows():
  name = county['name']
  geo = shapely.wkt.loads(county['geo_multipolygon'])

  # East
  if (geo.bounds[0] > ca_bounds[0]):
    ca_bounds[0] = geo.bounds[0]

  # South
  if (geo.bounds[1] < ca_bounds[1]):
    ca_bounds[1] = geo.bounds[1]

  # West
  if (geo.bounds[2] < ca_bounds[2]):
    ca_bounds[2] = geo.bounds[2]

  # Norht
  if (geo.bounds[3] > ca_bounds[3]):
    ca_bounds[3] = geo.bounds[3]

ca_bounds = tuple(ca_bounds)
print(f'California bounds (east-south, west-north): {ca_bounds}')

California bounds (east-south, west-north): (-116.10618166434291, 32.53402817678555, -123.51814169611895, 42.009834867689875)


Adjust the Sqlite `fires` table to help future analysis.
1. Rename `fips_code` to `fips`.
2. Truncate `longitude` and `latitude` into 1 decimal place `long` and `lat` columns respectively.
3. Add an index on `date`, `long`, and `lat` to help speed up the queries.

In [6]:
conn = sqlite3.connect('/Users/eerichmo/Documents/fires.sqlite')
cur = conn.cursor()

cur.execute('ALTER TABLE fires ADD COLUMN year INTEGER NOT NULL DEFAULT 0')
cur.execute('UPDATE fires SET year = FIRE_YEAR WHERE year = 0')

cur.execute('ALTER TABLE fires ADD COLUMN month INTEGER NOT NULL DEFAULT 0')
cur.execute("""UPDATE fires SET month = CAST(strftime('%m', DISCOVERY_DATE) as 'INTEGER') WHERE month = 0""")

cur.execute('ALTER TABLE fires ADD COLUMN fips INTEGER NOT NULL DEFAULT 0')
cur.execute('UPDATE fires SET fips = FIPS_CODE WHERE FIPS = 0')

cur.execute('ALTER TABLE fires ADD COLUMN long REAL NOT NULL DEFAULT 0')
cur.execute('ALTER TABLE fires ADD COLUMN lat REAL NOT NULL DEFAULT 0')
cur.execute('UPDATE fires SET long = round(LONGITUDE, 1), lat = round(LATITUDE, 1)')

cur.execute("UPDATE fires SET date = strftime('%Y-%m-%d', DISCOVERY_DATE)")

cur.execute("ALTER TABLE fires ADD COLUMN date_1d_before TEXT NOT NULL DEFAULT ''")
cur.execute("UPDATE fires SET date_1d_before = date(date, '-1 days') where date_1d_before = ''")
cur.execute("ALTER TABLE fires ADD COLUMN date_2d_before TEXT NOT NULL DEFAULT ''")
cur.execute("UPDATE fires SET date_2d_before = date(date, '-2 days') where date_2d_before = ''")
cur.execute("ALTER TABLE fires ADD COLUMN date_3d_before TEXT NOT NULL DEFAULT ''")
cur.execute("UPDATE fires SET date_3d_before = date(date, '-3 days') where date_3d_before = ''")

cur.execute('ALTER TABLE fires ADD COLUMN cause_code INTEGER NOT NULL DEFAULT 0')
cur.execute('ALTER TABLE fires ADD COLUMN cause_descr TEXT')
cur.execute("UPDATE fires SET cause_code = STAT_CAUSE_CODE, cause_descr = STAT_CAUSE_DESCR WHERE cause_code = 0")

cur.execute('DROP INDEX IF EXISTS idx_fires_date_long_lat')
cur.execute('DROP INDEX IF EXISTS idx_fires_ca_date_long_lat')
cur.execute('CREATE INDEX idx_fires_date_long_lat ON fires(date, long, lat)')

cur.execute('DROP INDEX IF EXISTS idx_fires_date_range_long_lat')
cur.execute('CREATE INDEX idx_fires_date_range_long_lat ON fires(date, date_1d_before, date_2d_before, date_3d_before, long, lat)')

cur.execute('DROP INDEX IF EXISTS idx_fires_date_fips')
cur.execute('CREATE INDEX idx_fires_date_fips ON fires(date, fips)')

cur.execute('DROP INDEX IF EXISTS idx_fires_date_1d_before_fips')
cur.execute('CREATE INDEX idx_fires_date_1d_before_fips ON fires(date_1d_before, fips)')

cur.execute('DROP INDEX IF EXISTS idx_fires_date_2d_before_fips')
cur.execute('CREATE INDEX idx_fires_date_2d_before_fips ON fires(date_2d_before, fips)')

cur.execute('DROP INDEX IF EXISTS idx_fires_date_3d_before_fips')
cur.execute('CREATE INDEX idx_fires_date_3d_before_fips ON fires(date_3d_before, fips)')

conn.commit()
conn.close()

Create a `weater_geo` table that holds the daily weather details at 11km wide longitude/latitude points between 1 Jan 2000 and 21 Dec 2015.

In [223]:
conn = sqlite3.connect('/Users/eerichmo/Documents/fires.sqlite')
cur = conn.cursor()

cur.execute('''CREATE TABLE weather_geo (
	date							TEXT NOT NULL,
	month							INTEGER NOT NULL,
	long						  REAL NOT NULL,
	lat							  REAL NOT NULL,
	fips							INTEGER NOT NULL,
	precipitation			REAL NOT NULL,
	pressure					REAL NOT NULL,
	humidity_2m				REAL NOT NULL,
	temp_2m						REAL NOT NULL,
	temp_dew_point_2m	REAL NOT NULL,
	temp_wet_bulb_2m	REAL NOT NULL,
	temp_max_2m				REAL NOT NULL,
	temp_min_2m				REAL NOT NULL,
	temp_range_2m			REAL NOT NULL,
	temp_0m						REAL NOT NULL,
	wind_10m					REAL NOT NULL,
	wind_max_10m			REAL NOT NULL,
	wind_min_10m			REAL NOT NULL,
	wind_range_10m		REAL NOT NULL,
	wind_50m					REAL NOT NULL,
	wind_max_50m			REAL NOT NULL,
	wind_min_50m			REAL NOT NULL,
	wind_range_50m		REAL NOT NULL,
	PRIMARY KEY(date, long, lat)
);''')

cur.execute('DROP INDEX IF EXISTS idx_weather_geo_date_fips')
cur.execute('CREATE INDEX idx_weather_geo_date_fips ON weather_geo (date, fips)')

cur.execute('DROP INDEX IF EXISTS idx_weather_geo_lat_long')
cur.execute('CREATE INDEX idx_weather_geo_lat_long ON weather_geo (lat, long)')

cur.execute('DROP INDEX IF EXISTS idx_weather_geo_month_lat_long')
cur.execute('CREATE INDEX idx_weather_geo_month_lat_long ON weather_geo (month, lat, long)')

conn.commit()
conn.close()

Defined the `fetch_weather` method for pulling various weather data points from NASA's POWER temperature and weather data API.

In [6]:
weather_params = [p.strip() for p in re.findall(
"^\w+",
"""
WS10M_MIN      MERRA2 1/2x1/2 Minimum Wind Speed at 10 Meters (m/s) 
QV2M           MERRA2 1/2x1/2 Specific Humidity at 2 Meters (g/kg) 
T2M_RANGE      MERRA2 1/2x1/2 Temperature Range at 2 Meters (C) 
WS10M          MERRA2 1/2x1/2 Wind Speed at 10 Meters (m/s) 
T2M            MERRA2 1/2x1/2 Temperature at 2 Meters (C) 
WS50M_MIN      MERRA2 1/2x1/2 Minimum Wind Speed at 50 Meters (m/s) 
T2M_MAX        MERRA2 1/2x1/2 Maximum Temperature at 2 Meters (C) 
WS50M          MERRA2 1/2x1/2 Wind Speed at 50 Meters (m/s) 
TS             MERRA2 1/2x1/2 Earth Skin Temperature (C) 
WS50M_RANGE    MERRA2 1/2x1/2 Wind Speed Range at 50 Meters (m/s) 
WS50M_MAX      MERRA2 1/2x1/2 Maximum Wind Speed at 50 Meters (m/s) 
WS10M_MAX      MERRA2 1/2x1/2 Maximum Wind Speed at 10 Meters (m/s) 
WS10M_RANGE    MERRA2 1/2x1/2 Wind Speed Range at 10 Meters (m/s) 
PS             MERRA2 1/2x1/2 Surface Pressure (kPa) 
T2MDEW         MERRA2 1/2x1/2 Dew/Frost Point at 2 Meters (C) 
T2M_MIN        MERRA2 1/2x1/2 Minimum Temperature at 2 Meters (C) 
T2MWET         MERRA2 1/2x1/2 Wet Bulb Temperature at 2 Meters (C) 
PRECTOT        MERRA2 1/2x1/2 Precipitation (mm day-1) 
""",
re.MULTILINE
)]

print(weather_params)

def fetch_weather(long, lat, start, end):
    return requests.get(
      'https://power.larc.nasa.gov/api/temporal/daily/point',
      {
          'parameters': ','.join(weather_params),
          'community': 'SB',
          'longitude': long,
          'latitude': lat,
          'start': start,
          'end': end,
          'format': 'JSON',
      }
    ).json()['properties']['parameter']

['WS10M_MIN', 'QV2M', 'T2M_RANGE', 'WS10M', 'T2M', 'WS50M_MIN', 'T2M_MAX', 'WS50M', 'TS', 'WS50M_RANGE', 'WS50M_MAX', 'WS10M_MAX', 'WS10M_RANGE', 'PS', 'T2MDEW', 'T2M_MIN', 'T2MWET', 'PRECTOT']


For each California county iterate over all the 11km (0.1 point steps) longitude and latitude points within that county's goegraphical bounds and fetch the weather data for those points between 1 Jan 2000 and 21 Dec 2015.

In [221]:
start_date = '20000101'
end_date = '20151231'

conn = sqlite3.connect('/Users/eerichmo/Documents/fires.sqlite')
cur = conn.cursor()

for fips, county in county_df.iterrows():
  name = county['name']
  geo = shapely.wkt.loads(county.geo_multipolygon)

  long_min = round(geo.bounds[0], 1)
  long_max = round(geo.bounds[2], 1)

  lat_min = round(geo.bounds[1], 1)
  lat_max = round(geo.bounds[3], 1)

  print(f'{name} southwest to northeast: ({lat_min}, {long_min}) to ({lat_max}, {long_max})')

  for long in range(int(long_min * 10), int(long_max * 10)):
    for lat in range(int(lat_min * 10), int(lat_max * 10)):
      point = Point(long / 10, lat / 10)
      start = time.time()

      if geo.contains(point):
        # Only process lat/long that have not been precessed yet.
        cur.execute("""SELECT 1 as found FROM weather_geo WHERE lat = :lat and long = :long""", { 'lat': point.y, 'long': point.x })
        found = cur.fetchall()

        if (len(found) == 0):
          print(f'lat: {point.x}, long: {point.y}')
          json = fetch_weather(point.x, point.y, start_date, end_date)

          for date in json['TS'].keys():
            cur.execute('''
              INSERT INTO weather_geo (
                date, month, long, lat, fips, precipitation, pressure, humidity_2m, temp_2m,
                temp_dew_point_2m, temp_wet_bulb_2m, temp_max_2m, temp_min_2m, temp_range_2m,
                temp_0m, wind_10m, wind_max_10m, wind_min_10m, wind_range_10m, wind_50m,
                wind_max_50m, wind_min_50m, wind_range_50m
              )
              VALUES (
                :date, :month, :long, :lat, :fips, :precipitation, :pressure, :humidity_2m, :temp_2m,
                :temp_dew_point_2m, :temp_wet_bulb_2m, :temp_max_2m, :temp_min_2m, :temp_range_2m,
                :temp_0m, :wind_10m, :wind_max_10m, :wind_min_10m, :wind_range_10m, :wind_50m,
                :wind_max_50m, :wind_min_50m, :wind_range_50m
              )
              ''', {
                'date': f'{date[0:4]}-{date[4:6]}-{date[6:8]}',
                'month': int(date[4:6]),
                'long': point.x,
                'lat': point.y,
                'fips': fips,
                'precipitation': json['PRECTOTCORR'][date],
                'pressure': json['PS'][date],
                'humidity_2m': json['QV2M'][date],
                'temp_2m': json['T2M'][date],
                'temp_dew_point_2m': json['T2MDEW'][date],
                'temp_wet_bulb_2m': json['T2MWET'][date],
                'temp_max_2m': json['T2M_MAX'][date],
                'temp_min_2m': json['T2M_MIN'][date],
                'temp_range_2m': json['T2M_RANGE'][date],
                'temp_0m': json['TS'][date],
                'wind_10m': json['WS10M'][date],
                'wind_max_10m': json['WS10M_MAX'][date],
                'wind_min_10m': json['WS10M_MIN'][date],
                'wind_range_10m': json['WS10M_RANGE'][date],
                'wind_50m': json['WS50M'][date],
                'wind_max_50m': json['WS50M_MAX'][date],
                'wind_min_50m': json['WS50M_MIN'][date],
                'wind_range_50m': json['WS50M_RANGE'][date]
              })

            conn.commit()

          end = time.time()
          print(f'{name} at {point} took {round(end - start, 1)}s')


conn.close()

Alameda southwest to northeast: (37.5, -122.3) to (37.9, -121.5)
Alpine southwest to northeast: (38.3, -120.1) to (38.9, -119.5)
Amador southwest to northeast: (38.2, -121.0) to (38.7, -120.1)
Butte southwest to northeast: (39.3, -122.1) to (40.2, -121.1)
Calaveras southwest to northeast: (37.8, -121.0) to (38.5, -120.0)
Colusa southwest to northeast: (38.9, -122.8) to (39.4, -121.8)
Contra Costa southwest to northeast: (37.7, -122.4) to (38.1, -121.5)
Del Norte southwest to northeast: (41.4, -124.3) to (42.0, -123.5)
El Dorado southwest to northeast: (38.5, -121.1) to (39.1, -119.9)
Fresno southwest to northeast: (35.9, -120.9) to (37.6, -118.4)
Glenn southwest to northeast: (39.4, -122.9) to (39.8, -121.9)
Humboldt southwest to northeast: (40.0, -124.4) to (41.5, -123.4)
Imperial southwest to northeast: (32.6, -116.1) to (33.4, -114.5)
Inyo southwest to northeast: (35.8, -118.8) to (37.5, -115.6)
Kern southwest to northeast: (34.8, -120.2) to (35.8, -117.6)
Kings southwest to northea

In [28]:
conn.commit()
conn.close()

20

Load the `soil.csv`, from the [Harmonized World Soil Database v 1.2](https://www.fao.org/soils-portal/data-hub/soil-maps-and-databases/harmonized-world-soil-database-v12/en/), into Sqlite.

In [351]:
conn = sqlite3.connect('/Users/eerichmo/Documents/fires.sqlite')
cur = conn.cursor()

cur.execute('DROP TABLE IF EXISTS soil')
cur.execute('''CREATE TABLE soil (
	long										REAL NOT NULL,
	lat											REAL NOT NULL,
	fips										INTEGER NOT NULL,
	latitude								REAL NOT NULL,
	longitude								REAL NOT NULL,
	elevation								INTEGER NOT NULL,
	slope_005								REAL NOT NULL,
	slope_005_02						REAL NOT NULL,
	slope_02_05							REAL NOT NULL,
	slope_05_10							REAL NOT NULL,
	slope_10_15							REAL NOT NULL,
	slope_15_30							REAL NOT NULL,
	slope_30_45							REAL NOT NULL,
	slope_45								REAL NOT NULL,
	aspect_north						REAL NOT NULL,
	aspect_east							REAL NOT NULL,
	aspect_south						REAL NOT NULL,
	aspect_west							REAL NOT NULL,
	aspect_unknown					REAL NOT NULL,
	water_land							REAL NOT NULL,
	barren_land							REAL NOT NULL,
	urban_land							REAL NOT NULL,
	grass_land							REAL NOT NULL,
	forest_land							REAL NOT NULL,
	partial_cultivated_land	REAL NOT NULL,
	irrigated_land					REAL NOT NULL,
	cultivated_land					REAL NOT NULL,
	nutrient								INTEGER NOT NULL,
	nutrient_retention			INTEGER NOT NULL,
	rooting									INTEGER NOT NULL,
	oxygen									INTEGER NOT NULL,
	excess_salts						INTEGER NOT NULL,
	toxicity								INTEGER NOT NULL,
	workablity							INTEGER NOT NULL
)''')

cur.execute('DROP INDEX IF EXISTS idx_soil_fips_date')
cur.execute('CREATE INDEX idx_soil_date ON soil(date, fips)')

soil_df = pd.read_csv('./soil.csv')
soil_df['lat'] = round(soil_df['latitude'], 1)
soil_df['long'] = round(soil_df['longitude'], 1)

soil_df = soil_df[soil_df['fips'].isin(county_df.index)]

soil_df.to_sql('soil', conn, if_exists='append', index=False)

conn.commit()
conn.close()

Create a `drought` table for holding the drought score for all California counties between 1 Jan 2000 and 21 Dec 2015.

In [11]:
conn = sqlite3.connect('/Users/eerichmo/Documents/fires.sqlite')
cur = conn.cursor()

cur.execute('DROP TABLE IF EXISTS drought')
cur.execute('''CREATE TABLE drought (
  date          TEXT NOT NULL,
  fips          INTEGER NOT NULL,
  drought_score REAL,
  PRIMARY KEY(date, fips)
)''')

conn.commit()
conn.close()

Pull the drought scores from the [US Drought Monitor website](https://droughtmonitor.unl.edu/).

In [402]:
def fetch_drought(fips):
    return requests.get(
        'https://usdmdataservices.unl.edu/api/CountyStatistics/GetDroughtSeverityStatisticsByAreaPercent',
        {
            'aoi': fips,
            'startdate': '10/1/1999',
            'enddate': '12/31/2015',
            'statisticsType': 1,
        }
    ).json()

For each county that doesn't have a drought score pull the drought score from US Drought Monintor.

In [422]:
conn = sqlite3.connect('/Users/eerichmo/Documents/fires.sqlite')
cur = conn.cursor()

cur.execute('SELECT DISTINCT fips FROM drought WHERE drought_score IS NULL')

for row in cur.fetchall():
  fips = row[0]
  fips_5_char = f'0{str(fips)}' if fips < 10000 else str(fips)

  print(f'Fetch drought score for {fips_5_char}')
  json = fetch_drought(fips_5_char)

  for item in json:
    drought_score = float(item['D0'])/100 + float(item['D1'])/100 + float(item['D2'])/100 + float(item['D3'])/100 + float(item['D4'])/100

    # Backfill Jan 4 score to Jan 1-3 of 2000 as it seems to be missing
    start = '2000-01-01' if item['ValidStart'] <= '2000-01-04' else item['ValidStart']

    drought_params = { 'fips': fips, 'drought_score': drought_score, 'start': start, 'end': item['ValidEnd'] }
    
    cur.execute('''
      UPDATE drought SET
        drought_score = :drought_score
      WHERE
        fips = :fips AND date >= :start AND date <= :end
    ''', drought_params)

  conn.commit()
  
conn.close()


Fetch drought score for 06023
Fetch drought score for 06025
Fetch drought score for 06027
Fetch drought score for 06029
Fetch drought score for 06031
Fetch drought score for 06033
Fetch drought score for 06035
Fetch drought score for 06037
Fetch drought score for 06039
Fetch drought score for 06041
Fetch drought score for 06043
Fetch drought score for 06045
Fetch drought score for 06047
Fetch drought score for 06049
Fetch drought score for 06051
Fetch drought score for 06053
Fetch drought score for 06055
Fetch drought score for 06057
Fetch drought score for 06059
Fetch drought score for 06061
Fetch drought score for 06063
Fetch drought score for 06065
Fetch drought score for 06067
Fetch drought score for 06069
Fetch drought score for 06071
Fetch drought score for 06073
Fetch drought score for 06075
Fetch drought score for 06077
Fetch drought score for 06079
Fetch drought score for 06081
Fetch drought score for 06083
Fetch drought score for 06085
Fetch drought score for 06087
Fetch drou

Backfill any missing county identifiers (FIPS codes) for California fires.

In [32]:
conn = sqlite3.connect('/Users/eerichmo/Documents/fires.sqlite')
cur = conn.cursor()

cur.execute('SELECT longitude, latitude FROM fires WHERE fips IS NULL')

for row in cur.fetchall():
  long = row[0]
  lat = row[1]
  found = False
  min_dist = 180
  closest_fips = 0

  for fips, county in county_df.iterrows():
    region = shapely.wkt.loads(county['geo_multipolygon'])
    point = Point(long, lat)

    if region.contains(point):
      print(f'{point} is in {fips}')
      cur.execute('''
        UPDATE fires SET fips = :fips
        WHERE longitude = :longitude AND latitude = :latitude
      ''', { 'fips': fips, 'longitude': long, 'latitude': lat })
      conn.commit()
      found = True
      break

    dist = region.boundary.distance(point)

    if min_dist > dist:
      min_dist = dist
      closest_fips = fips

  if not found:
    print(f'{point} not found. Closest county, by {round(min_dist, 3)}, is {closest_fips}')
    cur.execute('''
      UPDATE fires SET fips = :fips
      WHERE longitude = :longitude AND latitude = :latitude
    ''', { 'fips': closest_fips, 'longitude': long, 'latitude': lat })
    conn.commit()
  
conn.close()

POINT (-119.95472222 38.9725) not found. Closest county, by 0.004, is 6017
POINT (-119.925 38.95) not found. Closest county, by 0.002, is 6017
POINT (-119.67388889 38.79555556) not found. Closest county, by 0.017, is 6003
POINT (-119.45583333 38.64138889) not found. Closest county, by 0.015, is 6051
POINT (-120 39.23388889) not found. Closest county, by 0.006, is 6061
POINT (-119.96083333 39.40416667) not found. Closest county, by 0.043, is 6057
POINT (-119.98583333 39.00111111) not found. Closest county, by 0.01, is 6017
POINT (-118.78722222 38.16027778) not found. Closest county, by 0.006, is 6051
POINT (-119.7575 38.83277778) not found. Closest county, by 0.001, is 6003
POINT (-119.935 39.11888889) not found. Closest county, by 0.068, is 6061
POINT (-118.78638889 38.19722222) not found. Closest county, by 0.036, is 6051
POINT (-120 39.46888889) not found. Closest county, by 0.003, is 6091
POINT (-119.29444444 38.63944444) not found. Closest county, by 0.105, is 6051
POINT (-119.3602

In [None]:
conn = sqlite3.connect('/Users/eerichmo/Documents/fires.sqlite')
cur = conn.cursor()

df = pd.read_sql_query('select * from fires', conn)

final = df.isna().sum()
cols = []
for count, col in zip(final,list(df.columns)):
    if count > 0:
        cols.append(col)

print(f'Columns with null: {cols}')

conn.close()

In [None]:
conn = sqlite3.connect('/Users/eerichmo/Documents/fires.sqlite')
cur = conn.cursor()

cur.execute('ALTER TABLE fires DROP COLUMN LOCAL_FIRE_REPORT_ID')
cur.execute('ALTER TABLE fires DROP COLUMN LOCAL_INCIDENT_ID')
cur.execute('ALTER TABLE fires DROP COLUMN FIRE_CODE')
cur.execute('ALTER TABLE fires DROP COLUMN FIRE_NAME')
cur.execute('ALTER TABLE fires DROP COLUMN ICS_209_INCIDENT_NUMBER')
cur.execute('ALTER TABLE fires DROP COLUMN ICS_209_NAME')
cur.execute('ALTER TABLE fires DROP COLUMN MTBS_ID')
cur.execute('ALTER TABLE fires DROP COLUMN MTBS_FIRE_NAME')
cur.execute('ALTER TABLE fires DROP COLUMN COMPLEX_NAME')
cur.execute('ALTER TABLE fires DROP COLUMN DISCOVERY_TIME')
cur.execute('ALTER TABLE fires DROP COLUMN CONT_DATE')
cur.execute('ALTER TABLE fires DROP COLUMN CONT_DOY')
cur.execute('ALTER TABLE fires DROP COLUMN CONT_TIME')
cur.execute('ALTER TABLE fires DROP COLUMN STATE')
cur.execute('ALTER TABLE fires DROP COLUMN COUNTY')
cur.execute('ALTER TABLE fires DROP COLUMN FIPS_CODE')
cur.execute('ALTER TABLE fires DROP COLUMN FIPS_NAME')

conn.close()

In [20]:
conn = sqlite3.connect('/Users/eerichmo/Documents/fires.sqlite')
cur = conn.cursor()

cur.execute('ALTER TABLE fires DROP COLUMN NWCG_REPORTING_AGENCY')
cur.execute('ALTER TABLE fires DROP COLUMN NWCG_REPORTING_UNIT_ID')
cur.execute('ALTER TABLE fires DROP COLUMN NWCG_REPORTING_UNIT_NAME')
cur.execute('ALTER TABLE fires DROP COLUMN SOURCE_REPORTING_UNIT')
cur.execute('ALTER TABLE fires DROP COLUMN SOURCE_REPORTING_UNIT_NAME')
cur.execute('ALTER TABLE fires DROP COLUMN OWNER_CODE')
cur.execute('ALTER TABLE fires DROP COLUMN OWNER_DESCR')
cur.execute('ALTER TABLE fires DROP COLUMN FPA_ID')
cur.execute('ALTER TABLE fires DROP COLUMN OBJECTID')
cur.execute('ALTER TABLE fires DROP COLUMN SOURCE_SYSTEM')
cur.execute('ALTER TABLE fires DROP COLUMN SOURCE_SYSTEM_TYPE')

conn.close()

In [16]:
conn = sqlite3.connect('/Users/eerichmo/Documents/fires.sqlite')
cur = conn.cursor()

cur.execute('CREATE TABLE fires_2022_02_19 AS SELECT * FROM fires')

conn.close()

In [183]:
conn = sqlite3.connect('/Users/eerichmo/Documents/fires.sqlite')
cur = conn.cursor()

cur.execute('ALTER TABLE fires ADD COLUMN reporting_unit_id INTEGER')
cur.execute('ALTER TABLE fires ADD COLUMN local_incident_id TEXT')
cur.execute('ALTER TABLE fires ADD COLUMN fire_name TEXT')

cur.execute("""
update fires set fire_name = (
  select fires_2000_2018.fire_name
  from fires_2000_2018
  where fires_2000_2018.fod_id = fires.fod_id
)
where fires.fire_name is null
""")

cur.execute("""
update fires set reporting_unit_id = (
  select fires_2000_2018.NWCG_REPORTING_UNIT_ID
  from fires_2000_2018
  where fires_2000_2018.fod_id = fires.fod_id
)
where fires.reporting_unit_id is null
""")

cur.execute("""
update fires set local_incident_id = (
  select ifnull(fires_2000_2018.ICS_209_PLUS_INCIDENT_JOIN_ID, fires_2000_2018.local_incident_id)
  from fires_2000_2018
  where fires_2000_2018.fod_id = fires.fod_id
)
where local_incident_id is null
""")

conn.commit()
conn.close()

In [7]:
conn = sqlite3.connect('/Users/eerichmo/Documents/fires.sqlite')
cur = conn.cursor()

cur.execute("""UPDATE weather_county SET drought_score = (
  select drought_score
  from drought
  where
    drought.date = weather_county.date
    and drought.fips = weather_county.fips
)
where
  drought_score is null
""")

cur.execute('DROP INDEX IF EXISTS idx_weather_county_year')
cur.execute('CREATE INDEX idx_weather_county_year ON weather_county(year)')

conn.commit()
conn.close()

In [12]:
conn = sqlite3.connect('/Users/eerichmo/Documents/fires.sqlite')
cur = conn.cursor()

cur.execute('ALTER TABLE weather_county ADD COLUMN month INTEGER NOT NULL DEFAULT 0')
cur.execute("""UPDATE weather_county SET month = CAST(strftime('%m', date) as 'INTEGER') WHERE month = 0""")

cur.execute('DROP INDEX IF EXISTS idx_weather_county_fips_date')
cur.execute('CREATE INDEX idx_weather_county_fips_date ON weather_county(date, fips)')

conn.commit()
conn.close()

In [3]:
## EXAMPLE ONLY ##
# Loading the spatialite on Mac crashes Python
import sqlite3
import pandas as pd

conn = sqlite3.connect('/Users/eerichmo/Documents/fires.sqlite')
conn.enable_load_extension(True)

spatiallite_path = '/usr/local/lib/mod_spatialite.dylib'
conn.load_extension(spatiallite_path)

df_past = pd.read_sql_query("""
select ST_AsText(shape), longitude, latitude from fires limit 100
""", conn)

conn.close()

In [109]:
conn = sqlite3.connect('/Users/eerichmo/Documents/fires.sqlite')
cur = conn.cursor()

cur.execute("ALTER TABLE fires ADD COLUMN geo_polygon TEXT")

conn.close()

In [220]:
import urllib.parse
import json
import re
from shapely.geometry import Polygon
from shapely import wkt

conn = sqlite3.connect('/Users/eerichmo/Documents/fires.sqlite')
cur = conn.cursor()

df_fires = pd.read_sql_query("""
select fod_id, year, date, fire_name, fire_size, local_incident_id, reporting_unit_id
from fires
where
  geo_polygon is null
  and fire_size_class >= 'D'
  and (
    fire_name is not null
    or local_incident_id is not null
  )
""", conn)

# https://opendata.arcgis.com/datasets/585b8ff97f5c45fe924d3a1221b446c6_0.geojson
url = 'https://egis.fire.ca.gov/arcgis/rest/services/FRAP/FirePerimeters_FS/FeatureServer/0/query?outFields=*&outSR=4326&f=json'

for i, fire in df_fires.iterrows():
  regex_match = re.search(r'_CA-([^-]+)-([^-]+)_', fire.local_incident_id or '')
  if (regex_match):
    unit_id = regex_match.group(1)
    incident_id = regex_match.group(2).zfill(8)
  else:
    if (fire.local_incident_id is not None):
      incident_id = fire.local_incident_id.replace(unit_id, '')

    if (fire.reporting_unit_id is not None):
      unit_id = fire.reporting_unit_id.replace('USCA', '')

  fire_name = '' if fire.fire_name is None else fire.fire_name.replace("'", "''")
  if (len(fire_name)):
    where = urllib.parse.quote(f"ALARM_DATE=DATE '{fire.date}' and STATE='CA' and UNIT_ID='{unit_id}' and FIRE_NAME='{fire_name}'")
  else:
    where = urllib.parse.quote(f"ALARM_DATE=DATE '{fire.date}' and STATE='CA' and UNIT_ID='{unit_id}' and INC_NUM='{incident_id}'")

  result = requests.get(f'{url}&where={where}').json()['features']
  
  if (not len(result)):
    print(f'MISSED: where={where}')
    where = urllib.parse.quote(f"ALARM_DATE=DATE '{fire.date}' and STATE='CA' and FIRE_NAME='{fire_name}' and REPORT_AC={fire.fire_size}")
    result = requests.get(f'{url}&where={where}').json()['features']

  if (not len(result)):
    print(f'MISSED: where={where}')
    where = urllib.parse.quote(f"YEAR_={fire.year} and STATE='CA' and UNIT_ID='{unit_id}' and FIRE_NAME='{fire_name}' and REPORT_AC={fire.fire_size}")
    result = requests.get(f'{url}&where={where}').json()['features']

  if (len(result) == 1):
    geo_json = result[0]['geometry']['rings'][0]
    geo = Polygon(geo_json)

    cur.execute("""
    update fires set geo_polygon = :polygon where fod_id = :fod_id
    """, { 'polygon': wkt.dumps(geo), 'fod_id': fire.fod_id })

    conn.commit()
  else:
    print(f'MISSED: where={where}')
    print(f'Could not find {fire.fire_name} on {fire.date} ({fire.fod_id})')

conn.close()


MISSED: where=ALARM_DATE%3DDATE%20%272005-08-07%27%20and%20STATE%3D%27CA%27%20and%20UNIT_ID%3D%27KRU%27%20and%20FIRE_NAME%3D%27MACE%27
MISSED: where=ALARM_DATE%3DDATE%20%272005-08-07%27%20and%20STATE%3D%27CA%27%20and%20FIRE_NAME%3D%27MACE%27%20and%20REPORT_AC%3D191.0
MISSED: where=YEAR_%3D2005%20and%20STATE%3D%27CA%27%20and%20UNIT_ID%3D%27KRU%27%20and%20FIRE_NAME%3D%27MACE%27%20and%20REPORT_AC%3D191.0
Could not find MACE on 2005-08-07 (2763)
MISSED: where=ALARM_DATE%3DDATE%20%272005-06-25%27%20and%20STATE%3D%27CA%27%20and%20UNIT_ID%3D%27ANF%27%20and%20FIRE_NAME%3D%27OAK%27
MISSED: where=ALARM_DATE%3DDATE%20%272005-06-25%27%20and%20STATE%3D%27CA%27%20and%20FIRE_NAME%3D%27OAK%27%20and%20REPORT_AC%3D125.0
MISSED: where=YEAR_%3D2005%20and%20STATE%3D%27CA%27%20and%20UNIT_ID%3D%27ANF%27%20and%20FIRE_NAME%3D%27OAK%27%20and%20REPORT_AC%3D125.0
Could not find OAK on 2005-06-25 (3785)
MISSED: where=ALARM_DATE%3DDATE%20%272005-06-27%27%20and%20STATE%3D%27CA%27%20and%20UNIT_ID%3D%27ANF%27%20and%20

In [218]:
import urllib.parse
import json
import re
from shapely.geometry import Polygon, MultiPolygon

url = 'https://egis.fire.ca.gov/arcgis/rest/services/FRAP/FirePerimeters_FS/FeatureServer/0/query?outFields=*&outSR=4326&f=json'

where = urllib.parse.quote(f"YEAR_=2015 and STATE='CA' and UNIT_ID='HUU'")

print(f'{url}&where={where}')

res_json = requests.get(f'{url}&where={where}').json()

for item in res_json['features']:
  print(item['attributes'])

if (len(res_json['features'])):
  geo_json = res_json['features'][0]['geometry']['rings'][0]
  geo = Polygon(geo_json)
  geo



https://egis.fire.ca.gov/arcgis/rest/services/FRAP/FirePerimeters_FS/FeatureServer/0/query?outFields=*&outSR=4326&f=json&where=YEAR_%3D2015%20and%20STATE%3D%27CA%27%20and%20UNIT_ID%3D%27HUU%27
{'OBJECTID': 40960, 'YEAR_': '2015', 'STATE': 'CA', 'AGENCY': 'CDF', 'UNIT_ID': 'HUU', 'FIRE_NAME': 'BALD FIRE', 'INC_NUM': '003924', 'ALARM_DATE': 1433808000000, 'CONT_DATE': 1434240000000, 'CAUSE': 1, 'COMMENTS': None, 'REPORT_AC': None, 'GIS_ACRES': 26.94750404, 'C_METHOD': 1, 'OBJECTIVE': 1, 'FIRE_NUM': None, 'Shape__Area': 192485.38554464, 'Shape__Length': 1644.4127348930501}
{'OBJECTID': 40961, 'YEAR_': '2015', 'STATE': 'CA', 'AGENCY': 'CDF', 'UNIT_ID': 'HUU', 'FIRE_NAME': 'PINE 1-44', 'INC_NUM': '005606', 'ALARM_DATE': 1438214400000, 'CONT_DATE': 1441670400000, 'CAUSE': 1, 'COMMENTS': 'HUMBOLDT COMPLEX', 'REPORT_AC': None, 'GIS_ACRES': 1773.2467041, 'C_METHOD': 7, 'OBJECTIVE': 1, 'FIRE_NUM': None, 'Shape__Area': 12343897.1037467, 'Shape__Length': 19537.718497101898}
{'OBJECTID': 40962, 'YE