# FILTERING#
Here I am filtering for the plots that burned and have the specific measurement condition that they were measured on either side of a fire.  

#### What I have done here:
- Using WA_FIRE_PLOT_PAIRS, which was created in QGIS and contains all plots/fire pairs, I filter TREE by keeping rows with 'PLOT' that appear in WA_FIRE_PLOT_PAIRS.
    - **Note that this filtering is wrong right now because PLOT is not the unique indentifier.  Instead use a combination of PLOT, COUNTYCD, UNITCD, and STATECD** (STATECD unecessary in our case).  

- Then I add to WA_FIRE_PLOT_PAIRS the measurement years of those plots- this is done correctly as I use PLOT, COUNTYCD, UNITCD.  
- Now I can filter FIREPLOTS_MEAS (the plot/fire pairs with measurement years added) to only keep those that burned inbetween the measurement years.  
- **The final set FIREPLOTS_FIRE_SANDWICH only has these plot/fire pairs with burns inbetween measurements**

- **Now we need to see how many trees fall into these plots.** 

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns
sns.set_style("whitegrid")

Remember to unzip WA_TREE.csv.zip

In [3]:
TREE = pd.read_csv('../Data/WA_TREE.csv')
print(len(TREE))
TREE.columns

  TREE = pd.read_csv('../Data/WA_TREE.csv')


504956


Index(['CN', 'PLT_CN', 'PREV_TRE_CN', 'INVYR', 'STATECD', 'UNITCD', 'COUNTYCD',
       'PLOT', 'SUBP', 'TREE',
       ...
       'VOLCSNET_BARK', 'DRYBIO_STEM', 'DRYBIO_STEM_BARK', 'DRYBIO_STUMP_BARK',
       'DRYBIO_BOLE_BARK', 'DRYBIO_BRANCH', 'DRYBIO_FOLIAGE',
       'DRYBIO_SAWLOG_BARK', 'PREV_ACTUALHT_FLD', 'PREV_HT_FLD'],
      dtype='object', length=197)

$N = 504,956$ 
- Now we can remove trees that are in plots that have not burned in the last 24 years.  
- To do this I will use **PLOT_FIRES_GIS**, which was created by intersecting the last 24 years of fire data from the NIFC with our plot latitude and longitude.  *We are introducing our first uncertainty here- the plot latitude and longitude have been 'fuzzed'.*

In [4]:
FIREPLOTS = pd.read_csv('../Data/WA_PLOT_FIRE_PAIRS.csv')
print(len(FIREPLOTS))
print(FIREPLOTS.columns)


7177
Index(['CN', 'SRV_CN', 'CTY_CN', 'PREV_PLT_CN', 'INVYR', 'STATECD', 'UNITCD',
       'COUNTYCD', 'PLOT', 'PLOT_STATUS_CD', 'PLOT_NONSAMPLE_REASN_CD',
       'MEASYEAR', 'MEASMON', 'MEASDAY', 'REMPER', 'KINDCD', 'DESIGNCD',
       'RDDISTCD', 'WATERCD', 'LAT', 'LON', 'ELEV', 'GROW_TYP_CD',
       'MORT_TYP_CD', 'P2PANEL', 'P3PANEL', 'ECOSUBCD', 'CONGCD', 'MANUAL',
       'KINDCD_NC', 'QA_STATUS', 'CREATED_DATE', 'MODIFIED_DATE',
       'MICROPLOT_LOC', 'DECLINATION', 'EMAP_HEX', 'SAMP_METHOD_CD',
       'SUBP_EXAMINE_CD', 'MACRO_BREAKPOINT_DIA', 'INTENSITY', 'CYCLE',
       'SUBCYCLE', 'ECO_UNIT_PNW', 'TOPO_POSITION_PNW',
       'NF_SAMPLING_STATUS_CD', 'NF_PLOT_STATUS_CD',
       'NF_PLOT_NONSAMPLE_REASN_CD', 'P2VEG_SAMPLING_STATUS_CD',
       'P2VEG_SAMPLING_LEVEL_DETAIL_CD', 'INVASIVE_SAMPLING_STATUS_CD',
       'INVASIVE_SPECIMEN_RULE_CD', 'DESIGNCD_P2A', 'MANUAL_DB', 'SUBPANEL',
       'CONDCHNGCD_RMRS', 'FUTFORCD_RMRS', 'MANUAL_NCRS', 'MANUAL_NERS',
       'MANUAL_RMRS', 'PAC

**There are 7177 plots*fire pairs that have burned in the last 24 years.**
- Now we filter TREE by using these plots.  

In [5]:
TREE_IN_BURNED_PLOTS = TREE[TREE['PLOT'].isin(FIREPLOTS['PLOT'])]
print("The numer of tree measurements in fire plots is " + str(len(TREE_IN_BURNED_PLOTS)))


The numer of tree measurements in fire plots is 128183


In [6]:

# Step 1: Identify the first measurements (where PREV_TRE_CN is NaN)
first_measurement_trees = TREE_IN_BURNED_PLOTS[TREE_IN_BURNED_PLOTS['PREV_TRE_CN'].isna()]
first_measurement_trees = first_measurement_trees[['CN']].rename(columns={'CN': 'Root_CN'})

# Join first measurement with the subsequent measurements to get trees measured 2 times
second_measurement_trees = first_measurement_trees.merge(
    TREE_IN_BURNED_PLOTS, left_on='Root_CN', right_on='PREV_TRE_CN', suffixes=('', '_2'))

# Join again to get trees measured 3 times
third_measurement_trees = second_measurement_trees.merge(
    TREE_IN_BURNED_PLOTS, left_on='CN', right_on='PREV_TRE_CN', suffixes=('_2', '_3'))

# Step 3: Count trees measured exactly once, twice, three times, etc.
num_measured_once = len(first_measurement_trees) - len(second_measurement_trees)
num_measured_twice = len(second_measurement_trees) - len(third_measurement_trees)
num_measured_three_times = len(third_measurement_trees)

print("The number of trees measured once is:" +str(num_measured_once))
print("The number of trees measured twice is:"+ str(num_measured_twice))
print("The number of trees measured three times is:"+ str(num_measured_three_times))


The number of trees measured once is:18135
The number of trees measured twice is:54304
The number of trees measured three times is:0


$N = 128,183$
- This is still a lot of trees that appear in plots that have burned.  
- Now we need to filter for PLOTs that have been measured on either side of a burn
    - These measurements years are shown in the **WA_PLOT.csv** dataset in the *MEASYEAR* column

- **PLOT_FIRES_GIS** contains fire, plot pairs.  
    - Lets go through these pairs, referencing WA_PLOT to check if the measurement years are on either side of the fire year
        - This will filter **PLOT_FIRES_GIS** and will add the 2 years measured as columns to **PLOT_FIRES_GIS**.

In [7]:
PLOTS = pd.read_csv('../Data/WA_PLOT.csv')
PLOTS.columns

Index(['CN', 'SRV_CN', 'CTY_CN', 'PREV_PLT_CN', 'INVYR', 'STATECD', 'UNITCD',
       'COUNTYCD', 'PLOT', 'PLOT_STATUS_CD', 'PLOT_NONSAMPLE_REASN_CD',
       'MEASYEAR', 'MEASMON', 'MEASDAY', 'REMPER', 'KINDCD', 'DESIGNCD',
       'RDDISTCD', 'WATERCD', 'LAT', 'LON', 'ELEV', 'GROW_TYP_CD',
       'MORT_TYP_CD', 'P2PANEL', 'P3PANEL', 'ECOSUBCD', 'CONGCD', 'MANUAL',
       'KINDCD_NC', 'QA_STATUS', 'CREATED_DATE', 'MODIFIED_DATE',
       'MICROPLOT_LOC', 'DECLINATION', 'EMAP_HEX', 'SAMP_METHOD_CD',
       'SUBP_EXAMINE_CD', 'MACRO_BREAKPOINT_DIA', 'INTENSITY', 'CYCLE',
       'SUBCYCLE', 'ECO_UNIT_PNW', 'TOPO_POSITION_PNW',
       'NF_SAMPLING_STATUS_CD', 'NF_PLOT_STATUS_CD',
       'NF_PLOT_NONSAMPLE_REASN_CD', 'P2VEG_SAMPLING_STATUS_CD',
       'P2VEG_SAMPLING_LEVEL_DETAIL_CD', 'INVASIVE_SAMPLING_STATUS_CD',
       'INVASIVE_SPECIMEN_RULE_CD', 'DESIGNCD_P2A', 'MANUAL_DB', 'SUBPANEL',
       'CONDCHNGCD_RMRS', 'FUTFORCD_RMRS', 'MANUAL_NCRS', 'MANUAL_NERS',
       'MANUAL_RMRS', 'PAC_ISLA

In [8]:
FIREPLOTS.sample(3)

Unnamed: 0,CN,SRV_CN,CTY_CN,PREV_PLT_CN,INVYR,STATECD,UNITCD,COUNTYCD,PLOT,PLOT_STATUS_CD,...,PREV_PLOT_STATUS_CD_RMRS,REUSECD1,REUSECD2,REUSECD3,GRND_LYR_SAMPLING_STATUS_CD,GRND_LYR_SAMPLING_METHOD_CD,IRWINID,FIRE_YEAR,INCIDENT,GIS_ACRES
4805,174763655020004,44513110020004,82010497,22398400000000.0,2013,53,8,7,81771,1,...,,,,,,,,2002,Deer Point,43363.09
2126,345936697489998,310278370489998,85010497,8604981000000.0,2016,53,9,19,93656,1,...,,,,,,,,1924,UNNAMED,6374.49
6989,24479790010900,24456185010900,92010497,,2003,53,8,47,83033,2,...,,,,,,,,2014,Carlton,501.39


In [9]:
FIREPLOTS_MEAS = FIREPLOTS.copy()
FIREPLOTS_MEAS['MEASYEAR1'] = None
FIREPLOTS_MEAS['MEASYEAR2'] = None
# This will just add MEASYEAR1 and MEASYEAR2
#After we will filter for rows where the fire occurs in between
nums_of_measurements = np.zeros(len(FIREPLOTS))
for index,row in FIREPLOTS_MEAS.iterrows():
    plot = row['PLOT']
    unitcd = row['UNITCD']
    countycd = row['COUNTYCD']
    
    uniqueplot = PLOTS[(PLOTS['PLOT']==plot) & (PLOTS['UNITCD'] == unitcd) & (PLOTS['COUNTYCD']==countycd)]
    if len(uniqueplot) >= 2:
        FIREPLOTS_MEAS.at[index,'MEASYEAR1'] = min(uniqueplot.MEASYEAR)
        FIREPLOTS_MEAS.at[index,'MEASYEAR2'] = max(uniqueplot.MEASYEAR)

FIREPLOTS_MEAS = FIREPLOTS_MEAS.dropna(subset = ['MEASYEAR1','MEASYEAR2'])

In [10]:
FIREPLOTS_FIRE_SANDWICH = FIREPLOTS_MEAS[(FIREPLOTS_MEAS['FIRE_YEAR']>FIREPLOTS_MEAS['MEASYEAR1']) & (FIREPLOTS_MEAS['FIRE_YEAR']<FIREPLOTS_MEAS['MEASYEAR2'])]
print(len(FIREPLOTS_FIRE_SANDWICH))
print(FIREPLOTS_FIRE_SANDWICH['PLOT'].nunique())

1978
579


**IF** I have done this properly there are 1729 fire, plot pairs that are sandwiched between two measurements.  These pairs are on 539 plots.  \
Let me check a couple rows..

In [11]:
FIREPLOTS_FIRE_SANDWICH[['MEASYEAR1','FIRE_YEAR','MEASYEAR2']].sample(10)

Unnamed: 0,MEASYEAR1,FIRE_YEAR,MEASYEAR2
754,2008,2013,2018
807,2010,2012,2021
4569,2009,2015,2019
318,2007,2016,2017
6435,2002,2005,2012
2047,2011,2015,2021
783,2003,2012,2013
1728,2011,2015,2021
5208,2008,2010,2018
3402,2004,2006,2014


**BANG!** \
Look at how those fires are sandwiched

OK.

I am claiming that we use this **FIREPLOTS_FIRE_SANDWICH** dataset to filter our trees.  This will give us our final dataset.  

PLEASE double check my work and logic behind the filtering.  

#### Lets see how many trees fall into these plots that have this measurement structure.  
- First I make a new column in **FIREPLOTS_FIRE_SANDWICH** and **TREE** which is just a concatenation of *UNITCD,COUNTYCD, and PLOT*.  IN THIS ORDER!
- The combination of these three columns is called 'UNIQUE_PLOT_ID'.

In [12]:
FIREPLOTS_FIRE_SANDWICH['UNIQUE_PLOT_ID'] =FIREPLOTS_FIRE_SANDWICH['UNITCD'].astype(str)+' '+FIREPLOTS_FIRE_SANDWICH['COUNTYCD'].astype(str)+' '+FIREPLOTS_FIRE_SANDWICH['PLOT'].astype(str)
TREE['UNIQUE_PLOT_ID'] = TREE['UNITCD'].astype(str)+' '+TREE['COUNTYCD'].astype(str)+' '+TREE['PLOT'].astype(str)

FIREPLOTS_FIRE_SANDWICH['UNIQUE_PLOT_ID'].nunique()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  FIREPLOTS_FIRE_SANDWICH['UNIQUE_PLOT_ID'] =FIREPLOTS_FIRE_SANDWICH['UNITCD'].astype(str)+' '+FIREPLOTS_FIRE_SANDWICH['COUNTYCD'].astype(str)+' '+FIREPLOTS_FIRE_SANDWICH['PLOT'].astype(str)


579

In [13]:
TREES_FOR_US = TREE[TREE['UNIQUE_PLOT_ID'].isin(FIREPLOTS_FIRE_SANDWICH['UNIQUE_PLOT_ID'])]
len(TREES_FOR_US)

25518

$N=25518$ measurements! Now let's see how many are only measured once and get rid of those

In [14]:
first_measurement_trees_ofseries = TREES_FOR_US[TREES_FOR_US['CN'].isin(TREES_FOR_US['PREV_TRE_CN']) & TREES_FOR_US['PREV_TRE_CN'].isna()]
print("The number of trees/measurements that are first measurements of a series is " + str(first_measurement_trees_ofseries.shape[0]))
subsequent_measurement_trees = TREES_FOR_US[TREES_FOR_US['PREV_TRE_CN'].notna()]
print("The number of measurements that are subsequent measurements of a series is " + str(subsequent_measurement_trees.shape[0]))
multiple_measurement_cns = pd.concat([first_measurement_trees_ofseries['CN'], subsequent_measurement_trees['CN']])
single_measurement_trees = TREES_FOR_US[~TREES_FOR_US['CN'].isin(multiple_measurement_cns)]
print("The number of trees that were measured once is " + str(single_measurement_trees.shape[0]))

## Now find number of trees in each category
# Step 1: Identify the first measurements (where PREV_TRE_CN is NaN)
first_measurement_trees = TREES_FOR_US[TREES_FOR_US['PREV_TRE_CN'].isna()]
#first_measurement_trees = first_measurement_trees[['CN']].rename(columns={'CN': 'Root_CN'})

# Join first measurement with the subsequent measurements to get trees measured 2 times
second_measurement_trees = first_measurement_trees.merge(
    TREES_FOR_US, left_on='CN', right_on='PREV_TRE_CN', suffixes=('_1', ''))

# Join again to get trees measured 3 times
third_measurement_trees = second_measurement_trees.merge(
    TREES_FOR_US, left_on='CN', right_on='PREV_TRE_CN', suffixes=('_2', '_3'))


# Step 3: Count trees measured exactly once, twice, three times, etc.
num_measured_once = len(first_measurement_trees) - len(second_measurement_trees)
num_measured_twice = len(second_measurement_trees) - len(third_measurement_trees)
num_measured_three_times = len(third_measurement_trees)

print("The number of trees measured once is:" +str(num_measured_once))
print("The number of trees measured twice is:"+ str(num_measured_twice))
print("The number of trees measured three times is:"+ str(num_measured_three_times))

The number of trees/measurements that are first measurements of a series is 12059
The number of measurements that are subsequent measurements of a series is 12255
The number of trees that were measured once is 1204
The number of trees measured once is:1204
The number of trees measured twice is:12059
The number of trees measured three times is:0


Note that there are 196 measurements that were not captured in the trees measured once or twice categories. Let's dig into that.

In [17]:
# Look at trees not captured in above code
trees_captured_CN =pd.concat([first_measurement_trees['CN'], second_measurement_trees['CN']])
trees_notcaptured = TREES_FOR_US[~TREES_FOR_US['CN'].isin(trees_captured_CN)]
multiple_measurement_trees_notcaptured = trees_notcaptured[trees_notcaptured['CN'].isin(trees_notcaptured['PREV_TRE_CN'])]
misc_trees_notcaptured = trees_notcaptured[trees_notcaptured
print("The number of trees measured twice in this uncaptured tree dataset is:" +str(len(multiple_measurement_trees_notcaptured)))


The number of trees measured twice in this uncaptured tree dataset is:98


The uncaptured trees were measured twice. My code didn't catch them since the first measurement did not have a null value for PREV_TRE_CN

In [23]:
#Get rid of trees measured once 

TREES_FOR_US_multipleMeasurements= TREES_FOR_US[TREES_FOR_US['CN'].isin(multiple_measurement_cns)]
print(len(TREES_FOR_US_multipleMeasurements))

24314
