# FILTERING#
Here I am filtering for the plots that burned and have the specific measurement condition that they were measured on either side of a fire.  

#### What I have done here:
- Using WA_FIRE_PLOT_PAIRS, which was created in QGIS and contains all plots/fire pairs, I filter TREE by keeping rows with 'PLOT' that appear in WA_FIRE_PLOT_PAIRS.
    - **Note that this filtering is wrong right now because PLOT is not the unique indentifier.  Instead use a combination of PLOT, COUNTYCD, UNITCD, and STATECD** (STATECD unecessary in our case).  

- Then I add to WA_FIRE_PLOT_PAIRS the measurement years of those plots- this is done correctly as I use PLOT, COUNTYCD, UNITCD.  
- Now I can filter FIREPLOTS_MEAS (the plot/fire pairs with measurement years added) to only keep those that burned inbetween the measurement years.  
- **The final set FIREPLOTS_FIRE_SANDWICH only has these plot/fire pairs with burns inbetween measurements**

- **Now we need to see how many trees fall into these plots.** 

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns
sns.set_style("whitegrid")

Remember to unzip WA_TREE.csv.zip

In [57]:
TREE = pd.read_csv('/Users/henrycladouhos/Desktop/WA_TREE.csv')
print(len(TREE))
TREE.columns

  TREE = pd.read_csv('/Users/henrycladouhos/Desktop/WA_TREE.csv')


504956


Index(['CN', 'PLT_CN', 'PREV_TRE_CN', 'INVYR', 'STATECD', 'UNITCD', 'COUNTYCD',
       'PLOT', 'SUBP', 'TREE',
       ...
       'VOLCSNET_BARK', 'DRYBIO_STEM', 'DRYBIO_STEM_BARK', 'DRYBIO_STUMP_BARK',
       'DRYBIO_BOLE_BARK', 'DRYBIO_BRANCH', 'DRYBIO_FOLIAGE',
       'DRYBIO_SAWLOG_BARK', 'PREV_ACTUALHT_FLD', 'PREV_HT_FLD'],
      dtype='object', length=197)

$N = 504,956$ 
- Now we can remove trees that are in plots that have not burned in the last 24 years.  
- To do this I will use **PLOT_FIRES_GIS**, which was created by intersecting the last 24 years of fire data from the NIFC with our plot latitude and longitude.  *We are introducing our first uncertainty here- the plot latitude and longitude have been 'fuzzed'.*

In [58]:
FIREPLOTS = pd.read_csv('../Data/DISSOLVED_FIRE_PLOT_PAIRS.csv')
print(len(FIREPLOTS))
print(FIREPLOTS.columns)


6168
Index(['CN', 'SRV_CN', 'CTY_CN', 'PREV_PLT_CN', 'INVYR', 'STATECD', 'UNITCD',
       'COUNTYCD', 'PLOT', 'PLOT_STATUS_CD', 'PLOT_NONSAMPLE_REASN_CD',
       'MEASYEAR', 'MEASMON', 'MEASDAY', 'REMPER', 'KINDCD', 'DESIGNCD',
       'RDDISTCD', 'WATERCD', 'LAT', 'LON', 'ELEV', 'GROW_TYP_CD',
       'MORT_TYP_CD', 'P2PANEL', 'P3PANEL', 'ECOSUBCD', 'CONGCD', 'MANUAL',
       'KINDCD_NC', 'QA_STATUS', 'CREATED_DATE', 'MODIFIED_DATE',
       'MICROPLOT_LOC', 'DECLINATION', 'EMAP_HEX', 'SAMP_METHOD_CD',
       'SUBP_EXAMINE_CD', 'MACRO_BREAKPOINT_DIA', 'INTENSITY', 'CYCLE',
       'SUBCYCLE', 'ECO_UNIT_PNW', 'TOPO_POSITION_PNW',
       'NF_SAMPLING_STATUS_CD', 'NF_PLOT_STATUS_CD',
       'NF_PLOT_NONSAMPLE_REASN_CD', 'P2VEG_SAMPLING_STATUS_CD',
       'P2VEG_SAMPLING_LEVEL_DETAIL_CD', 'INVASIVE_SAMPLING_STATUS_CD',
       'INVASIVE_SPECIMEN_RULE_CD', 'DESIGNCD_P2A', 'MANUAL_DB', 'SUBPANEL',
       'CONDCHNGCD_RMRS', 'FUTFORCD_RMRS', 'MANUAL_NCRS', 'MANUAL_NERS',
       'MANUAL_RMRS', 'PAC

- This is still a lot of trees that appear in plots that have burned.  
- Now we need to filter for PLOTs that have been measured on either side of a burn
    - These measurements years are shown in the **WA_PLOT.csv** dataset in the *MEASYEAR* column

- **PLOT_FIRES_GIS** contains fire, plot pairs.  
    - Lets go through these pairs, referencing WA_PLOT to check if the measurement years are on either side of the fire year
        - This will filter **PLOT_FIRES_GIS** and will add the 2 years measured as columns to **PLOT_FIRES_GIS**.

In [59]:
PLOTS = pd.read_csv('../Data/WA_PLOT.csv')
PLOTS.columns

Index(['CN', 'SRV_CN', 'CTY_CN', 'PREV_PLT_CN', 'INVYR', 'STATECD', 'UNITCD',
       'COUNTYCD', 'PLOT', 'PLOT_STATUS_CD', 'PLOT_NONSAMPLE_REASN_CD',
       'MEASYEAR', 'MEASMON', 'MEASDAY', 'REMPER', 'KINDCD', 'DESIGNCD',
       'RDDISTCD', 'WATERCD', 'LAT', 'LON', 'ELEV', 'GROW_TYP_CD',
       'MORT_TYP_CD', 'P2PANEL', 'P3PANEL', 'ECOSUBCD', 'CONGCD', 'MANUAL',
       'KINDCD_NC', 'QA_STATUS', 'CREATED_DATE', 'MODIFIED_DATE',
       'MICROPLOT_LOC', 'DECLINATION', 'EMAP_HEX', 'SAMP_METHOD_CD',
       'SUBP_EXAMINE_CD', 'MACRO_BREAKPOINT_DIA', 'INTENSITY', 'CYCLE',
       'SUBCYCLE', 'ECO_UNIT_PNW', 'TOPO_POSITION_PNW',
       'NF_SAMPLING_STATUS_CD', 'NF_PLOT_STATUS_CD',
       'NF_PLOT_NONSAMPLE_REASN_CD', 'P2VEG_SAMPLING_STATUS_CD',
       'P2VEG_SAMPLING_LEVEL_DETAIL_CD', 'INVASIVE_SAMPLING_STATUS_CD',
       'INVASIVE_SPECIMEN_RULE_CD', 'DESIGNCD_P2A', 'MANUAL_DB', 'SUBPANEL',
       'CONDCHNGCD_RMRS', 'FUTFORCD_RMRS', 'MANUAL_NCRS', 'MANUAL_NERS',
       'MANUAL_RMRS', 'PAC_ISLA

In [60]:
FIREPLOTS.sample(3)

Unnamed: 0,CN,SRV_CN,CTY_CN,PREV_PLT_CN,INVYR,STATECD,UNITCD,COUNTYCD,PLOT,PLOT_STATUS_CD,...,PREV_PLOT_STATUS_CD_RMRS,REUSECD1,REUSECD2,REUSECD3,GRND_LYR_SAMPLING_STATUS_CD,GRND_LYR_SAMPLING_METHOD_CD,INCIDENT,FIRE_YEAR_,area,perimeter
2077,29882959010497,29878196010497,85010497,,2009,53,9,19,78601,1,...,,,,,,,COPPERBTTE,1994,24902290.0,19791.706786
4364,216958092020004,22132266010497,82010497,,2005,53,8,7,97594,3,...,,,,,,,Cougar Creek,2018,172813200.0,111863.33352
1880,273642188489998,273493072489998,92010497,15343640000000.0,2015,53,8,47,56250,2,...,,,,,,,Whitmore,2021,235960200.0,121302.814355


In [61]:
FIREPLOTS_MEAS = FIREPLOTS.copy()
FIREPLOTS_MEAS['MEASYEAR1'] = None
FIREPLOTS_MEAS['MEASYEAR2'] = None
# This will just add MEASYEAR1 and MEASYEAR2
#After we will filter for rows where the fire occurs in between
nums_of_measurements = np.zeros(len(FIREPLOTS))
for index,row in FIREPLOTS_MEAS.iterrows():
    plot = row['PLOT']
    unitcd = row['UNITCD']
    countycd = row['COUNTYCD']
    
    uniqueplot = PLOTS[(PLOTS['PLOT']==plot) & (PLOTS['UNITCD'] == unitcd) & (PLOTS['COUNTYCD']==countycd)]
    if len(uniqueplot) >= 2:
        FIREPLOTS_MEAS.at[index,'MEASYEAR1'] = min(uniqueplot.MEASYEAR)
        FIREPLOTS_MEAS.at[index,'MEASYEAR2'] = max(uniqueplot.MEASYEAR)

FIREPLOTS_MEAS = FIREPLOTS_MEAS.dropna(subset = ['MEASYEAR1','MEASYEAR2'])

In [62]:
FIREPLOTS_FIRE_SANDWICH = FIREPLOTS_MEAS[(FIREPLOTS_MEAS['FIRE_YEAR_']>FIREPLOTS_MEAS['MEASYEAR1']) & (FIREPLOTS_MEAS['FIRE_YEAR_']<FIREPLOTS_MEAS['MEASYEAR2'])]
print(len(FIREPLOTS_FIRE_SANDWICH))
print(FIREPLOTS_FIRE_SANDWICH['PLOT'].nunique())

1596
579


**IF** I have done this properly there are 1596 fire, plot pairs that are sandwiched between two measurements.  These pairs are on 579 plots.  \
Let me check a couple rows..

In [63]:
FIREPLOTS_FIRE_SANDWICH[['MEASYEAR1','FIRE_YEAR_','MEASYEAR2']].sample(10)

Unnamed: 0,MEASYEAR1,FIRE_YEAR_,MEASYEAR2
4997,2004,2013,2014
4495,2009,2014,2019
3830,2011,2015,2021
5308,2011,2017,2021
1614,2004,2006,2014
1559,2003,2006,2013
4516,2005,2014,2015
1939,2003,2005,2013
3552,2006,2015,2016
1005,2006,2015,2016


**BANG!** \
Look at how those fires are sandwiched

OK.

I am claiming that we use this **FIREPLOTS_FIRE_SANDWICH** dataset to filter our trees.  This will give us our final dataset.  

PLEASE double check my work and logic behind the filtering.  

#### Lets see how many trees fall into these plots that have this measurement structure.  
- First I make a new column in **FIREPLOTS_FIRE_SANDWICH** and **TREE** which is just a concatenation of *UNITCD,COUNTYCD, and PLOT*.  IN THIS ORDER!
- The combination of these three columns is called 'UNIQUE_PLOT_ID'.

In [64]:
FIREPLOTS_FIRE_SANDWICH['UNIQUE_PLOT_ID'] =FIREPLOTS_FIRE_SANDWICH['UNITCD'].astype(str)+' '+FIREPLOTS_FIRE_SANDWICH['COUNTYCD'].astype(str)+' '+FIREPLOTS_FIRE_SANDWICH['PLOT'].astype(str)
TREE['UNIQUE_PLOT_ID'] = TREE['UNITCD'].astype(str)+' '+TREE['COUNTYCD'].astype(str)+' '+TREE['PLOT'].astype(str)

FIREPLOTS_FIRE_SANDWICH['UNIQUE_PLOT_ID'].nunique()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  FIREPLOTS_FIRE_SANDWICH['UNIQUE_PLOT_ID'] =FIREPLOTS_FIRE_SANDWICH['UNITCD'].astype(str)+' '+FIREPLOTS_FIRE_SANDWICH['COUNTYCD'].astype(str)+' '+FIREPLOTS_FIRE_SANDWICH['PLOT'].astype(str)


579

In [65]:
TREES_FOR_US = TREE[TREE['UNIQUE_PLOT_ID'].isin(FIREPLOTS_FIRE_SANDWICH['UNIQUE_PLOT_ID'])]
len(TREES_FOR_US)

25518

$N=25518$ measurements! Now let's see how many are only measured once and get rid of those

In [66]:
first_measurement_trees_ofseries = TREES_FOR_US[TREES_FOR_US['CN'].isin(TREES_FOR_US['PREV_TRE_CN']) & TREES_FOR_US['PREV_TRE_CN'].isna()]
print("The number of trees/measurements that are first measurements of a series is " + str(first_measurement_trees_ofseries.shape[0]))
subsequent_measurement_trees = TREES_FOR_US[TREES_FOR_US['PREV_TRE_CN'].notna()]
print("The number of measurements that are subsequent measurements of a series is " + str(subsequent_measurement_trees.shape[0]))
multiple_measurement_cns = pd.concat([first_measurement_trees_ofseries['CN'], subsequent_measurement_trees['CN']])
single_measurement_trees = TREES_FOR_US[~TREES_FOR_US['CN'].isin(multiple_measurement_cns)]
print("The number of trees that were measured once is " + str(single_measurement_trees.shape[0]))

## Now find number of trees in each category
# Step 1: Identify the first measurements (where PREV_TRE_CN is NaN)
first_measurement_trees = TREES_FOR_US[TREES_FOR_US['PREV_TRE_CN'].isna()]
#first_measurement_trees = first_measurement_trees[['CN']].rename(columns={'CN': 'Root_CN'})

# Join first measurement with the subsequent measurements to get trees measured 2 times
second_measurement_trees = first_measurement_trees.merge(
    TREES_FOR_US, left_on='CN', right_on='PREV_TRE_CN', suffixes=('_1', ''))

# Join again to get trees measured 3 times
third_measurement_trees = second_measurement_trees.merge(
    TREES_FOR_US, left_on='CN', right_on='PREV_TRE_CN', suffixes=('_2', '_3'))


# Step 3: Count trees measured exactly once, twice, three times, etc.
num_measured_once = len(first_measurement_trees) - len(second_measurement_trees)
num_measured_twice = len(second_measurement_trees) - len(third_measurement_trees)
num_measured_three_times = len(third_measurement_trees)

print("The number of trees measured once is:" +str(num_measured_once))
print("The number of trees measured twice is:"+ str(num_measured_twice))
print("The number of trees measured three times is:"+ str(num_measured_three_times))

The number of trees/measurements that are first measurements of a series is 12059
The number of measurements that are subsequent measurements of a series is 12255
The number of trees that were measured once is 1204
The number of trees measured once is:1204
The number of trees measured twice is:12059
The number of trees measured three times is:0


Note that there are 196 measurements that were not captured in the trees measured once or twice categories. Let's dig into that.

In [67]:
# Look at trees not captured in above code
trees_captured_CN =pd.concat([first_measurement_trees['CN'], second_measurement_trees['CN']])
trees_notcaptured = TREES_FOR_US[~TREES_FOR_US['CN'].isin(trees_captured_CN)]
multiple_measurement_trees_notcaptured = trees_notcaptured[trees_notcaptured['CN'].isin(trees_notcaptured['PREV_TRE_CN'])]
print("The number of trees measured twice in this uncaptured tree dataset is:" +str(len(multiple_measurement_trees_notcaptured)))


The number of trees measured twice in this uncaptured tree dataset is:98


In [68]:
first_trees_prev_nonNaN = trees_notcaptured[trees_notcaptured['CN'].isin(multiple_measurement_trees_notcaptured['CN'])]

The uncaptured trees were measured twice. My code didn't catch them since the first measurement did not have a null value for PREV_TRE_CN

In [69]:
#Get rid of trees measured once 

TREES_FOR_US_multipleMeasurements= TREES_FOR_US[TREES_FOR_US['CN'].isin(multiple_measurement_cns)]
print(len(TREES_FOR_US_multipleMeasurements))

24314


##### Adding fire and measurement info to TREES_FOR_US
- MEASYEAR1, MEASYEAR2
- FIRE YEAR (Multiple)
- FIRE SIZE


Whats the max number of fires that occured on a plot?


In [70]:
plot_numfires = []
for index, row in FIREPLOTS_FIRE_SANDWICH.iterrows():
    plot_name = row['UNIQUE_PLOT_ID']
    numfires = len(FIREPLOTS_FIRE_SANDWICH[FIREPLOTS_FIRE_SANDWICH['UNIQUE_PLOT_ID']==plot_name])

    plot_numfires.append([plot_name,numfires])

plot_numfires = np.array(plot_numfires)
plot_numfires = pd.DataFrame(plot_numfires, columns = ['UNIQUE_PLOT_ID','NUMFIRES'])
plot_numfires = plot_numfires.drop_duplicates()

plot_numfires['NUMFIRES'] = plot_numfires['NUMFIRES'].astype(int)
plot_numfires['UNIQUE_PLOT_ID'] = plot_numfires['UNIQUE_PLOT_ID'].astype(str)

print(plot_numfires['NUMFIRES'].min())
print(plot_numfires['NUMFIRES'].max())

plot_numfires['NUMFIRES'].idxmax()
plot_numfires.loc[13]

2
8


UNIQUE_PLOT_ID    8 47 79990
NUMFIRES                   8
Name: 13, dtype: object

In [71]:
FIREPLOTS_FIRE_SANDWICH[FIREPLOTS_FIRE_SANDWICH['UNIQUE_PLOT_ID'] == '8 47 79990']

Unnamed: 0,CN,SRV_CN,CTY_CN,PREV_PLT_CN,INVYR,STATECD,UNITCD,COUNTYCD,PLOT,PLOT_STATUS_CD,...,REUSECD3,GRND_LYR_SAMPLING_STATUS_CD,GRND_LYR_SAMPLING_METHOD_CD,INCIDENT,FIRE_YEAR_,area,perimeter,MEASYEAR1,MEASYEAR2,UNIQUE_PLOT_ID
30,29883386010497,29878201010497,92010497,,2009,53,8,47,79990,2,...,,,,Antoine 2,2012,27677780.0,28197.567994,2009,2019,8 47 79990
31,558629649126144,532456097126144,92010497,29883390000000.0,2019,53,8,47,79990,2,...,,,,Antoine 2,2012,27677780.0,28197.567994,2009,2019,8 47 79990
2301,29883386010497,29878201010497,92010497,,2009,53,8,47,79990,2,...,,,,Reach,2015,349790800.0,231497.057459,2009,2019,8 47 79990
2322,558629649126144,532456097126144,92010497,29883390000000.0,2019,53,8,47,79990,2,...,,,,Reach,2015,349790800.0,231497.057459,2009,2019,8 47 79990
4495,29883386010497,29878201010497,92010497,,2009,53,8,47,79990,2,...,,,,Carlton,2014,1035210000.0,436020.430783,2009,2019,8 47 79990
4559,558629649126144,532456097126144,92010497,29883390000000.0,2019,53,8,47,79990,2,...,,,,Carlton,2014,1035210000.0,436020.430783,2009,2019,8 47 79990
5731,29883386010497,29878201010497,92010497,,2009,53,8,47,79990,2,...,,,,Carlton Complex,2014,1020137000.0,407768.451876,2009,2019,8 47 79990
5792,558629649126144,532456097126144,92010497,29883390000000.0,2019,53,8,47,79990,2,...,,,,Carlton Complex,2014,1020137000.0,407768.451876,2009,2019,8 47 79990


The minimum number of fires on one plot is 1 and the maximum is 6.  Note that the 2 and 12 are multiplied by two because FIREPLOTS_FIRE_SANDWICH contains two measurements for each plot.  

Looking at the plot with this maximum number of fires, I NOTICE A PROBLEM...
- Each fire actually has multiple unconnected components
- So the deep harbor fire has three unconnected components that overlapped with the plot with UNIQUE_PLOT_ID = 8 7 66742.
- Not sure what to do with this

**OK, what I did was take the union of all the fire polygons**
Notice that now there are only two rows in the below dataset for each fire, one for each measurement.  

In [72]:
FIREPLOTS_FIRE_SANDWICH[FIREPLOTS_FIRE_SANDWICH['UNIQUE_PLOT_ID'] == '8 7 66742']

Unnamed: 0,CN,SRV_CN,CTY_CN,PREV_PLT_CN,INVYR,STATECD,UNITCD,COUNTYCD,PLOT,PLOT_STATUS_CD,...,REUSECD3,GRND_LYR_SAMPLING_STATUS_CD,GRND_LYR_SAMPLING_METHOD_CD,INCIDENT,FIRE_YEAR_,area,perimeter,MEASYEAR1,MEASYEAR2,UNIQUE_PLOT_ID
757,24126063010900,24120389010900,82010497,,2002,53,8,7,66742,1,...,0.0,,,Deep Harbor,2004,123553700.0,75743.179625,2003,2012,8 7 66742
771,30764138020004,48466120010497,82010497,24126060000000.0,2012,53,8,7,66742,1,...,,,,Deep Harbor,2004,123553700.0,75743.179625,2003,2012,8 7 66742
2155,24126063010900,24120389010900,82010497,,2002,53,8,7,66742,1,...,0.0,,,Domke Lake,2007,47689180.0,47513.496931,2003,2012,8 7 66742
2157,30764138020004,48466120010497,82010497,24126060000000.0,2012,53,8,7,66742,1,...,,,,Domke Lake,2007,47689180.0,47513.496931,2003,2012,8 7 66742


Lets add measyear1 and 2, fire year, fire size, and incident name.  
These last three may repeat up to 4 times (max times a plot burned in between measurements was 4 times)

In [73]:
TREES_FOR_US.columns
TREES_FOR_US.sample(3)

Unnamed: 0,CN,PLT_CN,PREV_TRE_CN,INVYR,STATECD,UNITCD,COUNTYCD,PLOT,SUBP,TREE,...,DRYBIO_STEM,DRYBIO_STEM_BARK,DRYBIO_STUMP_BARK,DRYBIO_BOLE_BARK,DRYBIO_BRANCH,DRYBIO_FOLIAGE,DRYBIO_SAWLOG_BARK,PREV_ACTUALHT_FLD,PREV_HT_FLD,UNIQUE_PLOT_ID
232608,45135972020004,48206622010497,,2011,53,8,77,81960,4,5,...,135.111236,25.013058,1.700383,22.870092,32.302475,0.0,,,,8 77 81960
228262,44961089020004,48205698010497,,2011,53,8,7,91586,2,20,...,3327.746123,654.336883,27.448936,625.383065,258.814594,310.328346,618.348,,,8 7 91586
420548,645088780126144,484819432489998,30520370000000.0,2018,53,9,19,75050,3,336,...,2407.53064,451.426076,14.851749,435.174232,648.515352,168.984129,427.21955,92.0,92.0,9 19 75050


I think this code is working.  \
I know its not pretty.  \
It is adding measurement years and relevant fire information (up to 4 times!).  

In [74]:
TREES_FOR_US['MEASYEAR'] = None
columns_to_add = ['INCIDENT1', 'FIREYEAR1', 'AREA1', 'PERIM1','INCIDENT2', 'FIREYEAR2', 'AREA2', 'PERIM2','INCIDENT3', 'FIREYEAR3', 'AREA3', 'PERIM3','INCIDENT4', 'FIREYEAR4', 'AREA4', 'PERIM4']

for col in columns_to_add:
    TREES_FOR_US[col] = None
#initialize these new columns
for index1, row1 in TREES_FOR_US.iterrows():
    #Assign MEASYEAR1 and MEASYEAR2 
    pltid = row1['UNIQUE_PLOT_ID']
    firesonplot = FIREPLOTS_FIRE_SANDWICH[FIREPLOTS_FIRE_SANDWICH['UNIQUE_PLOT_ID']==pltid]
    #Here we check if PREV_TRE_CN is NaN or if its in c
    #This means its the first tree measured
    if pd.isna(row1['PREV_TRE_CN']) or row1['CN'] in first_trees_prev_nonNaN['CN'].values:
        TREES_FOR_US.loc[index1, 'MEASYEAR'] = firesonplot['MEASYEAR1'].iloc[0]
    else:
        #if we are on the second measurement lets add fire info as well
        TREES_FOR_US.loc[index1, 'MEASYEAR'] = firesonplot['MEASYEAR2'].iloc[0]
        
        i=1
        for index2, row2 in firesonplot.iterrows():
            if i==1 or TREES_FOR_US.loc[index1,f'INCIDENT{i-1}']!=row2['INCIDENT']:
                TREES_FOR_US.loc[index1,f'INCIDENT{i}'] = row2['INCIDENT']
                TREES_FOR_US.loc[index1,f'FIREYEAR{i}'] = row2['FIRE_YEAR_']
                TREES_FOR_US.loc[index1, f'AREA{i}'] = row2['area']
                TREES_FOR_US.loc[index1,f'PERIM{i}'] = row2['perimeter']
                i = i+1



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  TREES_FOR_US['MEASYEAR'] = None
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  TREES_FOR_US[col] = None
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  TREES_FOR_US[col] = None
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead


NOTE THAT: area and perim are in meters.

We can look at the trees that were in both the Deep Harbor fire and the Domke Lake fire.  

In [75]:
TREES_FOR_US[(TREES_FOR_US['INCIDENT1']=='Deep Harbor') & (TREES_FOR_US['INCIDENT2']=='Domke Lake')]

Unnamed: 0,CN,PLT_CN,PREV_TRE_CN,INVYR,STATECD,UNITCD,COUNTYCD,PLOT,SUBP,TREE,...,AREA2,PERIM2,INCIDENT3,FIREYEAR3,AREA3,PERIM3,INCIDENT4,FIREYEAR4,AREA4,PERIM4
251742,177512750020004,30764138020004,24126080000000.0,2012,53,8,7,66742,1,100,...,47689184.150893,47513.496931,,,,,,,,
251743,177512751020004,30764138020004,24126090000000.0,2012,53,8,7,66742,1,101,...,47689184.150893,47513.496931,,,,,,,,
251744,177512754020004,30764138020004,24126090000000.0,2012,53,8,7,66742,2,102,...,47689184.150893,47513.496931,,,,,,,,
251745,177512755020004,30764138020004,24126090000000.0,2012,53,8,7,66742,2,103,...,47689184.150893,47513.496931,,,,,,,,
251746,177512756020004,30764138020004,24126090000000.0,2012,53,8,7,66742,2,104,...,47689184.150893,47513.496931,,,,,,,,
251747,177512757020004,30764138020004,24126100000000.0,2012,53,8,7,66742,2,105,...,47689184.150893,47513.496931,,,,,,,,
251748,177512758020004,30764138020004,24126100000000.0,2012,53,8,7,66742,3,106,...,47689184.150893,47513.496931,,,,,,,,
251749,177512759020004,30764138020004,24126100000000.0,2012,53,8,7,66742,3,107,...,47689184.150893,47513.496931,,,,,,,,
251750,177512760020004,30764138020004,24126100000000.0,2012,53,8,7,66742,3,108,...,47689184.150893,47513.496931,,,,,,,,
251751,177512761020004,30764138020004,24126100000000.0,2012,53,8,7,66742,3,109,...,47689184.150893,47513.496931,,,,,,,,


In [76]:
TREES_FOR_US[pd.isna(TREES_FOR_US['INCIDENT1'])]['PREV_TRE_CN'].info()

<class 'pandas.core.series.Series'>
Index: 13361 entries, 25391 to 504079
Series name: PREV_TRE_CN
Non-Null Count  Dtype  
--------------  -----  
98 non-null     float64
dtypes: float64(1)
memory usage: 208.8 KB


Quick check above: All of the trees with INCIDENT1 == None, NaN, are the first measurements of the trees. i.e. they havent burned yet and have no PREV_TRE_CN.  \
- 98 Trees have INCIDENT1 values of None which have PREV_TRE_CN values.  

Lets just get rid of the trees that were only measured once.  Using Allies TREES_FOR_US_multipleMeasurements df.  

In [77]:
len(TREES_FOR_US)

25518

In [78]:
TREES_FOR_US.loc[TREES_FOR_US['CN'].isin(first_trees_prev_nonNaN['CN']), 'PREV_TRE_CN'] = pd.NA

In [79]:
TREES_FOR_US_FINAL = TREES_FOR_US[~TREES_FOR_US['CN'].isin(single_measurement_trees['CN'])]

In [80]:
len(TREES_FOR_US_FINAL)

24314

In [81]:
TREES_FOR_US_FINAL.to_csv('../Data/TREES_FOR_US.csv')

In [82]:
TREES_FOR_US_FINAL[pd.isna(TREES_FOR_US_FINAL['INCIDENT1'])]['PREV_TRE_CN'].info()

<class 'pandas.core.series.Series'>
Index: 12157 entries, 25391 to 235657
Series name: PREV_TRE_CN
Non-Null Count  Dtype  
--------------  -----  
0 non-null      float64
dtypes: float64(1)
memory usage: 190.0 KB


Above: Showing that all the trees with Null Incident1 also have Null PREV_TRE_CN

In [None]:
first = TREES_FOR_US_FINAL[pd.isna(TREES_FOR_US_FINAL['PREV_TRE_CN'])]
second = TREES_FOR_US_FINAL[~pd.isna(TREES_FOR_US_FINAL['PREV_TRE_CN'])]

print(len(first),len(second))

#first['CN'].nunique()
#second['PREV_TRE_CN'].nunique()

len(first[first['CN'].isin(second['PREV_TRE_CN'])])




12157 12157


12157

In [92]:
from sklearn.model_selection import train_test_split

In [93]:
first_train, first_test = train_test_split(first, test_size = .2, random_state=216, shuffle=True)

In [94]:
second_train = second[second['PREV_TRE_CN'].isin(first_train['CN'])]

In [97]:
#len(first_train)
len(second_train)

9725