# Welcome  

Notebook Author: Samuel Alter  
Notebook Subject: Capstone Project - Fire Perimeters Processing

BrainStation Winter 2023: Data Science

This notebook reads the `.geojson` fire perimeter files created in `QGIS`, deals with the `NaN` rows, and cleans the data in anticipation of the next step, running the perimeter dataset through a suite of `statsmodels` and `sklearn` modeling.

In [1]:
# imports

import numpy as np
import pandas as pd
import geopandas as gpd

# Join fire and nofire datasets together

I had created two layers of points, one in areas that experienced no fire, and one that expereinced fire. These point layers had elevation and aspect values joined to them from the underlying raster layers. Since each layer was either completely within a "fire/nofire" area, that means that I already know where the layers are in relation to fire incidence. I simply have to concatenate the two and then I have a dataset of fire/nofire point locations. I can then feed that into a model.

## Read in data

In [2]:
layer_fire=gpd.read_file('/Users/sra/Desktop/Data_Science_2023/_capstone/00_capstone_data/shapefiles/joins/patch_fire_elev_asp.geojson')
print(layer_fire.shape)
display(layer_fire.head())

(9918, 6)


Unnamed: 0,id,layer,path,elevation1,aspect1,geometry
0,0,patch_fire1_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,52.0,14.74356,POINT (310476.000 3778264.200)
1,1,patch_fire1_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,45.0,21.03751,POINT (310552.800 3778264.200)
2,2,patch_fire1_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,45.0,333.434967,POINT (310629.600 3778264.200)
3,3,patch_fire1_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,51.0,283.392487,POINT (310706.400 3778264.200)
4,4,patch_fire1_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,63.0,229.289154,POINT (310783.200 3778264.200)


In [14]:
# are there any duplicates?
dups=layer_fire[layer_fire.duplicated()]
print(dups)

Empty GeoDataFrame
Columns: [id, layer, path, elevation1, aspect1, geometry]
Index: []


In [227]:
layer_nofire=gpd.read_file('/Users/sra/Desktop/Data_Science_2023/_capstone/00_capstone_data/shapefiles/joins/patch_nofire_elev_asp.geojson')
print(layer_nofire.shape)
display(layer_nofire.head())

(9918, 6)


Unnamed: 0,id,layer,path,elevation1,aspect1,geometry
0,0,patch_city_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,229.0,90.0,POINT (361436.400 3782022.600)
1,1,patch_city_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,230.0,180.0,POINT (361513.200 3782022.600)
2,2,patch_city_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,227.0,147.528809,POINT (361590.000 3782022.600)
3,3,patch_city_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,226.0,149.03624,POINT (361666.800 3782022.600)
4,4,patch_city_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,228.0,206.565048,POINT (361743.600 3782022.600)


## Inspect dataframes for `NaN`: use `.isna()` and impute values if necessary

In [228]:
layer_nofire.isna().sum()

id              0
layer           0
path            0
elevation1      0
aspect1       194
geometry        0
dtype: int64

In [229]:
layer_nofire[layer_nofire['aspect1'].isna()==True]

Unnamed: 0,id,layer,path,elevation1,aspect1,geometry
60,60,patch_city_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,209.0,,POINT (366044.400 3782022.600)
99,99,patch_city_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,207.0,,POINT (369039.600 3782022.600)
146,146,patch_city_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,201.0,,POINT (372649.200 3782022.600)
156,156,patch_city_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,197.0,,POINT (373417.200 3782022.600)
171,171,patch_city_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,192.0,,POINT (374569.200 3782022.600)
...,...,...,...,...,...,...
9744,2220,patch_farm_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,5.0,,POINT (308350.200 3778880.400)
9747,2223,patch_farm_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,4.0,,POINT (308580.600 3778880.400)
9762,2238,patch_farm_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,6.0,,POINT (309732.600 3778880.400)
9782,2258,patch_farm_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,5.0,,POINT (308350.200 3778803.600)


In [230]:
perc_nan_nofire=(layer_nofire['aspect1'].isna().sum())/(layer_nofire.shape[0])*100
perc_nan_nofire

1.9560395240976005

In [231]:
print(f'The percentage of nulls to actual numbers in the aspect column is:\n~{round(perc_nan_nofire,2)}')

The percentage of nulls to actual numbers in the aspect column is:
~1.96


In [232]:
layer_fire.isna().sum()

id            0
layer         0
path          0
elevation1    0
aspect1       1
geometry      0
dtype: int64

In [233]:
layer_fire[layer_fire['aspect1'].isna()==True]

Unnamed: 0,id,layer,path,elevation1,aspect1,geometry
78,78,patch_fire1_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,101.0,,POINT (316466.400 3778264.200)


In [234]:
perc_nan_fire=(layer_fire['aspect1'].isna().sum())/(layer_fire.shape[0])*100
perc_nan_fire

0.01008267795926598

In [235]:
print(f'The percentage of nulls to actual numbers in the aspect column is:\n~{round(perc_nan_fire,2)}')

The percentage of nulls to actual numbers in the aspect column is:
~0.01


I want to impute an aspect value to the `NaN` rows. What should I do? Looking at the map, the aspect actually has a value, and I'm not sure why it gave a `NaN`. But there are too many to manually update the aspects for. To impute the aspect, I could set a random value to that point, or copy adjacent points. I will try using a nearest-neighbors approach to impute the missing data.

In [236]:
from scipy.spatial import KDTree

# create a KDTree from the x,y coordinates of the points
tree = KDTree(np.array(layer_nofire.geometry.apply(lambda geom: (geom.x, geom.y))).tolist())

# get the indices of the NaN values in the 'aspect1' column
nan_idx = layer_nofire['aspect1'].isna()

# iterate over the NaN indices and impute the values
for idx in nan_idx[nan_idx].index:
    # get the 4 nearest neighbors to the point at the current index
    _, neighbor_idx = tree.query(np.array(layer_nofire.loc[idx].geometry.coords)[0], k=4)
    
    # compute the average of the 'aspect1' values of the neighbors
    neighbor_vals = layer_nofire.loc[neighbor_idx].aspect1.dropna()
    imputed_val = neighbor_vals.mean()
    
    # set the imputed value for the current index
    layer_nofire.loc[idx, 'aspect1'] = imputed_val

In [237]:
layer_nofire.iloc[60,:]

id                                                           60
layer                                         patch_city_points
path          /Users/sra/Desktop/Data_Science_2023/_capstone...
elevation1                                                209.0
aspect1                                              151.466237
geometry                    POINT (366044.3999999993 3782022.6)
Name: 60, dtype: object

In [238]:
layer_nofire.isna().sum()

id            0
layer         0
path          0
elevation1    0
aspect1       0
geometry      0
dtype: int64

It worked! Now for the `layer_fire` dataset.

In [239]:
# create a KDTree from the x,y coordinates of the points
tree = KDTree(np.array(layer_fire.geometry.apply(lambda geom: (geom.x, geom.y))).tolist())

# get the indices of the NaN values in the 'aspect1' column
nan_idx = layer_fire['aspect1'].isna()

# iterate over the NaN indices and impute the values
for idx in nan_idx[nan_idx].index:
    # get the 4 nearest neighbors to the point at the current index
    _, neighbor_idx = tree.query(np.array(layer_fire.loc[idx].geometry.coords)[0], k=4)
    
    # compute the average of the 'aspect1' values of the neighbors
    neighbor_vals = layer_fire.loc[neighbor_idx].aspect1.dropna()
    imputed_val = neighbor_vals.mean()
    
    # set the imputed value for the current index
    layer_fire.loc[idx, 'aspect1'] = imputed_val

In [240]:
layer_fire.isna().sum()

id            0
layer         0
path          0
elevation1    0
aspect1       0
geometry      0
dtype: int64

In [241]:
layer_fire.iloc[78,:]

id                                                           78
layer                                        patch_fire1_points
path          /Users/sra/Desktop/Data_Science_2023/_capstone...
elevation1                                                101.0
aspect1                                              273.147054
geometry                    POINT (316466.3999999991 3778264.2)
Name: 78, dtype: object

It worked!

## Combine `layer_fire` and `layer_nofire` datasets

First need to create a column denoting which layer is from the fire area and which is from the nofire.

In [242]:
layer_fire['fire']=1
layer_fire.head(3)

Unnamed: 0,id,layer,path,elevation1,aspect1,geometry,fire
0,0,patch_fire1_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,52.0,14.74356,POINT (310476.000 3778264.200),1
1,1,patch_fire1_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,45.0,21.03751,POINT (310552.800 3778264.200),1
2,2,patch_fire1_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,45.0,333.434967,POINT (310629.600 3778264.200),1


In [243]:
layer_nofire['fire']=0
layer_nofire.head(3)

Unnamed: 0,id,layer,path,elevation1,aspect1,geometry,fire
0,0,patch_city_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,229.0,90.0,POINT (361436.400 3782022.600),0
1,1,patch_city_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,230.0,180.0,POINT (361513.200 3782022.600),0
2,2,patch_city_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,227.0,147.528809,POINT (361590.000 3782022.600),0


In [244]:
layer_combine=pd.concat([layer_fire,layer_nofire],axis=0)
layer_combine

Unnamed: 0,id,layer,path,elevation1,aspect1,geometry,fire
0,0,patch_fire1_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,52.0,14.743560,POINT (310476.000 3778264.200),1
1,1,patch_fire1_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,45.0,21.037510,POINT (310552.800 3778264.200),1
2,2,patch_fire1_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,45.0,333.434967,POINT (310629.600 3778264.200),1
3,3,patch_fire1_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,51.0,283.392487,POINT (310706.400 3778264.200),1
4,4,patch_fire1_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,63.0,229.289154,POINT (310783.200 3778264.200),1
...,...,...,...,...,...,...,...
9913,2389,patch_farm_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,22.0,270.000000,POINT (309655.800 3778573.200),0
9914,2390,patch_farm_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,18.0,59.036243,POINT (309732.600 3778573.200),0
9915,2391,patch_farm_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,20.0,264.805573,POINT (309809.400 3778573.200),0
9916,2392,patch_farm_points,/Users/sra/Desktop/Data_Science_2023/_capstone...,17.0,63.434952,POINT (309886.200 3778573.200),0


In [245]:
layer_combine.describe()

Unnamed: 0,id,elevation1,aspect1,fire
count,19836.0,19836.0,19836.0,19836.0
mean,3142.362069,281.611867,178.903351,0.5
std,2213.394884,203.469005,103.025075,0.500013
min,0.0,1.0,0.437359,0.0
25%,1239.0,185.0,93.789381,0.0
50%,2564.5,212.0,176.790237,0.5
75%,5044.0,357.0,261.869904,1.0
max,7523.0,924.0,360.0,1.0


In [246]:
layer_combine[layer_combine['fire']==0].describe()

Unnamed: 0,id,elevation1,aspect1,fire
count,9918.0,9918.0,9918.0,9918.0
mean,3142.362069,158.715568,176.766364,0.0
std,2213.450681,88.066381,105.962961,0.0
min,0.0,1.0,1.4688,0.0
25%,1239.25,171.0,90.0,0.0
50%,2564.5,197.0,171.869904,0.0
75%,5043.75,208.0,270.0,0.0
max,7523.0,324.0,360.0,0.0


In [247]:
layer_combine[layer_combine['fire']==1].describe()

Unnamed: 0,id,elevation1,aspect1,fire
count,9918.0,9918.0,9918.0,9918.0
mean,3142.362069,404.508167,181.040339,1.0
std,2213.450681,211.749173,99.960587,0.0
min,0.0,32.0,0.437359,1.0
25%,1239.25,237.0,107.251465,1.0
50%,2564.5,357.0,178.736259,1.0
75%,5043.75,569.0,257.900124,1.0
max,7523.0,924.0,360.0,1.0


Mean elevation is $281$ meters in the combined dataset.
* In the fire areas, the mean elevation is $404$ meters. 
* In the nofire areas, the mean elevation is $158$ meters.

Aspect is almost identical between the two areas.

In [248]:
# clean up dataframe to have just elevation, aspect, and fire

layer_combine=layer_combine[['elevation1','aspect1','fire']]
layer_combine

Unnamed: 0,elevation1,aspect1,fire
0,52.0,14.743560,1
1,45.0,21.037510,1
2,45.0,333.434967,1
3,51.0,283.392487,1
4,63.0,229.289154,1
...,...,...,...
9913,22.0,270.000000,0
9914,18.0,59.036243,0
9915,20.0,264.805573,0
9916,17.0,63.434952,0


In [249]:
layer_combine.loc[:,'elevation']=layer_combine['elevation1']
layer_combine.loc[:,'aspect']=layer_combine['aspect1']
layer_combine=layer_combine[['elevation','aspect','fire']]
layer_combine=layer_combine.reset_index()
layer_combine=layer_combine[['elevation','aspect','fire']]
layer_combine

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  layer_combine.loc[:,'elevation']=layer_combine['elevation1']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  layer_combine.loc[:,'aspect']=layer_combine['aspect1']


Unnamed: 0,elevation,aspect,fire
0,52.0,14.743560,1
1,45.0,21.037510,1
2,45.0,333.434967,1
3,51.0,283.392487,1
4,63.0,229.289154,1
...,...,...,...
19831,22.0,270.000000,0
19832,18.0,59.036243,0
19833,20.0,264.805573,0
19834,17.0,63.434952,0


### Write `layer_combined` to a `.csv`

In [250]:
path='/Users/sra/Desktop/Data_Science_2023/_capstone/00_capstone_data/shapefiles/joins/layer_combine.csv'

In [251]:
layer_combine.to_csv(path_or_buf=path,index=False)

# Now the `layer_combine.csv` file will be used in the Geoanalysis notebook.