-
Notifications
You must be signed in to change notification settings - Fork 1
/
PhilaFire_Final_Presi_1.Rmd
1243 lines (1019 loc) · 50.8 KB
/
PhilaFire_Final_Presi_1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Understanding and Forecasting the Community Impacts of Structure Fire"
author: "Kendra Hills, Myron Bañez, & Ben Keel"
date: "MUSA Practicum Feb 2023"
output:
html_document:
toc: yes
toc_float: yes
toc_depth: 3
fig_caption: yes
theme: lumen
---
## **Introduction**
![](/Users/kendrae.hills/Desktop/Spring 2023/MUSA_Practicum/Picture1.png)
### Abstract
This projects explores the experiences of properties and neighborhoods after fires occur with the purpose of developing a predictive model that can inform the Philadelphia Fire Department (PFD) on the likelihood of redevelopment or vacancies for fire impacted structures. This intelligence will allow PFD and partner agencies to understand when pro-active expertise and services might have their highest impact. We want to thank Commissioner Adam Theil, Kathy Matheson, Andrew Newell the PFD, and our instructors Matt Harris and Michael Fichman for their guidance and support for this project.
### Motivation & Use Case
In 2022, there were 1.2 million structure fires in the country that led to 2,500 deaths — 276 of them children. Last year major cities like Philadelphia and New York grappled with severe and deadly structure fires. In Philadelphia specifically,41 people died from fires, while nearly 200 were injured,and thousands were displaced.
As it stands, the PFD has very limited knowledge about what happens after their job is complete, and there is no programmed set of economic development interventions that are used by public agencies in Philadelphia as a response to fire. The PFD expressed their desire to better understand and predict consequences of a fire so that they can better understand recovery patterns.
Through a storytelling lens that will contextualize our research at the incident level, this project development will provide the PFD with predictions of X for each property, depending on fire severity, and visualize it to allow them to study and gain understanding of aftermath trends. Our application will be be used as an interactive tool to assist PFD and partner agencies to understand when pro-active expertise and services might have their highest impact.
```{r Setup, include=FALSE}
knitr::opts_chunk$set(echo= TRUE, warning = FALSE, message = FALSE)
# Set Up
library(boxr)
library(mapview)
library(sf)
library(tidyverse)
library(knitr)
library(kableExtra)
library(tigris)
library(viridis)
library(dplyr)
library(tidycensus)
library(ggplot2)
library(RSocrata)
library(lubridate)
library(janitor)
library(proxy)
library(FNN)
library(plotROC)
library(pROC)
library(ggcorrplot)
options(scipen = 999)
mapTheme <- theme(plot.title =element_text(size=12),
plot.subtitle = element_text(size=8),
plot.caption = element_text(size = 6),
axis.line=element_blank(),
axis.text.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks=element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank(),
panel.background=element_blank(),
panel.border=element_blank(),
panel.grid.major=element_line(colour = 'transparent'),
panel.grid.minor=element_blank(),
legend.direction = "vertical",
legend.position = "right",
#plot.margin = margin(1, 1, 1, 1, 'cm'),
legend.key.height = unit(1, "cm"), legend.key.width = unit(0.2, "cm"))
#Color Palettes
palette2 <- c("#b9cfcf", "#e19825")
palette3_sat <- c("#e19825","#d55816","#7b230b")
palette3_desat <- c("#B19C7D","#7F5F52","#262626")
palette4 <- c("#f1c82b","#e19825","#d55816","#7b230b")
palette4_desat <- c("#B19C7D","#B27D49","#7F5F52","#262626")
palette5_sat <- c("#f1c82b","#e19825","#d55816","#7b230b","#413028")
pallette5_desat <- c("#ead5b7","#d2b190","#b18e6f","#7f5f52","#413028")
palette7_cats <- c("#b9cfcf","#20b1ae","#e19825","#7b230b","#b47c49", "#3f3128", "#8f8172")
#Sources for Graphs
creditFire <- "Source: Philadelphia Fire Department"
creditOpen <- "Source: Open Data Philly"
g<-glimpse
source("https://raw.githubusercontent.com/urbanSpatial/Public-Policy-Analytics-Landing/master/functions.r")
```
```{r Box Set Up, include=FALSE}
#Loading in Fire Data
## INSERT BOX WORKFLOW HERE ##
box_auth(client_id = "zhmuvtkofhuu21py025uhr8cffzew2om",
client_secret = "k39pMJM1FUnBmGnRrM9zVUWk1eth0EVi")
box_setwd(186732420366)
box_getwd()
box_ls()
list <- box_ls() %>% as.data.frame()
structureFire <- box_read_excel(1093000179542)
dat <- structureFire
```
```{r Cleaning The Dataset, include=FALSE}
# Creating geometry for the fires
dat <- dat %>% drop_na("Longitude", "Latitude") %>%
st_as_sf(coords = c("Longitude", "Latitude"), crs = "EPSG:4326")
# Cleaning the column names
dat<-clean_names(dat)
# Address string
dat <-
dat %>% mutate(street_type = ifelse(street_type == 'AV', "AVE", street_type)) %>%
unite(address, c('address_number', 'street_prefix', 'street_name', 'street_type'), sep = " ", remove = FALSE, na.rm=TRUE)
# Extracting quarter
dat <- dat %>% mutate(quarter = floor_date(alarm_date, unit="quarter"))
# Reducing columns
dat <- dat %>%
dplyr::select(address, quarter, property_use, incident_number, number_of_exposures, incident_type, building_status, fire_spread, no_spread, code_description, geometry, alarm_date, cad_nature_code_description,
minor_damage, significant_damage, heavy_damage, extreme_damage)
# Removing duplicates
dat <- dat[!duplicated(dat$incident_number),]
```
## **Exploratory Analysis**
### The Data: Understanding Fire Incidents
With the help of the PFD, we were given access to proprietary data that consist of rich and extensive fire data collected by the department dating back to 2009.
To better understand the outcomes of fire impacted fires,we first conducted prelminatry literature review research on why fires occur in the first place. The following model, originally developed by Charles Jennings(1996),is a conceptualized model that represents the interrelationships between environmental, structural, and human factors as they relate to fire. We find this model useful as a way to devlop more powerful predictors of the incidence of fire and nuanced model to determine their social and economic impacts in the future.
![Jennings, 1996](/Users/kendrae.hills/Desktop/Spring 2023/MUSA_Practicum/Presentation images /Concept_diagram.png)
### How Many Fires Occur Per Address?
```{r Counting the Number of Fires Per Address, echo=FALSE, message=FALSE, warning=FALSE}
#Count the number of Fires per address
nFires_perAddress <- dat%>%
st_drop_geometry()%>%
count(address, sort=TRUE)%>%
left_join(dplyr::select(dat, address), by="address")%>% #removed na.rm=TRUE
st_as_sf()
#remove duplicates from above
nFires_perAddress <- nFires_perAddress[!duplicated(nFires_perAddress$address),]
#Barplot of Counts of Fires for Each Address
nFires_perAddress%>%
filter(n < 7)%>%
ggplot()+
geom_bar(mapping=aes(x=as.factor(n)), fill="#A5300F")+
labs(title="Number of Fires Per Address",
subtitle="Philadelphia County, 2009-2022")+
xlab("Count of Fires")+
ylab("Number of Structures")+
theme(panel.background = element_rect(fill = "#f3efe0"))
```
There have been at least one incident of a fire for over 15,000 structures in Philadelphia.
### Counts Over Time
```{r Fire Counts Over Time, echo=FALSE}
#Line plot of Fires per quarter, Min to Max
dat %>%
ggplot(aes(x=as_date(quarter))) +
geom_bar(fill="#A5300F")+
labs(title = "Quarterly Count of Unique Fire Incidents",
subtitle = "Philadelphia County, 2009 Q1 - 2022 Q4",
y = "Number of Fires")+
scale_x_date(name = "Year", date_breaks = "1 year")+
theme(axis.text.x = element_text(angle = 45, hjust = 1),
panel.background = element_rect(fill = "#f3efe0"))
```
There is little unique patterns to the data, and it appears that fires have been relatively consistent over the last 15 years.
- Note: fires dip from July to September
### Property Use
Focusing on residential cou
```{r Building Use, echo=FALSE}
#Building Use for All Fires
#Plotting the Frequency of Fires based on Property Use
nFires_perAddress_BuildAll <- nFires_perAddress %>%
left_join(st_drop_geometry(dplyr::select(dat, property_use, address)), by="address")
#Bar plot of property use counts
dplyr::select(nFires_perAddress_BuildAll, -address)%>%
st_drop_geometry()%>%
gather(Variable, value, -n)%>%
count(Variable, value)%>%
group_by(Variable)%>%
filter(n > 150)%>%
ggplot(., aes(value, n))+
geom_bar(position = "dodge", stat="identity", fill="#A5300F") +
labs(x="Category", y="Frequency",
title = "Top 10 Property Uses Among All Structure Fires",
subtitle = "Philadelphia County, 2009-2022")+
theme(axis.text.x = element_text(angle = 45, hjust = 1),
panel.background = element_rect(fill = "#f3efe0"))
```
If we cut out non-residential buildings, which have very different types of fires and post-fire outcomes than residential buildings (supercategory 4), we still keep the vast majority of the data.
```{r eval=FALSE, include=FALSE}
#Count Fires by PropUse and Add Super Category column
## Error with lines 203-205
#nFires_perAddress_PropUse <- nFires_perAddress %>%
# left_join(st_drop_geometry(dplyr::select(structureFire_sf_addressU,`Property Use`, address)), by="address")%>%
#mutate(Property_Use_SuperCat = substr(`Property Use`, 1, 1))
#Commenting this out for now, fix later!
```
```{r Share of Property Use Amongst Super Categories, echo=FALSE}
#Chart of Building Status counts
nFires_perAddress_BuildAll%>%
mutate(Property_Use_SuperCat = substr(property_use, 1, 1))%>%
dplyr::select(-address, -property_use)%>%
st_drop_geometry()%>%
gather(Variable, value, -n)%>%
count(Variable, value, sort = TRUE)%>%
group_by(Variable)%>%
summarize(`Property Use Supercategory` = value, `Share (%)` = round((n/sum(n)*100), 2))%>%
kable()%>%
kable_styling()
```
### Measures of Fire Severity
When measuring outcomes of buildings that experience a fire, the obvious question is how severe the fire was. There are multiple possible determinants in the data. We ended up using the first one, the ordinal variable labeled "fire spread". Here's how we made our decision:
- Disruption of fire spread within he fire spread severity. SO using fire spread is ideal
- Incident type: weighted to one cateogry, not as useful
- No record for a lot of fire incidents
#### Fire Spread
In our research, time had a big correlation with damage. The longer fire burns close to metal and concrete, the hotter those critical infrastructure pieces get, and the less able they are to hold weight.
Source: https://www.homego.com/blog/house-fire-damage/
This was the simplest and most normally-distributed feature as compared to the three others we considered.
```{r Bar Chart of Fire Spread, echo=FALSE}
dat %>%
st_drop_geometry()%>%
count(fire_spread)%>%
ggplot()+
geom_col(mapping=aes(x=as.factor(fire_spread), y=n, fill=fire_spread))+
labs(title = "Number of Fires by Fire Spread",
subtitle = "Philadelphia County, 2009-2022",
caption = creditFire,
x="Fire Spread Code",
y="Number of Fires")+
scale_fill_manual(values = palette5_sat,
name = "Fire Spread \nConfined To:",
labels = c("Object", "Room", "Floor", "Building", "Beyond", "NA"))+
theme(plot.title = element_text(size=18),
panel.background = element_rect(fill = "#f3efe0"))
```
#### Floor Damage Counts
Originally these counts seemed useful as direct measures of damage, but the large amount of "no record" instances means that it can't function as a reliable metric.
```{r Fire Damage Counts Setup, include=FALSE}
g(dat)
#How to measure severity in detail beyond
sFire_severity <- dat%>%
st_drop_geometry()%>%
dplyr::select(incident_type, minor_damage, significant_damage, heavy_damage, extreme_damage, fire_spread)%>%
mutate(Worst_Damage = ifelse(extreme_damage > 0, "Extreme",
ifelse(heavy_damage > 0, "Heavy",
ifelse(significant_damage > 0, "Significant",
ifelse(minor_damage > 0, "Minor", "No Record")))))%>%
count(Worst_Damage, fire_spread)
```
```{r Fire Damage Counts, echo=FALSE}
ggplot(sFire_severity)+
geom_col(mapping=aes(x=as.factor(Worst_Damage), y=n, fill=fire_spread))+
labs(title = "Number of Fires by Worst Recorded Floor Damage",
subtitle = "Philadelphia County, 2009-2022",
caption = creditFire,
x="Incident Type Code",
y="Number of Fires")+
scale_fill_manual(values = palette5_sat,
name = "Fire Spread \nConfined To:",
labels = c("Object", "Room", "Floor", "Building", "Beyond", "NA"))+
theme(plot.title = element_text(size=18),
panel.background = element_rect(fill = "#f3efe0"))
```
#### Cad Nature Code Description
We have more specific categories, like CAD nature code description.
```{r undefined echo=FALSE}
#Count the unique values for CAD
nFires_CADDescr <- dat %>%
st_drop_geometry%>%
group_by(fire_spread)%>%
count(cad_nature_code_description, sort=TRUE)
#Barplot of counts, by count
nFires_CADDescr%>%
filter(n>50)%>%
ggplot()+
geom_col(mapping=aes(x=cad_nature_code_description, y=n, fill=fire_spread))+
labs(title="Frequency of Fire Types",
subtitle="Philadelphia County, 2009-2022, Above 50 Unique Incidents")+
xlab("CAD Nature Code Description")+
ylab("Number of Fires")+
scale_fill_manual(values = palette5_sat,
name = "Fire Spread \nConfined To:",
labels = c("Object", "Room", "Floor", "Building", "Beyond", "NA"))+
theme(axis.text.x = element_text(angle = 45, hjust = 1),
panel.background = element_rect(fill = "#f3efe0"))
```
#### Incident Type
Incident type measures whether the interior or exterior of the structure has collapsed. Considering outcomes like demolition, vacancy, and major construction for homes affected by a fire, then collapse seems like a strong candidate for collelation.
However, the vast majority of cases are type 1110: no collapse. This doesn't make it a very helpful measure for severity, especially when we can see fire spread varying so heavily inside these categories.
```{r Incident Type Bar Plot All Severities, echo=FALSE}
dat %>%
st_drop_geometry()%>%
count(incident_type, fire_spread)%>%
ggplot()+
geom_col(mapping=aes(x=as.factor(incident_type), y=n, fill=fire_spread))+
labs(title = "Number of Fires by Incident Type, All Severities",
subtitle = "Philadelphia County, 2009-2022",
caption = creditFire,
x="Incident Type Code",
y="Number of Fires")+
scale_fill_manual(values = palette5_sat,
name = "Fire Spread \nConfined To:",
labels = c("Object", "Room", "Floor", "Building", "Beyond", "NA"))+
theme(plot.title = element_text(size=18),
panel.background = element_rect(fill = "#f3efe0"))
```
```{r Incident Type Bar Plot Greater Severities, echo=FALSE}
dat %>%
st_drop_geometry()%>%
count(incident_type, fire_spread)%>%
filter(incident_type != 111 & incident_type != 1110)%>%
ggplot()+
geom_col(mapping=aes(x=as.factor(incident_type), y=n, fill=fire_spread))+
labs(title = "Number of Fires by Incident Type, Greater Severities",
subtitle = "Philadelphia County, 2009-2022",
caption = creditFire,
x="Incident Type Code",
y="Number of Fires")+
scale_fill_manual(values = palette5_sat,
name = "Fire Spread \nConfined To:",
labels = c("Object", "Room", "Floor", "Building", "Beyond", "NA"))+
theme(plot.title = element_text(size=18),
panel.background = element_rect(fill = "#f3efe0"))
```
As a result, we picked fire spread as our measure of severity.
### Outliers
We will classify an outlier as an observation more than 3 standard deviations away from the population mean.
The mean number of fires per location is 1.115, weighted by the large amount of places where only 1 fire has occurred. The standard deviation of this fire population is 0.521. The result, 2.678, means any of the 293 locations with three or more fires will be classified as an outlier.
```{r ACS Data Loading, include=FALSE}
acs_vars <- c("B01001_001E")
acsTractsPHL.2020 <- get_acs(geography = "tract",
year = 2020,
variables = acs_vars,
geometry = TRUE,
state = "PA",
county = "Philadelphia",
output = "wide")
```
```{r undefined, echo=FALSE}
nFires_perAddress_Outliers <- filter(nFires_perAddress, n>2)
ggplot()+
geom_sf(data=acsTractsPHL.2020, fill='#f0efe0', color='dark gray')+
geom_sf(data=nFires_perAddress_Outliers, aes(color=q5(n)), alpha=0.5)+
scale_color_manual(values=palette5_sat, labels=qBr(nFires_perAddress_Outliers, "n"))+
labs(title = "Addresses With 3+ Fires") + mapTheme()
```
Apart from centers of population, there is not an obvious geographic distribution of the outliers. More research could be useful with ACS data to determine correlations.
```{r echo=FALSE}
#Plotting the Frequency of Fires based on the Outliers' property use
dplyr::select(nFires_perAddress_BuildAll, -address)%>%
filter(n>2)%>%
st_drop_geometry()%>%
gather(Variable, value, -n)%>%
count(Variable, value)%>%
group_by(Variable)%>%
filter(n>23)%>%
ggplot(., aes(value, n))+
geom_bar(position = "dodge", stat="identity", fill="#A5300F") +
labs(x="Category", y="Frequency",
title = "Top 10 Property Uses Among Addresses with 3+ Fires",
subtitle = "Philadelphia County, 2009-2022",
credit = creditFire)+
theme(axis.text.x = element_text(angle = 45, hjust = 1),
panel.background = element_rect(fill = "#f3efe0"))
```
There is a difference among property use, in that 215, the designation for "schools, high/junior/middle" is now in the top 3. Code 500, the designation for "mercantile, business, other", also has a greater share than in the complete data set.
Regardless of the cause, these school and commercial fires are very different in their causes and outcomes than a residential fire. This outlier analysis supports our decision to remove the non-residential fires from our research in order to focus what narratives we can observe.
## **Data Wrangling**: Panal Data Analysis
A key element to the exploratory analysis is understanding how fires relate to properties and other relevant data that will help us better predict post fire impacted properties. To start, we have decided to work with 311 complaints, permit data, and property assessment data to further explore and craft differnt post fire scenarios.
### Fire Panel - Initial, Count, and Final Panel
The fire panel is created as the base template containing the key information related to fire incidents. The dataset first undergoes numerous operations to refine the information.
The current structure of the initial dataset contains information regarding fire incidents from January 2009 - December 2022. Despite having the same address, there will be a new record of fire incident for that address as each fire has a unique incident number. In order to see the count and severity of properties with and without a fire incident we create a panel to display the observations of every possible address and time combination.
```{r Initial Panel, echo=TRUE}
# Initial panel
dat.panel <-
expand.grid(quarter = unique(dat$quarter),
address = unique(dat$address))
```
```{r Count Panel, echo=TRUE}
# Count Panel
count.panel <-
dat %>%
st_drop_geometry() %>%
group_by(quarter, address, fire_spread) %>%
count(address, sort=TRUE)
# Changing address to factor for join purposes later
count.panel$address <-
as.factor(count.panel$address)
```
```{r Final Panel, echo=TRUE}
# Final Panel
final.panel <- left_join(dat.panel, count.panel, by=c("address", "quarter")) # Join
final.panel <- final.panel %>% dplyr::select(address, quarter, fire_spread, n) %>% rename(count = n) # Condensing & renaming
final.panel[is.na(final.panel)] <- 0 # Assigning 0 to NA for everything
final.panel$fire_spread <- as.numeric(final.panel$fire_spread) # Making fire_spread numeric
# Calculating the maximum severity score
final.panel <-
final.panel %>%
group_by(address, quarter, count) %>% summarise(severity_index = max(fire_spread))
g(final.panel)
```
### OPA Panel
```{r OPA Upload and building counts, echo=TRUE}
#Data from https://www.opendataphilly.org/dataset/opa-property-assessments
#metadata at: https://metadata.phila.gov/#home/datasetdetails/5543865f20583086178c4ee5/representationdetails/55d624fdad35c7e854cb21a4/?view_287_page=3
# Reading the data
# Kendra
#opa_dat <- read_csv("/Users/kendrae.hills/Desktop/Spring 2023/MUSA_Practicum/opa_properties_public-2.csv")
# Myron
opa_dat <- read_csv("/Users/myronbanez/Desktop/Coding/PhilaFireData/OpenDataPhilly-opa_properties_public.csv")
# Creating geometry for the properties
opa_dat <- opa_dat%>%
drop_na(lng, lat)%>%
st_as_sf(coords = c("lng", "lat"),
crs = "EPSG:4326")
g(opa_dat)
```
```{r OPA Setup, echo=TRUE}
# Reducing columns
opa_dat_small_sf <- opa_dat[!duplicated(opa_dat$location),] %>%
dplyr::select(location, category_code, category_code_description, building_code, building_code_description, building_code_new, building_code_description_new, total_area, total_livable_area, owner_1, owner_2, market_value, market_value_date, mailing_street, number_of_bedrooms, number_of_bathrooms, number_stories, interior_condition, assessment_date, year_built, year_built_estimate, zoning, quality_grade, central_air, exterior_condition, fireplaces, fuel, taxable_building, topography, type_heater, sale_price, separate_utilities)%>%
rename(address = location)
# Filtering for residential properties
opa_dat_small_sf <- opa_dat_small_sf %>% filter(category_code == 1 | category_code == 2 | category_code == 3)
# Extracting just the addresses
opa_dat_small <- opa_dat_small_sf%>%
dplyr::select(address)%>%
st_drop_geometry()
```
```{r Time Panel, echo=TRUE}
# Time Panel
quarter <- c("Q1", "Q2", "Q3", "Q4") # Creating quarters
year <- c(2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022) # Creating years
comb <- expand.grid(year = year, quarter = quarter)%>%
mutate(yq = paste(year, ":", quarter))%>%
mutate(yqDT = yq(yq))%>%
arrange(yqDT)
time.panel <- as_tibble(comb)%>%
dplyr::select(yqDT)
```
```{r OPA Panel, include=FALSE}
opa.panel <- expand.grid(address = opa_dat_small$address,
quarter = time.panel$yqDT)
```
### Combined Panel
```{r OPA + Fire Panel, echo=TRUE}
# Combining OPA and Fire
opa_count.panel <- full_join(opa.panel, final.panel, by=c("address", "quarter")) # Join
opa_count.panel[is.na(opa_count.panel)] <- 0 # Assigning 0 to NA for everything
```
# Adding Open Data
### 311 Call Data
```{r loading 311 data, include=FALSE}
#311 Data Upload, downloaded from https://data.phila.gov/visualizations/311-requests/
All311 <- read_csv("/Users/myronbanez/Desktop/Coding/PhilaFireData/OpenDataPhilly-311Calls.csv")
AllLI <- st_read("/Users/myronbanez/Desktop/Coding/PhilaFireData/complaints.geojson")
```
Filtering 311 Data to the appropriate categories. Data is limited to only 2014-2020. START RUNNING HERE
```{r undefined , include=FALSE}
#Filtering to only the fire/building-relevant terms
#strictly filtering to vacancy complaints for initial combination
property311 <- filter(All311,
#service_name == "Building Dangerous" |
#service_name == "Dangerous Building Complaint " |
#service_name == "Fire Safety Complaint" |
#service_name == "Maintenance Complaint" |
#service_name == "Maintenance Residential or Commercial" |
service_name == "Vacant House or Commercial" ) %>%
#service_name == "Fire Residential or Commercial" |
#service_name == "Complaints against Fire or EMS"
dplyr::select(objectid, service_request_id, status, service_name, service_code, requested_datetime, agency_responsible, address, zipcode, lat, lon)%>%
drop_na(lat, lon, address)%>%
st_as_sf(coords = c("lon", "lat"),
crs = "EPSG:4326")
#Reducing variables and calculating the quarter of the calls
prop311_small <- property311%>%
dplyr::select(service_name, requested_datetime, address)%>%
st_drop_geometry()%>%
mutate(quarter = floor_date(requested_datetime, unit="quarter"))
#counting the calls per address per quarter
vacant311_count <- prop311_small%>%
group_by(address, quarter)%>%
count(address, sort=TRUE)%>%
rename(n_311Vacant = n)
vacant311_count%>%
group_by(quarter)%>%
summarize(count = sum(n_311Vacant))%>%
ggplot(aes(x=quarter, y=count))+
geom_col(fill="#A5300F")+
labs(title="311 Vacancy Complaints",
subtitle="Philadelphia County, 2009-2022")+
xlab("Date, Rounded to Beginning of Quarter")+
ylab("Number of Fires")+
theme(axis.text.x = element_text(angle = 45, hjust = 1),
panel.background = element_rect(fill = "#f3efe0"))
#Data is now ready to join with panel
```
### L&I Data
Designations for L&I data switch during certain periods, but it does cover the span of all the fire data.
```{r LI filter}
#strictly filtering to vacancy complaints for initial combination
vacantLI <- filter(AllLI,
complaintcodename == "VACANT HOUSE" |
complaintcodename == "VACANT HOUSE RESIDENTIAL" |
complaintcodename == "SPECIAL VACANT HOUSE" |
complaintcodename == "VACANT PROPERTY COMPLAINT" ) %>%
dplyr::select(address, addressobjectid, complaintdate, complaintcodename, geometry)%>%
mutate(quarter = as_date(floor_date(complaintdate, unit="quarter")))%>%
st_set_crs("EPSG:4326")
vacantLI_count <- vacantLI%>%
st_drop_geometry%>%
drop_na(address)%>%
group_by(address, quarter)%>%
count(address, sort=TRUE)%>%
rename(n_Vacant = n,
address = address,
quarter = quarter)
vacantLI_count$address <- as.factor(vacantLI_count$address)
vacantLI_count%>%
group_by(quarter)%>%
summarize(count = sum(n_Vacant))%>%
ggplot(aes(x=quarter, y=count))+
geom_col(fill="#A5300F")+
labs(title="L&I Vacancy Complaints",
subtitle="Philadelphia County, 2009-2022")+
xlab("Date, Rounded to Beginning of Quarter")+
ylab("Number of Fires")+
theme(axis.text.x = element_text(angle = 45, hjust = 1),
panel.background = element_rect(fill = "#f3efe0"))
#Data is now ready to join with panel
```
We combine these for better coverage 2014-2020, but we take out the calls that happened during the same time on the same address.
```{r}
#Outer join to ensure no date/quarter combos are the same, getting unique values
vacant311_count_Clean <- vacant311_count%>%
anti_join(vacantLI_count, by=c("address", "quarter"))
#Row bind L&I and Clean 311 together.
vacantLI311 <- rbind(vacantLI_count, vacant311_count_Clean)
vacantLI311%>%
group_by(quarter)%>%
summarize(count = sum(n_Vacant))%>%
ggplot(aes(x=quarter, y=count))+
geom_col(fill="#A5300F")+
labs(title="L&I Vacancy Complaints",
subtitle="Philadelphia County, 2009-2022")+
xlab("Date, Rounded to Beginning of Quarter")+
ylab("Number of Fires")+
theme(axis.text.x = element_text(angle = 45, hjust = 1),
panel.background = element_rect(fill = "#f3efe0"))
vacant311_count$address <- as.factor(vacant311_count$address)
```
### Permit Data
Importing Permit Data by combining their two sets together.
```{r Permit Data, echo=TRUE}
#Importing Permit Data
#Data from https://www.opendataphilly.org/dataset/licenses-and-inspections-building-permits
#metadata at: https://metadata.phila.gov/#home/datasetdetails/5543868920583086178c4f8f/representationdetails/5e9a01ac801624001585ca11/
permits0715 <- read_csv("/Users/myronbanez/Desktop/Coding/PhilaFireData/OpenDataPhilly-permits_0715.csv")
permits1623 <- read_csv("/Users/myronbanez/Desktop/Coding/PhilaFireData/OpenDataPhilly-permits_1623.csv")
permitsAll <- rbind(permits0715, permits1623)
permits_sf <- permitsAll%>%
drop_na(lng, lat)%>%
st_as_sf(coords = c("lng", "lat"),
crs = "EPSG:4326")
```
Cleaning Permit Data and Creating Count Table
```{r Permit Data Clean and Count, echo=TRUE}
permits_sf_res <- permits_sf%>%
dplyr::select(permittype, permitdescription, permitissuedate, commercialorresidential, address)%>%
# filter(permitdescription == "DEMOLITION PERMIT" |
# permitdescription == "GENERAL PERMIT" |
# permitdescription == "NEW CONTRUCTION PERMIT" |
# permitdescription == "RESIDENTIAL BUILDING PERMIT" |
# permitdescription == "FAST FORM BUILDING PERMIT" |
# permitdescription == "ALTERATION PERMIT")%>%
filter(commercialorresidential != "COMMERCIAL")%>%
mutate(quarter = as_date(floor_date(permitissuedate, unit="quarter")))%>%
filter(year(quarter) >= 2009)
permits_count <- permits_sf_res %>%
st_drop_geometry()%>%
group_by(address, quarter)%>%
count(address, sort = TRUE)%>%
rename(n_permits = n)
permits_count%>%
ggplot(aes(x=quarter, y=n_permits))+
geom_col(fill="#A5300F")+
labs(title="Permit Records",
subtitle="Philadelphia County, 2009-2022")+
xlab("Date, Rounded to Beginning of Quarter")+
ylab("Number of Records")+
theme(axis.text.x = element_text(angle = 45, hjust = 1),
panel.background = element_rect(fill = "#f3efe0"))
```
### Real Estate Transfer Data
```{r SHORTCUT: Real Estate Transfers Count Panel}
transfers_count <- read_csv("/Users/myronbanez/Desktop/Coding/PhilaFireData/transfers_count.csv")
```
```{r Real Estate Transfers Load-In}
#transfers <- read_csv("Data/OpenDataPhilly-transfers.csv")
#Select relevant fields and filter to fire data range
#transfers_data <- transfers%>%
# dplyr:: select(objectid, recording_date, street_address, document_type)%>%
# filter(year(recording_date) > 2008,
# !is.na(street_address))%>%
# rename(address = street_address)%>%
# mutate(quarter = as_date(floor_date(recording_date, unit="quarter")))
```
```{r Real Estate Transfers Count Panel}
#transfers_count <- transfers_data %>%
# group_by(address, quarter)%>%
# count(address, sort = TRUE)%>%
# rename(n_transfers = n)
#write.csv(transfers_count, "~/Desktop/Coding/MUSA Practicum/MUSA_Practicum-/Data/transfers_count.csv")
```
```{r Real Estate Transfers ggplot}
transfers_count%>%
ggplot(aes(x=quarter, y=n_transfers))+
geom_col(fill="#A5300F")+
labs(title="Real Estate Transfer Records",
subtitle="Philadelphia County, 2009-2022")+
xlab("Date, Rounded to Beginning of Quarter")+
ylab("Number of Records")+
theme(axis.text.x = element_text(angle = 45, hjust = 1),
panel.background = element_rect(fill = "#f3efe0"))
```
### Joining Open Data to Existing Panel
```{r SHORTCUT: Join 311 and Permits to panel_FireOPA}
panel_OPAFireOpenData <- read_csv("/Users/myronbanez/Desktop/Coding/PhilaFireData/panel_OPAFireOpenData.csv")
```
```{r Join 311 and Permits to panel_FireOPA,include=FALSE}
#panel_OPAFire311 <- left_join(opa_count.panel, vacant311_count, by=c("address", "quarter"))
#panel_OPAFire311$n_311Vacant[is.na(panel_OPAFire311$n_311Vacant)] <- 0
#panel_OPAFire311Permit <- left_join(panel_OPAFire311, permits_count, by=c("address", "quarter"))
#panel_OPAFire311Permit$n_permits[is.na(panel_OPAFire311Permit$n_permits)] <- 0
#panel_OPAFireOpenData <- left_join(panel_OPAFire311Permit, transfers_count, by=c("quarter", "address"))
#panel_OPAFireOpenData$n_transfers[is.na(panel_OPAFireOpenData$n_transfers)] <- 0
```
# Calculating Outcomes
Objective is to know which fires had vacancy complaints, permit requests, or real estate transfers in the months or years following their fire.
```{r Calculate Average Quarters Until Result}
#filter panel to just outcomes or fires
panel_Positives <- panel_OPAFireOpenData %>%
filter(count > 0 | n_Vacant > 0 | n_permits > 0 | n_transfers > 0)
#filter to just fires, then combine those addresses with additional outcomes and building categories
panel_FirePositives <- panel_OPAFireOpenData %>%
filter(count > 0)%>%
dplyr::select(address)%>%
distinct(address, .keep_all = TRUE)%>%
left_join(panel_Positives, by="address")%>%
left_join(dplyr::select(opa_dat_small_sf, address, category_code_description, mailing_street, building_code_description), by="address")%>%
mutate(condo = ifelse(grepl("CONDO", building_code_description) == TRUE, TRUE, FALSE),
owner_occ = ifelse(condo == FALSE & category_code_description != "MULTI FAMILY",
ifelse(address == mailing_street, TRUE, FALSE),
NA))%>%
dplyr::select(-mailing_street, -condo, -building_code_description)%>%
st_drop_geometry()
#mutate(diff = interval(lag(quarter, n=1),quarter) %/% years(1))
#Eliminate the data that comes before the fires, as those are not outcomes of the fire
panel_FirePositives <- panel_FirePositives%>%
group_by(address)%>%
mutate(f = cumsum(count))%>% #counts the cumulative sum of the number of fires at that address so far.
filter(f > 0)%>% #if that number is zero, then we don't want the data
dplyr::select(-f)
# edit here if we wanna play with the lags
#Join to get fire incident number and date
#calculate the difference between the quarter of the incident date and the quarter of the outcome
panel_FirePositivesDiff <- panel_FirePositives%>%
left_join(st_drop_geometry(dplyr::select(dat, incident_number,address, quarter)), by=c("address"))%>%
group_by(incident_number)%>% #some addresses have multiple fires, so we use incident number instead
mutate(mSinceFire = interval(quarter.y, quarter.x) %/% months(1),
ySinceFire = mSinceFire / 12,
cat_code = toupper(category_code_description))%>%
filter(mSinceFire >= -1,#Eliminate entries before fires (occurs because of incident_number group duplicates)
mSinceFire < 49, #Eliminate entries after four years, as they are irrelevant (arbitrary)
!(count > 0 & ySinceFire > 0))%>% #For addresses with multiple incidents, take out repeated fire observ's
dplyr::select(-category_code_description)%>%
st_as_sf()
#Chart:
#For every outcome, what is the median time since a fire occurred?
#panel_FirePositivesDiff%>%
# st_drop_geometry()%>%
# ungroup()%>%
# filter(!(n_Vacant == 0 & n_permits == 0 & n_transfers == 0)) %>%
# mutate(ySinceFire = mSinceFire / 12)%>%
# dplyr::select(n_Vacant, n_permits, n_transfers, ySinceFire)%>%
# gather(Variable, value, -ySinceFire)%>%
# filter(value > 0)%>%
# group_by(Variable)%>%
# summarize(`Median Years Since Fire` = median(ySinceFire))%>%
# kable()%>%
# kable_styling()
```
## Results (Boolean)
The outcomes here can be used as the dependent variable for our model, when combined back with all the other properties. Our 2-years-after-fire results panel has 14,545 observations, down from the original data's ~21,000, so we lost about 40% of incidents. We'll have to check why.
```{r SHORTCUT: Calculate Boolean Outcome Code}
panel_Results2Y <- read_csv("/Users/myronbanez/Desktop/Coding/PhilaFireData/panel_Results2Y.csv")
```
```{r Calculate Boolean Outcome Code}
#six months boolean outcome code
#panel_Results2Q <- panel_OPAFireOpenData %>%
# mutate(fireVacant2Q = ifelse(address == lag(address, n=1) &
# (count > 0 | lag(count, n=1) > 0) &
# (n_311Vacant > 0) > 0 , 1, 0),
# firePermit2Q = ifelse(address == lag(address, n=1) &
# (count > 0 | lag(count, n=1) > 0) &
# (n_permits > 0) > 0 , 1, 0),
# fireTransfer2Q = ifelse(address == lag(address, n=1) &
# (count > 0 | lag(count, n=1) > 0) &
# (n_transfers > 0) > 0 , 1, 0))
#2 Year Outcomes for Each Incident
#panel_Results2Y <- panel_FirePositivesDiff %>%
# st_drop_geometry()%>%
# dplyr::select(-mSinceFire, -cat_code, -quarter.y)%>%
# filter(., ySinceFire <= 2)%>% # edit here if we wanna play with the lags
# group_by(address, incident_number)%>%
# summarize(count = sum(count),
# severity_index = max(severity_index),
# outcome_vacant = sum(n_311Vacant),
# outcome_permit = sum(n_permits),
# outcome_transfer = sum(n_transfers),
# quarter = min(quarter.x))
# Later: Join back to original dataset to get the spatial features
```
# Feature Engineering
### OPA and Fire Dataset
```{r Cleaning Dataframes and Slight Engineering}
# OPA - Creating an OPA dataset to get just the variables we want for feature engineering and doing slight feature engineering
# Numeric
opa_dat_small_sf_num <- opa_dat_small_sf %>%
dplyr::select(address, total_livable_area, market_value, sale_price, number_of_bedrooms, number_of_bathrooms, number_stories,
interior_condition) %>% st_drop_geometry()
opa_dat_small_sf_num[is.na(opa_dat_small_sf_num)] <- 0 # Assigning 0 to NA for everything
# Categorical
opa_dat_small_sf_cat <- opa_dat_small_sf %>%
dplyr::select(address, quality_grade, year_built, central_air, exterior_condition, fireplaces, fuel, taxable_building,
topography, type_heater) %>% st_drop_geometry()
opa_dat_small_sf_fe <- left_join(opa_dat_small_sf_num,opa_dat_small_sf_cat, by = "address") # Joining
# OPA - Quality Grade
opa_dat_small_sf_fe <-
opa_dat_small_sf_fe %>%
mutate(grade = case_when(
quality_grade == "A+" |quality_grade == "A" | quality_grade == "A-" | quality_grade == "B+" | quality_grade == "B" |
quality_grade == "B-"| quality_grade == "C+"| quality_grade == "C" ~ "Average or Better",
TRUE ~ "Below Average"))
# OPA - Fuel
opa_dat_small_sf_fe <-
opa_dat_small_sf_fe %>%
mutate(fuel_type = case_when(
fuel == "A" | fuel == "B" ~ "Fossil",
fuel == "D" | fuel == "F" ~ "Solid",
fuel == "C" | fuel == "E" ~ "Alternative",
TRUE ~ "Other"))
# OPA - Topography
opa_dat_small_sf_fe <-
opa_dat_small_sf_fe %>%
mutate(topo = case_when(
topography == "A" ~ "Above Street Level",
topography == "B" ~ "Below Street Level",
topography == "C" ~ "Flood Plain",
topography == "D" ~ "Rocky",
topography == "F" ~ "Street Level",
TRUE ~ "Other"))
# OPA - Heater
opa_dat_small_sf_fe <-
opa_dat_small_sf_fe %>%
mutate(electric_heater = case_when(
type_heater == "C" ~ "Yes",
TRUE ~ "No"))
# OPA - Central Air
opa_dat_small_sf_fe <-
opa_dat_small_sf_fe %>%
mutate(air_central = case_when(
central_air == "Y" ~ "Yes",
TRUE ~ "No"))
# OPA - Fireplace
opa_dat_small_sf_fe <-
opa_dat_small_sf_fe %>%
mutate(fireplace = case_when(
fireplaces > 0 ~ "Yes",
TRUE ~ "No"))
# OPA - Year Built
opa_dat_small_sf_fe$year_built <- as.numeric(as.character(opa_dat_small_sf_fe$year_built))
# OPA - Cleaning
opa_dat_small_sf_fe <- opa_dat_small_sf_fe %>% dplyr::select(-quality_grade, -central_air, -fuel, -topography, -type_heater)
opa_dat_small_sf_fe[is.na(opa_dat_small_sf_fe)] <- 0 # Assigning 0 to NA for everything
# Fire Data - Creating an fire dataset to get just the variables we want for feature engineering
dat_fe <- dat %>% dplyr::select(address, incident_type, number_of_exposures, minor_damage, significant_damage, heavy_damage, extreme_damage)
```
```{r Joining}
panel_Results2Y_sf <- left_join(panel_Results2Y, opa_dat_small_sf_fe, by="address") # Joining with OPA FE variables
panel_Results2Y_sf[is.na(panel_Results2Y_sf)] <- 0 # Assigning 0 to NA for everything
# Joining panel_Results2Y with Time outcome FE variables
panel_FirePositivesDiff_fe <- panel_FirePositivesDiff %>%
dplyr::select(address, mSinceFire, ySinceFire, cat_code, owner_occ) %>% st_drop_geometry() # Condensing dataframe
fire_panel <- merge(panel_Results2Y_sf,panel_FirePositivesDiff_fe ) # Join
fire_panel <- fire_panel[!duplicated(fire_panel$incident_number),] # Removing Duplicates
```
```{r Feature Engineering}
# Turning outcomes into binary
fire_panel <-fire_panel %>%
mutate(vacant = case_when(
outcome_vacant > 0 ~ 1,
TRUE ~ 0
))
fire_panel <-fire_panel %>%
mutate(permit = case_when(
outcome_permit > 0 ~ 1,
TRUE ~ 0
))
fire_panel <-fire_panel %>%
mutate(transfer = case_when(
outcome_transfer > 0 ~ 1,
TRUE ~ 0
))
# Market Value
fire_panel <-
fire_panel %>%
mutate(mkt_value = case_when(
market_value < 25000 ~ "Low", # < $250,000
market_value > 24999 & market_value < 500000 ~ "Medium", # $250,000-$500,000
TRUE ~ "High")) # $500,000+
# Number of Bedrooms
fire_panel <-
fire_panel %>%
mutate(bedrooms = case_when(
number_of_bedrooms < 4 ~ "Low", # 1-3 Bedrooms
number_of_bedrooms > 3 & number_of_bedrooms < 8 ~ "Medium", # 4-7 Bedrooms
TRUE ~ "High")) # 8+
# Number of Bathrooms
fire_panel <-
fire_panel %>%
mutate(bathrooms = case_when(
number_of_bathrooms < 3 ~ "Low", # 1-2 Bathroom