**NYC Construction Worker Accident Reports**

Construction work is amongst the most dangerous jobs in the city, and injuries and fatalities have only risen over the years (even during the pandemic, data shows that NYC construction permits were high and worker injuries more than they were in 2019). The most common accident, which often proves fatal, is a fall.

This project aims to create a classifier to check how many of these accidents are results of falls.

Data has been assimilated from NYC Department of Building's monthly Construction Workers Accident reports [https://www1.nyc.gov/site/buildings/about/construction-related-accident-reports.page] from Jan. 2019 to March 2021.

In [1]:
import pandas as pd
import numpy as np
import re

In [10]:
df = pd.read_csv('total_injuries.csv')
pd.set_option('max_colwidth', 800)

df.shape

(17582, 9)

df appears to have many empty rows. Might be best to get rid of them.

In [12]:
df = df.dropna()

In [13]:
df.shape

(721, 9)

In [14]:
df.sample(10)

Unnamed: 0.1,Unnamed: 0,Incident Unique ID,Incident Date & Time,Street Name,Borough,Zip Code,Incident Description Details,Injury Count,Fatality Count
13046,9,1000000000.0,2019-08-16 13:15:00,Linden Blvd,Brooklyn,11207.0,An ironworker was struck by a piece of steel on his right foot while the steel was being rigged.,1.0,0.0
1852,33,1000001000.0,2020-01-30 08:45:00,WEST 56 STREET,Manhattan,10019.0,GTECH ELEVATOR MECHANIC - RICHARD SHARP CLAIMS HE LOST HIS BALANCE ON THE 20 TH FLOOR. WORKER TRIED TO BRACE HIMSELF AND CLAIMS HE INJURED HIS RIGHT WRIST IN THE PROCESS OF BRACING FROM FALLING.,1.0,0.0
1828,9,1000001000.0,2020-01-14 12:00:00,Willoughby Street,Brooklyn,11201.0,Worker was installing a wooden column when he tweaked his lower back. Worker explains that he has a problem with his back and may need to respond to his own medical facility.,1.0,0.0
11928,4,1000000000.0,2019-02-26 16:17:00,Bedford Ave,Brooklyn,11205.0,Construction worker Antonio Rivas was assisting foreman by vertically passing concrete admixture compound from ground to hopper on the rear of concrete truck when the truck drivers interference caused bag to bust and disperse in the immediate area.,1.0,0.0
3675,38,1000001000.0,2020-03-21 12:00:00,16th Street,Brooklyn,11215.0,"To safeguard site, unsafe supported scaffolding was being removed from the front elevation of new building under construction, a construction worker fell from the supported scaffolding platform to sidewalk of neighbors front yard, the fall was approximately 10', and also its understood that his harness may have pulled the 1 supported scaffolding frame with his fall, which landed on to the neighbors front awning damaging it. It is unclear how he fell as the other workers were not paying attention to him, and he was then transported to the hospital via ambulance.",1.0,0.0
8266,33,1000001000.0,2020-08-20 14:25:00,East 61st St,Manhattan,10065.0,"While the employee was working, employee suffered an epileptic seizure. The employee was not injured as a result of the seizure but was taken by ambulance to the hospital, treated and released.",1.0,0.0
6440,25,1000001000.0,2020-07-20 10:00:00,Hudson Yards,Manhattan,10001.0,The employee was standing at the window on the SE corner of the 79th floor. He was on the phone requesting a coffee from a fellow employee and has a seizure. He fell to the ground. The person on the receiving end of the phone call heard a thump as well as the Turner laborer who was working outside of the door. When the medic arrived the employee was not coherent. The medic checked his vitals and how responsive he was and instructed someone to call 911. His fellow employee informed the medic that he has previously had a seizure. The employee works for the owner. The ambulance was called,1.0,0.0
10087,36,1000001000.0,2020-11-18 14:00:00,BEACH CHANNEL DRIVE,Queens,11692.0,"A sheet of plywood was blown from a workers hand due to wind gusts and struck the lower right leg of worker (Juan Loja) who sustained a slight injury. He declined medical treatment, went home, & returned to work this morning.",1.0,0.0
16676,3,1000001000.0,2019-12-02 14:00:00,Albee Square,Brooklyn,11201.0,Worker was moving a bag of mortar from pallet to mixer when he slipped and fell onto the floor. Worker was able to walk to hoist and walk from hoist to medics office where he received treatment but was sent out to urgent care for evaluation due to pain. Worker had X-ray and MRI completed that both came back negative and was given two days of rest until he was able to come back to work.,1.0,0.0
14876,21,1000000000.0,2019-10-24 08:00:00,Third Avenue,Manhattan,10035.0,The worker was tightening a bolt on a pipe when the wrench slipped and hit him in the forehead,1.0,0.0


We now create a new column in df to mark the row as being a fall accident or not. This column will serve as our y dataframe.

In [16]:
df['is_fall'] = (df['Incident Description Details'].str.contains('fall', na=False).astype(int) |
                 df['Incident Description Details'].str.contains('fell', na=False).astype(int) |
                 df['Incident Description Details'].str.contains('FALL', na=False).astype(int) |
                 df['Incident Description Details'].str.contains('FELL', na=False).astype(int))
df.head()

Unnamed: 0.1,Unnamed: 0,Incident Unique ID,Incident Date & Time,Street Name,Borough,Zip Code,Incident Description Details,Injury Count,Fatality Count,is_fall
0,0,1000001000.0,2021-01-04 07:00:00,North 14th St,Brooklyn,11249.0,"On 1/31/20 at 10 AM Mr. Mendoza claims some kind of debris went into his eye. He thought the issue with his eye was not a problem. Today 1/4/21 he called out of work complaining his eye is still bothering him from 1/31/20, he reported to his superiors. As of today no medical attention was sought or received.",1.0,0.0,0
1,1,1000001000.0,2021-01-04 11:00:00,east 14th st,Manhattan,10003.0,"Jeremy DeYoung, reported an accident on the jobsite today. Jeremy stated that he was working alone on the South side of the Roof (21st floor) cutting plastic shims with a box cutter knife in his right hand, when the knife slipped causing him to cut his left hand on his middle finger.",1.0,0.0,0
2,2,1000001000.0,2021-01-04 10:30:00,park ave,Manhattan,10022.0,Worker was walking with a GC employee on a finished client occupied floor for punch list purposes touching up small paint inconsistencies. Worker was stepping on the first or second step of a ladder and fell back. Felt discomfort in his thumb and went to Medrite to get it checked out.,1.0,0.0,1
3,3,1000001000.0,2021-01-06 16:30:00,Pennsylvania Avenue,Brooklyn,11207.0,"Allegedly, a member of the public driving down Pennsylvania Avenue had a seizure. She struck another car, which drove over the curb on Pennsylvania Avenue, through the fence, scaffold and into the site. The condition of the drivers are unknown. The car struck the security guard booth, and the booth pushed an Agra worker.",3.0,0.0,0
4,4,1000001000.0,2021-01-06 16:00:00,Summerfield Street,Queens,11385.0,"A worker, from ""Do All Interiors"", fell off an A-frame ladder while applying compound. She required medical attention and was taken off site in an ambulance.",1.0,0.0,1


In order to count the features, we use TF id Vectorizer. This serves as our X dataframe.

In [17]:
from sklearn.feature_extraction.text import TfidfVectorizer

# Make a vectorizer
vectorizer = TfidfVectorizer()
# Learn and count the words in df
matrix = vectorizer.fit_transform(df['Incident Description Details'])
# Convert the matrix of counts to a dataframe
words_df = pd.DataFrame(matrix.toarray(),
                        columns=vectorizer.get_feature_names())
words_df.head()

Unnamed: 0,00,0047,00pm,01,01am,02,02am,03,0431,05,...,yelled,yesterday,yet,yo,yoni,yordan,york,zapata,zone,zw
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Classifier creating time!

In [18]:
X = words_df
y = df.is_fall

In [19]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y)

In [20]:
from sklearn.multiclass import OutputCodeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification


clf = OutputCodeClassifier(
     estimator=RandomForestClassifier(random_state=0),
     random_state=0).fit(X_train, y_train)

In [21]:
clf.fit(X_train, y_train)

OutputCodeClassifier(estimator=RandomForestClassifier(random_state=0),
                     random_state=0)

In [22]:
clf.score(X_test, y_test)

0.9337016574585635

In [23]:
from sklearn.metrics import confusion_matrix

y_true = y_test
y_pred = clf.predict(X_test)
matrix = confusion_matrix(y_true, y_pred)

label_names = pd.Series(['not fall', 'fall'])
pd.DataFrame(matrix,
     columns='Predicted ' + label_names,
     index='Is ' + label_names)

Unnamed: 0,Predicted not fall,Predicted fall
Is not fall,118,0
Is fall,12,51


Looks like a decent score, but unsure why we only have less than 200 predictions in total.

In [27]:
from sklearn.tree import DecisionTreeClassifier

clf = DecisionTreeClassifier(max_depth=5)
clf.fit(X_train, y_train)

DecisionTreeClassifier(max_depth=5)

In [28]:
clf.score(X_test, y_test)

0.9834254143646409

In [29]:
import eli5

feature_names=list(X.columns)
eli5.show_weights(clf, feature_names=feature_names, show=['description', 'feature_importances'])

Weight,Feature
0.7344,fell
0.1923,fall
0.0322,falling
0.0164,fallen
0.0083,slipped
0.0083,hitting
0.0080,ladder
0,event
0,excavated
0,examination
