# Predicting Blight Ticket Payment

[Blight violations](http://www.detroitmi.gov/How-Do-I/Report/Blight-Complaint-FAQs) are issued by the city to individuals who allow their properties to remain in a deteriorated condition. Every year, the city of Detroit issues millions of dollars in fines to residents and every year, many of these fines remain unpaid. Enforcing unpaid blight fines is a costly and tedious process, so the city wants to know: how can we increase blight ticket compliance? The goal of this project is to predict whether a given blight ticket will be paid on time. The raw data includes addresses.csv, latlons.csv, train.csv and test.csv. 

<br>

**File descriptions** 

    train.csv - the training set (all tickets issued 2004-2011)
    test.csv - the test set (all tickets issued 2012-2016)
    addresses.csv & latlons.csv - mapping from ticket id to addresses, and from addresses to lat/lon coordinates. 
     Note: misspelled addresses may be incorrectly geolocated.

<br>

**Data fields**

train.csv & test.csv

    ticket_id - unique identifier for tickets
    agency_name - Agency that issued the ticket
    inspector_name - Name of inspector that issued the ticket
    violator_name - Name of the person/organization that the ticket was issued to
    violation_street_number, violation_street_name, violation_zip_code - Address where the violation occurred
    mailing_address_str_number, mailing_address_str_name, city, state, zip_code, non_us_str_code, country - Mailing address of the violator
    ticket_issued_date - Date and time the ticket was issued
    hearing_date - Date and time the violator's hearing was scheduled
    violation_code, violation_description - Type of violation
    disposition - Judgment and judgement type
    fine_amount - Violation fine amount, excluding fees
    admin_fee - $20 fee assigned to responsible judgments
state_fee - $10 fee assigned to responsible judgments
    late_fee - 10% fee assigned to responsible judgments
    discount_amount - discount applied, if any
    clean_up_cost - DPW clean-up or graffiti removal cost
    judgment_amount - Sum of all fines and fees
    grafitti_status - Flag for graffiti violations
    
train.csv only

    payment_amount - Amount paid, if any
    payment_date - Date payment was made, if it was received
    payment_status - Current payment status as of Feb 1 2017
    balance_due - Fines and fees still owed
    collection_status - Flag for payments in collections
    compliance [target variable for prediction] 
     Null = Not responsible
     0 = Responsible, non-compliant
     1 = Responsible, compliant
    compliance_detail - More information on why each ticket was marked compliant or non-compliant

In [40]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import MinMaxScaler

In [33]:
address = pd.read_csv('addresses.csv')
latlon = pd.read_csv('latlons.csv')
address_latlon = pd.merge(address, latlon, how = 'left', left_on = 'address', right_on = 'address')

In [70]:
# Define X_train, y_train
df1 = pd.read_csv('train.csv', encoding = "ISO-8859-1")
columns_to_keep = ['ticket_id','fine_amount', 'admin_fee', 'state_fee', 'late_fee', 'discount_amount', 'judgment_amount', 'compliance']
df1 = df1[columns_to_keep].dropna()
train = pd.merge(df1, address_latlon, how ='left', left_on = 'ticket_id', right_on = 'ticket_id')
X_train = train[['ticket_id', 'fine_amount', 'admin_fee', 'state_fee', 'late_fee', 'discount_amount', 'judgment_amount', 'lat','lon']]
X_train = X_train.fillna(method = 'pad').astype('float').set_index('ticket_id')
y_train = train['compliance'].astype('float')
X_train.head()
# len([i for i in y_train if i == 1])/len(y_train)

Unnamed: 0_level_0,fine_amount,admin_fee,state_fee,late_fee,discount_amount,judgment_amount,lat,lon
ticket_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
22056.0,250.0,20.0,10.0,25.0,0.0,305.0,42.390729,-83.124268
27586.0,750.0,20.0,10.0,75.0,0.0,855.0,42.326937,-83.135118
22046.0,250.0,20.0,10.0,25.0,0.0,305.0,42.145257,-83.208233
18738.0,750.0,20.0,10.0,75.0,0.0,855.0,42.433466,-83.023493
18735.0,100.0,20.0,10.0,10.0,0.0,140.0,42.388641,-83.037858


In [71]:
# Define X_test
df2 = pd.read_csv('test.csv', encoding = "ISO-8859-1")
columns_to_keep = ['ticket_id','fine_amount', 'admin_fee', 'state_fee', 'late_fee', 'discount_amount', 'judgment_amount']
df2 = df2[columns_to_keep].dropna()
test = pd.merge(df2, address_latlon, how ='left', left_on = 'ticket_id', right_on = 'ticket_id')
X_test = test[['ticket_id', 'fine_amount', 'admin_fee', 'state_fee', 'late_fee', 'discount_amount', 'judgment_amount', 'lat','lon']]
X_test = X_test.fillna(method = 'pad').astype('float').set_index('ticket_id')
X_test.head()

Unnamed: 0_level_0,fine_amount,admin_fee,state_fee,late_fee,discount_amount,judgment_amount,lat,lon
ticket_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
284932.0,200.0,20.0,10.0,20.0,0.0,250.0,42.407581,-82.986642
285362.0,1000.0,20.0,10.0,100.0,0.0,1130.0,42.426239,-83.238259
285361.0,100.0,20.0,10.0,10.0,0.0,140.0,42.426239,-83.238259
285338.0,200.0,20.0,10.0,20.0,0.0,250.0,42.309661,-83.122426
285346.0,100.0,20.0,10.0,10.0,0.0,140.0,42.30883,-83.121116


In [73]:
# Train the Logistic Regression Classifier
# Return the probability that the corresponding blight ticket will be paid on time.
scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
clf = LogisticRegression(solver = 'lbfgs').fit(X_train_scaled, y_train)
y_predict = clf.predict(X_test_scaled)
y_probs = pd.Series(clf.predict_proba(X_test_scaled)[:,1], index = X_test.index)
y_probs
# len([i for i in y_predict if i == 1])/len(y_predict)   

ticket_id
284932.0    0.068681
285362.0    0.011564
285361.0    0.078909
285338.0    0.077714
285346.0    0.095258
285345.0    0.077824
285347.0    0.105227
285342.0    0.141703
285530.0    0.015174
284989.0    0.039391
285344.0    0.104634
285343.0    0.014087
285340.0    0.014172
285341.0    0.105208
285349.0    0.095283
285348.0    0.077846
284991.0    0.039393
285532.0    0.044423
285406.0    0.035723
285001.0    0.040920
285006.0    0.013920
285405.0    0.011515
285337.0    0.033575
285496.0    0.094910
285497.0    0.069985
285378.0    0.011462
285589.0    0.033926
285585.0    0.065198
285501.0    0.086192
285581.0    0.011418
              ...   
376367.0    0.020968
376366.0    0.060797
376362.0    0.058583
376363.0    0.064989
376365.0    0.020968
376364.0    0.060797
376228.0    0.067380
376265.0    0.060276
376286.0    0.126908
376320.0    0.065406
376314.0    0.065675
376327.0    0.145703
376385.0    0.134070
376435.0    0.234654
376370.0    0.145983
376434.0    0.089379
376