# Auto Insurance Analysis

## Technical Notebook

## Project Goals

- Analyze auto insurance data.
- Build a logistic regression model to predict crash probability for auto insurance customers.
- Build a linear regression model to predict crash cost for auto insurance customers.
- Use model results to develop crash percentage and assign customers to new risk profiles.
- Determine cost of premiums based on risk profiles.

## Summary of Data

The dataset for this project contains 6043 records of auto insurance data. Each record
represents a customer at an auto insurance company. Using this data, we will be able to ascertain what
influences the likelihood of a car crash. Then subsequently, we will be able to determine the cost to resolve a claim. The data in this project is the typical type of corporate data you would receive from a company in the insurance field-- a typical flat file from client records.

### Library Import

In [None]:
#Import libraries
%run ../python_files/imports

### Data Import

In [None]:
#Import cleaned, prepared data from EDA
%run ../python_files/auto_insurance_eda

## Modeling

### Logistic Regression Model

Here, we are building a logistic regression model to predict crash probability for auto insurance customers.

##### Model Implementation

In [None]:
logit_model=sm.Logit(y_train_log,x_train_log)
result=logit_model.fit()
print(result.summary2())

##### Model Fitting

In [None]:
logreg = LogisticRegression()
logreg.fit(x_train_log, y_train_log)

##### Predicting Test Set Results and Calculating Accuracy

In [None]:
y_pred_log = logreg.predict(x_test_log)
print('Accuracy of logistic regression classifier on test set: {:.2f}'.format(logreg.score(x_test_log, y_test_log)))

##### Confusion Matrix

In [None]:
confusion_matrix = confusion_matrix(y_test_log, y_pred_log)
print(confusion_matrix)

##### Interpretation of Results

In [None]:
print(classification_report(y_test_log, y_pred_log))

In [None]:
# ROC Curve

logit_roc_auc = roc_auc_score(y_test_log, logreg.predict(x_test_log))
fpr, tpr, thresholds = roc_curve(y_test_log, logreg.predict_proba(x_test_log)[:,1])
plt.figure()
plt.plot(fpr, tpr, label='Logistic Regression (area = %0.2f)' % logit_roc_auc)
plt.plot([0, 1], [0, 1],'r--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic')
plt.legend(loc="lower right")
plt.savefig('Log_ROC')
plt.show()

### Linear Regression Model

Here, we are building a linear regression model to predict crash cost for auto insurance customers.

In [None]:
linreg_model = LinearRegression()
linreg_model.fit(x_train_lin, y_train_lin)

#### Model Results

In [None]:
#Calculate r-squared value

y_pred_lin = linreg_model.predict(x_test_lin)
print('Linear Regression R squared": %.4f' % linreg_model.score(x_test_lin, y_test_lin))

In [None]:
#Calculate root mean squared error (RMSE) value

mse_lin = mean_squared_error(y_pred_lin, y_test_lin)
rmse_lin = np.sqrt(mse_lin)
print('Linear Regression RMSE: %.4f' % rmse_lin)

In [None]:
#Calculate mean absolute error (MAE) value

mae_lin = mean_absolute_error(y_pred_lin, y_test_lin)
print('Linear Regression MAE: %.4f' % mae_lin)

In [None]:
#Plot of Residuals

# model = Ridge()
visualizer = ResidualsPlot(linreg_model)
visualizer.fit(x_train_lin, y_train_lin)  # Fit the training data to the visualizer
visualizer.score(x_test_lin, y_test_lin)  # Evaluate the model on the test data
visualizer.show()                         # Finalize and render the figure

## Results and Conclusions

compare crash_percentage vs crash_cost