In [None]:
library(tidyverse)
library(data.table)
library(plotly) # for interactive ploting
library(DT) # for interactive tabulation
library(broom) # for tidy statistical summaries
library(caret) # for regression performance measures
library(psych) # for pairwise comparisons
library(GGally) # for pairwise comparisons
library(magrittr) # for two-way pipes
library(lindia) # for qqplots

In [None]:
options(repr.matrix.max.rows=20, repr.matrix.max.cols=15) # for limiting the number of top and bottom rows of tables printed 

In [None]:
datapath <- "~/data_ad454"

# Logistic Regression

We continue with the realty dataset.

Remember that, we calculated the premium_neigh variable which is premium of the unit price of the property over the median unit price of the neighborhood.

Now we will try to classify the properties into premium and discount

Let's first import the realty dataset:

In [None]:
realty_data3 <- readRDS(sprintf("%s/rds/06_02_realty_data3.rds", datapath))

In [None]:
realty_data3

Let's add the binary variable premium, which takes 1 when the premium is above 0, and 0 otherwise

In [None]:
realty_data3[, premium := as.integer(premium_neigh > 0)]

Let's see the structure:

In [None]:
realty_data3 %>% str

Now, select some of the variables:

In [None]:
vars <- c("premium", "esyali", "krediye_uygunluk", "bina_yasi", "kat_sayisi", "kat", realty_data3 %>% keep(is.logical) %>% names)
vars

And assign the subset:

In [None]:
realty_data4 <- realty_data3 %>% select(all_of(vars)) %>% na.omit

Your tasks are to:

- Partition the data set into 70% train and 30% test sets
- Create and run a logistic regression model to explain premium with all other variables **without intercept**. Note that, the median values are taken as basis for premium, so the classes are nearly equal
- Print the summary of the model. Compare and interpret null and residual deviance values and create a table of the coefficients of the variables that are significant at 5% level
- Calculate the fitted positive case ("1") probabilities from the model and also the fitted classes for the train set with a cut value of 0.5
- Create a confusion matrix. You may use the below code template:

```R
table(actual = actual_classes, fitted = fitted_classes) %>% caret::confusionMatrix(positive = "1")
```
- What are the TP, TN, FP, FN counts? Interpret accuracy, sensitivity and specificity.
- Interpret Kappa (what is the level of class agreement)
- Create a ROC curve and calculate AUC. How far is the model better than pure random guessing?
- Calculate the predicted positive case ("1") probabilities from the model and also the predicted classes for the test set with a cut values of 0.5
- Create a confusion matrix for the test set similar to the one above.
- What are the TP, TN, FP, FN counts? Interpret accuracy, sensitivity and specificity.
- Interpret Kappa (what is the level of class agreement)
- Compare the results from the confusion matrices of the train and test sets. 

# Answer