In [None]:
# install packages
install.packages('RCurl')
install.packages('ggplot2')

## Goal : to predict office occupancy using the listed predictors.

__Office occupancy data__
1. Occupancy: 0 (not occupied), 1(occupied)
2. Temperature: in Celsius
3. Humidity: Relative humidity as a percentage
4. Light: measured in Lux
5. Co2: in ppm

## 1. Data Preparation

In [30]:
library(RCurl) # a package includes the function getURL(), which allowa for reading data from Github
library(ggplot2)

In [34]:
# url <- getURL("https://raw.githubusercontent.com/LuisM78/Occupancy-detection-data/master/datatest.txt")
# occ <- read.csv(test = url)

In [35]:
# I use this way since the code above shows me error in my labtop.
url <- "https://raw.githubusercontent.com/LuisM78/Occupancy-detection-data/master/datatest.txt"
occ <-  read.csv((url))

In [27]:
head(occ)

Unnamed: 0_level_0,date,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<int>
140,2015-02-02 14:19:00,23.7,26.272,585.2,749.2,0.004764163,1
141,2015-02-02 14:19:59,23.718,26.29,578.4,760.4,0.004772661,1
142,2015-02-02 14:21:00,23.73,26.23,572.6667,769.6667,0.004765153,1
143,2015-02-02 14:22:00,23.7225,26.125,493.75,774.75,0.004743773,1
144,2015-02-02 14:23:00,23.754,26.2,488.6,779.0,0.004766594,1
145,2015-02-02 14:23:59,23.76,26.26,568.6667,790.0,0.004779332,1


In [28]:
head(occ[ , c(2,3,4,5,7)])

Unnamed: 0_level_0,Temperature,Humidity,Light,CO2,Occupancy
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<int>
140,23.7,26.272,585.2,749.2,1
141,23.718,26.29,578.4,760.4,1
142,23.73,26.23,572.6667,769.6667,1
143,23.7225,26.125,493.75,774.75,1
144,23.754,26.2,488.6,779.0,1
145,23.76,26.26,568.6667,790.0,1


In [29]:
summary(occ[ , c(2,3,4,5,7)])

  Temperature       Humidity         Light             CO2        
 Min.   :20.20   Min.   :22.10   Min.   :   0.0   Min.   : 427.5  
 1st Qu.:20.65   1st Qu.:23.26   1st Qu.:   0.0   1st Qu.: 466.0  
 Median :20.89   Median :25.00   Median :   0.0   Median : 580.5  
 Mean   :21.43   Mean   :25.35   Mean   : 193.2   Mean   : 717.9  
 3rd Qu.:22.36   3rd Qu.:26.86   3rd Qu.: 442.5   3rd Qu.: 956.3  
 Max.   :24.41   Max.   :31.47   Max.   :1697.2   Max.   :1402.2  
   Occupancy     
 Min.   :0.0000  
 1st Qu.:0.0000  
 Median :0.0000  
 Mean   :0.3647  
 3rd Qu.:1.0000  
 Max.   :1.0000  

We see many data has zero in Light variable. It may be very skewed. And Occupany is a categorical variable, it is not neceaary to be give numerical summary.

## 2. Modelling using GLM function

In [40]:
is.factor(occ$Occupancy)

In [41]:
occ$Occupancy <- as.factor(occ$Occupancy)

In [42]:
glmod <- glm(Occupancy ~ Temperature + Humidity + Light + CO2, 
             data=occ, family="binomial")

In [43]:
summary(glmod)


Call:
glm(formula = Occupancy ~ Temperature + Humidity + Light + CO2, 
    family = "binomial", data = occ)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.4969  -0.0624  -0.0179   0.1038   2.6544  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept) -29.316563  11.038232  -2.656  0.00791 ** 
Temperature  -0.333612   0.318492  -1.047  0.29488    
Humidity      1.353727   0.298368   4.537  5.7e-06 ***
Light         0.021921   0.001586  13.819  < 2e-16 ***
CO2          -0.006839   0.003257  -2.099  0.03578 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 3496.96  on 2664  degrees of freedom
Residual deviance:  375.66  on 2660  degrees of freedom
AIC: 385.66

Number of Fisher Scoring iterations: 9


* $\hat{\beta_0}$ = -29.316563 : Assuming the model is correct, the avrage log odds of an offce being occupied when temperature, humidity, light, and Co2 are all equal to zero, is approximately - 29.3.
    * The odds scale: $e^{\hat{\beta_0}}\approx 0$ and this seems to make sense. This tells about the average odds. The odds that in offices occupied when the temperaure, humidity, light, and Co2 are all equal to zero, is 0
*$\hat{\beta_3}$ = 0.021921: A one Lux increase in light with all other predictors held constant, would result in approximately 0.022 to increase in the log odds on average.
    * The odds scale: $e^{\hat{\beta_3}}\approx 1.02$ If one lux increases in light with all other predictors held constant, it would results in an average __multipicate__ of increase, in odds of 1.02.

## 3. Mathematical formula

* Estimated odds of occupation

$$e^{\hat{\eta}}=e^{\hat{\beta_0}+\hat{\beta_1}x_1+\hat{\beta_2}x_2+\hat{\beta_3}x_3+\hat{\beta_4}x_4}=\frac{\hat{p}}{1-\hat{p}} $$

* One Lux increase in light ($x_3$)

$$e^{{\hat{\eta}}_{+1}}
=e^{\hat{\beta_0}+\hat{\beta_1}x_1+\hat{\beta_2}x_2+\hat{\beta_3}(x_3+1)+\hat{\beta_4}x_4}
=e^{\hat{\beta_3}}e^{\hat{\beta_0}+\hat{\beta_1}x_1+\hat{\beta_2}x_2+\hat{\beta_3}x_3+\hat{\beta_4}x_4}
=e^{\hat{\beta_3}}e^{\hat{\eta}}
=e^{\hat{\beta_3}}\frac{\hat{p}}{1-\hat{p}}
$$

This is adjustable for Temperature, humidity, and Co2.