# Quantification
*****************
**"The quantification task for machine learning: given a limited training set with class labels, induce a quantifier that takes an unlabeled test set as input and returns its best estimate of the number of cases in each class." 
_Quantifying counts and costs via classification, George Forman _**

## Data

Data used in this demo are published on: http://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+#

### Occupancy Detection Data Set

_Abstract_: Experimental data used for binary classification (room occupancy) from Temperature, Humidity, Light and CO2. Ground-truth occupancy was obtained from time stamped pictures that were taken every minute.

Occupancy=[0,1]
![](http://users.dsic.upv.es/~flip/caspdm/imagenes/pairs_plot_green_blue_time.png)
![](http://users.dsic.upv.es/~flip/caspdm/imagenes/VarImp_modelRF_All.png)

In [2]:
require(class)
data<-read.csv("data.csv")

head(data[,3:8],10)


Loading required package: class
: package 'class' was built under R version 3.2.5

Unnamed: 0,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy
1,23.18,27.272,426.0,721.25,0.004792988,1
2,23.15,27.2675,429.5,714.0,0.004783441,1
3,23.15,27.245,426.0,713.5,0.004779464,1
4,23.15,27.2,426.0,708.25,0.004771509,1
5,23.1,27.2,426.0,704.5,0.004756993,1
6,23.1,27.2,419.0,701.0,0.004756993,1
7,23.1,27.2,419.0,701.6667,0.004756993,1
8,23.1,27.2,419.0,699.0,0.004756993,1
9,23.1,27.2,419.0,689.3333,0.004756993,1
10,23.075,27.175,419.0,688.0,0.004745351,1


Number of instances:

In [3]:
nrow(data)

Number of instances of class "0"

In [4]:
nrow(data[which(data$Occupancy==0),])

Number of instances of class "1"

In [5]:
nrow(data[which(data$Occupancy==1),])

Dataset split into 3 sets:
+ 60% training (training the probabilistic model).
+ 20% validation (to calculate the cutoffs).
+ 20% test (to evaluate the results).


In [6]:
train<-head(data,12336)
test<-tail(data,4112)
validation<-data[12337:16448,]

Percentage of instances = 0 in the test set:

In [7]:
paste(round(nrow(test[which(test$Occupancy==0),])*100/nrow(test),digits=2),"%")

Percentage of instances = 1 in the test set:

In [8]:
paste(round(nrow(test[which(test$Occupancy==1),])*100/nrow(test),digits=2),"%")


*************

## Classification
In machine learning and statistics, classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known.

In this demo we used the k-NN algorithm.

### k-Nearest Neighbour Classification (Package: class)

* **Description:**
k-nearest neighbour classification for test set from training set. For each row of the test set, the k nearest (in Euclidean distance) training set vectors are found, and the classification is decided by majority vote, with ties broken at random. If there are ties for the k th nearest vector, all candidates are included in the vote.

* **Usage:**
knn(train, test, cl, k = 1, l = 0, prob = FALSE, use.all = TRUE)



Validation set:

In [9]:
cl<-train[,8]
validation_sol<-validation[,8]
validation_pred<-knn(train[,3:8], validation[,3:8], cl, k = 10)
validation_pred<-as.numeric(as.character(validation_pred))
sol<-cbind(validation_sol,validation_pred)


Wrong cassified instances (validation set):

In [10]:
length(sol[which(sol[,1]!=sol[,2]),])

Percentage of wrong classified instances (validation set):

In [11]:
paste(round(length(sol[which(sol[,1]!=sol[,2]),])*100/nrow(validation),2),"%")

Test set:

In [12]:
test_sol<-test[,8]
test_pred<-knn(train[,3:8], test[,3:8], cl, k = 10)
test_pred<-as.numeric(as.character(test_pred))
sol<-cbind(test_sol,test_pred)

Wrong cassified instances (test set):

In [13]:
length(sol[which(sol[,1]!=sol[,2]),])

*************

## Quantification

### CC (Classify and count)

Descripción...


In [23]:
# CLASSIFY AND COUNT
# yTestPred: predicted class in the test set
# clase: class to obtain the quantification
CC <- function(yTestPred,clase){
	return(length(which(yTestPred==clase))/length(yTestPred))
}

Prediction for class "0":

In [24]:
paste(round(CC(test_pred,"0"),2),"%")

Prediction for class "1":

In [26]:
paste(round(CC(test_pred,"1"),2),"%")

### AC (Adjusted count)
Descripción...

In [28]:
# ADJUSTED COUNT
# yValPred: predicted class in the validation set
# yVal: actual class in the validation set
# yTestPred: predicted class in the test set
# clase: class to obtain the quatification
AC <- function(yVal,yValPred,yTestPred,clase){
  res<-0.0
  lenYVal<-length(yVal)
  PV<-length(which(yVal==clase))
  NV<-lenYVal-PV
  TPRV<-length(which(yValPred==clase & yVal==clase))/PV
  FPRV<-length(which(yValPred==clase & yVal!=clase))/NV
  
  lenYTest<-length(yTestPred)
  PT<-length(which(yTestPred==clase))
  NT<-lenYTest-PT
  PPT<-PT/lenYTest
  PNT<-NT/lenYTest
  
  if((TPRV-FPRV)==0){
    res<-0.5
    #res<-0.0
  }
  if((TPRV-FPRV)!=0){
    res<-(CC(yTestPred,clase)-FPRV)/(TPRV-FPRV)
  }
  res[res<0]<-0
  res[res>1]<-1
  
  return(res)
}

Prediction for class: "0"

In [29]:
paste(round(AC(validation_sol,validation_pred,test_pred,"0"),2),"%")

Prediction for class: "1"

In [30]:
paste(round(AC(validation_sol,validation_pred,test_pred,"1"),2),"%")