## Predicting Airbnb Prices

#### Model 1:

In [1]:
import numpy as np
import pandas as pd
from gurobipy import *

In [2]:
df = pd.read_csv('AirbnbTrain.csv')
df1 = pd.read_csv('AirbnbTest.csv')

In [3]:
df.head()

Unnamed: 0,latitude,longitude,Entire home,accommodates,bathrooms,bedrooms,beds,cleaning_fee,minimum_nights,number_of_reviews,review_scores_rating,instant_bookable,price
0,34.103701,-118.332241,1,13,2.0,3,2,150,2,1,100,1,350
1,34.099484,-118.331645,1,8,2.0,2,4,150,1,11,96,1,190
2,34.104321,-118.329662,1,4,1.0,0,1,55,1,1,80,0,85
3,34.101028,-118.317848,0,2,1.0,1,1,20,1,8,98,0,75
4,34.098292,-118.32498,1,2,1.0,1,1,20,1,11,96,0,130


In [4]:
df1.head()

Unnamed: 0,latitude,longitude,Entire home,accommodates,bathrooms,bedrooms,beds,cleaning_fee,minimum_nights,number_of_reviews,review_scores_rating,instant_bookable,price
0,34.100604,-118.341787,0,2,1.0,1,1,40,1,261,96,1,100
1,34.100607,-118.350583,1,8,2.0,2,2,100,2,10,98,0,300
2,34.10061,-118.347617,1,2,1.0,1,1,80,2,1,100,1,125
3,34.100611,-118.34218,1,3,1.0,0,2,55,2,54,97,1,169
4,34.100618,-118.342791,1,4,1.0,1,1,70,2,233,92,1,119


In [5]:
# Splitting into X and Y
X = df.iloc[:,:-1].values
Y = df.iloc[:,-1].values
n = len(df.index)
m = 12

In [6]:
m1 = Model()

beta = m1.addVars(m, lb = -GRB.INFINITY)
z = m1.addVars(n)

for i in range(n):
    m1.addConstr(z[i] >= Y[i] - sum(beta[j]*X[i][j] for j in range(m)))
    
for i in range(n):
    m1.addConstr(z[i] >= sum(beta[j]*X[i][j] for j in range(m)) - Y[i])

m1.setObjective((sum(z[i] for i in range(n))/n), GRB.MINIMIZE)

Set parameter Username
Academic license - for non-commercial use only - expires 2024-01-05


In [7]:
m1.update()
m1.optimize()

Gurobi Optimizer version 10.0.0 build v10.0.0rc2 (mac64[rosetta2])

CPU model: Apple M1
Thread count: 8 physical cores, 8 logical processors, using up to 8 threads

Optimize a model with 3400 rows, 1712 columns and 41372 nonzeros
Model fingerprint: 0xc48519e8
Coefficient statistics:
  Matrix range     [5e-01, 5e+02]
  Objective range  [6e-04, 6e-04]
  Bounds range     [0e+00, 0e+00]
  RHS range        [1e+01, 2e+03]
Presolve time: 0.02s
Presolved: 3400 rows, 1712 columns, 41372 nonzeros

Concurrent LP optimizer: primal simplex, dual simplex, and barrier
Showing barrier log only...

Ordering time: 0.00s

Barrier statistics:
 Dense cols : 12
 Free vars  : 12
 AA' NZ     : 2.995e+04
 Factor NZ  : 3.260e+04 (roughly 2 MB of memory)
 Factor Ops : 4.141e+05 (less than 1 second per iteration)
 Threads    : 1

                  Objective                Residual
Iter       Primal          Dual         Primal    Dual     Compl     Time
   0   2.33436347e+03  0.00000000e+00  0.00e+00 0.00e+00  2.

In [8]:
for i in range(m):
    print(df1.columns[i],':',beta[i].x)

latitude : 292.9273144176643
longitude : 84.7352951197452
Entire home : 33.22412746150925
accommodates : 10.587079438384528
bathrooms : 28.741561239586503
bedrooms : 20.217437710193302
beds : -2.8104760271025704
cleaning_fee : 0.4182472935851169
minimum_nights : -1.8166763809057307
number_of_reviews : -0.029190391189600314
review_scores_rating : 0.27329921119059414
instant_bookable : 3.953641200791045


In [9]:
X1 = df1.iloc[:,:-1].values
Y1 = df1.iloc[:,-1].values
n1 = len(df1.index)

In [18]:
error1 = 0
for i in range(n1):
    s = 0
    for j in range(m):
        s = s + X1[i][j]*beta[j].X
    error1 = error1 + abs(s - Y1[i])
pe1 = error1/n1
print('Prediction error:', pe1)

Prediction error: 34.606120752768526


#### Model 2:

In [10]:
m2 = Model()

beta2 = m2.addVars(m, lb = -GRB.INFINITY)
z2 = m2.addVars(n)
a = m2.addVars(m, vtype = GRB.BINARY)
M = 500

for i in range(n):
    m2.addConstr(z2[i] >= Y[i] - sum(beta2[j]*X[i][j] for j in range(m)))
    
for i in range(n):
    m2.addConstr(z2[i] >= sum(beta2[j]*X[i][j] for j in range(m)) - Y[i])
    
for i in range(m):
    m2.addConstr(beta2[i] <= M*a[i])
    m2.addConstr(beta2[i] >= -M*a[i])
    
m2.addConstr(sum(a[i] for i in range(m)) == 3)

m2.setObjective((sum(z2[i] for i in range(n))/n), GRB.MINIMIZE)

In [11]:
m2.update()
m2.optimize()

Gurobi Optimizer version 10.0.0 build v10.0.0rc2 (mac64[rosetta2])

CPU model: Apple M1
Thread count: 8 physical cores, 8 logical processors, using up to 8 threads

Optimize a model with 3425 rows, 1724 columns and 41432 nonzeros
Model fingerprint: 0x607ae293
Variable types: 1712 continuous, 12 integer (12 binary)
Coefficient statistics:
  Matrix range     [5e-01, 5e+02]
  Objective range  [6e-04, 6e-04]
  Bounds range     [1e+00, 1e+00]
  RHS range        [3e+00, 2e+03]
Found heuristic solution: objective 144.9682353
Presolve removed 828 rows and 414 columns
Presolve time: 0.05s
Presolved: 2597 rows, 1310 columns, 31298 nonzeros
Variable types: 1298 continuous, 12 integer (12 binary)

Root relaxation: objective 3.573002e+01, 1939 iterations, 0.14 seconds (0.41 work units)

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0   35.73002    0   10  144.96824   35.73002  75.4%  

##### Question 1:

In [23]:
for i in range(m):
    if beta2[i].X != 0:
        print(df1.columns[i], ':', beta2[i].X)

Entire home : 52.0
accommodates : 14.0
bedrooms : 32.0


##### Question 2:

In [24]:
error2 = 0
for i in range(n1):
    s = 0
    for j in range(m):
        s = s + X1[i][j]*beta2[j].X
    error2 = error2 + abs(s - Y1[i])
pe2 = error2/n1
print('Prediction error:', pe2)

Prediction error: 37.73676680972818


#### Model 3:

In [12]:
m3 = Model()

beta3 = m3.addVars(m, lb = -GRB.INFINITY)
z3 = m3.addVars(n)
a3 = m3.addVars(m, vtype = GRB.BINARY)

for i in range(n):
    m3.addConstr(z3[i] >= Y[i] - sum(beta3[j]*X[i][j] for j in range(m)))
    
for i in range(n):
    m3.addConstr(z3[i] >= sum(beta3[j]*X[i][j] for j in range(m)) - Y[i])
    
for i in range(m):
    m3.addConstr(beta3[i] <= M*a3[i])
    m3.addConstr(beta3[i] >= -M*a3[i])
    
m3.addConstr(sum(a3[i] for i in range(m)) == 3)
m3.addConstr(a3[6] == 1)

m3.setObjective((sum(z3[i] for i in range(n))/n), GRB.MINIMIZE)

In [13]:
m3.update()
m3.optimize()

Gurobi Optimizer version 10.0.0 build v10.0.0rc2 (mac64[rosetta2])

CPU model: Apple M1
Thread count: 8 physical cores, 8 logical processors, using up to 8 threads

Optimize a model with 3426 rows, 1724 columns and 41433 nonzeros
Model fingerprint: 0xc4d00964
Variable types: 1712 continuous, 12 integer (12 binary)
Coefficient statistics:
  Matrix range     [5e-01, 5e+02]
  Objective range  [6e-04, 6e-04]
  Bounds range     [1e+00, 1e+00]
  RHS range        [1e+00, 2e+03]
Found heuristic solution: objective 144.9682353
Presolve removed 831 rows and 415 columns
Presolve time: 0.04s
Presolved: 2595 rows, 1309 columns, 31293 nonzeros
Variable types: 1298 continuous, 11 integer (11 binary)

Root relaxation: objective 3.573002e+01, 1542 iterations, 0.13 seconds (0.37 work units)

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0   35.73002    0   10  144.96824   35.73002  75.4%  

##### Question 1

In [26]:
for i in range(m):
    if beta3[i].X != 0:
        print(df1.columns[i], ':', beta3[i].X)

Entire home : 67.875
bedrooms : 47.375
beds : 12.125


##### Question 2

The variable `accomodates` was dropped in model 3. This is because knowing the number of beds and bedrooms may be enough to conclude how many people the Airbnb can accomodate. This variable is thus redundant, and also correlated with `beds`.

##### Question 3

In [25]:
error3 = 0
for i in range(n1):
    s = 0
    for j in range(m):
        s = s + X1[i][j]*beta3[j].X
    error3 = error3 + abs(s - Y1[i])
pe3 = error3/n1
print('Prediction error:', pe3)

Prediction error: 38.59960658082976
