This notebook contains the code and rationale for obtaining a **score** for each listing based on the relevant feature vector.


**UNDER CONSTRUCTION!!**

The final feature vector **used only for calculation** contains the following fields:

**HomeAway**
- `$listingId`
- `$priceRent`
- `$reviewCount`
- `$reviewAverage`

**Crime Data**
- `$crimeScore`

**Yelp**
- `$uc1` - User-selected Category 1
- `$uc2` - User-selected Category 2
- `$uc3` - User-selected Category 3

For the purpose of testing, a test case feature vector with these features will be created below.

In [1]:
# Imports
import pandas as pd


In [58]:
## Create test case
# Create test case feature vector
features = pd.DataFrame(index = [0], 
                        data = {
                            "listingId": 7512702,
                            "priceRent": 1150,
                            "reviewCount": 19,
                            "reviewAverage": 4.7894735,
                            "crimeScore": -1,
                            "uc1": 5,
                            "uc2": 9,
                            "uc3": 4
                        })
features = features.append(features.iloc[0] + 10)
features = features.append(features.iloc[0] - 3)

# Create test case weight vector
weights = pd.DataFrame(index = [0],
                      data = {
                          'priceRent': 5,
                          'ratingScore': 5,
                          'crimeScore': 5,
                          'uc1': 3,
                          'uc2': 5,
                          'uc3': 2
                      })

In [59]:
features

Unnamed: 0,listingId,priceRent,reviewCount,reviewAverage,crimeScore,uc1,uc2,uc3
0,7512702.0,1150.0,19.0,4.789473,-1.0,5.0,9.0,4.0
0,7512712.0,1160.0,29.0,14.789473,9.0,15.0,19.0,14.0
0,7512699.0,1147.0,16.0,1.789473,-4.0,2.0,6.0,1.0


In [51]:
weights

Unnamed: 0,priceRent,ratingScore,crimeScore,uc1,uc2,uc3
0,5,5,5,3,5,2


Interesting reddit on normalizing product review ratings: https://www.reddit.com/r/statistics/comments/4svy2e/how_would_i_normalize_product_review_ratings/

In [60]:
# Normalize review ratings from review count and review average using a ?Beta Distribution?
prior_alpha = 3
prior_beta = 2

features['ratingScore'] = ((prior_alpha + (features['reviewCount'] * (features['reviewAverage']/5))) / (prior_beta + prior_alpha + features['reviewCount']))

In [61]:
features

Unnamed: 0,listingId,priceRent,reviewCount,reviewAverage,crimeScore,uc1,uc2,uc3,ratingScore
0,7512702.0,1150.0,19.0,4.789473,-1.0,5.0,9.0,4.0,0.883333
0,7512712.0,1160.0,29.0,14.789473,9.0,15.0,19.0,14.0,2.611145
0,7512699.0,1147.0,16.0,1.789473,-4.0,2.0,6.0,1.0,0.415539


In [65]:
features = features.drop(['reviewCount', 'reviewAverage'], axis=1)
features

Unnamed: 0,listingId,priceRent,crimeScore,uc1,uc2,uc3,ratingScore
0,7512702.0,1150.0,-1.0,5.0,9.0,4.0,0.883333
0,7512712.0,1160.0,9.0,15.0,19.0,14.0,2.611145
0,7512699.0,1147.0,-4.0,2.0,6.0,1.0,0.415539


In [71]:
# Scoring Algorithm

features['finalScore'] = 0 # instantiate scoring algorithm

for col in weights.columns:
    features['finalScore'] = features['finalScore'] + (weights[col] * features[col]) # weighted sum

In [75]:
# Sort listings by `finalScore`
features = features.sort_values(by=['finalScore'], ascending=False)

In [78]:
# Return top 10 listings
output = features.head(10)
output

Unnamed: 0,listingId,priceRent,crimeScore,uc1,uc2,uc3,ratingScore,finalScore
0,7512712.0,1160.0,9.0,15.0,19.0,14.0,2.611145,6026.055727
0,7512702.0,1150.0,-1.0,5.0,9.0,4.0,0.883333,5817.416667
0,7512699.0,1147.0,-4.0,2.0,6.0,1.0,0.415539,5755.077694


In [None]:
def score_noSA(feature_vector, weight_vector): # outputs a score for a record
    for col in weight_vector.columns:
        feature_vector['finalScore'] = feature_vector['finalScore'] + (weight_vector[col] * feature_vector[col]) # weighted sum
    return(feature_vector)