## Problem definition

As a host, if we try to charge above market price for a living space we'd like to rent, then renters will select more affordable alternatives which are similar to ours. If we set our nightly rent price too low, we'll miss out on potential revenue.

One strategy we could use is to:

* find a few listings that are similar to ours,
* average the listed price for the ones most similar to ours,
* set our listing price to this calculated average price.

The process of discovering patterns in existing data to make a prediction is called **machine learning**. In our case, we want to use data on local listings to predict the optimal price for us to set. In this mission, we'll explore a specific machine learning technique called k-nearest neighbors, which mirrors the strategy we just described. Before we dive further into machine learning and **k-nearest neighbors**, let's get familiar with the dataset we'll be working with.

## Introduction to the data

In [2]:
import pandas as pd

dc_listings = pd.read_csv('dc_airbnb.csv')
print(dc_listings)

     host_response_rate host_acceptance_rate  host_listings_count  \
0                   92%                  91%                   26   
1                   90%                 100%                    1   
2                   90%                 100%                    2   
3                  100%                  NaN                    1   
4                   92%                  67%                    1   
...                 ...                  ...                  ...   
3718               100%                  60%                    1   
3719               100%                  50%                    1   
3720               100%                 100%                    2   
3721                88%                 100%                    1   
3722                70%                 100%                    1   

      accommodates        room_type  bedrooms  bathrooms  beds    price  \
0                4  Entire home/apt       1.0        1.0   2.0  $160.00   
1                6  E

## Euclidean distance

In [7]:
import numpy as np
dc_listings.loc[0, 'accommodates']

first_distance = np.linalg.norm(dc_listings.loc[0, 'accommodates'] - 3) 

print(first_distance)

1.0


## Calculate distance for all observations

In [9]:
dc_listings['distance'] = dc_listings.accommodates.apply(lambda x: abs(x-3))


In [10]:
dc_listings['distance'].value_counts()

1     2294
2      503
0      461
3      279
5       73
4       35
7       22
6       17
9       12
13       8
8        7
12       6
11       4
10       2
Name: distance, dtype: int64

## Randomizing, and sorting

In [12]:
np.random.permutation(dc_listings.index)

array([ 182, 2501, 2292, ..., 2740, 2590,  248], dtype=int64)

In [14]:
dc_listings = dc_listings.loc[np.random.permutation(dc_listings.index), :]

In [17]:
dc_listings = dc_listings.sort_values(by='distance')

In [19]:
dc_listings.price.head(10)

3219     $99.00
2957    $120.00
1015    $139.00
2868    $125.00
2744    $100.00
1177    $200.00
930     $100.00
2893    $145.00
2348    $138.00
2327    $115.00
Name: price, dtype: object

## Average price

In [22]:
dc_listings.price = dc_listings.price.map(lambda x: x.replace(',','')).map(lambda x: x.replace('$','')).astype(float)

In [36]:
mean_price = dc_listings.price[:5].mean()

In [37]:
print(mean_price)

116.6


## Function to make predictions

In [38]:
# Brought along the changes we made to the `dc_listings` Dataframe.
dc_listings = pd.read_csv('dc_airbnb.csv')
stripped_commas = dc_listings['price'].str.replace(',', '')
stripped_dollars = stripped_commas.str.replace('$', '')
dc_listings['price'] = stripped_dollars.astype('float')
dc_listings = dc_listings.loc[np.random.permutation(len(dc_listings))]

def predict_price(new_listing):
    temp_df = dc_listings.copy()
    ## Complete the function.
    temp_df['distance'] = temp_df.accommodates.apply(lambda x: abs(x - new_listing))
    new_listing = temp_df.sort_values(by='distance').head().price.mean()
    return(new_listing)

acc_one = predict_price(1)
acc_two = predict_price(2)
acc_four = predict_price(4)