<center><img src="https://github.com/DACSS-Spatial/data_forSpatial/raw/main/logo.png" width="700"></center>

<a target="_blank" href="https://colab.research.google.com/github/DACSS-Spatial/GDF_OPS_applications/blob/main/airbnb.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# POST ON AIRBNB!


I  have a place I wish to rent on AirBnb:


In [None]:
myEstate="80 Rugg Rd, Allston, MA 02134"

It is an **entire unit** with **two bedrooms**.

## The Problem



> How much may I charge for my unit?

## The rationale

> I will find a fair price based on what others owners are charging nearby.

## Getting ready

### Installations

We will need some help from two new libraries:

- **PYSAL** will offer functions to interpolate, that is, estimate an unknown value in a location, based on known values in other locations in the greater area of reference.

- **H3**, which will divide the greater area of reference into smaller pieces. H3 will create a grid of hexagons. [H3 is an Uber idea](https://www.uber.com/en-PE/blog/h3/).

In [None]:
# !pip install pysal h3

### The Data

We will work on Boston data.

Besides the Boston [boundaries map](https://www.mass.gov/info-details/massgis-data-municipalities), we will know the prices of other locations using [insideairbnb](https://insideairbnb.com/get-the-data/).

Let's get the data:




In [None]:
import pandas as pd
import geopandas as gpd


linkBostonBorder="https://github.com/DACSS-Spatial/data_forSpatial/raw/refs/heads/main/BOSTON/GISDATA.TOWNSSURVEY_POLYM.zip"
boston=gpd.read_file(linkBostonBorder)

linkBostonAIrbnb="https://github.com/DACSS-Spatial/data_forSpatial/raw/refs/heads/main/BOSTON/listings.csv"
airbnb_all=pd.read_csv(linkBostonAIrbnb)

The **boston** GDF is already projected, and has just one row.


In [None]:
boston

On the other hand, **airbnb_all** is a DF with several columns:

In [None]:
airbnb_all.info()

Let's keep some relevant columns:

In [None]:
keep=['id','price','bedrooms','property_type','latitude','longitude']
airbnb=airbnb_all[keep].copy()
airbnb.head()

Re check the data types:

In [None]:
airbnb.info()

We need to clean and format the **price** column:

In [None]:
airbnb.price.str.replace(r'\$|\,', '', regex=True).astype(float)

Let's make the change, and get rid of missing data:

In [None]:
#then

airbnb['price']=airbnb.price.str.replace(r'\$|\,', '', regex=True).astype(float)
# bye missing data
airbnb.dropna(inplace=True)

# check
airbnb.info()

Using bedrooms and property type, I will keep the ones similar to mine:

In [None]:
pd.crosstab(airbnb.property_type,airbnb.bedrooms)


Let's proceed:

In [None]:
conditionText="bedrooms==2 & property_type=='Entire rental unit'"
airbnb_source=airbnb.query(conditionText).copy()
airbnb_source

Time to turn the DF of airbnb into a GDF:

- No duplicated units:

In [None]:
# Create a list of columns that must be identical to be considered a duplicate
key_columns = ['latitude', 'longitude']

# Remove rows that are identical across the key columns
airbnb_source_unique = airbnb_source.drop_duplicates(subset=key_columns)

- Into GDF:

In [None]:
airbnb_source_gdf = gpd.GeoDataFrame(
    airbnb_source_unique,
    geometry=gpd.points_from_xy(airbnb_source_unique.longitude,
                                airbnb_source_unique.latitude),
    crs='EPSG:4326') #long / lat



airbnb_source_gdf=airbnb_source_gdf.to_crs(boston.crs)

Let's see both layers:

In [None]:
base=boston.plot(figsize=(10,10),alpha=0.1,edgecolor='black')
airbnb_source_gdf.plot(ax=base,marker="+",color='red')

## The need for Spatial Interpolation


We have a unit for rent, but there is no catalog that tells us how much one should charge. We could ask friends and familiy  for suggestions, but we decide to estimate the rental charge based on what others charge. Then, with lots of prices around, we need to see how proximity plays a role to propose a fair price for our estate. That is when interpolation comes to our help.

Spatial Interpolation assumes Waldo Tobler's [idea](https://en.wikipedia.org/wiki/Tobler%27s_first_law_of_geography) that proximity influences spatially located phenonenom. So, spatial interpolation is the mathematical process that formalizes this idea. It takes our scattered known prices and creates a continuous surface of predicted prices that covers our entire neighborhood, thus allowing us to estimate the price for our specific unlisted location.

Spatial Interpolation has two flavors:
- Areal Interpolation
- Point Interpolation

Our case is **Point Interpolation**, an interpolation of values (prices) from a sparse set of points (the AirBnb units known similar to mine) to any other location in between (we will prepare this soon).

### Creating Target Grid

We have known data:

- Current AirBnb unit locations similar to mine.
- The reantal prices of those AirBnb units.
- The address of my unit.

We do not have:
- The rental price of my unit.

We need a GRID of points, where each point ideally represents potential rental locations with its own rental prices. Since not every point on the grid has a rental price, that value will be estimated from the locations with known rental prices.

How to get that grid?

- **option 1**. Let's get a satellite image of the area that covers Boston, the image should have a sufficient high resolution to identify at least house level area. Several operation would be needed (defining pixel size, extent/clipping, and aligning coordinates), and get the centroid of the square retrieved.

- **option 2**. Let's use UBER's H3 grid. This will split Boston into lots of hexagons with a proper resolution, and get the centroids of those hexagons.

The easiest choice, and likely better, would be the **option 2**. These are the steps:

- call "tobler" library
- use "tobler.util.h3fy" function on "boston" GDF with a "resolution" of 10.
- You would get a warning. But to avoid it:
  * In _tobler.util.h3fy_ use _boston_ with the crs 4326
  * Reproject the result of _tobler.util.h3fy_ with the original _boston_ crs.




In [None]:
import tobler

# this gives a warning, no worries
tobler.util.h3fy(boston, resolution=10)

In [None]:
# this gives no warning, but NOT really needed:
boston_grid10=tobler.util.h3fy(boston.to_crs(4326), resolution=10).to_crs(boston.crs)

Notice we used a prastical level of **resolution** for our case, you can use higher or lower [values](https://gist.github.com/colbyn/001064f00385d253b42693c3889f9beb) depending on the project.

Now, we get the points from the hexagons:

In [None]:
boston_target_locations = boston_grid10.centroid.get_coordinates()
boston_target_locations.head()

We do not estimate prices for the hexagons, we estimate prices for their centroids, now **boston_target_locations**.

### The Estimation approaches


Let's organize the input for  the estimations:

- These are coordinates of the airbnb units:

In [None]:
airbnb_source_locations = airbnb_source_gdf.get_coordinates()
airbnb_source_locations

- These are the known rental prices

In [None]:
airbnb_source_gdf.price

- This is the target grid:

In [None]:
boston_target_locations

Notice none of the data above are geo data, but the projected locations (x,y) can used in other programs. Let's make several estimations:

- **approach 1**: I will charge the same as the closest airbnb similar to mine.

In [None]:
from scipy.interpolate import griddata

boston_grid10["nearest"] = griddata(points=airbnb_source_locations,
                                    values=airbnb_source_gdf.price,
                                    xi=boston_target_locations,
                                    method="nearest")

# here we have:

boston_grid10.plot('nearest', legend=True,figsize=(10,10))

- **approach 2**: I will charge the same as the average of the closest similar AirBnb rental units. How many 'closest units', that **K** value may vary. I will use 10.

In [None]:
from sklearn.neighbors import KNeighborsRegressor

interpolation_uniform = KNeighborsRegressor(n_neighbors=10,
                                            weights="uniform").\
                                            fit(airbnb_source_locations,
                                                airbnb_source_gdf.price)

boston_grid10["knn10_uniform"] = interpolation_uniform.predict(boston_target_locations)

# here we have
boston_grid10.plot("knn10_uniform", legend=True,figsize=(10,10))

- **approach 3**: I will charge based on some close AirBnb rental units, I will weight the average based on their distance to me, a closer unit will weight more, a farther unit will weight less. This is alsa called IDW (Inverse Distance Weight).

In [None]:



interpolation_IDW = KNeighborsRegressor(n_neighbors=10,
                                            weights="distance").\
                                            fit(airbnb_source_locations,
                                                airbnb_source_gdf.price)

boston_grid10["IDW_10"] = interpolation_IDW.predict(boston_target_locations)

# here we have
boston_grid10.plot("IDW_10", legend=True,figsize=(10,10))

- **approach 4**: I will charge based on all AirBnb rental units located within a fixed radius of 1000 meters from my unit. I will then weight the average based on their distance to me, so a closer unit will weight more, and a farther unit will weight less. This is essentially Inverse Distance Weighting (IDW) where the size of the local neighborhood is defined by a distance (1000m) instead of a fixed number of neighbors.

In [None]:
from sklearn.neighbors import RadiusNeighborsRegressor

interpolation_radius = RadiusNeighborsRegressor(
    radius=1000, weights="distance"
)
interpolation_radius.fit(
    airbnb_source_locations, airbnb_source_gdf.price
)

boston_grid10["radius_1000"] = interpolation_radius.predict(boston_target_locations)

boston_grid10.plot("radius_1000", legend=True, missing_kwds={'color': 'lightgrey'},figsize=(10,10))

## So, How much should I charge

Some previous steps:


1.   Find the coordinates of my unit: This is good time to *geocode*.



In [None]:
from geopy.geocoders import Nominatim
from shapely.geometry import Point

geolocator = Nominatim(user_agent="theGeocoder")

myEstate_Address = geolocator.geocode(myEstate)

# see
myEstate_Address

The **geolocator.geocode** returned a **Location** structure. You can access each piece of information from this structure like this:

In [None]:
myEstate_Address.address, myEstate_Address.longitude,myEstate_Address.latitude

You can create a GDF with that:

In [None]:
myEstats_gdf4326 = gpd.GeoDataFrame(
    {'address': [myEstate_Address.address]},
    geometry=[Point(myEstate_Address.longitude, myEstate_Address.latitude)],
    crs="EPSG:4326" # because of lon/lat
)

# reprojecting
myEstats_gdf = myEstats_gdf4326.to_crs(boston.crs)

# here it is
myEstats_gdf


2. Find in which hexagon my unit is located, and bring those prices:

In [None]:
myEstats_gdf.sjoin(
    boston_grid10,
    how="left",
    predicate="within"
)

You decide:

| Price Interpolation Method | Price Recommendation | Rationale |
|---------------------------|----------------------|-----------|
| **nearest** | The most conservative starting price. | It's based only on the price of the single closest competitor. |
| **knn10_uniform** | A balanced average price. | It smooths out the highest/lowest outliers by averaging the prices of the k nearest neighbors equally. |
| **IDW_10** | The most data-driven price (Recommended). | This is your Inverse Distance Weighting (IDW) result. It's the most robust prediction as it gives greater influence to very nearby, similar properties. |
| **radius_1000** | A highly localized price (Use with caution). | Only reflects the market of properties within a short, fixed distance. If you have a NaN here, it suggests your property is too remote for a localized comparison. |



Since we have no missing values, we could choose 'radius_1000' option':

In [None]:
base=boston_grid10.plot("radius_1000", legend=True, missing_kwds={'color': 'lightgrey'},figsize=(10,10))
myEstats_gdf.plot(ax=base,color='red')

______

[BACK TO MAIN MENU](https://dacss-spatial.github.io/GDF_OPS_applications/)