<a href="https://colab.research.google.com/github/lamphgg/Airbnb_Amsterdam/blob/main/Numpy_Project_code_PFDI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Downloading the Dataset


In [1]:
%%capture
!pip install numpy pandas streamlit gdown currencyconverter

In [2]:
import numpy as np

# For readability purposes, we will disable scientific notation for numbers
np.set_printoptions(suppress=True)

In [3]:
import os
import shutil

import gdown
from numpy import genfromtxt

# Download file from Google Drive
# This file is based on data from: http://insideairbnb.com/get-the-data/
file_id_1 = "13fyESiH1ZEnMV6eabAyhe20t4W6peEWK"
downloaded_file_1 = "WK1_Airbnb_Amsterdam_listings_proj.csv"

# Download the file from Google Drive
gdown.download(id=file_id_1, output=downloaded_file_1)

Downloading...
From: https://drive.google.com/uc?id=13fyESiH1ZEnMV6eabAyhe20t4W6peEWK
To: /content/WK1_Airbnb_Amsterdam_listings_proj.csv
100%|██████████| 246k/246k [00:00<00:00, 73.3MB/s]


'WK1_Airbnb_Amsterdam_listings_proj.csv'

## Preprocessing the Dataset


In [4]:
 from numpy import genfromtxt

my_data = genfromtxt(downloaded_file_1, delimiter='|',dtype='unicode')

In [5]:
print(my_data[:,:4])

[['' '0' '1' '2']
 ['id' '23726706' '35815036' '31553121']
 ['price' '$88.00' '$105.00' '$152.00']
 ['latitude' '52.34916' '52.42419' '52.43237']
 ['longitude' '4.97879' '4.95689' '4.91821']]


In [6]:
# Remove the first column and row
matrix = my_data[1:,1:]
matrix[:,:4]

array([['23726706', '35815036', '31553121', '34745823'],
       ['$88.00', '$105.00', '$152.00', '$87.00'],
       ['52.34916', '52.42419', '52.43237', '52.2962'],
       ['4.97879', '4.95689', '4.91821', '5.01231']], dtype='<U18')

In [7]:
# Shift the matrix by 90 degrees
matrix = matrix.T
print(matrix[:5,:])

[['23726706' '$88.00' '52.34916' '4.97879']
 ['35815036' '$105.00' '52.42419' '4.95689']
 ['31553121' '$152.00' '52.43237' '4.91821']
 ['34745823' '$87.00' '52.2962' '5.01231']
 ['44586947' '$160.00' '52.31475' '5.0303']]


In [8]:
# Remove the dollar sign and the comma
matrix = np.char.replace(matrix, "$", "")
matrix = np.char.replace(matrix, ",", "")

In [9]:
# Check if the dollar sign is in our dataset
matrix[(np.char.find(matrix, "$") > -1)]

array([], dtype='<U18')

In [10]:
# Check if the comma sign is in our dataset
matrix[(np.char.find(matrix, ",") > -1)]

array([], dtype='<U18')

In [11]:
# Change Unicode to float32
matrix = np.float32(matrix)
print(matrix[:5,:])

[[23726706.            88.            52.34916        4.97879]
 [35815036.           105.            52.42419        4.95689]
 [31553120.           152.            52.43237        4.91821]
 [34745824.            87.            52.2962         5.01231]
 [44586948.           160.            52.31475        5.0303 ]]


## Convert currency

In [12]:
from currency_converter import CurrencyConverter

cc = CurrencyConverter()

# Entries: airbnb_id, price_usd, latitude, longitude
print(matrix[:5,:])

[[23726706.            88.            52.34916        4.97879]
 [35815036.           105.            52.42419        4.95689]
 [31553120.           152.            52.43237        4.91821]
 [34745824.            87.            52.2962         5.01231]
 [44586948.           160.            52.31475        5.0303 ]]


In [13]:
print('\tSecond column is:\n', matrix[:, 1])

	Second column is:
 [ 88. 105. 152. ... 180. 174.  65.]


In [14]:
cc.currencies

{'AUD',
 'BGN',
 'BRL',
 'CAD',
 'CHF',
 'CNY',
 'CYP',
 'CZK',
 'DKK',
 'EEK',
 'EUR',
 'GBP',
 'HKD',
 'HRK',
 'HUF',
 'IDR',
 'ILS',
 'INR',
 'ISK',
 'JPY',
 'KRW',
 'LTL',
 'LVL',
 'MTL',
 'MXN',
 'MYR',
 'NOK',
 'NZD',
 'PHP',
 'PLN',
 'ROL',
 'RON',
 'RUB',
 'SEK',
 'SGD',
 'SIT',
 'SKK',
 'THB',
 'TRL',
 'TRY',
 'USD',
 'ZAR'}

#### Converting to EUR

In [15]:
eur_rate = cc.convert(1,'USD','EUR')

# Multiply the dollar column by EUR currency
matrix[:,1] = matrix[:,1] * eur_rate
print(matrix[:,1])

[ 81.37599  97.09636 140.55853 ... 166.4509  160.90253  60.10727]


In [16]:
# Multiply the dollar column by the inflation percentage (1.00 + inflation)
matrix[:,1] = matrix[:,1] * (1.00 + 0.07)
matrix[:,1]

array([ 87.07232, 103.89311, 150.39764, ..., 178.10246, 172.16571,
        64.31478], dtype=float32)

In [17]:
# Round down the new currency column to 2 decimals
matrix[:,1] = np.round_(matrix[:,1], decimals=2)
matrix[:,1]

array([ 87.07, 103.89, 150.4 , ..., 178.1 , 172.17,  64.31], dtype=float32)

#### Choose a location: Van Gogh Museum

I get the coordinates from [here](https://www.google.com/search?q=coordinates+of+van+gogh+museum+amsterdam&rlz=1C1ONGR_enUS1042US1042&oq=coordinates+of+van+gogh+museum+amsterdam&aqs=chrome..69i57j0i22i30i625j0i390l3.5845j0j7&sourceid=chrome&ie=UTF-8)




In [18]:
#Coordinates of Van Gogh Museum
latitude = 52.3584
longitude = 4.8811

In [19]:
import math

def from_location_to_airbnb_listing_in_meters(lat1: float, lon1: float, lat2: list, lon2: list):
    # Source: https://community.esri.com/t5/coordinate-reference-systems-blog
    # /distance-on-a-sphere-the-haversine-formula/ba-p/902128
    
    R = 6371000  # Radius of Earth in meters
    phi_1 = math.radians(lat1)
    phi_2 = math.radians(lat2)

    delta_phi = math.radians(lat2 - lat1)
    delta_lambda = math.radians(lon2 - lon1)

    a = (
        math.sin(delta_phi / 2.0) ** 2
        + math.cos(phi_1) * math.cos(phi_2) * math.sin(delta_lambda / 2.0) ** 2
    )

    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))

    meters = R * c  # Output distance in meters

    return round(meters, 0)

In [20]:
# Create a loop or vectorized way to calculate the distance,
# going over all latitude and longitude entries in the dataset

vectorizing_function = np.vectorize(from_location_to_airbnb_listing_in_meters)
distance = vectorizing_function(latitude, longitude, matrix[:,2], matrix[:,3])

In [21]:
%%timeit -r 4 -n 100

# Allow a Python function to be used in a (semi-)vectorized way
conv_to_meters = np.vectorize(from_location_to_airbnb_listing_in_meters)

# Apply the function, use timeit
conv_to_meters(latitude, longitude, matrix[:, 2], matrix[:, 3])

36.4 ms ± 10.8 ms per loop (mean ± std. dev. of 4 runs, 100 loops each)


---

#### (Extra Credit)  Task 12: NumPy all the way!

[*\[Related section on CoRise\]*](https://corise.com/course/python-for-data-science/v2/module/math-superpowers#corise_cl9k5z30h005w3b6p5jv0vs82)

Now convert the `from_location_to_airbnb_listing_in_meters` function into a pure NumPy function. You can do this by changing all the imported math functions into their NumPy variant. ***For now just calculate these numbers. Don't add it as a new column to your matrix.***

In [22]:
def from_location_to_airbnb_listing_in_meters(lat1: float, lon1: float, lat2: list, lon2: list):   
    R = 6371000  # Radius of Earth in meters
    phi_1 = np.radians(lat1) # CHANGE THIS
    phi_2 = np.radians(lat2) # CHANGE THIS

    delta_phi = np.radians(lat2 - lat1) # CHANGE THIS
    delta_lambda = np.radians(lon2 - lon1) # CHANGE THIS

    a = (
        np.sin(delta_phi / 2.0) ** 2 # CHANGE THIS
        + np.cos(phi_1) * np.cos(phi_2) * np.sin(delta_lambda / 2.0) ** 2 # CHANGE THIS (3x)
    )

    c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1 - a)) # CHANGE THIS (3x)

    meters = R * c  # Output distance in meters

    return np.around(meters, 0) # CHANGE THIS

In [23]:
# Run the converted NumPy method and check if it works
from_location_to_airbnb_listing_in_meters(latitude, longitude, matrix[:, 2], matrix[:, 3])

array([6714., 8943., 8602., ..., 7680., 4432., 5600.])

In [24]:
%%timeit -r 4 -n 100
from_location_to_airbnb_listing_in_meters(latitude, longitude, matrix[:, 2], matrix[:, 3])

617 µs ± 107 µs per loop (mean ± std. dev. of 4 runs, 100 loops each)


WOW! You see a massive speed-up just by switching your functions from default Python functions to their NumPy variants! Awesome!

---

## Prep the Dataset for Download!


Now that we've created a function to calculate the distance in meters for every Airbnb listing, we'll perform this calculation on the entire dataset and add the outputs to the matrix as a new column.

Next to that, we'll add another column that contains only ones and zeros to represent the "color" of an entry/row. This column can be used later if you want to turn this dataset into an app using [Streamlit](https://streamlit.io/). This resource is great for when you want to translate your Python projects into an interactive website. More on that in the next section.

As you'll see from the code, we'll also add our favorite location as an entry/row. (We've selected the coordinates of the Rijksmuseum. Feel free to change it to your favorite location in Amsterdam). 




In [25]:
# Run the previous method
meters = from_location_to_airbnb_listing_in_meters(
    latitude, longitude, matrix[:, 2], matrix[:, 3]
)

# Add an axis to make concatenation possible
meters = meters.reshape(-1, 1)

# Append the distance in meters to the matrix
matrix = np.concatenate((matrix, meters), axis=1)

In [26]:
# Append a color to the matrix
colors = np.zeros(meters.shape)
matrix = np.concatenate((matrix, colors), axis=1)

# Append our entry to the matrix
fav_entry = np.array([1, 0, 52.3600, 4.8852, 0, 1]).reshape(1, -1) # Change coordinates to your favorite location
matrix = np.concatenate((fav_entry, matrix), axis=0)

# Entries: airbnb_id, price, latitude, longitude,
# meters from favorite point, color
matrix[:5, :]

array([[       1.        ,        0.        ,       52.36      ,
               4.8852    ,        0.        ,        1.        ],
       [23726706.        ,       87.06999969,       52.34915924,
               4.97878981,     6714.        ,        0.        ],
       [35815036.        ,      103.88999939,       52.42419052,
               4.95689011,     8943.        ,        0.        ],
       [31553120.        ,      150.3999939 ,       52.43236923,
               4.91821003,     8602.        ,        0.        ],
       [34745824.        ,       86.08000183,       52.2961998 ,
               5.01231003,    11284.        ,        0.        ]])

In [27]:
# Export the data to use in the primer for next week
np.savetxt("WK1_Airbnb_Amsterdam_listings_proj_solution.csv", matrix, delimiter=",")

Great! By running all the cells above, you've saved the matrix here on your Google Colab instance. Let's now look into how to download the dataset to your local machine.

### Download the Dataset to Local Machine!

In [28]:
from google.colab import files

# Download the file locally
files.download('WK1_Airbnb_Amsterdam_listings_proj_solution.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [29]:
%%writefile streamlit_app.py
import pandas as pd
import plotly.express as px
import streamlit as st

# Display title and text
st.title("Week 1 - Data and visualization")
st.markdown("Here we can see the dataframe created during this weeks project.")

# Read dataframe
dataframe = pd.read_csv(
    "WK1_Airbnb_Amsterdam_listings_proj_solution.csv",
    names=[
        "Airbnb Listing ID",
        "Price",
        "Latitude",
        "Longitude",
        "Meters from chosen location",
        "Location",
    ],
)

# We have a limited budget, therefore we would like to exclude
# listings with a price above 100 pounds per night
dataframe = dataframe[dataframe["Price"] <= 100]

# Display as integer
dataframe["Airbnb Listing ID"] = dataframe["Airbnb Listing ID"].astype(int)
# Round of values
dataframe["Price"] = "£ " + dataframe["Price"].round(2).astype(str) # <--- CHANGE THIS POUND SYMBOL IF YOU CHOSE CURRENCY OTHER THAN POUND
# Rename the number to a string
dataframe["Location"] = dataframe["Location"].replace(
    {1.0: "To visit", 0.0: "Airbnb listing"}
)

# Display dataframe and text
st.dataframe(dataframe)
st.markdown("Below is a map showing all the Airbnb listings with a red dot and the location we've chosen with a blue dot.")

# Create the plotly express figure
fig = px.scatter_mapbox(
    dataframe,
    lat="Latitude",
    lon="Longitude",
    color="Location",
    zoom=11,
    height=500,
    width=800,
    hover_name="Price",
    hover_data=["Meters from chosen location", "Location"],
    labels={"color": "Locations"},
)
fig.update_geos(center=dict(lat=dataframe.iloc[0][2], lon=dataframe.iloc[0][3]))
fig.update_layout(mapbox_style="stamen-terrain")

# Show the figure
st.plotly_chart(fig, use_container_width=True)

Writing streamlit_app.py


### Make an App on Streamlit!

We are going to use [Streamlit Share](https://share.streamlit.io/) to host this project. It's a website that allows us to host interactive projects for free online!

<center>
  <img src=https://i.ibb.co/gRhj6Jd/Screen-Shot-2022-11-10-at-3-58-17-PM.png width="500" align="center" />
</center>
<br/>

Five out of the six columns in the dataset are used as so:
- **Price**: Hovering over a blue/red dot displays in **bold** the price at the top
- **Latitude**: Used to plot the blue/red dot on the map
- **Longitude**: Used to plot the blue/red dot on the map
- **Meters from favorite point**: Hovering over a blue/red dot displays the number of meters from the blue point
- **Color**: Dependent on the category its assigned

To visualize this, we will again use a library called [Streamlit](https://streamlit.io/).

In [30]:
from google.colab import files

# Download the file locally
files.download('streamlit_app.py')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [31]:
%%writefile requirements.txt
pandas
streamlit
plotly

Writing requirements.txt


In [32]:
from google.colab import files

# Download the file locally
files.download('requirements.txt')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>