# Airbnb analysis for European capital cities
## 1. Introduction
Airbnb has posted the listing data for various cities across the world on http://insideairbnb.com/get-the-data.html. In this project I will be analyzing the data to find insights to housing prices across major cities in Europe.

In [19]:
import requests
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os

## 2. Data Gathering
Airbnb has posted the listing data for various cities across the world on http://insideairbnb.com/get-the-data.html. I have downloaded the data to my local hard drive. The data is in the form of a compressed csv file.

In [41]:
# Listing all the folders with different city information stored in the local hard drive

path = r'C:\Users\srini\Projects\Airbnb'
folders= os.listdir(path)
folders

['Amsterdam',
 'Berlin',
 'Brussels',
 'Copenhagen',
 'Lisbon',
 'London',
 'Madrid',
 'Oslo',
 'Paris',
 'Rome',
 'Stockholm']

In [31]:
# Importing the listing information for Amsterdam
df= pd.read_csv(os.path.join(path,folders[0], 'listings.csv.gz'))
df.head(2)

Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,summary,space,description,experiences_offered,neighborhood_overview,...,instant_bookable,is_business_travel_ready,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,2818,https://www.airbnb.com/rooms/2818,20200508171622,2020-05-09,Quiet Garden View Room & Super Fast WiFi,Quiet Garden View Room & Super Fast WiFi,I'm renting a bedroom (room overlooking the ga...,Quiet Garden View Room & Super Fast WiFi I'm r...,none,"Indische Buurt (""Indies Neighborhood"") is a ne...",...,t,f,strict_14_with_grace_period,f,f,1,0,1,0,2.05
1,9693,https://www.airbnb.com/rooms/9693,20200508171622,2020-05-09,Top Location on Canal (Center Flat),You will love your stay here. It is a beautifu...,This beautiful apartment in the heart of 17th ...,You will love your stay here. It is a beautifu...,none,You will be on a beautiful quite canal and wil...,...,t,f,moderate,f,f,1,1,0,0,0.45


In [37]:
df.shape

(19278, 106)

Since the data for each city has the same file name 'listings' it will be accessed from across all the folders and merged to form a master dataframe.

In [56]:
# Merging the listing across all cities to form a master dataframe
for i,city in enumerate(folders):
    try:
        if i==0:
            df= pd.read_csv(os.path.join(path,city, 'listings.csv.gz'), low_memory=False)
                                                # reading the csv file for Amsterdam
        else:
            df= pd.concat([df,pd.read_csv(os.path.join(path,city, 'listings.csv.gz'),low_memory=False)] ) 
                                                # joining the remaining files
            
    except:
        print(city)


In [55]:
# Sending the compiled file to a local hard drive
df.to_csv(os.path.join(path,'listing_master.csv'), index= False)
df.shape

(320401, 106)

The master dataframe consists of more than 300,000 rows and 106 columns.

## 3. Data Wrangling

In [51]:
for i,col_name in enumerate(df.columns):
    print(col_name, df.iloc[0,i])

id 2818
listing_url https://www.airbnb.com/rooms/2818
scrape_id 20200508171622
last_scraped 2020-05-09
name Quiet Garden View Room & Super Fast WiFi
summary Quiet Garden View Room & Super Fast WiFi
space I'm renting a bedroom (room overlooking the garden) in my apartment in Amsterdam,  The room is located to the east of the city centre in a quiet, typical Amsterdam neighbourhood the "Indische Buurt". Amsterdam’s historic centre is less than 15 minutes away by bike or tram. The features of the room are: - Twin beds (80 x 200 cm, down quilts and pillows)  - 2 pure cotton towels for each guest  - reading lamps - bedside table - wardrobe - table with chairs - tea and coffee making facilities - mini bar - alarm clock - Hi-Fi system with cd player, connection for mp3 player / phone - map of Amsterdam and public transport - Wi-Fi Internet connection  Extra services: - Bike rental
description Quiet Garden View Room & Super Fast WiFi I'm renting a bedroom (room overlooking the garden) in my apa

Only some columns are essential for our current analysis. We will make a subset of our dataframe with the essential columns.

In [59]:
# Making a list for essential columns
essential_columns= ['id', 'name', 'host_id','host_since', 'host_response_time','host_response_rate','host_acceptance_rate',\
'host_is_superhost','host_total_listings_count','host_verifications','street','city','country',\
 'property_type','room_type','accommodates','bathrooms','bedrooms','beds','bed_type','amenities','square_feet',\
'price','weekly_price','monthly_price','security_deposit','cleaning_fee','guests_included','extra_people','minimum_nights',\
'maximum_nights','availability_30','availability_60','availability_90','availability_365', 'number_of_reviews', \
'number_of_reviews_ltm','first_review','last_review','review_scores_rating','review_scores_accuracy',\
'review_scores_cleanliness','review_scores_checkin','review_scores_communication','review_scores_location',\
'review_scores_value','is_business_travel_ready','cancellation_policy','require_guest_profile_picture','require_guest_phone_verification',\
'calculated_host_listings_count','calculated_host_listings_count_entire_homes',\
'calculated_host_listings_count_private_rooms','calculated_host_listings_count_shared_rooms','reviews_per_month']


In [61]:
# Making a subset dataframe with essential columns
df = df[essential_columns]

# Creating a backup for the cleaned dataframe
df.to_csv(os.path.join(path,'listing_master_clean.csv'), index= False)

df.shape

(320401, 55)