# 🏡 💰 Who is putting their homes on AirBnb? 💰🏡

I've always been curious about the people who put up their homes on AirBnb in New York City. Who are these people? Why do they have so many houses? Let's get nosy. More broadly, I was interested in finding out how many people could potentially be doing it commercially, because then AirBnb is not the cute home-sharing platform it claims to be 😔 and is worsening the city's housing crisis 👎  

* The Data Source: http://insideairbnb.com/get-the-data.html 
*This data is sourced from publicly available information from the Airbnb site. I did not perform the scrape myself. 

### Let's start by looking at some rows and exploring the data a little bit

In [1]:
import pandas as pd
df = pd.read_csv('listings1.csv', low_memory=False)
df.tail(3)



Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm,license
38182,54150715,Private PATIO in PRIVATE Room | 5mins to Manha...,340322917,Elizabeth,Brooklyn,Bedford-Stuyvesant,40.692611,-73.933902,Private room,61,13,0,,,1,365,0,
38183,54152478,Center of Columbia University in UWS New York,438705539,Qingnan,Manhattan,Morningside Heights,40.804473,-73.963734,Private room,60,1,0,,,1,159,0,
38184,54161645,Escape to Haven in Manhattan-West 57th Street-...,355450429,Mike,Manhattan,Midtown,40.7644,-73.97796,Entire home/apt,110,1,0,,,4,5,0,


### * Entire home listings are the most popular and can earn you more than shared or private room listings.

In [2]:
df[['room_type','price']].groupby('room_type')\
.agg(['mean','count'])\
.sort_values(by=('price','mean'), ascending=False).round(2)

Unnamed: 0_level_0,price,price
Unnamed: 0_level_1,mean,count
room_type,Unnamed: 1_level_2,Unnamed: 2_level_2
Hotel room,237.07,209
Entire home/apt,210.97,20376
Shared room,150.48,566
Private room,96.8,17034


### * I used the data above for this viz: https://datawrapper.dwcdn.net/B3V0R/2/

### * There are 38,185 unique listings and 25,574 hosts 

In [4]:
len(pd.unique(df['id']))

38185

In [5]:
len(pd.unique(df['host_id']))

25574

### * We are only interested in those hosts who have multiple ratings. Let's see what % of listings they control. The idea is to isolate each 'room_type' in the dataframe, and calculate what percentage of those listings are by those hosts who have more than one listing. And then you repeat!

In [16]:
entire_home = df[df.room_type == 'Entire home/apt']

In [18]:
len(pd.unique(entire_home['host_id']))

15588

In [19]:
new_df = entire_home[entire_home.calculated_host_listings_count != 1]

In [22]:
len(pd.unique(new_df['host_id']))

2211

In [23]:
len(pd.unique(entire_home['id']))

20376

In [24]:
len(pd.unique(new_df['id']))

6999

In [25]:
private_room = df[df.room_type == 'Private room']

In [28]:
newer_df = private_room[private_room.calculated_host_listings_count != 1]

In [26]:
len(pd.unique(private_room['host_id']))

10543

In [29]:
len(pd.unique(newer_df['host_id']))

2662

In [31]:
len(pd.unique(private_room['id']))

17034

In [32]:
len(pd.unique(newer_df['id']))

9153

In [33]:
hotel_room = df[df.room_type == 'Hotel room']

In [35]:
newest_df = hotel_room[hotel_room.calculated_host_listings_count != 1]

In [36]:
len(pd.unique(hotel_room['host_id']))

73

In [37]:
len(pd.unique(newest_df['host_id']))

32

In [38]:
len(pd.unique(hotel_room['id']))

209

In [39]:
len(pd.unique(newest_df['id']))

168

In [40]:
shared_room = df[df.room_type == 'Shared room']

In [41]:
old_df = shared_room[shared_room.calculated_host_listings_count != 1]

In [42]:
len(pd.unique(shared_room['host_id']))

404

In [43]:
len(pd.unique(old_df['host_id']))

160

In [44]:
len(pd.unique(shared_room['id']))

566

In [46]:
len(pd.unique(old_df['id']))

322

### * I used the data above for this viz: https://datawrapper.dwcdn.net/G2nDr/1/ 