# PROJECT 1 - BLOG POST

### PDX AIRBNB DATASET


### Source
Inside Airbnb is an independent, non-commercial set of tools and data that allows you to explore how Airbnb is really being used in cities around the world.
By analyzing publicly available information about a city's Airbnb's listings, Inside Airbnb provides filters and key metrics so you can see how Airbnb is being used to compete with the residential housing market.

About Inside Airbnb: http://insideairbnb.com/about.html <BR>
Dataset of Portland, OR: http://insideairbnb.com/get-the-data.html 


### Dataset files description

listings.csv.gz - Detailed Airbnb Listings data for Portland, OR.<BR>
listings.csv - Summary information and metrics for listings in Portland (good for visualisations).<BR>
reviews.csv.gz - Detailed Review Data for listings in Portland.<BR>
reviews.csv - Summary Review data and Listing ID (to facilitate time based analytics and visualisations linked to a listing).<BR>
calendar.csv.gz - Detailed Calendar Data for listings in Portland.<BR>
neighbourhoods.csv - Neighbourhood list for geo filter. Sourced from city or open source GIS files.<BR>
neighbourhoods.geojson - GeoJSON file of neighbourhoods of the city.<BR>


In [1]:
# import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style()

Os dados disponibilizados pelo Inside Airbnb website contém 02 arquivos em formato csv com informações dos anúncios listados no site Airbnb.

- ***listings_full.csv:*** esse arquivo contém todas as informações que foram scrapped do site airbnb <br>
- ***listings.csv:*** esse arquivo é uma versão resumida do arquivo acima com apenas as colunas consieradas mais relevantes.


In [2]:
# load listings_full dataset 
listings_full = pd.read_csv('./data/listings_full.csv')
listings_full.head()

Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,description,neighborhood_overview,picture_url,host_id,host_url,...,review_scores_communication,review_scores_location,review_scores_value,license,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,12899,https://www.airbnb.com/rooms/12899,20201222151945,2020-12-22,"Alberta Arts 2 bedroom suite, charming 1906 house","Firstly, please know that we are practicing so...",We're within walking distance of a grocery and...,https://a0.muscache.com/pictures/225005/cf0083...,49682,https://www.airbnb.com/users/show/49682,...,10.0,10.0,10.0,14-218887-000-00-HO,t,1,1,0,0,4.15
1,18130,https://www.airbnb.com/rooms/18130,20201222151945,2020-12-22,Charming Roseway Bungalow,An adorable 1925 bungalow home in the Roseway ...,,https://a0.muscache.com/pictures/42c1b5ca-4aca...,69824,https://www.airbnb.com/users/show/69824,...,10.0,10.0,10.0,City registration pending,f,1,1,0,0,2.0
2,29931,https://www.airbnb.com/rooms/29931,20201222151945,2020-12-22,Lovely SW Victorian w/Bonus Room and Hot Tub,This house is wonderfully located near downtow...,"While quiet and safe, the neighborhood also ha...",https://a0.muscache.com/pictures/c9a3aeaf-8d99...,79786,https://www.airbnb.com/users/show/79786,...,10.0,10.0,9.0,15-142887-000-00-HO,f,3,3,0,0,0.41
3,37676,https://www.airbnb.com/rooms/37676,20201222151945,2020-12-22,Mt. Hood View in the Pearl District,"This 1,000 SF loft is located in the heart of ...",The Pearl district enjoys a walkability score ...,https://a0.muscache.com/pictures/212298/16fb6b...,162158,https://www.airbnb.com/users/show/162158,...,10.0,10.0,10.0,,f,1,1,0,0,0.97
4,41601,https://www.airbnb.com/rooms/41601,20201222151945,2020-12-22,Grandpa's Bunkhouse-Backyard Studio,Safety First: We have 2 studios here at our ho...,Ours is a neighborhood just off Noodle Flat in...,https://a0.muscache.com/pictures/294ac44c-168d...,179045,https://www.airbnb.com/users/show/179045,...,10.0,10.0,10.0,14-240877-000-00-HO,t,2,2,0,0,1.79


In [4]:
# General visualization of the data information
listings_full.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3541 entries, 0 to 3540
Data columns (total 74 columns):
 #   Column                                        Non-Null Count  Dtype  
---  ------                                        --------------  -----  
 0   id                                            3541 non-null   int64  
 1   listing_url                                   3541 non-null   object 
 2   scrape_id                                     3541 non-null   int64  
 3   last_scraped                                  3541 non-null   object 
 4   name                                          3541 non-null   object 
 5   description                                   3533 non-null   object 
 6   neighborhood_overview                         2805 non-null   object 
 7   picture_url                                   3541 non-null   object 
 8   host_id                                       3541 non-null   int64  
 9   host_url                                      3541 non-null   o

In [68]:
listings_full.number_of_reviews.value_counts()

0      477
1      171
2      101
3       64
4       60
      ... 
472      1
464      1
462      1
458      1
629      1
Name: number_of_reviews, Length: 423, dtype: int64

**O dataset 'listings_full' contém 3541 entradas(rows) e 74 colunas.**

**Os formatos/tipos de dados das colunas(dtypes) são float64, int64 e object.**

In [28]:
listings = pd.read_csv('./data/listings.csv')
listings.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,12899,"Alberta Arts 2 bedroom suite, charming 1906 house",49682,Ali And David,,Concordia,45.56488,-122.63418,Entire home/apt,75,2,551,2020-10-19,4.15,1,361
1,18130,Charming Roseway Bungalow,69824,Katie,,Roseway,45.55342,-122.58002,Entire home/apt,80,3,2,2020-11-30,2.0,1,0
2,29931,Lovely SW Victorian w/Bonus Room and Hot Tub,79786,Ken,,Hayhurst,45.48278,-122.72089,Entire home/apt,200,2,53,2019-08-31,0.41,3,204
3,37676,Mt. Hood View in the Pearl District,162158,Paul,,Pearl,45.52555,-122.68193,Entire home/apt,100,30,123,2019-10-11,0.97,1,365
4,41601,Grandpa's Bunkhouse-Backyard Studio,179045,Jean,,Roseway,45.54935,-122.58425,Entire home/apt,169,3,226,2020-09-05,1.79,2,365


In [29]:
listings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3541 entries, 0 to 3540
Data columns (total 16 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              3541 non-null   int64  
 1   name                            3541 non-null   object 
 2   host_id                         3541 non-null   int64  
 3   host_name                       3539 non-null   object 
 4   neighbourhood_group             0 non-null      float64
 5   neighbourhood                   3541 non-null   object 
 6   latitude                        3541 non-null   float64
 7   longitude                       3541 non-null   float64
 8   room_type                       3541 non-null   object 
 9   price                           3541 non-null   int64  
 10  minimum_nights                  3541 non-null   int64  
 11  number_of_reviews               3541 non-null   int64  
 12  last_review                     30

In [72]:
num_data = listings._get_numeric_data()
num_data.columns.tolist()

['id',
 'host_id',
 'neighbourhood_group',
 'latitude',
 'longitude',
 'price',
 'minimum_nights',
 'number_of_reviews',
 'reviews_per_month',
 'calculated_host_listings_count',
 'availability_365']