# Rental Dataset

## Scope: The goal of this notebook is to explore and understand each and every column in this dataset. 

This seems like a daunting task since there are 20 columns, though there is luck in the sense that all columns are clean and don't require further processing. For most of the columns, there will be basic exploration (since future notebooks will deal with relationships between two columns in depth) and for some others such as `rent` we can expect deeper analysis in this notebook. 

## Import Relevant Libraries

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [4]:
# printing out the version of pandas used for this analysis
print(pd.__version__)

1.2.1


## Get Data

In [18]:
rental_df = pd.read_csv('new_rental')

In [19]:
rental_df.head()

Unnamed: 0.1,Unnamed: 0,rental_id,building_id,rent,bedrooms,bathrooms,size_sqft,min_to_subway,floor,building_age_yrs,...,has_roofdeck,has_washer_dryer,has_doorman,has_elevator,has_dishwasher,has_patio,has_gym,neighborhood,submarket,borough
0,0,1545,44518357,2550,0.0,1,480,9,2.0,17,...,yes,no,no,yes,yes,no,yes,Upper East Side,All Upper East Side,Manhattan
1,1,2472,94441623,11500,2.0,2,2000,4,1.0,96,...,no,no,no,no,no,no,no,Greenwich Village,All Downtown,Manhattan
2,2,10234,87632265,3000,3.0,1,1000,4,1.0,106,...,no,no,no,no,no,no,no,Astoria,Northwest Queens,Queens
3,3,2919,76909719,4500,1.0,1,916,2,51.0,29,...,yes,no,yes,yes,yes,no,no,Midtown,All Midtown,Manhattan
4,4,2790,92953520,4795,1.0,1,975,3,8.0,31,...,no,no,yes,yes,yes,no,yes,Greenwich Village,All Downtown,Manhattan


In [20]:
# From what can be seen above, there is an unwanted index column and this can be gotten rid of
rental_df = pd.read_csv('new_rental', index_col =0)

In [21]:
rental_df.head()

Unnamed: 0,rental_id,building_id,rent,bedrooms,bathrooms,size_sqft,min_to_subway,floor,building_age_yrs,no_fee,has_roofdeck,has_washer_dryer,has_doorman,has_elevator,has_dishwasher,has_patio,has_gym,neighborhood,submarket,borough
0,1545,44518357,2550,0.0,1,480,9,2.0,17,yes,yes,no,no,yes,yes,no,yes,Upper East Side,All Upper East Side,Manhattan
1,2472,94441623,11500,2.0,2,2000,4,1.0,96,no,no,no,no,no,no,no,no,Greenwich Village,All Downtown,Manhattan
2,10234,87632265,3000,3.0,1,1000,4,1.0,106,no,no,no,no,no,no,no,no,Astoria,Northwest Queens,Queens
3,2919,76909719,4500,1.0,1,916,2,51.0,29,no,yes,no,yes,yes,yes,no,no,Midtown,All Midtown,Manhattan
4,2790,92953520,4795,1.0,1,975,3,8.0,31,no,no,no,yes,yes,yes,no,yes,Greenwich Village,All Downtown,Manhattan


In [22]:
rental_df.tail()

Unnamed: 0,rental_id,building_id,rent,bedrooms,bathrooms,size_sqft,min_to_subway,floor,building_age_yrs,no_fee,has_roofdeck,has_washer_dryer,has_doorman,has_elevator,has_dishwasher,has_patio,has_gym,neighborhood,submarket,borough
4995,1964,73060494,2650,1.0,1,686,9,4.0,3,yes,no,no,no,no,no,no,no,Astoria,Northwest Queens,Queens
4996,5686,92994390,6675,2.0,2,988,5,10.0,9,yes,yes,yes,yes,yes,yes,no,yes,Tribeca,All Downtown,Manhattan
4997,9679,7689663,1699,0.0,1,250,2,5.0,96,no,no,no,no,no,no,no,no,Little Italy,All Downtown,Manhattan
4998,5188,62828354,3475,1.0,1,651,6,5.0,14,yes,no,yes,yes,yes,yes,no,yes,Midtown West,All Midtown,Manhattan
4999,4718,67659586,4500,1.0,1,816,4,11.0,9,no,yes,yes,yes,yes,no,yes,yes,Tribeca,All Downtown,Manhattan


In [23]:
rental_df.sample(10)

Unnamed: 0,rental_id,building_id,rent,bedrooms,bathrooms,size_sqft,min_to_subway,floor,building_age_yrs,no_fee,has_roofdeck,has_washer_dryer,has_doorman,has_elevator,has_dishwasher,has_patio,has_gym,neighborhood,submarket,borough
4718,6909,87356164,3988,1.0,1,630,12,14.0,4,yes,no,no,no,no,no,no,no,Midtown West,All Midtown,Manhattan
2285,2250,62828586,11000,2.0,3,1607,5,8.0,88,no,yes,yes,yes,yes,yes,no,yes,Upper East Side,All Upper East Side,Manhattan
4266,6190,79496356,3995,1.5,1,900,12,24.0,36,yes,no,no,no,yes,yes,no,no,Upper East Side,All Upper East Side,Manhattan
2651,4994,15555409,2300,1.0,1,500,2,4.0,116,no,no,no,no,no,no,no,no,Upper East Side,All Upper East Side,Manhattan
3000,6796,11790470,1825,1.0,1,773,13,5.0,60,yes,no,no,no,no,yes,no,no,Forest Hills,Central Queens,Queens
2798,4704,73605280,3675,1.0,1,850,4,4.0,76,no,yes,yes,yes,no,yes,no,yes,East Village,All Downtown,Manhattan
4020,3679,95366382,1800,2.0,1,920,5,4.0,62,no,no,no,no,no,no,no,no,Rego Park,Central Queens,Queens
630,2961,57174723,2600,0.0,1,570,2,19.0,13,no,no,no,no,no,no,no,no,Financial District,All Downtown,Manhattan
4034,2910,88062922,3800,1.0,1,700,10,9.0,9,yes,no,yes,no,no,yes,no,no,Gramercy Park,All Downtown,Manhattan
3215,2134,6434691,15000,3.0,2,1700,6,7.0,12,no,no,no,no,no,no,no,no,Lower East Side,All Downtown,Manhattan


In [28]:
list(rental_df.columns)

['rental_id',
 'building_id',
 'rent',
 'bedrooms',
 'bathrooms',
 'size_sqft',
 'min_to_subway',
 'floor',
 'building_age_yrs',
 'no_fee',
 'has_roofdeck',
 'has_washer_dryer',
 'has_doorman',
 'has_elevator',
 'has_dishwasher',
 'has_patio',
 'has_gym',
 'neighborhood',
 'submarket',
 'borough']

As stated in the scope, the goal of this notebook is to got through each column and analyze the data, see what information can be gleaned by taking a look. 

## `rental_id`

In [32]:
rental_df.rental_id.dtype

dtype('int64')

In [34]:
rental_df.rental_id.nunique()

5000

What the code in the cells above have shown is that there are 5000 individual unique `rental_id` values. It seems as though this column is simply there in order to identify the apartment buildings. My assumption is that each apartment has a rental id for administrative purposes

## `building_id`