# Pandas Qcut and Cut

#### Often, with regression tasks (such as predicting the price of an Airbnb listing, in this example), it may be helpful to discretize the price range into several quantiles (buckets of equal size). This can increase accuracy and reduce model uncertainty in the case that quantiles are acceptable.

In [14]:
import pandas as pd
import numpy as np

In [15]:
airbnb = pd.read_csv('ny_airbnb_data/AB_NYC_2019.csv')
airbnb.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


### Say we’d like to discretize the price variable into four quantiles. pandas will automatically split the price range into any number of quantiles you specify. Note that pandas splits the price range into buckets such that there is an equal number of items in each bucket.

In [16]:
airbnb1 = airbnb.copy()
airbnb1['price_bins'] = pd.qcut(airbnb1['price'], 4)          #second number specifies num of quantiles
airbnb1.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,price_bins
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365,"(106.0, 175.0]"
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355,"(175.0, 10000.0]"
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365,"(106.0, 175.0]"
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194,"(69.0, 106.0]"
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0,"(69.0, 106.0]"


In [17]:
airbnb1['price_bins'][0:10]

0      (106.0, 175.0]
1    (175.0, 10000.0]
2      (106.0, 175.0]
3       (69.0, 106.0]
4       (69.0, 106.0]
5    (175.0, 10000.0]
6      (-0.001, 69.0]
7       (69.0, 106.0]
8       (69.0, 106.0]
9      (106.0, 175.0]
Name: price_bins, dtype: category
Categories (4, interval[float64]): [(-0.001, 69.0] < (69.0, 106.0] < (106.0, 175.0] < (175.0, 10000.0]]

### Cut   - Binning data

In [22]:
airbnb2 = airbnb.copy()
bins = (0, 69, 106, 175, 10000)
price_status = ['Low', 'Medium', 'High', 'VeryHigh']

airbnb2['price_group'] = pd.cut(airbnb2['price'], bins = bins, labels = price_status)
airbnb2.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,price_group
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365,High
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355,VeryHigh
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365,High
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194,Medium
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0,Medium
