# SLU2 - Subsetting data in Pandas: Examples notebook

In this example notebook we will be working with airbnb's listing dataset to over the concepts learned in this unit:



In [None]:
import pandas as pd

# This is an option to preview less rows in the notebook's cells' outputs
pd.options.display.max_rows = 10

## 1 - Using the index 

### Read airbnb_input data with neighborhood as index

In [None]:
# Read the data in file airbnb_input.csv into a pandas DataFrame and use column room_id as the DataFrame index.
df = pd.read_csv('data/airbnb_input.csv', index_col='neighborhood')

# Preview the first rows of the DataFrame.
df.head()

### Sort index alphabetically (descending)

In [None]:
df = df.sort_index(ascending=False)
df

### Reset neighborhood index, keeping it as a column, and setting host_id as index

In [None]:
df = df.reset_index(drop=False).set_index('host_id').sort_index()
df

### Add host with id 1  

In [None]:
df.loc[1] = ['Alvalade',567,'Private room',2,4.5,2,1,24]
df

As we can see the our new host in the last poistion of our dataframe. This means that is no longer sorted along the index.

In [None]:
df = df.sort_index()

### Select last 7 rows

In [None]:
df[-7:]

In [None]:
df.iloc[-7:]

### Select between positions 25 and 33

In [None]:
df[25:33]

In [None]:
df.iloc[25:33]

### Select between host 1 and 150000

In [None]:
df.loc[1:15000]

### Select columns reviews and price for hosts with id between 150000 and 600000

In [None]:
df.loc[15000:60000,['reviews','price']]

### Update reviews to value 5 for for room_id 33348. 

In [None]:
df.loc[df.room_id==33348,'reviews']=5
df.loc[df.room_id==33348,:]

### Drop *host_id* 1 and column *accommodates*

In [None]:
# Drop row
df_new = df.drop(labels=1)
# Drop column
df_new = df_new.drop(columns='accommodates')
# Show header
df_new.head()