## Introduction
In this lesson, we will practice filtering, sorting and grouping data.  We'll also cover method chaining which you might encounter when you're reading code.

## Important Functions
When cleaning data, filtering is a powerful tool in Pandas to remove unwanted or incorrect data.  

In the following example, we will look at data from the California housing dataset.  It can be found here: https://www.kaggle.com/datasets/camnugent/california-housing-prices and it is also included in the resources with this lecture.

In [None]:
import pandas as pd
df = pd.read_csv('housing.csv')

## Some important functions

In [None]:
df.info() 

In [None]:
df.describe()

In [None]:
df.nunique()

In [None]:
df.isnull()

In [None]:
df.isnull().sum()

## Filtering

In [None]:
df.loc[:,'total_rooms']

In [None]:
df.loc[df.loc[:,'total_rooms']>25000,:]

## Sorting

In [None]:
df.sort_values("median_house_value")

In [None]:
df.sort_values("median_house_value",ascending=False)

In [None]:
df # sorting wasn't stored.  Use "inplace" to make sort permanent

In [None]:
df.sort_values("median_house_value",ascending=False, inplace=True)
df

## Aggregate Functions 

In [None]:
df.loc[:,"housing_median_age"].mean()

In [None]:
df.loc[:,"housing_median_age"].max()

## Grouping with Aggregate Functions

In [None]:
df.groupby(['ocean_proximity']).mean()

## Method Chaining
- Can save a lot of memory by not creating intermediate variables
- But harder to debug and read

Continuing the last example, use method chaining to find average for just the "total_rooms" column

In [None]:
df.groupby(['ocean_proximity']).mean().loc[:,'total_rooms']

### Another example of method chaining:

In [None]:
df2 = df.loc[:,['total_rooms' , 'households'] ]
df3 = df2.fillna(0)
df4 = df3.head()
df4

In [None]:
df.loc[:,['total_rooms' , 'households'] ].fillna(0).head()