# Inspection, Selection & Filtering

After loading data, we explore and select useful parts of the dataset.

In [1]:
import pandas as pd

sales = pd.read_csv("../data/raw/sales.csv")
sales.head()

Unnamed: 0,order_id,customer_id,product,category,price,quantity,city,date
0,1001,C101,Laptop,Electronics,55000,1,Delhi,2024-01-05
1,1002,C102,Phone,Electronics,20000,2,Mumbai,2024-01-06
2,1003,C103,Shoes,Fashion,3000,1,Pune,2024-01-07
3,1004,C101,Headphones,Electronics,2000,3,Delhi,2024-01-07
4,1005,C104,Tshirt,Fashion,800,2,Bangalore,2024-01-08


## Selecting Columns

In [2]:
sales["product"]

0        Laptop
1         Phone
2         Shoes
3    Headphones
4        Tshirt
5         Watch
6        Laptop
7      Backpack
8         Phone
9         Shoes
Name: product, dtype: str

In [3]:
sales[["product", "price"]]

Unnamed: 0,product,price
0,Laptop,55000
1,Phone,20000
2,Shoes,3000
3,Headphones,2000
4,Tshirt,800
5,Watch,2500
6,Laptop,60000
7,Backpack,1500
8,Phone,18000
9,Shoes,3500


## Selecting Rows

In [4]:
sales.loc[0]
sales.loc[2]

order_id             1003
customer_id          C103
product             Shoes
category          Fashion
price                3000
quantity                1
city                 Pune
date           2024-01-07
Name: 2, dtype: object

In [5]:
sales.iloc[0]
sales.iloc[1]

order_id              1002
customer_id           C102
product              Phone
category       Electronics
price                20000
quantity                 2
city                Mumbai
date            2024-01-06
Name: 1, dtype: object

## Conditional Filtering

In [6]:
sales[sales["price"] > 5000]

Unnamed: 0,order_id,customer_id,product,category,price,quantity,city,date
0,1001,C101,Laptop,Electronics,55000,1,Delhi,2024-01-05
1,1002,C102,Phone,Electronics,20000,2,Mumbai,2024-01-06
6,1007,C102,Laptop,Electronics,60000,1,Mumbai,2024-01-10
8,1009,C103,Phone,Electronics,18000,1,Pune,2024-01-11


In [7]:
sales[(sales["city"] == "Delhi") & (sales["price"] > 2000)]

Unnamed: 0,order_id,customer_id,product,category,price,quantity,city,date
0,1001,C101,Laptop,Electronics,55000,1,Delhi,2024-01-05


## Using Query Method

In [8]:
sales.query("price > 5000")

Unnamed: 0,order_id,customer_id,product,category,price,quantity,city,date
0,1001,C101,Laptop,Electronics,55000,1,Delhi,2024-01-05
1,1002,C102,Phone,Electronics,20000,2,Mumbai,2024-01-06
6,1007,C102,Laptop,Electronics,60000,1,Mumbai,2024-01-10
8,1009,C103,Phone,Electronics,18000,1,Pune,2024-01-11


In [9]:
sales.query("city == 'Delhi'")

Unnamed: 0,order_id,customer_id,product,category,price,quantity,city,date
0,1001,C101,Laptop,Electronics,55000,1,Delhi,2024-01-05
3,1004,C101,Headphones,Electronics,2000,3,Delhi,2024-01-07
7,1008,C106,Backpack,Accessories,1500,2,Delhi,2024-01-10


## Sorting Data

In [10]:
sales.sort_values("price")

Unnamed: 0,order_id,customer_id,product,category,price,quantity,city,date
4,1005,C104,Tshirt,Fashion,800,2,Bangalore,2024-01-08
7,1008,C106,Backpack,Accessories,1500,2,Delhi,2024-01-10
3,1004,C101,Headphones,Electronics,2000,3,Delhi,2024-01-07
5,1006,C105,Watch,Accessories,2500,1,Chennai,2024-01-09
2,1003,C103,Shoes,Fashion,3000,1,Pune,2024-01-07
9,1010,C104,Shoes,Fashion,3500,1,Bangalore,2024-01-11
8,1009,C103,Phone,Electronics,18000,1,Pune,2024-01-11
1,1002,C102,Phone,Electronics,20000,2,Mumbai,2024-01-06
0,1001,C101,Laptop,Electronics,55000,1,Delhi,2024-01-05
6,1007,C102,Laptop,Electronics,60000,1,Mumbai,2024-01-10


In [11]:
sales.sort_values("price", ascending=False)

Unnamed: 0,order_id,customer_id,product,category,price,quantity,city,date
6,1007,C102,Laptop,Electronics,60000,1,Mumbai,2024-01-10
0,1001,C101,Laptop,Electronics,55000,1,Delhi,2024-01-05
1,1002,C102,Phone,Electronics,20000,2,Mumbai,2024-01-06
8,1009,C103,Phone,Electronics,18000,1,Pune,2024-01-11
9,1010,C104,Shoes,Fashion,3500,1,Bangalore,2024-01-11
2,1003,C103,Shoes,Fashion,3000,1,Pune,2024-01-07
5,1006,C105,Watch,Accessories,2500,1,Chennai,2024-01-09
3,1004,C101,Headphones,Electronics,2000,3,Delhi,2024-01-07
7,1008,C106,Backpack,Accessories,1500,2,Delhi,2024-01-10
4,1005,C104,Tshirt,Fashion,800,2,Bangalore,2024-01-08


## Conclusion

Filtering and selecting data is a core step before cleaning and analysis.