# Essential Pandas Operations

This lecture covers various useful pandas operations that don't fall into distinct categories but are essential for data manipulation.



**Example DataFrame**

In [5]:
import pandas as pd
inventory = pd.DataFrame({'item_id':[101,102,103,104],
                         'warehouse_qty':[45,78,33,91],
                         'category':['electronics','furniture','electronics','office']})
inventory

Unnamed: 0,item_id,warehouse_qty,category
0,101,45,electronics
1,102,78,furniture
2,103,33,electronics
3,104,91,office


## Info on Unique Values

| Operation               | Code Example                              | Description                                      |
|-------------------------|-------------------------------------------|--------------------------------------------------|
| Get unique values       | `inventory['category'].unique()`          | Returns array of unique categories               |
| Count unique values     | `inventory['category'].nunique()`         | Returns count of distinct categories             |
| Value counts            | `inventory['warehouse_qty'].value_counts()` | Returns frequency of each quantity              |

In [17]:
inventory['category'].unique()

array(['electronics', 'furniture', 'office'], dtype=object)

In [19]:
inventory['category'].nunique()

3

In [21]:
inventory['warehouse_qty'].value_counts()

warehouse_qty
45    1
78    1
33    1
91    1
Name: count, dtype: int64

## Selecting Data

In [28]:
# Select items with multiple conditions
high_stock = inventory[(inventory['warehouse_qty']>40) & 
                      (inventory['category']=='electronics')]

In [30]:
high_stock

Unnamed: 0,item_id,warehouse_qty,category
0,101,45,electronics


## Applying Functions

In [34]:
def add_tax(price):
    return price * 1.08  # Adding 8% tax

# Create price column
inventory['price'] = [199.99, 149.50, 299.00, 39.99]

# Apply custom function
inventory['price_with_tax'] = inventory['price'].apply(add_tax)

# Apply built-in function
inventory['category_length'] = inventory['category'].apply(len)

In [36]:
inventory

Unnamed: 0,item_id,warehouse_qty,category,price,price_with_tax,category_length
0,101,45,electronics,199.99,215.9892,11
1,102,78,furniture,149.5,161.46,9
2,103,33,electronics,299.0,322.92,11
3,104,91,office,39.99,43.1892,6


## Column Operations


** Permanently Removing a Column**

In [39]:
del inventory['category_length']


In [41]:
inventory

Unnamed: 0,item_id,warehouse_qty,category,price,price_with_tax
0,101,45,electronics,199.99,215.9892
1,102,78,furniture,149.5,161.46
2,103,33,electronics,299.0,322.92
3,104,91,office,39.99,43.1892


** Get column and index names: **

In [43]:
inventory.columns


Index(['item_id', 'warehouse_qty', 'category', 'price', 'price_with_tax'], dtype='object')

In [45]:
inventory.index


RangeIndex(start=0, stop=4, step=1)

## Sorting Data


In [48]:
# Sort by multiple columns
inventory.sort_values(by=['category','warehouse_qty'], ascending=[True,False])

Unnamed: 0,item_id,warehouse_qty,category,price,price_with_tax
0,101,45,electronics,199.99,215.9892
2,103,33,electronics,299.0,322.92
1,102,78,furniture,149.5,161.46
3,104,91,office,39.99,43.1892


## Pivot Tables


In [51]:
sales_data = pd.DataFrame({
    'region':['East','East','West','West','North','North'],
    'product':['A','B','A','B','A','B'],
    'revenue':[2500,1800,3200,2100,2900,1500],
    'units_sold':[125,90,160,105,145,75]
})

# Create pivot table
sales_data.pivot_table(values='revenue', 
                      index='region', 
                      columns='product', 
                      aggfunc='sum',
                      margins=True)

product,A,B,All
region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
East,2500,1800,4300
North,2900,1500,4400
West,3200,2100,5300
All,8600,5400,14000


# Quick Reference Table

| Operation        | Example Use Case            | Syntax Example                                  |
|------------------|-----------------------------|------------------------------------------------|
| Unique values    | Find all product categories | `df['category'].unique()`                     |
| Value counts     | Count orders by status      | `df['status'].value_counts()`                 |
| Apply function   | Clean text data             | `df['text'].apply(str.lower)`                 |
| Sort values      | Sort products by price      | `df.sort_values('price', ascending=False)`     |
| Handle missing   | Fill empty dates            | `df['date'].fillna(pd.Timestamp('today'))`    |
| Pivot table      | Analyze sales by region     | `df.pivot_table(values='sales', index='region')` |