# Pandas Cheat Sheet

In [2]:
import pandas as pd

## Creating DataFrames

### From Scratch

In [5]:
# Creating sample data for demonstration
employees = pd.DataFrame({
    'id': [1, 2, 3, 4, 5],
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'age': [25, 30, 35, 40, 22],
    'salary': [50000, 60000, 70000, 80000, 45000],
    'department': ['HR', 'IT', 'Finance', 'IT', 'HR']
})

performance = pd.DataFrame({
    'id': [3, 4, 5, 6],
    'name': ['Charlie', 'David', 'Eve', 'Frank'],
    'performance_score': [90, 85, 88, 75]
})
# Display dataframes and shape
print(employees.shape)
display(employees)
print(performance.shape)
display(performance)

(5, 5)


Unnamed: 0,id,name,age,salary,department
0,1,Alice,25,50000,HR
1,2,Bob,30,60000,IT
2,3,Charlie,35,70000,Finance
3,4,David,40,80000,IT
4,5,Eve,22,45000,HR


(4, 3)


Unnamed: 0,id,name,performance_score
0,3,Charlie,90
1,4,David,85
2,5,Eve,88
3,6,Frank,75


### From a CSV/JSON

## Querying Dataframes

### iloc: integer-based indexing 
df.iloc[row_index, column_index]
- row_index: Integer(s) representing row position(s)
- column_index: Integer(s) representing column position(s)
- if only 1 value passsed, assumed as a row

| Usage                                      | Example                          |
|--------------------------------------------|----------------------------------|
| Select a single row                        | `employees.iloc[2]`              |
| Select multiple rows                       | `employees.iloc[[0, 2]]`         |
| Select a row range                         | `employees.iloc[1:3]`            |
| Select a single column                     | `employees.iloc[:, 1]`           |
| Select multiple columns                    | `employees.iloc[:, [0, 2]]`      |
| Select a row & column                      | `employees.iloc[0, 1]`           |
| Select row & column ranges                 | `employees.iloc[1:3, 0:2]`       |
| Select the last row                        | `employees.iloc[-1]`             |
| Select last 3 rows & last 2 columns        | `employees.iloc[-3:, -2:]`       |
| Assign a value                             | `employees.iloc[0, 1] = "Updated"` |


In [32]:
employees.iloc[[0]] #Return first row of data - assumes all columns if not specified - the extra [] returns as a dataframe

Unnamed: 0,id,name,age,salary,department
0,1,Alice,25,50000,HR


In [36]:
employees.iloc[:, [3]] # Returns all rows from the 4th column - the extra [] is only needed on column since : calls all rows

Unnamed: 0,salary
0,50000
1,60000
2,70000
3,80000
4,45000


In [38]:
employees.iloc[3,3] # Returns the value at the 4th row and 4th column - excluding the extra [] leaves it was a value

80000

In [39]:
employees.iloc[:, [0, 2]] # Returns all rows for columns 1 and 3

Unnamed: 0,id,age
0,1,25
1,2,30
2,3,35
3,4,40
4,5,22


### loc: label-based indexing
df.loc[row_label, column_label]
- row_label: Row index labels (can be values or conditions)
- column_label: Column names
- if only 1 value passed, assumes it's a row

| Usage                                      | Example                          |
|--------------------------------------------|----------------------------------|
| Select a single row                        | `employees.loc[2]`               |
| Select multiple rows                       | `employees.loc[[0, 2]]`          |
| Select a row range                         | `employees.loc[1:3]`             |
| Select a single column                     | `employees.loc[:, 'ColumnName']` |
| Select multiple columns                    | `employees.loc[:, ['Col1', 'Col2']]` |
| Select a row & column                      | `employees.loc[0, 'ColumnName']` |
| Select row & column ranges                 | `employees.loc[1:3, 'Col1':'Col2']` |
| Select the last row                        | `employees.loc[df.index[-1]]`    |
| Select last 3 rows & last 2 columns        | `employees.loc[df.index[-3:], df.columns[-2:]]` |
| Assign a value                             | `employees.loc[0, 'ColumnName'] = "Updated"` |

### Apply

### Regex

| **Symbol**        | **Meaning**                          |
|-------------------|--------------------------------------|
| `^`               | Start of string                      |
| `$`               | End of string                        |
| `.`               | Any character                        |
| `\d`              | Digit (0–9)                          |
| `\D`              | Non-digit                            |
| `\w`              | Word character (letters, digits, _)   |
| `\s`              | Whitespace                           |
| `*`               | 0 or more occurrences                |
| `+`               | 1 or more occurrences                |
| `?`               | 0 or 1 occurrence                    |


### Group By

## Joining DataFrames

## Organizing Results

### Drop Duplicates

### Rename Columns

### Sort Values