

## Quick Reference Summary

### Most Common Operations by Category

**Data Import/Export:** `read_csv()`, `read_excel()`, `to_csv()`, `to_excel()`, `read_sql()`, `to_sql()`

**Data Inspection:** `head()`, `tail()`, `info()`, `describe()`, `shape`, `dtypes`, `columns`, `index`

**Selection:** `loc[]`, `iloc[]`, `[]`, `query()`, `at[]`, `iat[]`

**Cleaning:** `dropna()`, `fillna()`, `drop_duplicates()`, `replace()`, `rename()`

**Transformation:** `apply()`, `map()`, `applymap()`, `assign()`, `pipe()`

**Aggregation:** `groupby()`, `agg()`, `pivot_table()`, `crosstab()`

**Combining:** `merge()`, `join()`, `concat()`, `append()`

**Reshaping:** `pivot()`, `melt()`, `stack()`, `unstack()`, `transpose()`

**String Operations:** `.str.` methods (lower, upper, contains, replace, split, etc.)

**DateTime:** `.dt.` methods (year, month, day, strftime, etc.), `to_datetime()`

**Statistics:** `mean()`, `median()`, `sum()`, `std()`, `corr()`, `describe()`

**Sorting:** `sort_values()`, `sort_index()`, `rank()`

**Window Functions:** `rolling()`, `expanding()`, `ewm()`

---

## Tips for Learning Pandas Efficiently

1. **Start with Level 1** - Master basic operations before moving to advanced features
2. **Practice with real datasets** - Use Kaggle datasets or create your own
3. **Use method chaining** - Makes code more readable and efficient
4. **Learn one category at a time** - Don't try to memorize everything at once
5. **Understand the difference between** Series and DataFrame operations
6. **Always check the documentation** - `help(df.function)` or `df.function?` in Jupyter
7. **Use vectorized operations** - Avoid loops whenever possible for performance
8. **Experiment in Jupyter notebooks** - Great for learning and testing
9. **Learn keyboard shortcuts** - `Tab` for autocomplete, `Shift+Tab` for documentation
10. **Practice, practice, practice** - The more you use it, the more natural it becomes


# Complete Pandas Reference Guide - Level 1 to 3

## Table of Contents
- [Level 1: Beginner Functions](#level-1-beginner-functions)
- [Level 2: Intermediate Functions](#level-2-intermediate-functions)
- [Level 3: Advanced Functions](#level-3-advanced-functions)

---

## Level 1: Beginner Functions

### Importing Pandas
```python
import pandas as pd
import numpy as np
```

### Creating DataFrames and Series

#### `pd.DataFrame()`
Creates a DataFrame from various data structures.
```python
# From dictionary
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# From list of lists
df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])

# From numpy array
df = pd.DataFrame(np.array([[1, 2], [3, 4]]), columns=['A', 'B'])
```

#### `pd.Series()`
Creates a one-dimensional labeled array.
```python
s = pd.Series([1, 2, 3, 4, 5])
s = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
```

### Reading and Writing Data

#### `pd.read_csv()`
Reads a CSV file into a DataFrame.
```python
df = pd.read_csv('file.csv')
df = pd.read_csv('file.csv', sep=';', header=0, index_col=0)
```

#### `pd.read_excel()`
Reads an Excel file into a DataFrame.
```python
df = pd.read_excel('file.xlsx', sheet_name='Sheet1')
```

#### `pd.read_json()`
Reads a JSON file into a DataFrame.
```python
df = pd.read_json('file.json')
```

#### `df.to_csv()`
Writes DataFrame to a CSV file.
```python
df.to_csv('output.csv', index=False)
```

#### `df.to_excel()`
Writes DataFrame to an Excel file.
```python
df.to_excel('output.xlsx', sheet_name='Sheet1', index=False)
```

#### `df.to_json()`
Writes DataFrame to a JSON file.
```python
df.to_json('output.json')
```

### Basic DataFrame Information

#### `df.head()`
Returns the first n rows (default 5).
```python
df.head()      # First 5 rows
df.head(10)    # First 10 rows
```

#### `df.tail()`
Returns the last n rows (default 5).
```python
df.tail()      # Last 5 rows
df.tail(10)    # Last 10 rows
```

#### `df.shape`
Returns a tuple of (rows, columns).
```python
rows, cols = df.shape
```

#### `df.info()`
Displays concise summary of DataFrame including data types and memory usage.
```python
df.info()
```

#### `df.describe()`
Generates descriptive statistics for numerical columns.
```python
df.describe()                    # Numeric columns only
df.describe(include='all')       # All columns
```

#### `df.columns`
Returns column labels.
```python
columns = df.columns
df.columns = ['new_col1', 'new_col2']  # Rename columns
```

#### `df.index`
Returns the index (row labels).
```python
index = df.index
df.index = range(1, len(df) + 1)  # Reset index
```

#### `df.dtypes`
Returns the data type of each column.
```python
types = df.dtypes
```

### Selecting Data

#### `df['column']` or `df.column`
Selects a single column (returns Series).
```python
col = df['column_name']
col = df.column_name
```

#### `df[['col1', 'col2']]`
Selects multiple columns (returns DataFrame).
```python
subset = df[['col1', 'col2', 'col3']]
```

#### `df.loc[]`
Label-based indexing for rows and columns.
```python
df.loc[0]                      # Single row by label
df.loc[0:5]                    # Multiple rows
df.loc[:, 'column']            # All rows, one column
df.loc[0:5, ['col1', 'col2']]  # Rows and columns
```

#### `df.iloc[]`
Integer position-based indexing.
```python
df.iloc[0]                     # First row
df.iloc[0:5]                   # First 5 rows
df.iloc[:, 0]                  # All rows, first column
df.iloc[0:5, 0:3]              # Rows 0-4, columns 0-2
```

#### `df[df['column'] > value]`
Boolean indexing to filter rows.
```python
df[df['age'] > 30]
df[(df['age'] > 30) & (df['city'] == 'NYC')]
```

### Basic Operations

#### `df.drop()`
Removes rows or columns.
```python
df.drop('column_name', axis=1)           # Drop column
df.drop(['col1', 'col2'], axis=1)        # Drop multiple columns
df.drop([0, 1, 2], axis=0)               # Drop rows by index
df.drop('column_name', axis=1, inplace=True)  # Modify in place
```

#### `df.rename()`
Renames columns or index labels.
```python
df.rename(columns={'old_name': 'new_name'})
df.rename(columns={'col1': 'new1', 'col2': 'new2'})
df.rename(index={0: 'row1', 1: 'row2'})
```

#### `df.sort_values()`
Sorts DataFrame by values.
```python
df.sort_values('column')                         # Ascending
df.sort_values('column', ascending=False)        # Descending
df.sort_values(['col1', 'col2'], ascending=[True, False])
```

#### `df.sort_index()`
Sorts by index labels.
```python
df.sort_index()
df.sort_index(ascending=False)
```

#### `df.reset_index()`
Resets the index to default integer index.
```python
df.reset_index(drop=True)          # Drop old index
df.reset_index()                   # Keep old index as column
```

#### `df.set_index()`
Sets one or more columns as the index.
```python
df.set_index('column_name')
df.set_index(['col1', 'col2'])
```

### Handling Missing Data

#### `df.isna()` or `df.isnull()`
Detects missing values (returns boolean DataFrame).
```python
df.isna()
df['column'].isna()
```

#### `df.notna()` or `df.notnull()`
Detects non-missing values.
```python
df.notna()
df['column'].notna()
```

#### `df.dropna()`
Removes rows or columns with missing values.
```python
df.dropna()                    # Drop rows with any NaN
df.dropna(axis=1)              # Drop columns with any NaN
df.dropna(how='all')           # Drop rows where all are NaN
df.dropna(thresh=2)            # Keep rows with at least 2 non-NaN
df.dropna(subset=['col1'])     # Drop based on specific columns
```

#### `df.fillna()`
Fills missing values.
```python
df.fillna(0)                           # Fill with 0
df.fillna(method='ffill')              # Forward fill
df.fillna(method='bfill')              # Backward fill
df.fillna(df.mean())                   # Fill with mean
df['column'].fillna(value)             # Fill specific column
```

### Basic Statistics

#### `df.mean()`
Calculates mean of numeric columns.
```python
df.mean()                  # Mean of all numeric columns
df['column'].mean()        # Mean of specific column
df.mean(axis=1)            # Mean across rows
```

#### `df.median()`
Calculates median.
```python
df.median()
df['column'].median()
```

#### `df.sum()`
Calculates sum.
```python
df.sum()
df['column'].sum()
df.sum(axis=1)             # Sum across rows
```

#### `df.min()` and `df.max()`
Finds minimum and maximum values.
```python
df.min()
df.max()
df['column'].min()
```

#### `df.std()`
Calculates standard deviation.
```python
df.std()
df['column'].std()
```

#### `df.var()`
Calculates variance.
```python
df.var()
df['column'].var()
```

#### `df.count()`
Counts non-null values.
```python
df.count()
df['column'].count()
```

#### `df.value_counts()`
Counts unique values in a Series.
```python
df['column'].value_counts()
df['column'].value_counts(normalize=True)  # As proportions
```

#### `df.unique()`
Returns unique values in a Series.
```python
df['column'].unique()
```

#### `df.nunique()`
Counts unique values.
```python
df['column'].nunique()
df.nunique()
```

---


## Level 2: Intermediate Functions

### Data Manipulation

#### `df.apply()`
Applies a function along an axis.
```python
df['column'].apply(lambda x: x * 2)
df.apply(lambda x: x.max() - x.min())      # Apply to columns
df.apply(lambda x: x.max() - x.min(), axis=1)  # Apply to rows
```

#### `df.applymap()`
Applies a function element-wise (deprecated, use `df.map()` or `df.apply()`).
```python
df.applymap(lambda x: x * 2)
# Or use df.map() in newer versions
```

#### `df.map()`
Maps values of a Series using a dictionary or function.
```python
df['column'].map({'old': 'new', 'a': 'b'})
df['column'].map(lambda x: x * 2)
```

#### `df.replace()`
Replaces values in DataFrame.
```python
df.replace(0, np.nan)
df.replace([0, 1], [10, 100])
df.replace({'col1': {0: 10}, 'col2': {1: 100}})
df['column'].replace('old', 'new')
```

#### `df.astype()`
Converts data types.
```python
df['column'].astype(int)
df['column'].astype('category')
df.astype({'col1': int, 'col2': str})
```

#### `df.copy()`
Creates a copy of the DataFrame.
```python
df_copy = df.copy()        # Deep copy
df_copy = df.copy(deep=False)  # Shallow copy
```

### String Operations

#### `df['column'].str.lower()`
Converts strings to lowercase.
```python
df['column'].str.lower()
```

#### `df['column'].str.upper()`
Converts strings to uppercase.
```python
df['column'].str.upper()
```

#### `df['column'].str.strip()`
Removes leading/trailing whitespace.
```python
df['column'].str.strip()
df['column'].str.lstrip()   # Left strip
df['column'].str.rstrip()   # Right strip
```

#### `df['column'].str.contains()`
Checks if string contains pattern.
```python
df['column'].str.contains('pattern')
df['column'].str.contains('pattern', case=False)
df['column'].str.contains('pat1|pat2', regex=True)
```

#### `df['column'].str.replace()`
Replaces substring.
```python
df['column'].str.replace('old', 'new')
df['column'].str.replace(r'\d+', '', regex=True)
```

#### `df['column'].str.split()`
Splits strings.
```python
df['column'].str.split(',')
df['column'].str.split(',', expand=True)  # Creates separate columns
```

#### `df['column'].str.startswith()` / `endswith()`
Checks string start/end.
```python
df['column'].str.startswith('prefix')
df['column'].str.endswith('suffix')
```

#### `df['column'].str.len()`
Gets string length.
```python
df['column'].str.len()
```

#### `df['column'].str.slice()`
Slices strings.
```python
df['column'].str.slice(0, 5)  # First 5 characters
df['column'].str[:5]          # Same as above
```

### Date/Time Operations

#### `pd.to_datetime()`
Converts to datetime.
```python
pd.to_datetime(df['date_column'])
pd.to_datetime('2023-01-01')
pd.to_datetime(df['date'], format='%Y-%m-%d')
```

#### `df['date'].dt.year` / `month` / `day`
Extracts date components.
```python
df['date'].dt.year
df['date'].dt.month
df['date'].dt.day
df['date'].dt.hour
df['date'].dt.minute
df['date'].dt.dayofweek    # Monday=0, Sunday=6
df['date'].dt.day_name()   # 'Monday', 'Tuesday', etc.
```

#### `df['date'].dt.strftime()`
Formats datetime as string.
```python
df['date'].dt.strftime('%Y-%m-%d')
df['date'].dt.strftime('%B %d, %Y')
```

#### `pd.date_range()`
Creates a range of dates.
```python
pd.date_range('2023-01-01', '2023-12-31')
pd.date_range('2023-01-01', periods=10)
pd.date_range('2023-01-01', periods=12, freq='M')
```

### Grouping and Aggregation

#### `df.groupby()`
Groups DataFrame by one or more columns.
```python
df.groupby('column')
df.groupby(['col1', 'col2'])
df.groupby('column').mean()
df.groupby('column')['value'].sum()
```

#### `df.agg()` / `df.aggregate()`
Applies multiple aggregation functions.
```python
df.groupby('column').agg(['mean', 'sum', 'count'])
df.groupby('column').agg({'col1': 'mean', 'col2': 'sum'})
df.agg({'col1': ['mean', 'std'], 'col2': 'sum'})
```

#### `df.transform()`
Transforms values while keeping same shape.
```python
df.groupby('column').transform('mean')
df.groupby('column')['value'].transform(lambda x: x - x.mean())
```

#### `df.filter()`
Filters groups based on condition.
```python
df.groupby('column').filter(lambda x: len(x) > 5)
df.groupby('column').filter(lambda x: x['value'].sum() > 100)
```

### Combining DataFrames

#### `pd.concat()`
Concatenates DataFrames along an axis.
```python
pd.concat([df1, df2])                    # Vertical stack
pd.concat([df1, df2], axis=1)            # Horizontal stack
pd.concat([df1, df2], ignore_index=True)
pd.concat([df1, df2], keys=['df1', 'df2'])
```

#### `df.merge()`
Merges DataFrames (SQL-style joins).
```python
pd.merge(df1, df2, on='key')                  # Inner join
pd.merge(df1, df2, on='key', how='left')      # Left join
pd.merge(df1, df2, on='key', how='right')     # Right join
pd.merge(df1, df2, on='key', how='outer')     # Outer join
pd.merge(df1, df2, left_on='key1', right_on='key2')
```

#### `df.join()`
Joins DataFrames on index.
```python
df1.join(df2)
df1.join(df2, how='left')
df1.join(df2, on='key')
```

#### `df.append()`
Appends rows (deprecated, use `pd.concat()` instead).
```python
# Old way (deprecated)
df.append(df2)
# New way
pd.concat([df, df2], ignore_index=True)
```

### Reshaping Data

#### `df.pivot()`
Reshapes data based on column values.
```python
df.pivot(index='date', columns='category', values='value')
```

#### `df.pivot_table()`
Creates a pivot table with aggregation.
```python
df.pivot_table(values='value', index='row', columns='col')
df.pivot_table(values='value', index='row', columns='col', aggfunc='mean')
df.pivot_table(values='value', index='row', columns='col', fill_value=0)
```

#### `df.melt()`
Unpivots DataFrame from wide to long format.
```python
df.melt(id_vars=['id'], value_vars=['col1', 'col2'])
df.melt(id_vars='id', var_name='variable', value_name='value')
```

#### `df.stack()` and `df.unstack()`
Pivots a level of column/row labels.
```python
df.stack()      # Columns to rows
df.unstack()    # Rows to columns
df.unstack(level=0)
```

#### `df.transpose()` or `df.T`
Transposes DataFrame (swap rows and columns).
```python
df.T
df.transpose()
```

### Duplicate Handling

#### `df.duplicated()`
Identifies duplicate rows.
```python
df.duplicated()
df.duplicated(subset=['column'])
df.duplicated(keep='first')    # Mark duplicates except first
df.duplicated(keep='last')     # Mark duplicates except last
df.duplicated(keep=False)      # Mark all duplicates
```

#### `df.drop_duplicates()`
Removes duplicate rows.
```python
df.drop_duplicates()
df.drop_duplicates(subset=['column'])
df.drop_duplicates(keep='first')
```

### Binning and Discretization

#### `pd.cut()`
Bins values into discrete intervals.
```python
pd.cut(df['age'], bins=3)
pd.cut(df['age'], bins=[0, 18, 65, 100], labels=['Child', 'Adult', 'Senior'])
```

#### `pd.qcut()`
Bins based on quantiles.
```python
pd.qcut(df['value'], q=4)    # Quartiles
pd.qcut(df['value'], q=10)   # Deciles
```

### Statistical Functions

#### `df.corr()`
Computes correlation matrix.
```python
df.corr()
df.corr(method='pearson')
df.corr(method='spearman')
```

#### `df.cov()`
Computes covariance matrix.
```python
df.cov()
```

#### `df.rank()`
Ranks values.
```python
df['column'].rank()
df['column'].rank(ascending=False)
df['column'].rank(method='dense')
```

#### `df.pct_change()`
Computes percentage change.
```python
df['column'].pct_change()
df.pct_change(periods=2)
```

#### `df.diff()`
Computes first discrete difference.
```python
df['column'].diff()
df.diff(periods=2)
```

#### `df.cumsum()` / `df.cumprod()`
Computes cumulative sum/product.
```python
df['column'].cumsum()
df['column'].cumprod()
df.cumsum()
```

#### `df.cummax()` / `df.cummin()`
Computes cumulative maximum/minimum.
```python
df['column'].cummax()
df['column'].cummin()
```

---




## Level 3: Advanced Functions

### Advanced Grouping

#### `df.groupby().size()`
Returns size of each group.
```python
df.groupby('column').size()
```

#### `df.groupby().nth()`
Takes nth value from each group.
```python
df.groupby('column').nth(0)    # First row of each group
df.groupby('column').nth(-1)   # Last row of each group
```

#### `df.groupby().cumsum()` / `cumcount()`
Cumulative operations within groups.
```python
df.groupby('column')['value'].cumsum()
df.groupby('column').cumcount()
```

#### `df.groupby().rolling()`
Rolling window calculations within groups.
```python
df.groupby('category')['value'].rolling(window=3).mean()
```

#### `df.groupby().expanding()`
Expanding window calculations within groups.
```python
df.groupby('category')['value'].expanding().mean()
```

### Window Functions

#### `df.rolling()`
Creates rolling window calculations.
```python
df['column'].rolling(window=3).mean()
df['column'].rolling(window=3, min_periods=1).sum()
df.rolling(window=5).agg(['mean', 'std'])
```

#### `df.expanding()`
Creates expanding window calculations.
```python
df['column'].expanding().mean()
df['column'].expanding(min_periods=3).sum()
```

#### `df.ewm()`
Exponentially weighted moving calculations.
```python
df['column'].ewm(span=10).mean()
df['column'].ewm(alpha=0.5).mean()
```

### Multi-Index Operations

#### `pd.MultiIndex.from_tuples()`
Creates MultiIndex from tuples.
```python
index = pd.MultiIndex.from_tuples([('A', 1), ('A', 2), ('B', 1)])
df = pd.DataFrame({'value': [1, 2, 3]}, index=index)
```

#### `pd.MultiIndex.from_product()`
Creates MultiIndex from cartesian product.
```python
index = pd.MultiIndex.from_product([['A', 'B'], [1, 2, 3]])
```

#### `df.xs()`
Cross-section from MultiIndex.
```python
df.xs('A', level=0)
df.xs(('A', 1), level=[0, 1])
```

#### `df.swaplevel()`
Swaps levels in MultiIndex.
```python
df.swaplevel(0, 1)
```

### Advanced Merging

#### `pd.merge_asof()`
Merge on nearest key.
```python
pd.merge_asof(df1, df2, on='date')
pd.merge_asof(df1, df2, on='date', by='category')
```

#### `pd.merge_ordered()`
Merge with optional fill method.
```python
pd.merge_ordered(df1, df2, on='date', fill_method='ffill')
```

### Categorical Data

#### `df['column'].astype('category')`
Converts to categorical data type.
```python
df['column'] = df['column'].astype('category')
```

#### `df['column'].cat.categories`
Gets categories.
```python
categories = df['column'].cat.categories
```

#### `df['column'].cat.codes`
Gets integer codes for categories.
```python
codes = df['column'].cat.codes
```

#### `df['column'].cat.rename_categories()`
Renames categories.
```python
df['column'].cat.rename_categories({'old': 'new'})
```

#### `df['column'].cat.reorder_categories()`
Reorders categories.
```python
df['column'].cat.reorder_categories(['low', 'medium', 'high'])
```

#### `df['column'].cat.add_categories()` / `remove_categories()`
Adds or removes categories.
```python
df['column'].cat.add_categories(['new_cat'])
df['column'].cat.remove_categories(['unwanted'])
```

### Performance Optimization

#### `df.memory_usage()`
Shows memory usage of each column.
```python
df.memory_usage(deep=True)
```

#### `df.select_dtypes()`
Selects columns by data type.
```python
df.select_dtypes(include=['int64', 'float64'])
df.select_dtypes(exclude=['object'])
```

#### `pd.eval()`
Evaluates expression efficiently.
```python
df.eval('C = A + B')
result = pd.eval('df1 + df2')
```

#### `df.query()`
Filters using expression string (more efficient).
```python
df.query('age > 30')
df.query('age > 30 and city == "NYC"')
df.query('age > @threshold')  # Use variable
```

### Advanced Indexing

#### `df.where()`
Replaces values where condition is False.
```python
df.where(df > 0, 0)  # Replace negative values with 0
df.where(df['age'] > 30, other=0)
```

#### `df.mask()`
Replaces values where condition is True.
```python
df.mask(df < 0, 0)  # Replace negative values with 0
```

#### `df.at[]` and `df.iat[]`
Fast scalar value access.
```python
value = df.at[0, 'column']      # Label-based
value = df.iat[0, 0]            # Position-based
df.at[0, 'column'] = new_value  # Setting value
```

#### `df.get()`
Gets item with default value if not found.
```python
df.get('column', default=None)
```

### Sampling and Random

#### `df.sample()`
Random sampling of rows.
```python
df.sample(n=10)                  # 10 random rows
df.sample(frac=0.1)              # 10% of rows
df.sample(n=5, replace=True)     # With replacement
df.sample(n=10, random_state=42) # Reproducible
```

#### `df.nlargest()` / `df.nsmallest()`
Returns n largest/smallest values.
```python
df.nlargest(10, 'column')
df.nsmallest(5, 'column')
df.nlargest(10, ['col1', 'col2'])
```

### Advanced String Operations

#### `df['column'].str.extract()`
Extracts groups from regex pattern.
```python
df['column'].str.extract(r'(\d+)')
df['column'].str.extract(r'(\w+)-(\d+)', expand=True)
```

#### `df['column'].str.findall()`
Finds all occurrences of pattern.
```python
df['column'].str.findall(r'\d+')
```

#### `df['column'].str.match()`
Checks if string matches pattern.
```python
df['column'].str.match(r'^\d+')
```

#### `df['column'].str.pad()`
Pads strings to specified width.
```python
df['column'].str.pad(10, side='left', fillchar='0')
```

#### `df['column'].str.wrap()`
Wraps long strings.
```python
df['column'].str.wrap(width=50)
```

### Time Series Specific

#### `df.resample()`
Resamples time-series data.
```python
df.resample('D').mean()     # Daily mean
df.resample('W').sum()      # Weekly sum
df.resample('M').last()     # Monthly last value
df.resample('Q').agg(['mean', 'sum'])
```

#### `df.shift()`
Shifts index by desired number of periods.
```python
df['column'].shift(1)       # Shift down 1 row
df['column'].shift(-1)      # Shift up 1 row
df.shift(periods=2, freq='D')  # Shift time index
```

#### `df.tshift()`
Shift time index (deprecated, use `shift` with freq).
```python
df.shift(freq='D')
```

#### `df.asfreq()`
Converts to specified frequency.
```python
df.asfreq('D')
df.asfreq('H', method='ffill')
```

#### `df.between_time()`
Selects values between times of day.
```python
df.between_time('09:00', '17:00')
```

#### `df.at_time()`
Selects values at a particular time of day.
```python
df.at_time('10:30')
```

### Sparse Data

#### `pd.SparseDtype()`
Creates sparse data type.
```python
df['column'] = df['column'].astype(pd.SparseDtype('int', 0))
```

#### `df.sparse.to_dense()`
Converts sparse to dense.
```python
df.sparse.to_dense()
```

### Style and Formatting

#### `df.style.highlight_max()`
Highlights maximum values.
```python
df.style.highlight_max(color='lightgreen')
df.style.highlight_max(subset=['col1', 'col2'])
```

#### `df.style.highlight_min()`
Highlights minimum values.
```python
df.style.highlight_min(color='lightcoral')
```

#### `df.style.background_gradient()`
Applies color gradient.
```python
df.style.background_gradient(cmap='viridis')
df.style.background_gradient(subset=['col1'], cmap='Blues')
```

#### `df.style.format()`
Formats display values.
```python
df.style.format('{:.2f}')
df.style.format({'col1': '{:.2%}', 'col2': '${:.2f}'})
```

#### `df.style.bar()`
Displays bars in cells.
```python
df.style.bar(color='lightblue')
df.style.bar(subset=['col1'], color='#d65f5f')
```

### Advanced IO

#### `pd.read_sql()`
Reads from SQL database.
```python
import sqlite3
conn = sqlite3.connect('database.db')
df = pd.read_sql('SELECT * FROM table', conn)
df = pd.read_sql_query('SELECT * FROM table WHERE id > 5', conn)
```

#### `df.to_sql()`
Writes to SQL database.
```python
df.to_sql('table_name', conn, if_exists='replace')
df.to_sql('table_name', conn, if_exists='append')
```

#### `pd.read_parquet()`
Reads Parquet file.
```python
df = pd.read_parquet('file.parquet')
```

#### `df.to_parquet()`
Writes to Parquet format.
```python
df.to_parquet('file.parquet', compression='gzip')
```

#### `pd.read_hdf()`
Reads HDF5 file.
```python
df = pd.read_hdf('file.h5', 'key')
```

#### `df.to_hdf()`
Writes to HDF5 format.
```python
df.to_hdf('file.h5', key='data', mode='w')
```

#### `pd.read_pickle()`
Reads pickled pandas object.
```python
df = pd.read_pickle('file.pkl')
```

#### `df.to_pickle()`
Writes to pickle format.
```python
df.to_pickle('file.pkl')
```

#### `pd.read_clipboard()`
Reads from clipboard.
```python
df = pd.read_clipboard()
```

#### `df.to_clipboard()`
Writes to clipboard.
```python
df.to_clipboard()
```

### Advanced Aggregation

#### `df.pipe()`
Applies chainable functions.
```python
def func1(df):
    return df[df['value'] > 0]

def func2(df):
    return df.sort_values('value')

result = df.pipe(func1).pipe(func2)
```

#### `df.assign()`
Assigns new columns (chainable).
```python
df.assign(new_col=lambda x: x['col1'] * 2)
df.assign(col1=lambda x: x['col1'] * 2, col2=lambda x: x['col1'] + x['col2'])
```

#### `pd.crosstab()`
Computes cross-tabulation of two or more factors.
```python
pd.crosstab(df['col1'], df['col2'])
pd.crosstab(df['col1'], df['col2'], normalize='all')
pd.crosstab(df['col1'], df['col2'], values=df['col3'], aggfunc='mean')
```

#### `pd.cut()` with advanced options
More sophisticated binning.
```python
pd.cut(df['age'], bins=5, labels=['A', 'B', 'C', 'D', 'E'])
pd.cut(df['age'], bins=[0, 18, 30, 50, 100], right=False)
pd.cut(df['age'], bins=5, retbins=True)  # Returns bins used
```

### Advanced Data Validation

#### `df.isin()`
Checks if values are in a list.
```python
df['column'].isin([1, 2, 3])
df.isin({'col1': [1, 2], 'col2': ['a', 'b']})
```

#### `df.between()`
Checks if values are between bounds.
```python
df['column'].between(10, 20)
df['column'].between(10, 20, inclusive='neither')
```

#### `df.clip()`
Trims values at input thresholds.
```python
df['column'].clip(lower=0, upper=100)
df.clip(lower=0)
```

#### `df.interpolate()`
Fills NaN values using interpolation.
```python
df['column'].interpolate()
df['column'].interpolate(method='linear')
df['column'].interpolate(method='polynomial', order=2)
df['column'].interpolate(method='time')
```

### Advanced Boolean Operations

#### `df.all()`
Checks if all values are True.
```python
df.all()
df['column'].all()
(df > 0).all()
```

#### `df.any()`
Checks if any value is True.
```python
df.any()
df['column'].any()
(df > 100).any()
```

#### `df.equals()`
Checks if two DataFrames are equal.
```python
df1.equals(df2)
```

### Memory Optimization

#### Downcast numeric types
```python
df['int_col'] = pd.to_numeric(df['int_col'], downcast='integer')
df['float_col'] = pd.to_numeric(df['float_col'], downcast='float')
```

#### Convert to categorical for repeated strings
```python
df['category_col'] = df['category_col'].astype('category')
```

#### Use sparse for mostly zero/NaN data
```python
df['sparse_col'] = df['sparse_col'].astype(pd.SparseDtype('float', np.nan))
```

### Advanced Index Operations

#### `df.reindex()`
Conforms DataFrame to new index.
```python
df.reindex([0, 1, 2, 3, 4])
df.reindex(columns=['col1', 'col2'])
df.reindex(index=new_index, fill_value=0)
df.reindex(index=new_index, method='ffill')
```

#### `df.reindex_like()`
Reindexes to match another DataFrame.
```python
df1.reindex_like(df2)
```

#### `df.align()`
Aligns two DataFrames on their axes.
```python
df1_aligned, df2_aligned = df1.align(df2, join='inner')
df1_aligned, df2_aligned = df1.align(df2, join='outer', fill_value=0)
```

### Advanced Apply Operations

#### `df.applymap()` replacement with `map()`
Apply function element-wise (newer versions).
```python
df.map(lambda x: x * 2 if isinstance(x, (int, float)) else x)
```

#### Apply with result_type
```python
df.apply(lambda x: [x.min(), x.max()], result_type='expand')
df.apply(lambda x: pd.Series([x.min(), x.max()]), axis=1)
```

### Advanced Combining

#### `df.combine()`
Combines two DataFrames with element-wise function.
```python
df1.combine(df2, lambda s1, s2: s1 if s1.sum() > s2.sum() else s2)
```

#### `df.combine_first()`
Updates null elements with value from another DataFrame.
```python
df1.combine_first(df2)
```

#### `df.update()`
Updates values in place.
```python
df1.update(df2)
df.update(df2, overwrite=False)
```

### Advanced Iteration

#### `df.items()`
Iterates over (column name, Series) pairs.
```python
for col_name, col_data in df.items():
    print(col_name, col_data.sum())
```

#### `df.iterrows()`
Iterates over (index, Series) pairs for each row.
```python
for idx, row in df.iterrows():
    print(idx, row['column'])
```

#### `df.itertuples()`
Iterates over rows as named tuples (faster than iterrows).
```python
for row in df.itertuples():
    print(row.Index, row.column_name)
```

#### `df.items()` for columns
```python
for name, series in df.items():
    print(f"{name}: {series.mean()}")
```

### JSON Operations

#### `pd.json_normalize()`
Normalizes semi-structured JSON data.
```python
df = pd.json_normalize(json_data)
df = pd.json_normalize(json_data, record_path='items')
df = pd.json_normalize(json_data, record_path='items', meta=['id', 'name'])
```

### Options and Settings

#### `pd.set_option()` / `pd.get_option()`
Sets/gets pandas options.
```python
pd.set_option('display.max_rows', 100)
pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 2)
pd.set_option('display.float_format', '{:.2f}'.format)
pd.get_option('display.max_rows')
```

#### `pd.reset_option()`
Resets option to default.
```python
pd.reset_option('display.max_rows')
pd.reset_option('all')
```

#### Context manager for temporary options
```python
with pd.option_context('display.max_rows', 10, 'display.max_columns', 5):
    print(df)
```

### Advanced Type Conversion

#### `pd.to_numeric()`
Converts to numeric type with error handling.
```python
pd.to_numeric(df['column'], errors='coerce')  # Invalid -> NaN
pd.to_numeric(df['column'], errors='ignore')  # Keep invalid
pd.to_numeric(df['column'], downcast='integer')
```

#### `pd.to_timedelta()`
Converts to timedelta.
```python
pd.to_timedelta(df['duration'])
pd.to_timedelta('1 days 2 hours')
```

#### `df.convert_dtypes()`
Converts to best possible dtypes.
```python
df.convert_dtypes()
```

#### `df.infer_objects()`
Attempts to infer better dtypes for object columns.
```python
df.infer_objects()
```

### Extension Arrays and Nullable Types

#### Nullable integer types
```python
df['col'] = df['col'].astype('Int64')  # Capital I for nullable
df['col'] = df['col'].astype(pd.Int64Dtype())
```

#### Nullable boolean
```python
df['col'] = df['col'].astype('boolean')
df['col'] = df['col'].astype(pd.BooleanDtype())
```

#### String type (more efficient than object)
```python
df['col'] = df['col'].astype('string')
df['col'] = df['col'].astype(pd.StringDtype())
```

### Accessor Methods

#### `.str` accessor
Already covered but includes advanced methods:
```python
df['text'].str.extractall(r'(\d+)')
df['text'].str.get_dummies(sep=',')
df['text'].str.normalize('NFKD')
```

#### `.dt` accessor advanced
```python
df['date'].dt.tz_localize('UTC')
df['date'].dt.tz_convert('US/Eastern')
df['date'].dt.to_period('M')
df['date'].dt.to_timestamp()
df['date'].dt.quarter
df['date'].dt.is_leap_year
df['date'].dt.days_in_month
```

#### `.cat` accessor advanced
```python
df['cat_col'].cat.set_categories(['low', 'med', 'high'], ordered=True)
df['cat_col'].cat.as_ordered()
df['cat_col'].cat.as_unordered()
```

### Advanced Plotting (Basic Integration)

#### `df.plot()`
Basic plotting capabilities.
```python
df.plot()
df.plot(kind='bar')
df.plot(kind='barh')  # Horizontal bar
df.plot(kind='hist')
df.plot(kind='box')
df.plot(kind='scatter', x='col1', y='col2')
df.plot(kind='area')
df.plot(kind='pie', y='column')
df.plot(kind='density')
df.plot(kind='hexbin', x='col1', y='col2', gridsize=20)
```

#### `df.hist()`
Plots histogram.
```python
df.hist()
df['column'].hist(bins=20)
df.hist(column='col', by='category')
```

#### `df.boxplot()`
Creates box plot.
```python
df.boxplot()
df.boxplot(column='value', by='category')
```

### Working with Intervals

#### `pd.Interval()`
Creates an interval.
```python
interval = pd.Interval(left=0, right=5)
```

#### `pd.interval_range()`
Creates IntervalIndex.
```python
intervals = pd.interval_range(start=0, end=5)
intervals = pd.interval_range(start=0, periods=5, freq=1)
```

#### `pd.IntervalIndex.from_tuples()`
Creates from tuples.
```python
pd.IntervalIndex.from_tuples([(0, 1), (1, 2), (2, 3)])
```

### Advanced Missing Data Handling

#### `df.bfill()` / `df.ffill()`
Backward/forward fill (alternative to fillna).
```python
df.ffill()    # Forward fill
df.bfill()    # Backward fill
df.ffill(limit=2)  # Limit consecutive fills
```

#### `df.interpolate()` advanced methods
```python
df.interpolate(method='polynomial', order=3)
df.interpolate(method='spline', order=2)
df.interpolate(method='akima')
df.interpolate(method='krogh')
```

### Testing Functions

#### `pd.testing.assert_frame_equal()`
Asserts that two DataFrames are equal.
```python
pd.testing.assert_frame_equal(df1, df2)
pd.testing.assert_frame_equal(df1, df2, check_dtype=False)
```

#### `pd.testing.assert_series_equal()`
Asserts that two Series are equal.
```python
pd.testing.assert_series_equal(s1, s2)
```

### Utility Functions

#### `pd.factorize()`
Encodes object as enumerated type.
```python
codes, uniques = pd.factorize(df['column'])
```

#### `pd.get_dummies()`
Converts categorical variables to dummy/indicator variables.
```python
pd.get_dummies(df['category'])
pd.get_dummies(df, columns=['cat1', 'cat2'])
pd.get_dummies(df['category'], prefix='cat')
pd.get_dummies(df['category'], drop_first=True)
```

#### `pd.wide_to_long()`
Reshapes wide to long format with more control than melt.
```python
pd.wide_to_long(df, stubnames='value', i='id', j='year')
```

#### `pd.from_dummies()`
Converts dummy variables back to categorical (inverse of get_dummies).
```python
pd.from_dummies(dummies_df)
```

### Performance Tips

1. **Use vectorized operations instead of loops**
```python
# Bad
for i in range(len(df)):
    df.loc[i, 'new'] = df.loc[i, 'a'] + df.loc[i, 'b']

# Good
df['new'] = df['a'] + df['b']
```

2. **Use query() for complex filtering**
```python
# Faster
df.query('age > 30 & city == "NYC"')

# Slower
df[(df['age'] > 30) & (df['city'] == 'NYC')]
```

3. **Use category dtype for repeated strings**
```python
df['category'] = df['category'].astype('category')
```

4. **Use itertuples() instead of iterrows()**
```python
# Faster
for row in df.itertuples():
    process(row.column)

# Slower
for idx, row in df.iterrows():
    process(row['column'])
```

5. **Use eval() for complex expressions**
```python
df.eval('result = (a + b) * c')
```

### Common Patterns and Best Practices

#### Method Chaining
```python
result = (df
    .query('age > 30')
    .groupby('city')
    .agg({'salary': 'mean', 'count': 'size'})
    .reset_index()
    .sort_values('salary', ascending=False)
)
```

#### Creating a DataFrame from scratch efficiently
```python
# Using a dictionary
df = pd.DataFrame({
    'A': range(1000),
    'B': np.random.randn(1000),
    'C': pd.date_range('2020-01-01', periods=1000)
})
```

#### Conditional column creation
```python
df['category'] = df['value'].apply(
    lambda x: 'high' if x > 100 else 'low' if x < 50 else 'medium'
)

# Or using np.select for better performance
conditions = [df['value'] > 100, df['value'] < 50]
choices = ['high', 'low']
df['category'] = np.select(conditions, choices, default='medium')
```

#### Handling large files in chunks
```python
chunk_iter = pd.read_csv('large_file.csv', chunksize=10000)
for chunk in chunk_iter:
    process(chunk)
```

#### Creating bins and labels together
```python
df['age_group'] = pd.cut(
    df['age'], 
    bins=[0, 18, 35, 60, 100],
    labels=['Youth', 'Young Adult', 'Middle Age', 'Senior']
)
```


# Complete NumPy Guide: Vectors, 2D Arrays, and 3D Arrays

## Understanding NumPy Array Dimensions

### What are Vectors, 2D Arrays, and 3D Arrays?

**1D Array (Vector)**
- Single row or column of numbers
- Shape: `(n,)` 
- Example: `[1, 2, 3, 4, 5]`
- Used for: single list of values, time series data

**2D Array (Matrix)**
- Table with rows and columns
- Shape: `(rows, columns)`
- Example: `[[1, 2, 3], [4, 5, 6]]` → 2 rows, 3 columns
- Used for: spreadsheets, images (grayscale), data tables

**3D Array (Tensor)**
- Stack of 2D arrays
- Shape: `(depth, rows, columns)`
- Example: Stack of 3 matrices
- Used for: RGB images, video frames, batch of data

---

## LEVEL 1: Basic Vector Operations (1D Arrays)

### Creating Vectors

**np.array() - Create from list**
```python
vec = np.array([1, 2, 3, 4, 5])
# Result: [1 2 3 4 5]
# Shape: (5,)
```

**np.arange() - Range of values**
```python
vec = np.arange(0, 10, 2)
# Result: [0 2 4 6 8]
# Shape: (5,)
```

**np.linspace() - Evenly spaced values**
```python
vec = np.linspace(0, 1, 5)
# Result: [0.   0.25 0.5  0.75 1.  ]
# Shape: (5,)
```

**np.zeros() - Vector of zeros**
```python
vec = np.zeros(5)
# Result: [0. 0. 0. 0. 0.]
```

**np.ones() - Vector of ones**
```python
vec = np.ones(5)
# Result: [1. 1. 1. 1. 1.]
```

**np.random.rand() - Random vector**
```python
vec = np.random.rand(5)
# Result: [0.23 0.67 0.45 0.89 0.12]  # random values
```

### Accessing Vector Elements

**Indexing (starts from 0)**
```python
vec = np.array([10, 20, 30, 40, 50])
vec[0]      # First element: 10
vec[2]      # Third element: 30
vec[-1]     # Last element: 50
vec[-2]     # Second last: 40
```

**Slicing**
```python
vec[1:4]    # Elements from index 1 to 3: [20 30 40]
vec[:3]     # First 3 elements: [10 20 30]
vec[2:]     # From index 2 onwards: [30 40 50]
vec[::2]    # Every 2nd element: [10 30 50]
```

### Vector Operations

**Addition**
```python
vec1 = np.array([1, 2, 3])
vec2 = np.array([4, 5, 6])
result = vec1 + vec2
# Result: [5 7 9]
```

**Multiplication (element-wise)**
```python
result = vec1 * vec2
# Result: [4 10 18]
```

**Scalar Operations**
```python
vec = np.array([1, 2, 3, 4])
vec + 10        # Add 10 to all: [11 12 13 14]
vec * 5         # Multiply all by 5: [5 10 15 20]
vec ** 2        # Square all: [1 4 9 16]
```

**Vector Statistics**
```python
vec = np.array([1, 2, 3, 4, 5])
vec.sum()       # Sum: 15
vec.mean()      # Average: 3.0
vec.max()       # Maximum: 5
vec.min()       # Minimum: 1
vec.std()       # Standard deviation
```

**Dot Product (important for vectors!)**
```python
vec1 = np.array([1, 2, 3])
vec2 = np.array([4, 5, 6])
result = np.dot(vec1, vec2)
# Result: 1*4 + 2*5 + 3*6 = 32
```

---

## LEVEL 1: Basic 2D Array Operations (Matrices)

### Creating 2D Arrays

**np.array() - From nested list**
```python
matrix = np.array([[1, 2, 3],
                   [4, 5, 6]])
# Shape: (2, 3) → 2 rows, 3 columns
```

**np.zeros() - 2D array of zeros**
```python
matrix = np.zeros((3, 4))
# Creates 3x4 matrix of zeros
# Shape: (3, 4)
```

**np.ones() - 2D array of ones**
```python
matrix = np.ones((2, 5))
# Creates 2x5 matrix of ones
```

**np.eye() - Identity matrix**
```python
matrix = np.eye(3)
# Result:
# [[1. 0. 0.]
#  [0. 1. 0.]
#  [0. 0. 1.]]
```

**np.random.rand() - Random 2D array**
```python
matrix = np.random.rand(3, 4)
# Creates 3x4 matrix with random values
```

**np.arange().reshape() - Range reshaped**
```python
matrix = np.arange(12).reshape(3, 4)
# Result:
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]]
```

### Understanding 2D Array Shape

```python
matrix = np.array([[1, 2, 3],
                   [4, 5, 6]])

matrix.shape        # (2, 3) → 2 rows, 3 columns
matrix.ndim         # 2 → 2 dimensions
matrix.size         # 6 → total elements
```

### Accessing 2D Array Elements

**Single Element**
```python
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

matrix[0, 0]    # First row, first column: 1
matrix[1, 2]    # Second row, third column: 6
matrix[-1, -1]  # Last row, last column: 9
```

**Row Access**
```python
matrix[0]       # First row: [1 2 3]
matrix[1]       # Second row: [4 5 6]
matrix[-1]      # Last row: [7 8 9]
```

**Column Access**
```python
matrix[:, 0]    # First column: [1 4 7]
matrix[:, 1]    # Second column: [2 5 8]
matrix[:, -1]   # Last column: [3 6 9]
```

**Submatrix (Slicing)**
```python
matrix[0:2, 1:3]    # First 2 rows, columns 1-2
# Result:
# [[2 3]
#  [5 6]]

matrix[:2, :]       # First 2 rows, all columns
matrix[:, 1:]       # All rows, from column 1 onwards
```

### 2D Array Operations

**Row-wise Operations (axis=1)**
```python
matrix = np.array([[1, 2, 3],
                   [4, 5, 6]])

matrix.sum(axis=1)      # Sum each row: [6 15]
matrix.mean(axis=1)     # Mean of each row: [2. 5.]
matrix.max(axis=1)      # Max in each row: [3 6]
```

**Column-wise Operations (axis=0)**
```python
matrix.sum(axis=0)      # Sum each column: [5 7 9]
matrix.mean(axis=0)     # Mean of each column: [2.5 3.5 4.5]
matrix.min(axis=0)      # Min in each column: [1 2 3]
```

**Matrix Addition**
```python
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
C = A + B
# Result:
# [[ 6  8]
#  [10 12]]
```

**Matrix Multiplication**
```python
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
C = np.dot(A, B)  # or A @ B
# Result:
# [[19 22]
#  [43 50]]
```

**Transpose**
```python
matrix = np.array([[1, 2, 3],
                   [4, 5, 6]])
transposed = matrix.T
# Result:
# [[1 4]
#  [2 5]
#  [3 6]]
```

---

## LEVEL 1: Basic 3D Array Operations

### Creating 3D Arrays

**np.zeros() - 3D array of zeros**
```python
arr_3d = np.zeros((2, 3, 4))
# Shape: (2, 3, 4)
# 2 layers, 3 rows, 4 columns
```

**np.ones() - 3D array of ones**
```python
arr_3d = np.ones((3, 2, 5))
# 3 layers, 2 rows, 5 columns
```

**np.arange().reshape() - Range to 3D**
```python
arr_3d = np.arange(24).reshape(2, 3, 4)
# Creates 2x3x4 array with values 0-23
# Shape: (2, 3, 4)
```

**From nested lists**
```python
arr_3d = np.array([
    [[1, 2], [3, 4]],          # First layer
    [[5, 6], [7, 8]]           # Second layer
])
# Shape: (2, 2, 2)
```

### Understanding 3D Array Structure

```python
arr_3d = np.arange(24).reshape(2, 3, 4)
# Shape: (2, 3, 4)
# Think of it as: 2 matrices, each with 3 rows and 4 columns

print(arr_3d.shape)     # (2, 3, 4)
print(arr_3d.ndim)      # 3
print(arr_3d.size)      # 24
```

### Accessing 3D Array Elements

**Single Element**
```python
arr_3d = np.arange(24).reshape(2, 3, 4)
arr_3d[0, 0, 0]     # First layer, first row, first column
arr_3d[1, 2, 3]     # Second layer, third row, fourth column
```

**Getting Layers**
```python
arr_3d[0]           # First complete layer (3x4 matrix)
arr_3d[1]           # Second complete layer (3x4 matrix)
```

**Getting Rows from specific layer**
```python
arr_3d[0, 0]        # First row of first layer
arr_3d[1, 2]        # Third row of second layer
```

**Getting Columns**
```python
arr_3d[:, :, 0]     # First column from all layers
arr_3d[0, :, 1]     # Second column from first layer
```

### 3D Array Operations

**Operations along different axes**
```python
arr_3d = np.arange(24).reshape(2, 3, 4)

# axis=0 → across layers
arr_3d.sum(axis=0)      # Sum across layers → (3, 4) result

# axis=1 → across rows (within each layer)
arr_3d.sum(axis=1)      # Sum across rows → (2, 4) result

# axis=2 → across columns (within each row)
arr_3d.sum(axis=2)      # Sum across columns → (2, 3) result
```

**Flattening 3D to 1D**
```python
arr_3d = np.arange(8).reshape(2, 2, 2)
flat = arr_3d.flatten()
# Result: [0 1 2 3 4 5 6 7]
```

---

## LEVEL 2: Intermediate Vector/Matrix Operations

### Vector Advanced Operations

**Vector Magnitude (Length)**
```python
vec = np.array([3, 4])
magnitude = np.linalg.norm(vec)
# Result: 5.0 (because sqrt(3² + 4²) = 5)
```

**Unit Vector (Normalized)**
```python
vec = np.array([3, 4])
unit_vec = vec / np.linalg.norm(vec)
# Result: [0.6 0.8]
```

**Cross Product (3D vectors only)**
```python
vec1 = np.array([1, 0, 0])
vec2 = np.array([0, 1, 0])
cross = np.cross(vec1, vec2)
# Result: [0 0 1]
```

**Outer Product**
```python
vec1 = np.array([1, 2, 3])
vec2 = np.array([4, 5])
outer = np.outer(vec1, vec2)
# Result:
# [[ 4  5]
#  [ 8 10]
#  [12 15]]
```

### 2D Array Reshaping

**Reshape**
```python
arr = np.arange(12)
matrix = arr.reshape(3, 4)      # 3x4 matrix
matrix = arr.reshape(4, 3)      # 4x3 matrix
matrix = arr.reshape(2, 6)      # 2x6 matrix
matrix = arr.reshape(2, -1)     # 2 rows, auto-calculate columns
```

**Flatten vs Ravel**
```python
matrix = np.array([[1, 2], [3, 4]])
flat1 = matrix.flatten()    # Creates copy
flat2 = matrix.ravel()      # Creates view (when possible)
```

### Stacking and Splitting

**Vertical Stack (vstack) - Stack rows**
```python
arr1 = np.array([[1, 2, 3]])
arr2 = np.array([[4, 5, 6]])
result = np.vstack((arr1, arr2))
# Result:
# [[1 2 3]
#  [4 5 6]]
```

**Horizontal Stack (hstack) - Stack columns**
```python
arr1 = np.array([[1], [2]])
arr2 = np.array([[3], [4]])
result = np.hstack((arr1, arr2))
# Result:
# [[1 3]
#  [2 4]]
```

**Concatenate**
```python
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6]])

# Along rows (axis=0)
result = np.concatenate((arr1, arr2), axis=0)
# Result:
# [[1 2]
#  [3 4]
#  [5 6]]

# Along columns (axis=1) - shapes must match!
```

**Split**
```python
matrix = np.array([[1, 2, 3, 4],
                   [5, 6, 7, 8]])

# Split into 2 parts along columns
left, right = np.hsplit(matrix, 2)
# left: [[1 2], [5 6]]
# right: [[3 4], [7 8]]
```

### Matrix Operations

**Determinant**
```python
matrix = np.array([[1, 2], [3, 4]])
det = np.linalg.det(matrix)
# Result: -2.0
```

**Inverse**
```python
matrix = np.array([[1, 2], [3, 4]])
inv = np.linalg.inv(matrix)
# Result:
# [[-2.   1. ]
#  [ 1.5 -0.5]]
```

**Eigenvalues and Eigenvectors**
```python
matrix = np.array([[1, 2], [3, 4]])
eigenvalues, eigenvectors = np.linalg.eig(matrix)
```

**Matrix Power**
```python
matrix = np.array([[1, 2], [3, 4]])
squared = np.linalg.matrix_power(matrix, 2)
# Same as: matrix @ matrix
```

---

## LEVEL 2: Intermediate 3D Operations

### Stacking 2D Arrays into 3D

**Stack along new axis**
```python
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6], [7, 8]])

# Stack to create depth
arr_3d = np.stack((matrix1, matrix2), axis=0)
# Shape: (2, 2, 2)
```

**dstack - Depth stack**
```python
arr_3d = np.dstack((matrix1, matrix2))
# Stacks along third dimension
```

### Transposing 3D Arrays

```python
arr_3d = np.arange(24).reshape(2, 3, 4)
# Original shape: (2, 3, 4)

transposed = np.transpose(arr_3d, (1, 0, 2))
# New shape: (3, 2, 4)
# Axes reordered: axis 1 → axis 0, axis 0 → axis 1, axis 2 → axis 2
```

### Swapping Axes

```python
arr_3d = np.arange(24).reshape(2, 3, 4)
swapped = np.swapaxes(arr_3d, 0, 1)
# Swaps first and second axes
# New shape: (3, 2, 4)
```

---

## LEVEL 3: Advanced Operations

### Broadcasting

**What is Broadcasting?**
Broadcasting allows operations on arrays of different shapes.

**1D + 2D Broadcasting**
```python
matrix = np.array([[1, 2, 3],
                   [4, 5, 6]])
vec = np.array([10, 20, 30])

result = matrix + vec
# vec is broadcast to each row
# Result:
# [[11 22 33]
#  [14 25 36]]
```

**Column Broadcasting**
```python
matrix = np.array([[1, 2, 3],
                   [4, 5, 6]])
vec = np.array([[10], [20]])

result = matrix + vec
# Result:
# [[11 12 13]
#  [24 25 26]]
```

### Advanced Indexing

**Boolean Masking (2D)**
```python
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

# Get all values > 5
result = matrix[matrix > 5]
# Result: [6 7 8 9]

# Set all values > 5 to 0
matrix[matrix > 5] = 0
```

**Fancy Indexing (2D)**
```python
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

# Get specific rows
rows = matrix[[0, 2]]
# Result: [[1 2 3], [7 8 9]]

# Get specific elements
elements = matrix[[0, 1, 2], [0, 1, 2]]
# Gets: (0,0), (1,1), (2,2) → diagonal
# Result: [1 5 9]
```

### Advanced 3D Operations

**Meshgrid - Creating coordinate arrays**
```python
x = np.array([1, 2, 3])
y = np.array([4, 5])
X, Y = np.meshgrid(x, y)
# X: [[1 2 3]
#      [1 2 3]]
# Y: [[4 4 4]
#      [5 5 5]]
```

**Einstein Summation (einsum)**
```python
# Matrix multiplication
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
C = np.einsum('ij,jk->ik', A, B)
# Same as: A @ B

# Trace (sum of diagonal)
trace = np.einsum('ii', A)

# Batch matrix multiplication (3D)
A_batch = np.random.rand(10, 3, 4)  # 10 matrices of 3x4
B_batch = np.random.rand(10, 4, 5)  # 10 matrices of 4x5
C_batch = np.einsum('bij,bjk->bik', A_batch, B_batch)
# Result: 10 matrices of 3x5
```

### Tensor Operations (3D+)

**Batched Operations**
```python
# Process multiple images at once
images = np.random.rand(32, 224, 224, 3)
# 32 images, 224x224 pixels, 3 color channels

# Mean across color channels
gray_images = images.mean(axis=3)
# Result: (32, 224, 224)
```

**4D Arrays (Common in Deep Learning)**
```python
# Batch of RGB images
batch = np.random.rand(32, 64, 64, 3)
# Shape: (batch_size, height, width, channels)

print(batch.shape)      # (32, 64, 64, 3)
print(batch[0].shape)   # (64, 64, 3) - First image
```

---

## Quick Reference: Axis Understanding

### For 2D Arrays (Matrix)
```python
matrix = np.array([[1, 2, 3],
                   [4, 5, 6]])

# axis=0 → Down the rows (column-wise operation)
matrix.sum(axis=0)  # [5 7 9] - sum each column

# axis=1 → Across the columns (row-wise operation)
matrix.sum(axis=1)  # [6 15] - sum each row
```

### For 3D Arrays
```python
arr_3d = np.arange(24).reshape(2, 3, 4)
# Shape: (2, 3, 4)

# axis=0 → across depth (layers)
# axis=1 → across rows
# axis=2 → across columns
```

### Memory Rule
- **axis=0** → First dimension (outermost)
- **axis=1** → Second dimension (middle)
- **axis=2** → Third dimension (innermost)
- **axis=-1** → Last dimension

---

## Common Use Cases

### Image as 2D Array (Grayscale)
```python
image = np.random.randint(0, 256, (100, 100))
# 100x100 grayscale image
# Each value is pixel intensity (0-255)
```

### Image as 3D Array (RGB)
```python
image = np.random.randint(0, 256, (100, 100, 3))
# 100x100 RGB image
# Shape: (height, width, channels)
# channels: 0=Red, 1=Green, 2=Blue
```

### Video as 4D Array
```python
video = np.random.randint(0, 256, (30, 1920, 1080, 3))
# Shape: (frames, height, width, channels)
# 30 frames, 1920x1080 resolution, RGB
```

### Dataset as 2D Array
```python
data = np.array([[25, 170, 1],    # age, height, gender
                 [30, 165, 0],
                 [28, 180, 1]])
# Shape: (samples, features)
# Each row is one data sample
```

This guide covers everything about vectors, 2D arrays, and 3D arrays in a clear, organized way!