---
# **Pandas CheatSheet**
---

---
## **Pandas Series**
---

### **1. 🛠️ Series Creation**
- `pd.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)`

---

### **2. 🔍 Accessing Data**
- `Series.iloc[]`
- `Series.loc[]`
- `Series.at[]`
- `Series.iat[]`
- `Series.get(key, default=None)`

---

### **3. ✏️ Modifying Values**
- `Series.at[] =`
- `Series.iat[] =`
- `Series.loc[] =`
- `Series.iloc[] =`
- `Series.update(other)`
- `Series.where(cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False)`
- `Series.mask(cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False)`
- `Series.replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad')`

---

### **4. ➕ Mathematical Operations**
- `Series.add(other, level=None, fill_value=None)`
- `Series.sub(other, level=None, fill_value=None)`
- `Series.mul(other, level=None, fill_value=None)`
- `Series.div(other, level=None, fill_value=None)`
- `Series.truediv(other, level=None, fill_value=None)`
- `Series.floordiv(other, level=None, fill_value=None)`
- `Series.mod(other, level=None, fill_value=None)`
- `Series.pow(other, level=None, fill_value=None)`
- `Series.radd(), rsub(), rmul(), rdiv(), rfloordiv(), rmod(), rpow()` (same params)

---

### **5. 🧩 Handling Missing Values**
- `Series.isna()`
- `Series.notna()`
- `Series.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)`
- `Series.dropna(axis=0, inplace=False, how=None)`
- `Series.replace(...)` (also handles NaNs)

---

### **6. 📊 Ranking & Sorting**
- `Series.sort_values(axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key=None)`
- `Series.sort_index(axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index=False, key=None)`
- `Series.rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)`

---

### **7. 🛠️ Apply Custom Functions**
- `Series.map(arg, na_action=None)`
- `Series.apply(func, convert_dtype=True, args=(), **kwargs)`

---

### **8. 🔤 String Operations**
(Use via `.str`)
- `.str.lower(), upper(), title(), capitalize(), swapcase(), strip(), lstrip(), rstrip()`
- `.str.replace(pat, repl, n=-1, case=None, flags=0, regex=False)`
- `.str.contains(pat, case=True, flags=0, na=None, regex=True)`
- `.str.startswith(pat, na=None)`
- `.str.endswith(pat, na=None)`
- `.str.findall(pat, flags=0)`
- `.str.extract(pat, flags=0, expand=True)`
- `.str.get(i)`
- `.str.len()`
- `.str.slice(start=None, stop=None, step=None)`
- `.str.split(pat=None, n=-1, expand=False)`
- `.str.cat(others=None, sep=None, na_rep=None, join=None)`
- `.str.pad(width, side='left', fillchar=' ')`
- `.str.zfill(width)`
- `.str.repeat(repeats)`

---

### **9. 🔢 Indexing**
- `Series.index`
- `Series.reindex(index=None, method=None, copy=True, level=None, fill_value=nan, limit=None, tolerance=None)`
- `Series.reset_index(level=None, drop=False, name=None, inplace=False)`
- `Series.set_axis(labels, axis=0, inplace=False)`

---

### **10. 🔗 Combining Multiple Series**
- `pd.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)`
- `Series.append(to_append, ignore_index=False, verify_integrity=False)`
- `Series.combine(other, func, fill_value=None)`
- `Series.combine_first(other)`

---

### **11. 📈 Statistical Functions**
- `Series.mean(axis=None, skipna=True, numeric_only=None)`
- `Series.median(axis=None, skipna=True, numeric_only=None)`
- `Series.mode(dropna=True)`
- `Series.std(axis=None, skipna=True, ddof=1, numeric_only=None)`
- `Series.var(axis=None, skipna=True, ddof=1, numeric_only=None)`
- `Series.sum(axis=None, skipna=True, numeric_only=None, min_count=0)`
- `Series.min(axis=None, skipna=True, numeric_only=None)`
- `Series.max(axis=None, skipna=True, numeric_only=None)`
- `Series.idxmin(axis=0, skipna=True)`
- `Series.idxmax(axis=0, skipna=True)`
- `Series.cumsum(axis=None, skipna=True)`
- `Series.cumprod(axis=None, skipna=True)`
- `Series.cummax(axis=None, skipna=True)`
- `Series.cummin(axis=None, skipna=True)`
- `Series.describe(percentiles=None, include=None, exclude=None, datetime_is_numeric=False)`

---

### **12. 🏷️ Label Alignment**
(Applies to operations like addition/subtraction)
- `Series.align(other, join='outer', axis=0, level=None, copy=True, fill_value=None, method=None, limit=None, fill_axis=0, broadcast_axis=None)`

---

### **13. 📡 Broadcasting**
(Implicit in arithmetic & comparison operations)
- `Series.add()`, `sub()`, `mul()`, etc. (with `fill_value`)
- Supports broadcasting with scalar, Series, or aligned index objects.

---



---

## **🏷️ Pandas Label (Index)**

---

### **0. 🏗️ Creating Labels (Index)**
#### 🔹 **For Series**
- `pd.Series(data, index=...)`

#### 🔹 **For DataFrame**
- `pd.DataFrame(data, index=..., columns=...)`

#### 🔹 **Dedicated Index Creation (all types)**
- `pd.Index(data=None, dtype=None, copy=False, name=None)`
- `pd.RangeIndex(start=0, stop=None, step=1, name=None)`
- `pd.MultiIndex.from_arrays(arrays, sortorder=None, names=None)`
- `pd.MultiIndex.from_tuples(tuples, sortorder=None, names=None)`
- `pd.MultiIndex.from_product(iterables, sortorder=None, names=None)`
- `pd.CategoricalIndex(data, dtype=None, copy=False, name=None, categories=None, ordered=None)`

---

### **1. 🔍 Accessing Index (Labels)**
- `obj.index`

---

### **2. ✏️ Modifying Index**
- `obj.set_axis(labels, axis=0, inplace=False)`
- `obj.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)` *(DataFrame only)*
- `obj.reset_index(level=None, drop=False, name=None, inplace=False)`  
- `obj.rename_axis(mapper=None, index=None, columns=None, axis=None, copy=True, inplace=False)`

---

### **3. 🔄 Reindexing**
- `obj.reindex(labels=None, index=None, columns=None, axis=None, method=None, copy=True, level=None, fill_value=nan, limit=None, tolerance=None)`

---

### **4. 🧭 Index Properties**
- `.index.name`
- `.index.names` *(for MultiIndex)*
- `.index.dtype`
- `.index.size`
- `.index.shape`
- `.index.nbytes`
- `.index.ndim`
- `.index.empty`
- `.index.hasnans`
- `.index.is_monotonic`
- `.index.is_monotonic_increasing`
- `.index.is_monotonic_decreasing`
- `.index.is_unique`

---

### **5. 🛠️ Index Manipulation**
- `obj.sort_index(axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index=False, key=None)`
- `obj.swaplevel(i=-2, j=-1, axis=0)` *(for MultiIndex)*
- `obj.reorder_levels(order, axis=0)` *(for MultiIndex)*
- `obj.droplevel(level, axis=0)` *(for MultiIndex)*
- `obj.rename(mapper=None, index=None, columns=None, axis=None, copy=True, inplace=False, level=None, errors='ignore')`
- `obj.rename_axis(...)` *(see above)*

---

### **6. 🧩 Index Alignment & Joining**
- `obj.align(other, join='outer', axis=None, level=None, copy=True, fill_value=None, method=None, limit=None, fill_axis=0, broadcast_axis=None)`
- `obj.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)` *(DataFrame only)*
- `obj.combine_first(other)`  
- `pd.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)`

---

### **7. 🔢 Index Comparison and Set Logic**
- `.equals(other)`
- `.isin(values, level=None)`
- `.intersection(other, sort=False)`
- `.union(other, sort=False)`
- `.difference(other, sort=False)`
- `.symmetric_difference(other, sort=False)`
- `.isna()`
- `.notna()`

---

### **8. 🔍 Indexing with Labels**
- `obj.loc[label]`
- `obj.loc[start:stop]`
- `obj.at[label]`
- `obj.get(key, default=None)` *(Series)*

---


---
---

## **🧾Pandas DataFrame** 

---

### **1️⃣ All the Ways of Pandas DataFrame Creation**

- `pd.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)`
- `pd.read_csv(filepath_or_buffer, sep=',', delimiter=None, header='infer', names=None, index_col=None, usecols=None, ...)`
- `pd.read_excel(io, sheet_name=0, header=0, names=None, index_col=None, usecols=None, ...)`
- `pd.DataFrame.from_dict(data, orient='columns', dtype=None)`
- `pd.DataFrame.from_records(data, index=None, exclude=None, columns=None, coerce_float=False, nrows=None)`

---

### **2️⃣ Accessing the Data in Pandas DataFrame**

- `df['col']`
- `df.col`
- `df.loc[row_label, col_label]`
- `df.iloc[row_idx, col_idx]`
- `df.at[label, column]`
- `df.iat[i, j]`
- `df.get(key, default=None)`

---

### **3️⃣ Modifying the Values in the DataFrame**

- `df.at[label, column] = value`
- `df.iat[i, j] = value`
- `df.loc[row_labels, column_labels] = value`
- `df.iloc[rows, cols] = value`
- `df.replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad')`
- `df.update(other, join='left', overwrite=True, filter_func=None, errors='ignore')`
- `df.set_value(index, col, value, takeable=False)` *(Deprecated)*

---

### **4️⃣ Filtering and Boolean Indexing**

- `df[df['col'] > value]`
- `df.query(expr, inplace=False, **kwargs)`
- `df.where(cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False)`
- `df.mask(cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False)`

---

### **5️⃣ Grouping and Aggregating Data**

- `df.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, observed=False, dropna=True)`
- `.agg(func=None, axis=0, *args, **kwargs)`
- `.transform(func, axis=0, *args, **kwargs)`
- `.apply(func, axis=0, raw=False, result_type=None, args=(), **kwargs)`
- `.filter(func=None, axis=0, *args, **kwargs)`

---

### **6️⃣ Mathematical Operations in Pandas DataFrame**

- `.add(other, axis='columns', level=None, fill_value=None)`
- `.sub(other, axis='columns', level=None, fill_value=None)`
- `.mul(other, axis='columns', level=None, fill_value=None)`
- `.div(other, axis='columns', level=None, fill_value=None)`
- `.pow(other, axis='columns', level=None, fill_value=None)`
- `.abs()`
- `.round(decimals=0)`
- `.clip(lower=None, upper=None, axis=None, inplace=False)`

---

### **7️⃣ Handling the Missing Values in Pandas DataFrame**

- `.isna()`
- `.notna()`
- `.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)`
- `.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)`
- `.interpolate(method='linear', axis=0, limit=None, inplace=False, limit_direction='forward', limit_area=None, downcast=None)`

---

### **8️⃣ Searching and Sorting in Pandas DataFrame**

- `.sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key=None)`
- `.sort_index(axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index=False, key=None)`
- `.idxmax(axis=0, skipna=True)`
- `.idxmin(axis=0, skipna=True)`
- `.nlargest(n, columns, keep='first')`
- `.nsmallest(n, columns, keep='first')`

---

### **9️⃣ Apply Custom Functions with `map`, `apply`, `applymap`**

- `.apply(func, axis=0, raw=False, result_type=None, args=(), **kwargs)`
- `.applymap(func)` *(element-wise on DataFrame)*
- `.map(func, na_action=None)` *(Series only)*

---

### **🔟 String and Datetime Operations in Pandas DataFrame**

#### 🧵 String Operations:
- `.str.upper()`, `.str.lower()`, `.str.strip()`, `.str.replace(pat, repl, n=-1, case=None, flags=0, regex=True)`
- `.str.contains(pat, case=True, flags=0, na=None, regex=True)`
- `.str.startswith(pat)`, `.str.endswith(pat)`
- `.str.extract(pat, flags=0, expand=True)`

#### 🕒 Datetime Operations:
- `.dt.year`, `.dt.month`, `.dt.day`, `.dt.hour`, `.dt.minute`, `.dt.second`
- `.dt.strftime(format)`
- `.dt.weekday`, `.dt.dayofweek`, `.dt.dayofyear`, `.dt.is_month_end`

---

### **1️⃣1️⃣ Pandas DataFrame Indexing**

- `df.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)`
- `df.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='')`
- `df.reindex(labels=None, index=None, columns=None, axis=None, method=None, copy=True, level=None, fill_value=nan, limit=None, tolerance=None)`

---

### **1️⃣2️⃣ Multiple DataFrame Combining**

- `pd.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)`
- `df.append(other, ignore_index=False, verify_integrity=False, sort=False)` *(Deprecated)*
- `df.merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)`
- `df.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)`

---

### **1️⃣3️⃣ Pandas DataFrame Statistical Functions**

- `.mean(axis=0, skipna=True, numeric_only=False)`
- `.median(axis=0, skipna=True, numeric_only=False)`
- `.mode(axis=0, numeric_only=False, dropna=True)`
- `.std(axis=0, skipna=True, ddof=1, numeric_only=False)`
- `.var(axis=0, skipna=True, ddof=1, numeric_only=False)`
- `.sum(axis=0, skipna=True, numeric_only=False, min_count=0)`
- `.min(axis=0, skipna=True, numeric_only=False)`
- `.max(axis=0, skipna=True, numeric_only=False)`
- `.count(axis=0, numeric_only=False)`
- `.describe(percentiles=None, include=None, exclude=None)`

---

### **1️⃣4️⃣ Pandas DataFrame MultiIndexing**

- `pd.MultiIndex.from_arrays(arrays, sortorder=None, names=None)`
- `pd.MultiIndex.from_tuples(tuples, sortorder=None, names=None)`
- `pd.MultiIndex.from_product(iterables, sortorder=None, names=None)`
- `df.set_index([...])`
- `.swaplevel(i=-2, j=-1, axis=0)`
- `.reorder_levels(order, axis=0)`
- `.droplevel(level, axis=0)`
- `.sort_index(level=None, ...)`

---

### **1️⃣5️⃣ Pandas DataFrame Missing and Filling Data**

- `.isna()`, `.notna()`
- `.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)`
- `.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)`
- `.replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad')`
- `.interpolate(...)` *(same as above)*

---


---
---

## **🔢Pandas DataFrame: Data Types Handling**

---

## **🧬 Data Type Checking & Conversion**

#### ✅ Checking Data Types
- `df.dtypes`  
  → Returns the data type of each column.

- `df.get_dtype_counts()` *(Deprecated)*  
  → Counts of dtypes in the DataFrame.

- `df.infer_objects()`  
  → Attempts to infer better dtypes (e.g., `object` → `int`, `float`, etc.)

- `df.convert_dtypes(infer_objects=True, convert_string=True, convert_integer=True, convert_boolean=True, convert_floating=True)`  
  → Converts to best possible dtypes using new extension dtypes.

---

#### 🔄 Changing/Converting Data Types
- `df.astype(dtype, copy=True, errors='raise')`  
  - `dtype`: dict or single type to convert to  
  - `copy`: whether to return a copy or modify in place  
  - `errors`: {‘raise’, ‘ignore’} — raise error or ignore invalid conversion

- `pd.to_numeric(arg, errors='raise', downcast=None)`  
  - `arg`: Series, array, list  
  - `errors`: {‘raise’, ‘coerce’, ‘ignore’}  
  - `downcast`: {‘integer’, ‘signed’, ‘unsigned’, ‘float’}

- `pd.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, utc=None, format=None, exact=True, unit=None, infer_datetime_format=False, origin='unix', cache=True)`

- `pd.to_timedelta(arg, unit=None, errors='raise')`

- `pd.to_string(...)`, `pd.to_pickle(...)`, `pd.to_json(...)` *(for serialization, not type conversion)*

---
---

---
---

# **⚙️Pandas Data Processing**

---

### **1️⃣ Data Selection & Filtering**

- `df.loc[row_labels, column_labels]`
- `df.iloc[row_indices, column_indices]`
- `df.at[label, column]`
- `df.iat[i, j]`
- `df.query(expr, inplace=False, **kwargs)`
- `df.get(key, default=None)`
- `df.filter(items=None, like=None, regex=None, axis=None)`

---

### **2️⃣ Data Cleaning**

- `df.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')`
- `df.drop_duplicates(subset=None, keep='first', inplace=False, ignore_index=False)`
- `df.duplicated(subset=None, keep='first')`
- `df.replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad')`
- `df.rename(mapper=None, index=None, columns=None, axis=None, copy=True, inplace=False, level=None, errors='ignore')`
- `df.astype(dtype, copy=True, errors='raise')`
- `df.convert_dtypes(infer_objects=True, convert_string=True, convert_integer=True, convert_boolean=True, convert_floating=True)`

---

### **3️⃣ Handling Missing Data**

- `df.isna()`
- `df.notna()`
- `df.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)`
- `df.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)`
- `df.interpolate(method='linear', axis=0, limit=None, inplace=False, limit_direction='forward', limit_area=None, downcast=None)`

---

### **4️⃣ Data Aggregation & Transformation**

- `df.agg(func=None, axis=0, *args, **kwargs)`
- `df.transform(func, axis=0, *args, **kwargs)`
- `df.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, observed=False, dropna=True)`
- `df.pivot(index=None, columns=None, values=None)`
- `df.pivot_table(values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All', observed=False, sort=True)`
- `df.stack(level=-1, dropna=True)`
- `df.unstack(level=-1, fill_value=None)`
- `df.melt(id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None, ignore_index=True)`

---

### **5️⃣ Data Sorting & Ranking**

- `df.sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key=None)`
- `df.sort_index(axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index=False, key=None)`
- `df.rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)`

---

### **6️⃣ Data Combining / Merging**

- `pd.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)`
- `df.merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)`
- `df.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)`
- `df.append(other, ignore_index=False, verify_integrity=False, sort=False)` *(Deprecated)*

---

### **7️⃣ Reshaping DataFrames**

- `df.transpose()` or `df.T`
- `df.stack(level=-1, dropna=True)`
- `df.unstack(level=-1, fill_value=None)`
- `df.pivot(...)`
- `df.melt(...)`
- `df.explode(column, ignore_index=False)`

---
---

---
---

# **🕒Pandas Time-Series Operations**

---

### **1️⃣ Date Range Creation**

- `pd.date_range(start=None, end=None, periods=None, freq='D', tz=None, normalize=False, name=None, inclusive=None, unit=None, **kwargs)`
- `pd.bdate_range(start=None, end=None, periods=None, freq='B', tz=None, normalize=False, name=None, inclusive=None, **kwargs)`
- `pd.timedelta_range(start=None, end=None, periods=None, freq=None, name=None, closed=None)`
- `pd.period_range(start=None, end=None, periods=None, freq=None, name=None)`

---

### **2️⃣ Converting to Date/Time Objects**

- `pd.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, utc=None, format=None, exact=True, unit=None, infer_datetime_format=False, origin='unix', cache=True)`
- `pd.to_timedelta(arg, unit=None, errors='raise')`
- `pd.to_period(arg, freq=None, copy=True)`

---

### **3️⃣ Date/Time Components Access**

- `df['col'].dt.year`
- `df['col'].dt.month`
- `df['col'].dt.day`
- `df['col'].dt.hour`
- `df['col'].dt.minute`
- `df['col'].dt.second`
- `df['col'].dt.microsecond`
- `df['col'].dt.nanosecond`
- `df['col'].dt.week`
- `df['col'].dt.weekofyear`
- `df['col'].dt.dayofweek`
- `df['col'].dt.weekday`
- `df['col'].dt.dayofyear`
- `df['col'].dt.is_month_start`
- `df['col'].dt.is_month_end`
- `df['col'].dt.is_quarter_start`
- `df['col'].dt.is_quarter_end`
- `df['col'].dt.is_year_start`
- `df['col'].dt.is_year_end`
- `df['col'].dt.daysinmonth`
- `df['col'].dt.quarter`
- `df['col'].dt.to_period(freq=None)`

---

### **4️⃣ Resampling and Frequency Conversion**

- `df.resample(rule, axis=0, closed=None, label=None, convention='start', kind=None, loffset=None, base=None, on=None, level=None, origin='start_day', offset=None, group_keys=False)`
- `df.asfreq(freq, method=None, how=None, normalize=False, fill_value=None)`
- `df.tz_localize(tz, axis=0, level=None, copy=True, ambiguous='raise', nonexistent='raise')`
- `df.tz_convert(tz, axis=0, level=None, copy=True)`

---

### **5️⃣ Time Shifting**

- `df.shift(periods=1, freq=None, axis=0, fill_value=None)`
- `df.tshift(periods=1, freq=None, axis=0)` *(Deprecated, use `shift` with `freq`)*

---

### **6️⃣ Time-Series Window Functions**

- `df.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None)`
- `df.expanding(min_periods=1, axis=0, center=False, method='single')`
- `df.ewm(com=None, span=None, halflife=None, alpha=None, min_periods=1, adjust=True, ignore_na=False, axis=0, times=None)`

---

### **7️⃣ Offsets and Frequencies**

- `pd.offsets.Day(n=1)`
- `pd.offsets.BusinessDay(n=1)`
- `pd.offsets.MonthEnd(n=1)`
- `pd.offsets.MonthBegin(n=1)`
- `pd.offsets.QuarterEnd(startingMonth=3, n=1)`
- `pd.offsets.QuarterBegin(startingMonth=1, n=1)`
- `pd.offsets.YearEnd(month=12, n=1)`
- `pd.offsets.YearBegin(month=1, n=1)`

(📍 All offsets can be added or subtracted from `Timestamp`, `DatetimeIndex`, or `Series`.)

---
---


---

## **🧩Pandas Categorical Operations**

---

### **1️⃣ Creating Categorical Data**

- `pd.Categorical(values, categories=None, ordered=False, dtype=None, fastpath=False)`  
- `pd.CategoricalDtype(categories=None, ordered=False)`

- `Series.astype('category')`  
- `pd.Series(data, dtype='category')`

---

### **2️⃣ Accessing Categorical Properties**

- `series.cat.categories`  
- `series.cat.codes`  
- `series.cat.ordered`  
- `series.cat.dtype`

---

### **3️⃣ Modifying Categorical Attributes**

- `series.cat.rename_categories(new_categories, inplace=False)`  
  - `new_categories`: list-like or dict  
  - `inplace`: bool

- `series.cat.reorder_categories(new_categories, ordered=None, inplace=False)`  
  - `new_categories`: list-like  
  - `ordered`: bool  
  - `inplace`: bool

- `series.cat.set_categories(new_categories, ordered=None, rename=False, inplace=False)`  
  - `new_categories`: list-like  
  - `ordered`: bool  
  - `rename`: bool  
  - `inplace`: bool

- `series.cat.as_ordered(inplace=False)`  
- `series.cat.as_unordered(inplace=False)`

---

### **4️⃣ Removing Unused Categories**

- `series.cat.remove_unused_categories(inplace=False)`

---

### **5️⃣ Sorting Categorical Data**

- `series.sort_values(ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key=None)`

- `series.cat.remove_categories(removals, inplace=False)`  
  - `removals`: list-like of category values

---

### **6️⃣ Comparison & Boolean Operations**

*(Categoricals must be of same categories and order to compare)*  
- Standard comparisons: `==`, `!=`, `<`, `>`, `<=`, `>=` (if ordered=True)

---

### **7️⃣ Conversion**

- `series.astype(str)`  
- `series.astype(int)` (via `cat.codes`)  
- `series.cat.codes` → returns integer codes for categories

---


---

## **🧵Pandas Parallel Processing**

---

### **1️⃣ Using `swifter` for Parallel Apply**

- `df.swifter.apply(func, axis=0, raw=False, result_type=None, args=(), by_row=True)`
- `df.swifter.progress_bar(enable=True/False)`
- **Install**: `pip install swifter`
- **Wraps**: Pandas `apply()` with automatic parallelization using `dask` or `modin`

---

### **2️⃣ Using `modin.pandas` for Parallel DataFrame Execution**

- `import modin.pandas as pd`
- Supports almost all native Pandas functions (e.g., `.read_csv()`, `.apply()`, `.groupby()` etc.)
- **Backend Options**: Ray or Dask
  - `MODIN_ENGINE=ray` or `MODIN_ENGINE=dask`
- **Install**:  
  - `pip install modin[ray]`  
  - or `pip install modin[dask]`

---

### **3️⃣ Using `pandarallel` for Parallel Apply**

- `from pandarallel import pandarallel`  
  - `pandarallel.initialize(progress_bar=True, nb_workers=N)`
- `df.parallel_apply(func, axis=0)`
- `series.parallel_apply(func)`
- **Install**: `pip install pandarallel`

---

### **4️⃣ Using `joblib` for Parallel Processing with Pandas**

- `from joblib import Parallel, delayed`
- `Parallel(n_jobs=N)(delayed(func)(row) for row in df.iterrows())`
- **Parameters**:
  - `n_jobs`: Number of parallel workers
  - `backend`: ‘loky’ | ‘multiprocessing’ | ‘threading’

---

### **5️⃣ Using `dask.dataframe` for Scalable Parallel DataFrames**

- `import dask.dataframe as dd`
- `dd.from_pandas(df, npartitions=N)`
- `dd.read_csv("file.csv")`
- `ddf.compute()` → triggers computation
- Supports:
  - `.groupby()`
  - `.apply()`
  - `.map_partitions()`
  - `.merge()`, etc.

---

### **6️⃣ Using `concurrent.futures` for Pandas Row/Chunk Parallelization**

- `from concurrent.futures import ProcessPoolExecutor`
- Manual chunking:  
  - `np.array_split(df, N)`
  - `executor.map(func, chunks)`

---

### **7️⃣ Parallel File I/O in Pandas**

- Use `modin.pandas.read_csv()`  
- Use Dask’s `dd.read_csv()`

---

### **8️⃣ Best Practices & Tips**

- Prefer **column-wise operations** or **vectorization** before going parallel
- Use **`.applymap()`** sparingly — not parallelizable by default
- Always **monitor CPU/RAM** when using `modin`, `dask`, etc.
- Choose `swifter` for minimal setup, `modin` for full-scale replacement, and `dask` for large data pipelines

---



---

## **📊Pandas Plotting**

---

### **1️⃣ General Plotting Method**

- `DataFrame.plot(kind='line', ax=None, subplots=False, sharex=None, sharey=False, layout=None, figsize=None, use_index=True, title=None, grid=None, legend=True, style=None, logx=False, logy=False, loglog=False, xticks=None, yticks=None, xlim=None, ylim=None, rot=None, fontsize=None, colormap=None, table=False, yerr=None, xerr=None, secondary_y=False, sort_columns=False, **kwargs)`

- `Series.plot(...)` → same parameters as above.

---

### **2️⃣ Specific Plot Types**

> All of these methods inherit most parameters from `.plot()`.

---

#### 🔹 `line` (default)
- `df.plot.line(...)`

---

#### 🔹 `bar` / `barh`
- `df.plot.bar(x=None, y=None, stacked=False, **kwargs)`
- `df.plot.barh(x=None, y=None, stacked=False, **kwargs)`

---

#### 🔹 `hist`
- `df.plot.hist(bins=10, **kwargs)`

---

#### 🔹 `box`
- `df.plot.box(by=None, **kwargs)`

---

#### 🔹 `kde` / `density`
- `df.plot.kde(bw_method=None, ind=None, **kwargs)`
- `df.plot.density(bw_method=None, ind=None, **kwargs)`

---

#### 🔹 `area`
- `df.plot.area(stacked=True, **kwargs)`

---

#### 🔹 `pie`
- `series.plot.pie(labels=None, colors=None, autopct=None, pctdistance=0.6, shadow=False, labeldistance=1.1, startangle=None, radius=None, counterclock=True, wedgeprops=None, textprops=None, center=(0, 0), frame=False, rotatelabels=False, normalize=None, **kwargs)`

> 🚫 Only valid for **Series**, not DataFrame.

---

#### 🔹 `scatter`
- `df.plot.scatter(x, y, s=None, c=None, **kwargs)`

---

#### 🔹 `hexbin`
- `df.plot.hexbin(x, y, C=None, reduce_C_function='mean', gridsize=100, **kwargs)`

---

### **3️⃣ Subplots & Layout Control**

- `subplots=True`
- `layout=(rows, cols)`
- `sharex=True | False`
- `sharey=True | False`

---

### **4️⃣ Plot Styling & Customization**

- `title='...'`
- `style=['r--', 'g-', ...]`
- `colormap='viridis'`
- `legend=True | False`
- `grid=True | False`
- `fontsize=...`
- `rot=angle`
- `xlim=(min, max)`
- `ylim=(min, max)`
- `figsize=(width, height)`

---

### **5️⃣ Save Plot**

- `plt.savefig("filename.png")` after `df.plot(...)`

---