| **Function**                          | **Description**                                                                     | **Example**                                                                                                                         |
|--------------------------------------|-------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|
| **`pd.read_csv()`**               | Reads a CSV file into a DataFrame. Note that the first row is always considered as header. If it contains data, pass `header=None`                                                 | import pandas as pd<br>df = pd.read_csv("data.csv")<br>    
| **`DataFrame.to_csv(filename.txt)`**               | Writing back to csv file from pandas. By default, the row indexes is also written which may represent data. If you don't need those, pass `index=False`                                               | df.to_csv(filename.txt)       
| **`pd.read_json()`**               | Reads a json file into a DataFrame.                                                  | import pandas as pd<br>df = pd.read_json("data.json")       
| **`DataFrame.to_json(filename.txt)`**               | Converting JSON object to a string and write to the file                                                   | df.to_json(filename.txt)  
| **`DataFrame1.equals(DataFrame2)`**               | Checks if 2 DataFrames are equal. Following conditions must be met:<br><img src="/images/dataframe_equals.png" alt="dataframe_equals.png">                                               | df1.equals(df2)                                                        |
| **`DataFrame.head()`**            | Shows the first *n* rows of a DataFrame (default = 5).                              | df.head(3)  # Shows first 3 rows                                                                                   |
| **`DataFrame.tail()`**            | Shows the last *n* rows of a DataFrame (default = 5).                               | df.tail(3)  # Shows last 3 rows<br>                                                                                    |
| **`DataFrame.shape`**             | Returns a tuple of (rows, columns).                                                 | rows, cols = df.shape<br>rows=df.shape[0]<br>cols=df.shape[1]                                                                                              |
| **`DataFrame.columns`**           | Lists column names.                                                                 | df.columns                                                                                                         |
| **`DataFrame.info()`**            | Prints info about a DataFrame: column dtypes, non-null counts, etc.                 | df.info()<br>                                                                                                          |
| **`pd.set_option()`**            | Set option to see the entire set of columns in the output                 | pd.set_option("display.max_columns", None)<br>                                                                                                          |
| **`DataFrame.describe()`**        | Provides summary statistics for numeric columns (count, mean, std, etc.).           | df.describe()                                                                                                      |
| **`DataFrame.dtypes`**            | Shows the data type of each column.                                                 | df.dtypes                                                                                                          |
| **`DataFrame.select_dtypes(include=<data type>)`**            | Select the columns with only mentioned data types.                       | df.select_dtypes(include='number')                                                                                                        |
| **`DataFrame.isnull()`**          | Returns a DataFrame of booleans indicating missing values (`NaN`).                  | df.isnull().sum()  # Count missing in each column<br>
| **`DataFrame.reset_index()`**          | Fixes the row indices to see them sequentially ordered                  | df.reset_index(drop=True, inplace=True)                                                                  |
| **`DataFrame.dropna()`**         | Drops rows (or columns) with missing values. You can also drop all rows with missing values in the column by passing `subset=["column_name"]`                                        | # Drop rows with any missing values<br>df_clean = df.dropna()<br>                                                        |
| **`DataFrame.fillna()`**         | Fills missing values with a specified value or method.                              | # Replace missing in 'Age' with the mean<br>df['Age'] = df['Age'].fillna(df['Age'].mean())<br>                           |
| **`DataFrame.drop()`**           | Drops specified rows or columns from a DataFrame.                                   | # Drop a column named 'Unneeded'<br>df_dropped = df.drop(columns=['Unneeded'])<br>                                       |
| **`DataFrame.duplicated()`**     | Returns booleans for duplicate rows.                                                | df.duplicated().sum()  # Count duplicates<br>                                                                          |
| **`DataFrame.drop_duplicates()`**| Removes duplicate rows. Pass `keep="first"` if you want to keep the first occurrence                                                             | df_unique = df.drop_duplicates()<br>                                                                                   |
| **`DataFrame.rename()`**         | Renames columns or index labels.                                                    | df_renamed = df.rename(columns={'oldName': 'newName'})<br>                                                            |
| **`DataFrame.astype()`**         | Changes the data type of a column.                                                 | df['Age'] = df['Age'].astype(int)<br>                                                                                  |
| **`DataFrame.sort_values()`**    | Sorts by a specified column (or columns).                                           | # Sort by 'Age' descending<br>df_sorted = df.sort_values(by='Age', ascending=False)<br>                                  |
| **`DataFrame.loc`**              | Label-based row/column selection.                                                   | # Select rows where 'City' == 'NYC' and columns 'Name' & 'Age'<br>df_nyc = df.loc[df['City'] == 'NYC', ['Name','Age']]<br> |
| **`DataFrame.iloc`**             | Integer position-based row/column. Note you cannot use boolean conditions with `iloc`.                                        | # Select first 5 rows and first 2 columns<br>df_subset = df.iloc[:5, :2]<br>                                             |
| **`DataFrame.value_counts()`**   | Returns how many times each unique value appears in a single column.                                       | df['City'].value_counts()<br><img src="images/value_counts.png"><br>|
| **`DataFrame.nunique()`**        | Counts number of unique values for each column (or overall if used on a Series).    | df.nunique()<br><img src="images/nunique.png">|
| **`DataFrame.groupby()`**        | Groups data by a column or multiple columns for aggregations.                       | # Average age per city<br>df.groupby('City')['Age'].mean()<br>                                                          |
| **`DataFrame.agg()`**           | Applies aggregation(s) on grouped/un-grouped data (e.g., `mean`, `sum`, `count`).   | df.groupby('City').agg({'Age': 'mean', 'Income': 'sum'})<br>                                                          |
| **`DataFrame.apply()`**          | Applies a function to each column or row.                                           | # Convert all string columns to lowercase<br>df['Name'] = df['Name'].apply(str.lower)<br>                               |
| **`DataFrame.applymap()`**       | Applies a function elementwise across the entire DataFrame.                         | # Add 1 to every numeric value in df<br>df_num = df.select_dtypes(include='number')<br>df_num.applymap(lambda x: x + 1)<br> |
| **`DataFrame.merge()`**          | Merges DataFrames based on common columns or indexes (SQL-style joins).             | # Merge df1 and df2 on 'id'<br>df_merged = pd.merge(df1, df2, on='id', how='inner')<br>                                  |
| **`pd.concat()`**                | Concatenates DataFrames along rows or columns.                                      | # Concatenate two DataFrames vertically<br>df_concat = pd.concat([df1, df2], axis=0)<br>                                |
| **`DataFrame.pivot()`**          | Reshapes data by turning unique values of one column into multiple columns. You can imagine this like the row `index` here are `Date` (i.e. grouped by Date).         | df_pivoted = df.pivot(index='Date', columns='City', values='Sales')<br>                                               |
| **`DataFrame.pivot_table()`**    | Creates a spreadsheet-style pivot table as a DataFrame; can use aggregation funcs.  | # Average 'Sales' for each 'City' and 'Store'<br>pd.pivot_table(df, values='Sales', index='City', columns='Store', aggfunc='mean')<br> |
| **`pd.melt()`**                  | Unpivots (or “melts”) a DataFrame from wide format to long format.                  | # Melt pivoted data to long format<br>df_melted = pd.melt(df_pivoted, ignore_index=False, value_name='Sales')<br>        |
| **`DataFrame.replace()`**        | Replaces values in a DataFrame.                                                     | # Replace all occurrences of '?' with NaN<br>df = df.replace('?', np.nan)<br>                                           |
| **`DataFrame.sample()`**         | Returns a random sample of rows.                                                    | # Sample 5 random rows<br>df_sample = df.sample(n=5)<br>                                                               |
| **`DataFrame.transform()`**      | Applies a function to each group (similar to `apply` but must return same shape).    | # Standardize each group separately<br>df.groupby('City')['Income'].transform(lambda x: (x - x.mean()) / x.std())<br>   |
| **`DataFrame.join()`**           | Joins columns of another DataFrame. Faster for index-based merges.                  | # Join df2 (same index) onto df1<br>df_joined = df1.join(df2, lsuffix='_1', rsuffix='_2')<br>                           |
| **`DataFrame.resample()`**       | Aggregates time-series data at a new frequency (for DateTime index).                | # Resample daily data to monthly, summing values<br>df.resample('M').sum()<br>                                         |
| **`DataFrame.rolling()`**        | Provides rolling window calculations on time series or numeric data.                | # 7-day rolling mean for a time series<br>df['rolling_mean'] = df['value'].rolling(7).mean()<br>                       |
| **`DataFrame.corr()`**           | Computes pairwise correlation of columns. Values with higher positive or negative (absolute) values are highly corrlelated, one of which can be potentially dropped.                                            | correlation_matrix = df.corr()<br>                                                                                   |
| **`DataFrame.clip()`**           | Trims values below/above thresholds.                                                | # Clip outliers to the range [0, 100]<br>df['score'] = df['score'].clip(lower=0, upper=100)<br>                        |
| **`DataFrame.query()`**          | Queries a DataFrame using a boolean expression.                                     | # Select rows where Age > 30 and City == 'NYC'<br>df_filtered = df.query('Age > 30 and City == \"NYC\"')<br>            |                          | 
