# Combining and Merging Data Sets

<!DOCTYPE html>
<html>
<head>
<style>
  table {
    border-collapse: collapse;
    width: 100%;
  }

  th, td {
    border: 1px solid black;
    padding: 8px;
    text-align: left;
  }
</style>
</head>
<body>

<h2>Methods for Combining Data in Pandas Objects</h2>

<table>
  <tr>
    <th>Method</th>
    <th>Description</th>
  </tr>
  <tr>
    <td><code>pandas.merge</code></td>
    <td>Connects rows in DataFrames based on one or more keys. Implements database join operations.</td>
  </tr>
  <tr>
    <td><code>pandas.concat</code></td>
    <td>Glues or stacks together objects along an axis.</td>
  </tr>
  <tr>
    <td><code>combine_first</code></td>
    <td>Instance method that splices overlapping data to fill missing values in one object with values from another.</td>
  </tr>
</table>

</body>
</html>


# Database-style DataFrame Merges


<!DOCTYPE html>
<html>
<head>
<style>
  table {
    border-collapse: collapse;
    width: 100%;
    border: 1px solid #ddd;
  }

  th, td {
    padding: 8px;
    text-align: left;
    border-bottom: 1px solid #ddd;
  }

  th {
    background-color: #f2f2f2;
  }
</style>
</head>
<body>

<h2>Pandas Merge Parameters</h2>

<table>
  <tr>
    <th>Parameter</th>
    <th>Description</th>
    <th>Example</th>
  </tr>
  <tr>
    <td><code>left</code></td>
    <td>The first dataframe to be merged.</td>
    <td><code>merged_df = pd.merge(left_df, right_df)</code></td>
  </tr>
  <tr>
    <td><code>right</code></td>
    <td>The second dataframe to be merged.</td>
    <td></td>
  </tr>
  <tr>
    <td><code>how</code></td>
    <td>The type of merge to be performed. Options: 'inner', 'outer', 'left', 'right'.</td>
    <td><code>merged_df = pd.merge(left_df, right_df, how='inner')</code></td>
  </tr>
  <tr>
    <td><code>on</code> / <code>left_on</code> / <code>right_on</code></td>
    <td>Column(s) or index(es) to join on from both dataframes.</td>
    <td><code>merged_df = pd.merge(df1, df2, on='key')</code></td>
  </tr>
  <tr>
    <td><code>left_index</code> / <code>right_index</code></td>
    <td>Use the index from the left/right dataframe as the join key.</td>
    <td><code>merged_df = pd.merge(left_df, right_df, left_index=True, right_index=True)</code></td>
  </tr>
  <tr>
    <td><code>suffixes</code></td>
    <td>Suffixes to add to overlapping column names in case of name conflicts.</td>
    <td><code>merged_df = pd.merge(df1, df2, on='key', suffixes=('_left', '_right'))</code></td>
  </tr>
  <tr>
    <td><code>sort</code></td>
    <td>Sort the merged result by join keys. Default is True.</td>
    <td><code>merged_df = pd.merge(df1, df2, on='key', sort=True)</code></td>
  </tr>
  <tr>
    <td><code>indicator</code></td>
    <td>Adds a column to indicate the source of each row. Options: True, 'left_only', 'right_only', False.</td>
    <td><code>merged_df = pd.merge(df1, df2, on='key', indicator=True)</code></td>
  </tr>
  <tr>
    <td><code>validate</code></td>
    <td>Check if merge is possible. Options: 'one_to_one', 'one_to_many', 'many_to_one', 'many_to_many'.</td>
    <td><code>merged_df = pd.merge(df1, df2, on='key', validate='many_to_one')</code></td>
  </tr>
  <tr>
    <td><code>copy</code></td>
    <td>If False, avoid copying data unnecessarily. Default is True.</td>
    <td><code>merged_df = pd.merge(df1, df2, on='key', copy=False)</code></td>
  </tr>
</table>

</body>
</html>


In [1]:
import pandas as pd

# Sample dataframes
left_df = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value1': [1, 2, 3, 4]})
right_df = pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'value2': [5, 6, 7, 8]})

# Merge based on 'key'
merged_df_inner = pd.merge(left_df, right_df, how='inner')
merged_df_left = pd.merge(left_df, right_df, how='left')
merged_df_right = pd.merge(left_df, right_df, how='right')
merged_df_outer = pd.merge(left_df, right_df, how='outer')

# Merge based on index
merged_df_index = pd.merge(left_df, right_df, left_index=True, right_index=True)

# Merge with suffixes
merged_df_suffix = pd.merge(left_df, right_df, on='key', suffixes=('_left', '_right'))

# Merge with indicator
merged_df_indicator = pd.merge(left_df, right_df, on='key', indicator=True)

# Merge with validation
merged_df_validate = pd.merge(left_df, right_df, on='key', validate='many_to_one')

# Merge without copying data
merged_df_no_copy = pd.merge(left_df, right_df, on='key', copy=False)


# Merging on Index


cheat sheet table for merging on indexes in pandas:

| Merge Type    | Method                             | Description                                                |
|---------------|------------------------------------|------------------------------------------------------------|
| Left Join     | `df1.merge(df2, left_index=True, right_index=True)`   | Include all rows from `df1`, and matching rows from `df2` |
| Inner Join    | `df1.merge(df2, how='inner', left_index=True, right_index=True)` | Include only matching rows from both DataFrames           |
| Outer Join    | `df1.merge(df2, how='outer', left_index=True, right_index=True)` | Include all rows from both DataFrames                    |
| Right Join    | `df1.merge(df2, how='right', left_index=True, right_index=True)` | Include all rows from `df2`, and matching rows from `df1` |
| MultiIndex    | `df1.merge(df2, how='inner', left_index=True, right_index=True, level='level_name')` | Merge on a specific index level in MultiIndex DataFrames |
| Suffixes      | `df1.merge(df2, how='outer', left_index=True, right_index=True, suffixes=('_suffix1', '_suffix2'))` | Specify suffixes for overlapping column names            |

Replace `df1` and `df2` with your actual DataFrame variables, and customize the parameters according to your needs. Keep in mind that this is a simplified overview, and pandas offers more advanced options and variations for merging dataframes based on indexes. Always consult the official pandas documentation for complete and up-to-date information: https://pandas.pydata.org/docs/user_guide/merging.html

In [2]:
import pandas as pd

# Sample DataFrames
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['x', 'y', 'z'])
df2 = pd.DataFrame({'C': [7, 8, 9], 'D': [10, 11, 12]}, index=['x', 'y', 'z'])

# Merge on Index
# --------------

# Left Join (default behavior)
result_left = df1.merge(df2, left_index=True, right_index=True)
print("Left Join:\n", result_left)

# Inner Join
result_inner = df1.merge(df2, left_index=True, right_index=True, how='inner')
print("Inner Join:\n", result_inner)

# Outer Join
result_outer = df1.merge(df2, left_index=True, right_index=True, how='outer')
print("Outer Join:\n", result_outer)

# Right Join
result_right = df1.merge(df2, left_index=True, right_index=True, how='right')
print("Right Join:\n", result_right)

# Merge on Specific Index Levels
# ------------------------------

# Create MultiIndex DataFrames
df1_multi = df1.copy()
df1_multi['level_2'] = ['p', 'q', 'r']
df1_multi = df1_multi.set_index(['level_2'], append=True)

df2_multi = df2.copy()
df2_multi['level_2'] = ['p', 'q', 'r']
df2_multi = df2_multi.set_index(['level_2'], append=True)

# Merge on specific index levels
result_multi = df1_multi.merge(df2_multi, left_index=True, right_index=True, how='inner', level='level_2')
print("Merge on specific index levels:\n", result_multi)

# Join with Different Suffixes
# ----------------------------

df1_suffix = pd.DataFrame({'A': [4, 5, 6]}, index=['y', 'z', 'w'])
df2_suffix = pd.DataFrame({'C': [7, 8, 9]}, index=['y', 'z', 'w'])

result_suffix = df1_suffix.merge(df2_suffix, left_index=True, right_index=True, how='outer', suffixes=('_df1', '_df2'))
print("Merge with suffixes:\n", result_suffix)


Left Join:
    A  B  C   D
x  1  4  7  10
y  2  5  8  11
z  3  6  9  12
Inner Join:
    A  B  C   D
x  1  4  7  10
y  2  5  8  11
z  3  6  9  12
Outer Join:
    A  B  C   D
x  1  4  7  10
y  2  5  8  11
z  3  6  9  12
Right Join:
    A  B  C   D
x  1  4  7  10
y  2  5  8  11
z  3  6  9  12


TypeError: DataFrame.merge() got an unexpected keyword argument 'level'

# Concatenating Along an Axis

<!DOCTYPE html>
<html>
<head>
  <style>
    table {
      border-collapse: collapse;
      width: 100%;
    }

    th, td {
      border: 1px solid black;
      padding: 8px;
      text-align: left;
    }

    th {
      background-color: #f2f2f2;
    }
  </style>
</head>
<body>

<table>
  <tr>
    <th>Argument</th>
    <th>Description</th>
  </tr>
  <tr>
    <td>objs</td>
    <td>List or dict of pandas objects to be concatenated. The only required argument.</td>
  </tr>
  <tr>
    <td>axis</td>
    <td>Axis to concatenate along; defaults to 0.</td>
  </tr>
  <tr>
    <td>join</td>
    <td>One of 'inner', 'outer', defaulting to 'outer'; whether to intersection (inner) or union (outer) together indexes along the other axes.</td>
  </tr>
  <tr>
    <td>join_axes</td>
    <td>Specific indexes to use for the other n-1 axes instead of performing union/intersection logic.</td>
  </tr>
  <tr>
    <td>keys</td>
    <td>Values to associate with objects being concatenated, forming a hierarchical index along the concatenation axis. Can either be a list or array of arbitrary values, an array of tuples, or a list of arrays (if multiple level arrays passed in levels).</td>
  </tr>
  <tr>
    <td>levels</td>
    <td>Specific indexes to use as hierarchical index level or levels if keys passed.</td>
  </tr>
  <tr>
    <td>names</td>
    <td>Names for created hierarchical levels if keys and / or levels passed.</td>
  </tr>
  <tr>
    <td>verify_integrity</td>
    <td>Check new axis in concatenated object for duplicates and raise exception if so. By default (False) allows duplicates.</td>
  </tr>
  <tr>
    <td>ignore_index</td>
    <td>Do not preserve indexes along concatenation axis, instead producing a new range(total_length) index.</td>
  </tr>
</table>

</body>
</html>


# Combining Data with Overlap

Another data combination situation can’t be expressed as either a merge or concate-
nation operation. You may have two datasets whose indexes overlap in full or part. As
a motivating example, consider NumPy’s where function, which expressed a vectorized
if-else:

Combining data with overlap in pandas typically involves handling situations where you have multiple datasets with overlapping data points, 
and you want to combine them in a meaningful way. One common scenario is when you have missing or conflicting data in multiple datasets
and you want to prioritize the information from one dataset over another. The `combine_first()` method in pandas is often used for this purpose. Here's how you can use it:


In [None]:

import pandas as pd

# Sample DataFrames with overlapping data
data1 = {'A': [1, 2, np.nan], 'B': [4, np.nan, 6]}
data2 = {'A': [3, np.nan, 5], 'B': [np.nan, 7, 8]}

df1 = pd.DataFrame(data1, index=['x', 'y', 'z'])
df2 = pd.DataFrame(data2, index=['y', 'z', 'w'])

# Combine overlapping data using combine_first()
combined_df = df1.combine_first(df2)
print(combined_df)



Additionally, make sure to have the necessary imports at the beginning of your script, such as `import numpy as np` if you're using `np.nan` for missing values.
In this example, `combine_first()` takes values from `df1` and fills in missing values with corresponding values from `df2`, if available. If the value is missing in both `df1` and `df2`, then the resulting value will be NaN.

Keep in mind that this method works well for scenarios where you want to prioritize values from one DataFrame over another. If you need more complex merging or handling of overlapping data, other methods like merging, joining, or reshaping data might be more suitable.


# Reshaping and Pivoting

## Reshaping with Hierarchical Indexing

## 0
Pivoting “long” to “wide” For


read from book 
mat

Pandas is a popular Python library for data manipulation and analysis. It offers a wide range of methods for data transformation, which allow you to reshape, clean, and modify your data. Here are some of the most common methods of data transformation in Pandas:

1. **Filtering and Selection:**
   - `df[condition]`: Select rows based on a condition.
   - `df.loc[row_labels, column_labels]`: Access a group of rows and columns by labels or a boolean array.
   - `df.iloc[row_indices, column_indices]`: Access a group of rows and columns by integer indices.
   - `df.query("condition")`: Filter rows using a query expression.

2. **Sorting and Reordering:**
   - `df.sort_values(by=columns)`: Sort rows based on values in specified columns.
   - `df.sort_index()`: Sort rows by index labels.
   - `df.sort_values(by=columns, ascending=[True, False])`: Sort using custom ascending orders.

3. **Aggregation and Grouping:**
   - `df.groupby(by=columns)`: Group data based on one or more columns.
   - `grouped.aggregate(func)`: Apply aggregation functions (e.g., sum, mean) to groups.
   - `grouped.transform(func)`: Apply a transformation function to groups while preserving index.
   - `grouped.apply(func)`: Apply a custom function to groups.

4. **Pivoting and Reshaping:**
   - `df.pivot_table(values, index, columns)`: Create a pivot table based on specified values and columns.
   - `df.melt(id_vars, value_vars)`: Unpivot/melt the DataFrame to long format.
   - `df.stack()` and `df.unstack()`: Reshape between wide and long formats using index levels.

5. **Data Cleaning:**
   - `df.drop(columns=columns)`: Remove specified columns.
   - `df.dropna()`: Remove rows with missing values.
   - `df.fillna(value)`: Replace missing values with a specific value.
   - `df.replace(old_value, new_value)`: Replace specific values in the DataFrame.

6. **Data Transformation:**
   - `df.apply(func, axis)`: Apply a function along rows or columns.
   - `df.transform(func)` or `Series.transform(func)`: Apply a function element-wise.
   - `df.map(mapping_dict)` or `Series.map(mapping_series)`: Replace values using a dictionary or a Series.

7. **String Operations:**
   - `df.str.lower()` and `df.str.upper()`: Convert strings to lowercase or uppercase.
   - `df.str.replace(old, new)`: Replace substrings in strings.
   - `df.str.split(separator)`: Split strings based on a separator.

8. **Datetime Operations:**
   - `pd.to_datetime(series)`: Convert a column to datetime format.
   - `df['column'].dt.year` and similar attributes: Extract year, month, day, etc. from datetime columns.
   - `df['column'].dt.strftime(format)`: Format datetime values as strings.

9. **Combining Data:**
   - `pd.concat([df1, df2], axis)`: Concatenate DataFrames vertically or horizontally.
   - `df1.merge(df2, on=columns)`: Merge DataFrames based on common columns.

These are just some of the many methods available in Pandas for data transformation. The library is quite extensive and versatile, making it a powerful tool for working with various types of data transformation tasks.



<h2>Filtering and Selection</h2>
<table>
  <tr>
    <th>Method</th>
    <th>Arguments</th>
    <th>Description</th>
  </tr>
  <tr>
    <td><code>df[condition]</code></td>
    <td><code>condition</code>: Boolean expression</td>
    <td>Select rows based on a condition.</td>
  </tr>
  <tr>
    <td><code>df.loc[row_labels, column_labels]</code></td>
    <td><code>row_labels</code>: Label or label list<br><code>column_labels</code>: Label or label list</td>
    <td>Access rows and columns by labels.</td>
  </tr>
  <tr>
    <td><code>df.iloc[row_indices, column_indices]</code></td>
    <td><code>row_indices</code>: Integer or integer list<br><code>column_indices</code>: Integer or integer list</td>
    <td>Access rows and columns by indices.</td>
  </tr>
  <tr>
    <td><code>df.query("condition")</code></td>
    <td><code>condition</code>: Query expression as string</td>
    <td>Filter rows using a query.</td>
  </tr>
</table>


<h2>Sorting and Reordering</h2>
<table>
  <tr>
    <th>Method</th>
    <th>Arguments</th>
    <th>Description</th>
  </tr>
  <tr>
    <td><code>df.sort_values(by=columns)</code></td>
    <td><code>by</code>: Column or list of columns to sort by<br><code>ascending</code>: Boolean or list of booleans</td>
    <td>Sort rows based on column values.</td>
  </tr>
  <tr>
    <td><code>df.sort_index()</code></td>
    <td>No arguments</td>
    <td>Sort rows by index labels.</td>
  </tr>
  <tr>
    <td><code>df.sort_values(by=columns, ascending=[])</code></td>
    <td><code>by</code>: Column or list of columns to sort by<br><code>ascending</code>: Boolean or list of booleans</td>
    <td>Sort with custom ascending orders.</td>
  </tr>
</table>



<h2>Aggregation and Grouping</h2>
<table>
  <tr>
    <th>Method</th>
    <th>Arguments</th>
    <th>Description</th>
  </tr>
  <tr>
    <td><code>df.groupby(by=columns)</code></td>
    <td><code>by</code>: Column or list of columns to group by</td>
    <td>Group data based on columns.</td>
  </tr>
  <tr>
    <td><code>grouped.aggregate(func)</code></td>
    <td><code>func</code>: Aggregation function</td>
    <td>Apply aggregation functions.</td>
  </tr>
  <tr>
    <td><code>grouped.transform(func)</code></td>
    <td><code>func</code>: Transformation function</td>
    <td>Apply transformation to groups.</td>
  </tr>
  <tr>
    <td><code>grouped.apply(func)</code></td>
    <td><code>func</code>: Custom function</td>
    <td>Apply custom function to groups.</td>
  </tr>
</table>



<h2>Pivoting and Reshaping</h2>
<table>
  <tr>
    <th>Method</th>
    <th>Arguments</th>
    <th>Description</th>
  </tr>
  <tr>
    <td><code>df.pivot_table(...)</code></td>
    <td><code>values</code>: Column or list of columns to aggregate<br><code>index</code>: Column or list of columns for index<br><code>columns</code>: Column or list of columns for columns</td>
    <td>Create pivot table.</td>
  </tr>
  <tr>
    <td><code>df.melt(...)</code></td>
    <td><code>id_vars</code>: Column or list of columns to retain<br><code>value_vars</code>: Column or list of columns to melt</td>
    <td>Unpivot/melt DataFrame.</td>
  </tr>
  <tr>
    <td><code>df.stack()</code> and <code>df.unstack()</code></td>
    <td>No arguments</td>
    <td>Reshape between wide and long.</td>
  </tr>
</table>


<h2>Data Cleaning</h2>
<table>
  <tr>
    <th>Method</th>
    <th>Arguments</th>
    <th>Description</th>
  </tr>
  <tr>
    <td><code>df.drop(columns=columns)</code></td>
    <td><code>columns</code>: Column or list of columns to drop</td>
    <td>Remove specified columns.</td>
  </tr>
  <tr>
    <td><code>df.dropna()</code></td>
    <td>No arguments</td>
    <td>Remove rows with missing values.</td>
  </tr>
  <tr>
    <td><code>df.fillna(value)</code></td>
    <td><code>value</code>: Value to fill missing values with</td>
    <td>Replace missing values.</td>
  </tr>
  <tr>
    <td><code>df.replace(old_value, new_value)</code></td>
    <td><code>old_value</code>: Value to replace<br><code>new_value</code>: Replacement value</td>
    <td>Replace specific values.</td>
  </tr>
</table>


<h2>Data Transformation</h2>
<table>
  <tr>
    <th>Method</th>
    <th>Arguments</th>
    <th>Description</th>
  </tr>
  <tr>
    <td><code>df.apply(func, axis)</code></td>
    <td><code>func</code>: Function to apply<br><code>axis</code>: Axis along which to apply function</td>
    <td>Apply function along axis.</td>
  </tr>
  <tr>
    <td><code>df.transform(func)</code> and <code>Series.transform(...)</code></td>
    <td><code>func</code>: Function to apply</td>
    <td>Apply function element-wise.</td>
  </tr

<h2>String Manipulation</h2>
<table>
  <tr>
    <th>Method</th>
    <th>Description</th>
  </tr>
  <tr>
    <td>Concatenation</td>
    <td>Combining strings together.</td>
  </tr>
  <tr>
    <td>Substring Extraction</td>
    <td>Getting a part of a string.</td>
  </tr>
  <tr>
    <td>Replacing</td>
    <td>Replacing one substring with another.</td>
  </tr>
  <tr>
    <td>Splitting</td>
    <td>Breaking a string into parts based on a delimiter.</td>
  </tr>
  <tr>
    <td>Stripping</td>
    <td>Removing leading and trailing whitespace from strings.</td>
  </tr>
  <tr>
    <td>Case Conversion</td>
    <td>Converting strings to lowercase or uppercase.</td>
  </tr>
  <!-- Add other string manipul

ation operations -->
</table>

<h2>String Object Methods</h2>
<table>
  <tr>
    <th>Method</th>
    <th>Description</th>
  </tr>
  <tr>
    <td><code>str.lower()</code> and <code>str.upper()</code></td>
    <td>Convert strings to lowercase or uppercase.</td>
  </tr>
  <tr>
    <td><code>str.replace(old, new)</code></td>
    <td>Replace substrings in strings.</td>
  </tr>
  <tr>
    <td><code>str.split(separator)</code></td>
    <td>Split strings based on separator.</td>
  </tr>
  <tr>
    <td><code>str.strip()</code></td>
    <td>Remove leading and trailing whitespace from strings.</td>
  </tr>
  <tr>
    <td><code>str.startswith(prefix)</code> and <code>str.endswith(suffix)</code></td>
    <td>Check if a string starts with a specified prefix or ends with a specified suffix.</td>
  </tr>
  <tr>
    <td><code>str.isnumeric()</code> and <code>str.isalpha()</code></td>
    <td>Check if a string contains numeric or alphabetic characters.</td>
  </tr>
  <tr>
    <td><code>str.join(iterable)</code></td>
    <td>Join elements of an iterable with the string as a separator.</td>
  </tr>
  <tr>
    <td><code>str.capitalize()</code> and <code>str.title()</code></td>
    <td>Capitalize the first character of a string or capitalize each word.</td>
  </tr>
  <!-- Add other string object methods -->
</table>


<<h2>Regular Expressions</h2>
<table>
  <tr>
    <th>Method</th>
    <th>Description</th>
  </tr>
  <tr>
    <td><code>re.search(pattern, string)</code></td>
    <td>Search for a pattern in a string.</td>
  </tr>
  <tr>
    <td><code>re.match(pattern, string)</code></td>
    <td>Match a pattern at the beginning of a string.</td>
  </tr>
  <tr>
    <td><code>re.findall(pattern, string)</code></td>
    <td>Find all occurrences of a pattern in a string.</td>
  </tr>
  <tr>
    <td><code>re.sub(pattern, replacement, string)</code></td>
    <td>Replace occurrences of a pattern with a replacement.</td>
  </tr>
  <tr>
    <td><code>re.split(pattern, string)</code></td>
    <td>Split a string by occurrences of a pattern.</td>
  </tr>
  <tr>
    <td><code>re.compile(pattern)</code></td>
    <td>Compile a regular expression pattern for reuse.</td>
  </tr>
  <tr>
    <td><code>re.finditer(pattern, string)</code></td>
    <td>Find all occurrences of a pattern as iterator.</td>
  </tr>
  <tr>
    <td><code>re.subn(pattern, replacement, string)</code></td>
    <td>Replace occurrences of a pattern with a replacement and count.</td>
  </tr>
  <!-- Add other regular expression methods -->
</table>


<h2>Vectorized String Functions in Pandas</h2>
<table>
  <tr>
    <th>Method</th>
    <th>Description</th>
  </tr>
  <tr>
    <td><code>df['column'].str.lower()</code> and <code>df['column'].str.upper()</code></td>
    <td>Apply lowercase or uppercase to strings in a DataFrame column.</td>
  </tr>
  <tr>
    <td><code>df['column'].str.replace(old, new)</code></td>
    <td>Replace substrings in strings within a DataFrame column.</td>
  </tr>
  <tr>
    <td><code>df['column'].str.split(separator)</code></td>
    <td>Split strings in a DataFrame column based on a separator.</td>
  </tr>
  <tr>
    <td><code>df['column'].str.strip()</code></td>
    <td>Remove leading and trailing whitespace from strings.</td>
  </tr>
  <tr>
    <td><code>df['column'].str.len()</code></td>
    <td>Compute the length of strings in a DataFrame column.</td>
  </tr>
  <tr>
    <td><code>df['column'].str.contains(pattern)</code></td>
    <td>Check if strings in a DataFrame column contain a pattern.</td>
  </tr>
  <tr>
    <td><code>df['column'].str.extract(pattern)</code></td>
    <td>Extract substrings from strings using a regular expression.</td>
  </tr>
  <!-- Add other vectorized string functions -->
</table>
