# Pandas DataFrame Iteration Interactive Explorer

This notebook provides interactive demonstrations of different ways to iterate over a Pandas DataFrame: **iterrows()**, **itertuples()**, and **items()** (column-wise). You'll also explore why *vectorization usually beats explicit Python loops* and how to refactor loop-based code.

---

### Exercise Link: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/uf-bme/BME3053C-Fall-2025/blob/main/interactive-exercises/DataFrame_Iteration.ipynb)

---

## Learning Objectives
- Understand what `iterrows()` yields and its pitfalls
- Compare with `itertuples()` (faster, namedtuple-like access)
- Use `DataFrame.items()` to iterate over columns
- Benchmark performance of each approach
- Refactor a loop using vectorized Pandas operations
- Learn best practices for when (and when not) to iterate

> ⚠️ **Key Principle:** Prefer vectorized operations. Row-wise iteration in Python should be a *last resort* (e.g., calling slow Python logic, complex stateful operations, or interoperability with external APIs).

In [7]:
import pandas as pd
import numpy as np
import ipywidgets as widgets
from IPython.display import display, HTML, Markdown
import time
import textwrap

## 1. Generate a Sample DataFrame
Use the controls below to create a synthetic dataset. This will feed the subsequent interactive explorers.

In [8]:
# Create a sample DataFrame generator
def create_dataframe_generator():
    # Create widgets for DataFrame customization
    num_rows = widgets.IntSlider(
        value=5, min=3, max=20, step=1,
        description='Rows:', style={'description_width': 'initial'}
    )

    dataset_selector = widgets.Dropdown(
        options={
            'Student Scores': {'type': 'students'},
            'Sales Data': {'type': 'sales'},
            'Scientific Measurements': {'type': 'science'},
            'Financial Records': {'type': 'financial'}
        },
        value={'type': 'students'},
        description='Dataset:'
    )

    output = widgets.Output()

    # Global variable to store the current DataFrame
    global current_df
    current_df = None

    def generate_dataframe(*args):
        global current_df
        with output:
            output.clear_output()

            rows = num_rows.value
            dataset_type = dataset_selector.value['type']

            np.random.seed(42)  # For reproducible results

            if dataset_type == 'students':
                names = [f'Student_{i+1}' for i in range(rows)]
                math_scores = np.random.randint(60, 100, rows)
                science_scores = np.random.randint(55, 95, rows)
                english_scores = np.random.randint(65, 98, rows)

                current_df = pd.DataFrame({
                    'Name': names,
                    'Math': math_scores,
                    'Science': science_scores,
                    'English': english_scores
                })

            elif dataset_type == 'sales':
                products = ['Widget', 'Gadget', 'Tool', 'Device', 'Item']
                product_names = np.random.choice(products, rows)
                quantities = np.random.randint(1, 50, rows)
                prices = np.round(np.random.uniform(10.0, 100.0, rows), 2)

                current_df = pd.DataFrame({
                    'Product': product_names,
                    'Quantity': quantities,
                    'Price': prices,
                    'Total': np.round(quantities * prices, 2)
                })

            elif dataset_type == 'science':
                temperatures = np.round(np.random.uniform(20.0, 35.0, rows), 1)
                pressures = np.round(np.random.uniform(1000.0, 1020.0, rows), 1)
                humidity = np.random.randint(30, 80, rows)

                current_df = pd.DataFrame({
                    'Temperature': temperatures,
                    'Pressure': pressures,
                    'Humidity': humidity,
                    'Experiment_ID': [f'EXP_{i+1:03d}' for i in range(rows)]
                })

            elif dataset_type == 'financial':
                companies = ['AAPL', 'GOOGL', 'MSFT', 'TSLA', 'AMZN']
                company_names = np.random.choice(companies, rows)
                prices = np.round(np.random.uniform(100.0, 300.0, rows), 2)
                volumes = np.random.randint(1000, 10000, rows)

                current_df = pd.DataFrame({
                    'Symbol': company_names,
                    'Price': prices,
                    'Volume': volumes,
                    'Market_Cap': np.round(prices * volumes / 1000, 2)
                })

            # Display the DataFrame
            html_content = "<div style='font-family: monospace; font-size: 14px; color: #000;'>"
            html_content += "<div style='background-color: #f9f9f9; padding: 15px; border-left: 4px solid #2196f3; margin-bottom: 10px;'>"
            html_content += "<h4 style='color: #000; margin-top: 0;'>📊 Generated DataFrame</h4>"
            html_content += f"<p><strong>Shape:</strong> {current_df.shape[0]} rows × {current_df.shape[1]} columns</p>"
            html_content += f"<p><strong>Columns:</strong> {list(current_df.columns)}</p>"
            html_content += "</div>"

            # Convert DataFrame to HTML and style it
            df_html = current_df.to_html(classes='dataframe', table_id='sample_df')
            styled_df = df_html.replace('<table', '<table style="border-collapse: collapse; margin: 10px 0; font-size: 12px;"')
            styled_df = styled_df.replace('<th>', '<th style="background-color: #f0f0f0; padding: 8px; border: 1px solid #ddd; text-align: left;">')
            styled_df = styled_df.replace('<td>', '<td style="padding: 8px; border: 1px solid #ddd;">')

            html_content += styled_df
            html_content += "</div>"

            display(HTML(html_content))

    # Link widgets to update function
    num_rows.observe(generate_dataframe, 'value')
    dataset_selector.observe(generate_dataframe, 'value')

    # Display the interface
    display(widgets.VBox([
        widgets.HTML("<h3>🎛️ DataFrame Generator</h3>"),
        widgets.HTML("<p>Customize your DataFrame for the interactive examples below:</p>"),
        num_rows,
        dataset_selector,
        output
    ]))

    # Generate initial DataFrame
    generate_dataframe()

# Create the DataFrame generator
create_dataframe_generator()

VBox(children=(HTML(value='<h3>🎛️ DataFrame Generator</h3>'), HTML(value='<p>Customize your DataFrame for the …

In [9]:
# Example 1: DataFrame.iterrows() Explorer
def create_iterrows_explorer():
    # Create widgets for this example
    max_rows_slider = widgets.IntSlider(
        value=3, min=1, max=10, step=1,
        description='Show rows:', style={'description_width': 'initial'}
    )

    show_details = widgets.ToggleButtons(
        options=['Basic Usage', 'Index + Values', 'Data Types', 'Common Pitfalls'],
        value='Basic Usage',
        description='View:'
    )

    output = widgets.Output()

    def update_display(*args):
        with output:
            output.clear_output()

            # Use the global DataFrame from the generator
            if 'current_df' not in globals() or current_df is None:
                display(HTML("<p style='color: red;'>Please run the DataFrame generator first!</p>"))
                return

            max_rows = max_rows_slider.value
            view_mode = show_details.value
            df_subset = current_df.head(max_rows)

            html_content = "<div style='font-family: monospace; font-size: 14px; color: #000;'>"
            html_content += "<div style='background-color: #f9f9f9; padding: 10px; border-left: 4px solid #ff5722; margin-bottom: 10px;'>"
            html_content += "<h4 style='color: #000;'>🔴 DataFrame.iterrows() - Row-by-Row Iteration</h4>"

            if view_mode == 'Basic Usage':
                html_content += "<p><strong>Code:</strong></p>"
                html_content += "<code>for index, row in df.iterrows():<br>&nbsp;&nbsp;&nbsp;&nbsp;print(f'Index: {index}, Row data: {row.to_dict()}')</code><br><br>"

                html_content += "<div style='background-color: #ffebee; padding: 8px; border-radius: 4px;'>"
                html_content += "<strong>Output:</strong><br>"
                for index, row in df_subset.iterrows():
                    row_dict = row.to_dict()
                    html_content += f"Index: {index}, Row: {row_dict}<br>"
                html_content += "</div>"

            elif view_mode == 'Index + Values':
                html_content += "<p><strong>Code:</strong></p>"
                html_content += "<code>for index, row in df.iterrows():<br>&nbsp;&nbsp;&nbsp;&nbsp;print(f'Row {index}:')<br>&nbsp;&nbsp;&nbsp;&nbsp;for col_name, value in row.items():<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;print(f'&nbsp;&nbsp;{col_name}: {value}')</code><br><br>"

                html_content += "<div style='background-color: #ffebee; padding: 8px; border-radius: 4px;'>"
                html_content += "<strong>Output:</strong><br>"
                for index, row in df_subset.iterrows():
                    html_content += f"<strong>Row {index}:</strong><br>"
                    for col_name, value in row.items():
                        html_content += f"&nbsp;&nbsp;{col_name}: {value}<br>"
                    html_content += "<br>"
                html_content += "</div>"

            elif view_mode == 'Data Types':
                html_content += "<p><strong>Code:</strong></p>"
                html_content += "<code>for index, row in df.iterrows():<br>&nbsp;&nbsp;&nbsp;&nbsp;print(f'Row {index} types: {[type(val).__name__ for val in row]}')</code><br><br>"

                html_content += "<div style='background-color: #ffebee; padding: 8px; border-radius: 4px;'>"
                html_content += "<strong>Output:</strong><br>"
                for index, row in df_subset.iterrows():
                    types_list = [type(val).__name__ for val in row]
                    html_content += f"Row {index} types: {types_list}<br>"
                html_content += "</div>"

                html_content += "<div style='background-color: #fff3e0; padding: 8px; margin: 10px 0; border-radius: 4px;'>"
                html_content += "<strong>⚠️ Notice:</strong> iterrows() returns pandas Series objects, which might change data types!"
                html_content += "</div>"

            elif view_mode == 'Common Pitfalls':
                html_content += "<div style='background-color: #ffcdd2; padding: 10px; border-radius: 4px;'>"
                html_content += "<h5>❌ Common Pitfalls with iterrows():</h5>"
                html_content += "<ul>"
                html_content += "<li><strong>Type Changes:</strong> Data types may not be preserved</li>"
                html_content += "<li><strong>Performance:</strong> Very slow for large DataFrames</li>"
                html_content += "<li><strong>Index Issues:</strong> Row index might not match original DataFrame index</li>"
                html_content += "<li><strong>Memory Usage:</strong> Creates Series objects for each row</li>"
                html_content += "</ul>"
                html_content += "</div>"

                html_content += "<div style='background-color: #c8e6c9; padding: 10px; margin: 10px 0; border-radius: 4px;'>"
                html_content += "<h5>✅ Better Alternatives:</h5>"
                html_content += "<ul>"
                html_content += "<li><strong>Vectorized operations:</strong> df['new_col'] = df['col1'] + df['col2']</li>"
                html_content += "<li><strong>itertuples():</strong> Faster than iterrows(), preserves types</li>"
                html_content += "<li><strong>apply():</strong> For row-wise operations that can't be vectorized</li>"
                html_content += "</ul>"
                html_content += "</div>"

            # Show DataFrame info
            html_content += f"<p><strong>DataFrame shape:</strong> {df_subset.shape}</p>"
            html_content += f"<p><strong>Showing rows:</strong> 0 to {max_rows-1}</p>"
            html_content += "</div></div>"

            display(HTML(html_content))

    # Link widgets to update function
    max_rows_slider.observe(update_display, 'value')
    show_details.observe(update_display, 'value')

    # Display the interface
    display(widgets.VBox([
        widgets.HTML("<h3>🔴 Example 1: iterrows() Explorer</h3>"),
        widgets.HTML("<p>Explore how iterrows() works and its characteristics:</p>"),
        max_rows_slider,
        show_details,
        output
    ]))

    # Initial update
    update_display()

# Run Example 1
create_iterrows_explorer()

VBox(children=(HTML(value='<h3>🔴 Example 1: iterrows() Explorer</h3>'), HTML(value='<p>Explore how iterrows() …

In [10]:
# Example 2: DataFrame.itertuples() Explorer
def create_itertuples_explorer():
    # Create widgets for this example
    max_rows_slider = widgets.IntSlider(
        value=3, min=1, max=10, step=1,
        description='Show rows:', style={'description_width': 'initial'}
    )

    include_index = widgets.Checkbox(
        value=True,
        description='Include Index',
        style={'description_width': 'initial'}
    )

    named_tuple = widgets.Checkbox(
        value=True,
        description='Named Tuple',
        style={'description_width': 'initial'}
    )

    view_mode = widgets.ToggleButtons(
        options=['Basic Usage', 'Attribute Access', 'vs iterrows()'],
        value='Basic Usage',
        description='View:'
    )

    output = widgets.Output()

    def update_display(*args):
        with output:
            output.clear_output()

            # Use the global DataFrame from the generator
            if 'current_df' not in globals() or current_df is None:
                display(HTML("<p style='color: red;'>Please run the DataFrame generator first!</p>"))
                return

            max_rows = max_rows_slider.value
            inc_index = include_index.value
            named = named_tuple.value
            mode = view_mode.value
            df_subset = current_df.head(max_rows)

            html_content = "<div style='font-family: monospace; font-size: 14px; color: #000;'>"
            html_content += "<div style='background-color: #f9f9f9; padding: 10px; border-left: 4px solid #4caf50; margin-bottom: 10px;'>"
            html_content += "<h4 style='color: #000;'>🟢 DataFrame.itertuples() - Faster Row Iteration</h4>"

            if mode == 'Basic Usage':
                html_content += f"<p><strong>Parameters:</strong> index={inc_index}, name={'Pandas' if named else None}</p>"
                html_content += "<code>for row in df.itertuples(index={}, name={}):<br>&nbsp;&nbsp;&nbsp;&nbsp;print(row)</code><br><br>".format(inc_index, "'Pandas'" if named else "None")

                html_content += "<div style='background-color: #e8f5e8; padding: 8px; border-radius: 4px;'>"
                html_content += "<strong>Output:</strong><br>"
                for row in df_subset.itertuples(index=inc_index, name='Pandas' if named else None):
                    html_content += f"{row}<br>"
                html_content += "</div>"

            elif mode == 'Attribute Access':
                if not named:
                    html_content += "<p style='color: orange;'>⚠️ Enable 'Named Tuple' to see attribute access</p>"

                html_content += "<p><strong>Code:</strong></p>"
                html_content += "<code>for row in df.itertuples(index={}, name='Pandas'):<br>&nbsp;&nbsp;&nbsp;&nbsp;print(f'Accessing by attribute: row.{}')<br>&nbsp;&nbsp;&nbsp;&nbsp;print(f'Accessing by index: row[1]')</code><br><br>".format(inc_index, list(df_subset.columns)[0])

                html_content += "<div style='background-color: #e8f5e8; padding: 8px; border-radius: 4px;'>"
                html_content += "<strong>Output:</strong><br>"
                for i, row in enumerate(df_subset.itertuples(index=inc_index, name='Pandas' if named else None)):
                    if i >= 2:  # Show only first 2 rows for clarity
                        break
                    if named and hasattr(row, df_subset.columns[0].replace(' ', '_')):
                        first_col = df_subset.columns[0].replace(' ', '_')
                        attr_value = getattr(row, first_col, 'N/A')
                        html_content += f"Row {i+1} - Attribute access: row.{first_col} = {attr_value}<br>"
                        html_content += f"Row {i+1} - Index access: row[{'1' if inc_index else '0'}] = {row[1 if inc_index else 0]}<br><br>"
                    else:
                        html_content += f"Row {i+1} - Index access: row[{'1' if inc_index else '0'}] = {row[1 if inc_index else 0]}<br><br>"
                html_content += "</div>"

            elif mode == 'vs iterrows()':
                html_content += "<div style='display: flex; gap: 10px;'>"

                # iterrows column
                html_content += "<div style='flex: 1; background-color: #ffebee; padding: 8px; border-radius: 4px;'>"
                html_content += "<h5>iterrows() - Slower</h5>"
                html_content += "<code>for idx, row in df.iterrows():<br>&nbsp;&nbsp;&nbsp;&nbsp;# Returns Series object<br>&nbsp;&nbsp;&nbsp;&nbsp;print(type(row))</code><br><br>"

                start_time = time.time()
                for idx, row in df_subset.iterrows():
                    pass  # Just iterate
                iterrows_time = time.time() - start_time

                html_content += f"<strong>Time:</strong> {iterrows_time:.6f}s<br>"
                html_content += f"<strong>Type:</strong> {type(row).__name__}<br>"
                html_content += "<strong>Pros:</strong> Familiar, returns Series<br>"
                html_content += "<strong>Cons:</strong> Slow, type changes"
                html_content += "</div>"

                # itertuples column
                html_content += "<div style='flex: 1; background-color: #e8f5e8; padding: 8px; border-radius: 4px;'>"
                html_content += "<h5>itertuples() - Faster</h5>"
                html_content += "<code>for row in df.itertuples():<br>&nbsp;&nbsp;&nbsp;&nbsp;# Returns namedtuple<br>&nbsp;&nbsp;&nbsp;&nbsp;print(type(row))</code><br><br>"

                start_time = time.time()
                for row in df_subset.itertuples():
                    pass  # Just iterate
                itertuples_time = time.time() - start_time

                html_content += f"<strong>Time:</strong> {itertuples_time:.6f}s<br>"
                html_content += f"<strong>Type:</strong> {type(row).__name__}<br>"
                html_content += "<strong>Pros:</strong> Fast, preserves types<br>"
                html_content += "<strong>Cons:</strong> Less familiar syntax"
                html_content += "</div>"

                html_content += "</div>"

                if iterrows_time > 0:
                    speedup = iterrows_time / itertuples_time if itertuples_time > 0 else float('inf')
                    html_content += f"<p><strong>⚡ Speed improvement:</strong> ~{speedup:.1f}x faster with itertuples()</p>"

            # Show current settings
            html_content += f"<p><strong>Settings:</strong> index={inc_index}, named={named}</p>"
            html_content += f"<p><strong>DataFrame shape:</strong> {df_subset.shape}</p>"
            html_content += "</div></div>"

            display(HTML(html_content))

    # Link widgets to update function
    max_rows_slider.observe(update_display, 'value')
    include_index.observe(update_display, 'value')
    named_tuple.observe(update_display, 'value')
    view_mode.observe(update_display, 'value')

    # Display the interface
    display(widgets.VBox([
        widgets.HTML("<h3>🟢 Example 2: itertuples() Explorer</h3>"),
        widgets.HTML("<p>Explore the faster itertuples() method and compare with iterrows():</p>"),
        widgets.HBox([max_rows_slider, include_index, named_tuple]),
        view_mode,
        output
    ]))

    # Initial update
    update_display()

# Run Example 2
create_itertuples_explorer()

VBox(children=(HTML(value='<h3>🟢 Example 2: itertuples() Explorer</h3>'), HTML(value='<p>Explore the faster it…

In [11]:
# Example 3: DataFrame.items() - Column-wise Iteration
def create_items_explorer():
    # Create widgets for this example
    operation_selector = widgets.ToggleButtons(
        options=['Basic Items', 'Column Stats', 'Data Analysis', 'Column Transformation'],
        value='Basic Items',
        description='Operation:'
    )

    show_values = widgets.IntSlider(
        value=3, min=1, max=10, step=1,
        description='Show values:', style={'description_width': 'initial'}
    )

    output = widgets.Output()

    def update_display(*args):
        with output:
            output.clear_output()

            # Use the global DataFrame from the generator
            if 'current_df' not in globals() or current_df is None:
                display(HTML("<p style='color: red;'>Please run the DataFrame generator first!</p>"))
                return

            operation = operation_selector.value
            num_values = show_values.value

            html_content = "<div style='font-family: monospace; font-size: 14px; color: #000;'>"
            html_content += "<div style='background-color: #f9f9f9; padding: 10px; border-left: 4px solid #ff9800; margin-bottom: 10px;'>"
            html_content += "<h4 style='color: #000;'>🟠 DataFrame.items() - Column-wise Iteration</h4>"

            if operation == 'Basic Items':
                html_content += "<p><strong>Code:</strong></p>"
                html_content += "<code>for column_name, column_data in df.items():<br>&nbsp;&nbsp;&nbsp;&nbsp;print(f'Column: {column_name}')<br>&nbsp;&nbsp;&nbsp;&nbsp;print(f'First {num_values} values: {{column_data.head({num_values}).tolist()}}')</code><br><br>"

                html_content += "<div style='background-color: #fff3e0; padding: 8px; border-radius: 4px;'>"
                html_content += "<strong>Output:</strong><br>"
                for column_name, column_data in current_df.items():
                    html_content += f"<strong>Column:</strong> {column_name}<br>"
                    values_list = column_data.head(num_values).tolist()
                    html_content += f"First {num_values} values: {values_list}<br>"
                    html_content += f"Data type: {column_data.dtype}<br><br>"
                html_content += "</div>"

            elif operation == 'Column Stats':
                html_content += "<p><strong>Code:</strong></p>"
                html_content += "<code>for col_name, col_data in df.items():<br>&nbsp;&nbsp;&nbsp;&nbsp;if col_data.dtype in ['int64', 'float64']:<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;print(f'{col_name}: mean={col_data.mean():.2f}')<br>&nbsp;&nbsp;&nbsp;&nbsp;else:<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;print(f'{col_name}: unique_count={col_data.nunique()}')</code><br><br>"

                html_content += "<div style='background-color: #fff3e0; padding: 8px; border-radius: 4px;'>"
                html_content += "<strong>Output:</strong><br>"
                for col_name, col_data in current_df.items():
                    if col_data.dtype in ['int64', 'float64']:
                        mean_val = col_data.mean()
                        html_content += f"<strong>{col_name}:</strong> mean={mean_val:.2f}, std={col_data.std():.2f}<br>"
                    else:
                        unique_count = col_data.nunique()
                        html_content += f"<strong>{col_name}:</strong> unique_count={unique_count}, type='{col_data.dtype}'<br>"
                html_content += "</div>"

            elif operation == 'Data Analysis':
                html_content += "<p><strong>Code:</strong></p>"
                html_content += "<code>analysis = {}<br>for col_name, col_data in df.items():<br>&nbsp;&nbsp;&nbsp;&nbsp;analysis[col_name] = {<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;'type': str(col_data.dtype),<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;'null_count': col_data.isnull().sum(),<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;'unique_values': col_data.nunique()<br>&nbsp;&nbsp;&nbsp;&nbsp;}</code><br><br>"

                html_content += "<div style='background-color: #fff3e0; padding: 8px; border-radius: 4px;'>"
                html_content += "<strong>Analysis Results:</strong><br>"
                analysis = {}
                for col_name, col_data in current_df.items():
                    analysis[col_name] = {
                        'type': str(col_data.dtype),
                        'null_count': col_data.isnull().sum(),
                        'unique_values': col_data.nunique(),
                        'memory_usage': col_data.memory_usage(deep=True)
                    }

                for col_name, stats in analysis.items():
                    html_content += f"<strong>{col_name}:</strong><br>"
                    html_content += f"&nbsp;&nbsp;Type: {stats['type']}<br>"
                    html_content += f"&nbsp;&nbsp;Nulls: {stats['null_count']}<br>"
                    html_content += f"&nbsp;&nbsp;Unique: {stats['unique_values']}<br>"
                    html_content += f"&nbsp;&nbsp;Memory: {stats['memory_usage']} bytes<br><br>"
                html_content += "</div>"

            elif operation == 'Column Transformation':
                html_content += "<p><strong>Code:</strong></p>"
                html_content += "<code>transformed = {}<br>for col_name, col_data in df.items():<br>&nbsp;&nbsp;&nbsp;&nbsp;if col_data.dtype == 'object':<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;transformed[f'{col_name}_length'] = col_data.str.len()<br>&nbsp;&nbsp;&nbsp;&nbsp;elif col_data.dtype in ['int64', 'float64']:<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;transformed[f'{col_name}_normalized'] = (col_data - col_data.mean()) / col_data.std()</code><br><br>"

                html_content += "<div style='background-color: #fff3e0; padding: 8px; border-radius: 4px;'>"
                html_content += "<strong>Transformation Results:</strong><br>"
                transformed = {}
                for col_name, col_data in current_df.items():
                    if col_data.dtype == 'object':
                        try:
                            transformed[f'{col_name}_length'] = col_data.str.len()
                            lengths = transformed[f'{col_name}_length'].head(num_values).tolist()
                            html_content += f"<strong>{col_name}_length:</strong> {lengths}<br>"
                        except:
                            html_content += f"<strong>{col_name}:</strong> Cannot calculate length<br>"
                    elif col_data.dtype in ['int64', 'float64']:
                        transformed[f'{col_name}_normalized'] = (col_data - col_data.mean()) / col_data.std()
                        normalized = transformed[f'{col_name}_normalized'].head(num_values).round(2).tolist()
                        html_content += f"<strong>{col_name}_normalized:</strong> {normalized}<br>"
                html_content += "</div>"

            # Show iteration info
            html_content += f"<p><strong>Total columns:</strong> {len(current_df.columns)}</p>"
            html_content += f"<p><strong>Column names:</strong> {list(current_df.columns)}</p>"
            html_content += "</div></div>"

            display(HTML(html_content))

    # Link widgets to update function
    operation_selector.observe(update_display, 'value')
    show_values.observe(update_display, 'value')

    # Display the interface
    display(widgets.VBox([
        widgets.HTML("<h3>🟠 Example 3: items() Explorer - Column-wise Iteration</h3>"),
        widgets.HTML("<p>Explore how to iterate over DataFrame columns using items():</p>"),
        operation_selector,
        show_values,
        output
    ]))

    # Initial update
    update_display()

# Run Example 3
create_items_explorer()

VBox(children=(HTML(value='<h3>🟠 Example 3: items() Explorer - Column-wise Iteration</h3>'), HTML(value='<p>Ex…

In [12]:
# Example 4: Performance Comparison
def create_performance_comparison():
    # Create widgets for this example
    dataset_size = widgets.IntSlider(
        value=1000, min=100, max=5000, step=100,
        description='Dataset size:', style={'description_width': 'initial'}
    )

    operation_type = widgets.ToggleButtons(
        options=['Simple Sum', 'Conditional Logic', 'String Operations'],
        value='Simple Sum',
        description='Operation:'
    )

    run_benchmark = widgets.Button(
        description='🚀 Run Benchmark',
        button_style='primary'
    )

    output = widgets.Output()

    def create_test_dataframe(size):
        """Create a test DataFrame for benchmarking"""
        np.random.seed(42)
        return pd.DataFrame({
            'A': np.random.randint(1, 100, size),
            'B': np.random.randint(1, 100, size),
            'C': np.random.choice(['cat', 'dog', 'bird', 'fish'], size),
            'D': np.random.uniform(0, 1, size)
        })

    def benchmark_methods(*args):
        with output:
            output.clear_output()

            size = dataset_size.value
            operation = operation_type.value
            test_df = create_test_dataframe(size)

            html_content = "<div style='font-family: monospace; font-size: 14px; color: #000;'>"
            html_content += "<div style='background-color: #f9f9f9; padding: 10px; border-left: 4px solid #9c27b0; margin-bottom: 10px;'>"
            html_content += "<h4 style='color: #000;'>🟣 Performance Benchmark Results</h4>"
            html_content += f"<p><strong>Dataset size:</strong> {size:,} rows</p>"
            html_content += f"<p><strong>Operation:</strong> {operation}</p>"

            results = {}

            if operation == 'Simple Sum':
                # iterrows approach
                start_time = time.time()
                sum_iterrows = 0
                for index, row in test_df.iterrows():
                    sum_iterrows += row['A'] + row['B']
                results['iterrows'] = time.time() - start_time

                # itertuples approach
                start_time = time.time()
                sum_itertuples = 0
                for row in test_df.itertuples():
                    sum_itertuples += row.A + row.B
                results['itertuples'] = time.time() - start_time

                # Vectorized approach
                start_time = time.time()
                sum_vectorized = (test_df['A'] + test_df['B']).sum()
                results['vectorized'] = time.time() - start_time

                # apply approach
                start_time = time.time()
                sum_apply = test_df.apply(lambda row: row['A'] + row['B'], axis=1).sum()
                results['apply'] = time.time() - start_time

            elif operation == 'Conditional Logic':
                # iterrows approach
                start_time = time.time()
                count_iterrows = 0
                for index, row in test_df.iterrows():
                    if row['A'] > 50 and row['C'] == 'cat':
                        count_iterrows += 1
                results['iterrows'] = time.time() - start_time

                # itertuples approach
                start_time = time.time()
                count_itertuples = 0
                for row in test_df.itertuples():
                    if row.A > 50 and row.C == 'cat':
                        count_itertuples += 1
                results['itertuples'] = time.time() - start_time

                # Vectorized approach
                start_time = time.time()
                count_vectorized = len(test_df[(test_df['A'] > 50) & (test_df['C'] == 'cat')])
                results['vectorized'] = time.time() - start_time

                # apply approach
                start_time = time.time()
                count_apply = test_df.apply(lambda row: 1 if row['A'] > 50 and row['C'] == 'cat' else 0, axis=1).sum()
                results['apply'] = time.time() - start_time

            elif operation == 'String Operations':
                # iterrows approach
                start_time = time.time()
                lengths_iterrows = []
                for index, row in test_df.iterrows():
                    lengths_iterrows.append(len(row['C']))
                results['iterrows'] = time.time() - start_time

                # itertuples approach
                start_time = time.time()
                lengths_itertuples = []
                for row in test_df.itertuples():
                    lengths_itertuples.append(len(row.C))
                results['itertuples'] = time.time() - start_time

                # Vectorized approach
                start_time = time.time()
                lengths_vectorized = test_df['C'].str.len().tolist()
                results['vectorized'] = time.time() - start_time

                # apply approach
                start_time = time.time()
                lengths_apply = test_df['C'].apply(len).tolist()
                results['apply'] = time.time() - start_time

            # Sort results by performance
            sorted_results = sorted(results.items(), key=lambda x: x[1])

            # Create visualization
            html_content += "<div style='background-color: #f3e5f5; padding: 15px; border-radius: 4px;'>"
            html_content += "<h5>⏱️ Execution Times (seconds):</h5>"

            colors = {'vectorized': '#4caf50', 'itertuples': '#ff9800', 'apply': '#2196f3', 'iterrows': '#f44336'}

            for i, (method, time_taken) in enumerate(sorted_results):
                color = colors.get(method, '#666')
                bar_width = int((time_taken / sorted_results[-1][1]) * 200) if sorted_results[-1][1] > 0 else 1

                html_content += f"<div style='margin: 10px 0;'>"
                html_content += f"<strong>{method}:</strong> {time_taken:.6f}s "
                if i == 0:
                    html_content += "🏆 (Fastest)"
                elif i == len(sorted_results) - 1:
                    html_content += "🐌 (Slowest)"
                html_content += "<br>"
                html_content += f"<div style='background-color: {color}; width: {bar_width}px; height: 20px; border-radius: 3px; margin: 5px 0;'></div>"
                html_content += "</div>"

            # Show speed improvements
            fastest_time = sorted_results[0][1]
            html_content += "<h5>🚀 Speed Improvements:</h5>"
            for method, time_taken in sorted_results[1:]:
                if fastest_time > 0:
                    speedup = time_taken / fastest_time
                    html_content += f"<strong>{sorted_results[0][0]} vs {method}:</strong> {speedup:.1f}x faster<br>"

            html_content += "</div>"

            # Best practices
            html_content += "<div style='background-color: #e1f5fe; padding: 10px; margin: 15px 0; border-radius: 4px;'>"
            html_content += "<h5>💡 Key Takeaways:</h5>"
            html_content += "<ul>"
            html_content += "<li><strong>Vectorized operations:</strong> Almost always fastest for pandas operations</li>"
            html_content += "<li><strong>itertuples():</strong> Much faster than iterrows() when row iteration is needed</li>"
            html_content += "<li><strong>apply():</strong> Good middle ground for complex operations</li>"
            html_content += "<li><strong>iterrows():</strong> Avoid unless absolutely necessary</li>"
            html_content += "</ul>"
            html_content += "</div>"

            html_content += "</div></div>"

            display(HTML(html_content))

    # Link button to benchmark function
    run_benchmark.on_click(benchmark_methods)

    # Display the interface
    display(widgets.VBox([
        widgets.HTML("<h3>🟣 Example 4: Performance Comparison</h3>"),
        widgets.HTML("<p>Compare the performance of different iteration methods:</p>"),
        widgets.HBox([dataset_size, operation_type]),
        run_benchmark,
        output
    ]))

# Run Example 4
create_performance_comparison()

VBox(children=(HTML(value='<h3>🟣 Example 4: Performance Comparison</h3>'), HTML(value='<p>Compare the performa…

In [13]:
# Example 5: Vectorization vs Loops - Refactoring Examples
def create_vectorization_examples():
    # Create widgets for this example
    example_selector = widgets.ToggleButtons(
        options=['Calculate BMI', 'Grade Assignment', 'Price Categories', 'Text Processing'],
        value='Calculate BMI',
        description='Example:'
    )

    output = widgets.Output()

    def update_display(*args):
        with output:
            output.clear_output()

            example = example_selector.value

            html_content = "<div style='font-family: monospace; font-size: 14px; color: #000;'>"
            html_content += "<div style='background-color: #f9f9f9; padding: 10px; border-left: 4px solid #607d8b; margin-bottom: 10px;'>"
            html_content += "<h4 style='color: #000;'>⚫ Vectorization vs Loops - Refactoring Examples</h4>"

            if example == 'Calculate BMI':
                # Create sample data
                sample_data = pd.DataFrame({
                    'Name': ['Alice', 'Bob', 'Charlie', 'Diana'],
                    'Weight_kg': [65, 80, 75, 58],
                    'Height_m': [1.65, 1.80, 1.75, 1.62]
                })

                html_content += "<h5>📊 Sample Data:</h5>"
                html_content += sample_data.to_html(classes='dataframe', index=False).replace('<table', '<table style="font-size: 12px; border-collapse: collapse;"').replace('<th>', '<th style="background-color: #f0f0f0; padding: 5px; border: 1px solid #ddd;"').replace('<td>', '<td style="padding: 5px; border: 1px solid #ddd;"')

                html_content += "<div style='display: flex; gap: 10px; margin: 15px 0;'>"

                # Loop approach
                html_content += "<div style='flex: 1; background-color: #ffebee; padding: 10px; border-radius: 4px;'>"
                html_content += "<h5>❌ Loop Approach (Slow)</h5>"
                html_content += "<code style='font-size: 11px;'># Using iterrows<br>bmi_list = []<br>for index, row in df.iterrows():<br>&nbsp;&nbsp;&nbsp;&nbsp;bmi = row['Weight_kg'] / (row['Height_m'] ** 2)<br>&nbsp;&nbsp;&nbsp;&nbsp;bmi_list.append(bmi)<br>df['BMI'] = bmi_list</code><br><br>"

                # Calculate using loop simulation
                bmi_loop = []
                for index, row in sample_data.iterrows():
                    bmi = row['Weight_kg'] / (row['Height_m'] ** 2)
                    bmi_loop.append(round(bmi, 1))

                html_content += f"<strong>Result:</strong> {bmi_loop}"
                html_content += "</div>"

                # Vectorized approach
                html_content += "<div style='flex: 1; background-color: #e8f5e8; padding: 10px; border-radius: 4px;'>"
                html_content += "<h5>✅ Vectorized Approach (Fast)</h5>"
                html_content += "<code style='font-size: 11px;'># Using vectorized operations<br>df['BMI'] = df['Weight_kg'] / (df['Height_m'] ** 2)</code><br><br>"

                # Calculate using vectorization
                bmi_vectorized = (sample_data['Weight_kg'] / (sample_data['Height_m'] ** 2)).round(1).tolist()

                html_content += f"<strong>Result:</strong> {bmi_vectorized}<br>"
                html_content += "<strong>Benefits:</strong> Faster, cleaner, more readable"
                html_content += "</div>"

                html_content += "</div>"

            elif example == 'Grade Assignment':
                # Create sample data
                sample_data = pd.DataFrame({
                    'Student': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
                    'Score': [95, 87, 76, 92, 68]
                })

                html_content += "<h5>📊 Sample Data:</h5>"
                html_content += sample_data.to_html(classes='dataframe', index=False).replace('<table', '<table style="font-size: 12px; border-collapse: collapse;"').replace('<th>', '<th style="background-color: #f0f0f0; padding: 5px; border: 1px solid #ddd;"').replace('<td>', '<td style="padding: 5px; border: 1px solid #ddd;"')

                html_content += "<div style='display: flex; gap: 10px; margin: 15px 0;'>"

                # Loop approach
                html_content += "<div style='flex: 1; background-color: #ffebee; padding: 10px; border-radius: 4px;'>"
                html_content += "<h5>❌ Loop Approach</h5>"
                html_content += "<code style='font-size: 11px;'>grades = []<br>for index, row in df.iterrows():<br>&nbsp;&nbsp;&nbsp;&nbsp;score = row['Score']<br>&nbsp;&nbsp;&nbsp;&nbsp;if score >= 90:<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;grade = 'A'<br>&nbsp;&nbsp;&nbsp;&nbsp;elif score >= 80:<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;grade = 'B'<br>&nbsp;&nbsp;&nbsp;&nbsp;elif score >= 70:<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;grade = 'C'<br>&nbsp;&nbsp;&nbsp;&nbsp;else:<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;grade = 'D'<br>&nbsp;&nbsp;&nbsp;&nbsp;grades.append(grade)<br>df['Grade'] = grades</code><br><br>"

                # Calculate using loop simulation
                grades_loop = []
                for index, row in sample_data.iterrows():
                    score = row['Score']
                    if score >= 90:
                        grade = 'A'
                    elif score >= 80:
                        grade = 'B'
                    elif score >= 70:
                        grade = 'C'
                    else:
                        grade = 'D'
                    grades_loop.append(grade)

                html_content += f"<strong>Result:</strong> {grades_loop}"
                html_content += "</div>"

                # Vectorized approach
                html_content += "<div style='flex: 1; background-color: #e8f5e8; padding: 10px; border-radius: 4px;'>"
                html_content += "<h5>✅ Vectorized Approach</h5>"
                html_content += "<code style='font-size: 11px;'># Using pd.cut or np.where<br>df['Grade'] = pd.cut(df['Score'], <br>&nbsp;&nbsp;&nbsp;&nbsp;bins=[0, 70, 80, 90, 100], <br>&nbsp;&nbsp;&nbsp;&nbsp;labels=['D', 'C', 'B', 'A'],<br>&nbsp;&nbsp;&nbsp;&nbsp;include_lowest=True)</code><br><br>"

                # Calculate using vectorization
                grades_vectorized = pd.cut(sample_data['Score'],
                                         bins=[0, 70, 80, 90, 100],
                                         labels=['D', 'C', 'B', 'A'],
                                         include_lowest=True).tolist()

                html_content += f"<strong>Result:</strong> {grades_vectorized}<br>"
                html_content += "<strong>Alternative:</strong> np.where() for simpler conditions"
                html_content += "</div>"

                html_content += "</div>"

            elif example == 'Price Categories':
                # Create sample data
                sample_data = pd.DataFrame({
                    'Product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Headphones'],
                    'Price': [1200, 25, 80, 300, 150]
                })

                html_content += "<h5>📊 Sample Data:</h5>"
                html_content += sample_data.to_html(classes='dataframe', index=False).replace('<table', '<table style="font-size: 12px; border-collapse: collapse;"').replace('<th>', '<th style="background-color: #f0f0f0; padding: 5px; border: 1px solid #ddd;"').replace('<td>', '<td style="padding: 5px; border: 1px solid #ddd;"')

                html_content += "<div style='display: flex; gap: 10px; margin: 15px 0;'>"

                # Loop approach
                html_content += "<div style='flex: 1; background-color: #ffebee; padding: 10px; border-radius: 4px;'>"
                html_content += "<h5>❌ Loop Approach</h5>"
                html_content += "<code style='font-size: 11px;'>categories = []<br>for index, row in df.iterrows():<br>&nbsp;&nbsp;&nbsp;&nbsp;price = row['Price']<br>&nbsp;&nbsp;&nbsp;&nbsp;if price < 50:<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;category = 'Budget'<br>&nbsp;&nbsp;&nbsp;&nbsp;elif price < 200:<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;category = 'Mid-range'<br>&nbsp;&nbsp;&nbsp;&nbsp;else:<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;category = 'Premium'<br>&nbsp;&nbsp;&nbsp;&nbsp;categories.append(category)</code><br><br>"

                # Calculate using loop simulation
                categories_loop = []
                for index, row in sample_data.iterrows():
                    price = row['Price']
                    if price < 50:
                        category = 'Budget'
                    elif price < 200:
                        category = 'Mid-range'
                    else:
                        category = 'Premium'
                    categories_loop.append(category)

                html_content += f"<strong>Result:</strong> {categories_loop}"
                html_content += "</div>"

                # Vectorized approach
                html_content += "<div style='flex: 1; background-color: #e8f5e8; padding: 10px; border-radius: 4px;'>"
                html_content += "<h5>✅ Vectorized Approach</h5>"
                html_content += "<code style='font-size: 11px;'># Using np.where (nested)<br>df['Category'] = np.where(df['Price'] < 50, 'Budget',<br>&nbsp;&nbsp;&nbsp;&nbsp;np.where(df['Price'] < 200, 'Mid-range', 'Premium'))</code><br><br>"

                # Calculate using vectorization
                categories_vectorized = np.where(sample_data['Price'] < 50, 'Budget',
                                                np.where(sample_data['Price'] < 200, 'Mid-range', 'Premium')).tolist()

                html_content += f"<strong>Result:</strong> {categories_vectorized}<br>"
                html_content += "<strong>Alternative:</strong> pd.cut() for more complex binning"
                html_content += "</div>"

                html_content += "</div>"

            elif example == 'Text Processing':
                # Create sample data
                sample_data = pd.DataFrame({
                    'Name': ['John Doe', 'jane smith', 'BOB JOHNSON', 'Alice Brown'],
                    'Email': ['john@email.com', 'JANE@GMAIL.COM', 'bob@work.org', 'alice@test.net']
                })

                html_content += "<h5>📊 Sample Data:</h5>"
                html_content += sample_data.to_html(classes='dataframe', index=False).replace('<table', '<table style="font-size: 12px; border-collapse: collapse;"').replace('<th>', '<th style="background-color: #f0f0f0; padding: 5px; border: 1px solid #ddd;"').replace('<td>', '<td style="padding: 5px; border: 1px solid #ddd;"')

                html_content += "<div style='display: flex; gap: 10px; margin: 15px 0;'>"

                # Loop approach
                html_content += "<div style='flex: 1; background-color: #ffebee; padding: 10px; border-radius: 4px;'>"
                html_content += "<h5>❌ Loop Approach</h5>"
                html_content += "<code style='font-size: 11px;'>formatted_names = []<br>domains = []<br>for index, row in df.iterrows():<br>&nbsp;&nbsp;&nbsp;&nbsp;# Format name<br>&nbsp;&nbsp;&nbsp;&nbsp;name = row['Name'].title()<br>&nbsp;&nbsp;&nbsp;&nbsp;formatted_names.append(name)<br>&nbsp;&nbsp;&nbsp;&nbsp;# Extract domain<br>&nbsp;&nbsp;&nbsp;&nbsp;email = row['Email'].lower()<br>&nbsp;&nbsp;&nbsp;&nbsp;domain = email.split('@')[1]<br>&nbsp;&nbsp;&nbsp;&nbsp;domains.append(domain)</code><br><br>"

                # Calculate using loop simulation
                formatted_names_loop = []
                domains_loop = []
                for index, row in sample_data.iterrows():
                    name = row['Name'].title()
                    formatted_names_loop.append(name)
                    email = row['Email'].lower()
                    domain = email.split('@')[1]
                    domains_loop.append(domain)

                html_content += f"<strong>Names:</strong> {formatted_names_loop}<br>"
                html_content += f"<strong>Domains:</strong> {domains_loop}"
                html_content += "</div>"

                # Vectorized approach
                html_content += "<div style='flex: 1; background-color: #e8f5e8; padding: 10px; border-radius: 4px;'>"
                html_content += "<h5>✅ Vectorized Approach</h5>"
                html_content += "<code style='font-size: 11px;'># Using string methods<br>df['Formatted_Name'] = df['Name'].str.title()<br>df['Domain'] = df['Email'].str.lower().str.split('@').str[1]</code><br><br>"

                # Calculate using vectorization
                formatted_names_vec = sample_data['Name'].str.title().tolist()
                domains_vec = sample_data['Email'].str.lower().str.split('@').str[1].tolist()

                html_content += f"<strong>Names:</strong> {formatted_names_vec}<br>"
                html_content += f"<strong>Domains:</strong> {domains_vec}<br>"
                html_content += "<strong>Benefits:</strong> Built-in string methods are optimized"
                html_content += "</div>"

                html_content += "</div>"

            # General tips
            html_content += "<div style='background-color: #e3f2fd; padding: 15px; margin: 15px 0; border-radius: 4px;'>"
            html_content += "<h5>🎯 Vectorization Tips:</h5>"
            html_content += "<ul>"
            html_content += "<li><strong>Mathematical operations:</strong> Use arithmetic operators directly on Series/DataFrame</li>"
            html_content += "<li><strong>Conditional logic:</strong> Use np.where(), pd.cut(), or boolean indexing</li>"
            html_content += "<li><strong>String operations:</strong> Use .str accessor methods</li>"
            html_content += "<li><strong>Aggregations:</strong> Use .sum(), .mean(), .groupby() instead of loops</li>"
            html_content += "<li><strong>Element-wise functions:</strong> Use .apply() only when vectorization isn't possible</li>"
            html_content += "</ul>"
            html_content += "</div>"

            html_content += "</div></div>"

            display(HTML(html_content))

    # Link widget to update function
    example_selector.observe(update_display, 'value')

    # Display the interface
    display(widgets.VBox([
        widgets.HTML("<h3>⚫ Example 5: Vectorization vs Loops</h3>"),
        widgets.HTML("<p>Learn how to refactor loop-based code into fast vectorized operations:</p>"),
        example_selector,
        output
    ]))

    # Initial update
    update_display()

# Run Example 5
create_vectorization_examples()

VBox(children=(HTML(value='<h3>⚫ Example 5: Vectorization vs Loops</h3>'), HTML(value='<p>Learn how to refacto…

## 🎯 Summary and Best Practices

### When to Use Each Method

| Method | Use Case | Performance | Best For |
|--------|----------|-------------|----------|
| **Vectorized Operations** | Mathematical, logical, string operations | ⚡ Fastest | Most pandas operations |
| **`.apply()`** | Complex functions that can't be vectorized | 🔥 Fast | Row/column-wise custom functions |
| **`.itertuples()`** | When row iteration is absolutely necessary | 🚀 Moderate | Accessing row data as attributes |
| **`.iterrows()`** | Legacy code, debugging | 🐌 Slow | Avoid in production |
| **`.items()`** | Column-wise operations, metadata analysis | ⚡ Fast | Column processing, data profiling |

### 📋 Decision Flowchart

1. **Can you vectorize?** → Use vectorized operations
2. **Need custom function?** → Use `.apply()`
3. **Must iterate rows?** → Use `.itertuples()`
4. **Working with columns?** → Use `.items()`
5. **Debugging/prototyping?** → `.iterrows()` is acceptable

### ⚠️ Common Pitfalls to Avoid

- **Type changes with iterrows()**: Data types may not be preserved
- **Performance assumptions**: Always benchmark with realistic data sizes
- **Unnecessary iteration**: Many operations that seem to need loops can be vectorized
- **Memory usage**: Large DataFrames + iteration = potential memory issues