# Chapter 2: Building an Interactive Job Dashboard

## Learning Objectives

By the end of this chapter, you will:
- Build a complete interactive dashboard using real job data
- Create dynamic visualizations that respond to user input
- Implement data filtering and exploration features
- Generate PDF reports from dashboard data
- Apply dashboard design best practices

## Introduction to Dashboard Building

> **Instructor Cue:** Start by asking: "What makes a good dashboard? What would you want to see if you were analyzing job market data?" Emphasize that dashboards are powerful communication tools that make data accessible to non-technical users.

A dashboard transforms raw data into actionable insights through interactive visualizations. Today we'll build a job market dashboard that allows users to:

- Explore job opportunities by location and role
- Filter data based on salary ranges and companies
- Visualize trends and patterns
- Generate custom reports

The goal is to create something that HR professionals, job seekers, or recruiters could actually use in their daily work.

## Setting Up Our Dashboard

Let's start by creating a new Streamlit app specifically for our job dashboard. We will build a new file called `job_dashboard_app.py` in the `03_module/apps` directory.

> **Instructor Cue:** Have everyone create this new file. Explain that we're creating a separate file to keep our dashboard focused and organized.

In [None]:
%%writefile app/job_dashboard_app.py

# <START> Streamlit setup
from datetime import datetime

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import streamlit as st

# Configure the page
st.set_page_config(
    page_title="Job Market Dashboard",
    page_icon="💼",
    layout="wide",
    initial_sidebar_state="expanded"
)

st.title("💼 Job Market Dashboard")
st.markdown("Explore job opportunities and market trends with interactive visualizations")
# <END> Streamlit setup

> **Instructor Cue:** Explain the `st.set_page_config()` function and its parameters. The `layout="wide"` parameter is particularly important for dashboards as it uses the full browser width.

## Loading and Preparing the Data

Now let's load our job data and add some basic data exploration features:

In [None]:
%%writefile -a app/job_dashboard_app.py

# <START> Data loading and caching
@st.cache_data
def load_job_data():
    """Load and prepare job data for dashboard"""
    try:
        df = pd.read_csv('data/indeed_jobs_combined.csv')

        # Clean salary data for analysis
        df['salary_clean'] = df['salary'].str.replace(r'[\$,]', '', regex=True)
        df['salary_clean'] = df['salary_clean'].str.extract(r'(\d+)', expand=False)
        df['salary_numeric'] = pd.to_numeric(df['salary_clean'], errors='coerce')

        # Extract state from location for better grouping
        df['state'] = df['location'].str.extract(r'([A-Z]{2})$')

        # Convert scraped_at to datetime
        df['scraped_at'] = pd.to_datetime(df['scraped_at'])

        return df
    except FileNotFoundError:
        st.error("Job data file not found. Please ensure 'data/indeed_jobs_combined.csv' exists.")
        return None

# Load the data
df = load_job_data()

if df is not None:
    st.success(f"✅ Loaded {len(df)} job listings successfully!")
else:
    st.stop()
# <END> Data loading and caching

> **Instructor Cue:** Explain the `@st.cache_data` decorator - this is crucial for performance. Without caching, the CSV would be reloaded every time someone interacts with the dashboard. Also point out the error handling for missing files.

## Building the Sidebar Controls

The sidebar will contain all our interactive controls. This keeps the main area clean for visualizations:

In [None]:
%%writefile -a app/job_dashboard_app.py

# <START> Sidebar for filters and controls
st.sidebar.header("🔍 Dashboard Controls")

# Location filter
available_states = sorted(df['state'].dropna().unique())
selected_states = st.sidebar.multiselect(
    "Select States:",
    options=available_states,
    default=available_states[:3],  # Default to first 3 states
    help="Choose which states to include in the analysis"
)

# Occupation filter
available_occupations = sorted(df['target_occupation'].unique())
selected_occupation = st.sidebar.selectbox(
    "Select Occupation:",
    options=['All'] + available_occupations,
    help="Filter by specific job category"
)

# Salary range filter
min_salary = int(df['salary_numeric'].min()) if df['salary_numeric'].notna().any() else 0
max_salary = int(df['salary_numeric'].max()) if df['salary_numeric'].notna().any() else 200000

salary_range = st.sidebar.slider(
    "Salary Range ($):",
    min_value=min_salary,
    max_value=max_salary,
    value=(min_salary, max_salary),
    step=5000,
    format="$%d",
    help="Filter jobs by salary range"
)

# Company filter
top_companies = df['company_name'].value_counts().head(10).index.tolist()
selected_companies = st.sidebar.multiselect(
    "Select Companies:",
    options=['All'] + top_companies,
    default=['All'],
    help="Filter by specific companies"
)
# <END> Sidebar for filters and controls

> **Instructor Cue:** Walk through each filter type and explain why each one is useful. Demonstrate how the `help` parameter provides context. The `multiselect` vs `selectbox` choice is important for user experience.

## Implementing Data Filtering Logic

Now we need to apply the filters to our data:

In [None]:
%%writefile -a app/job_dashboard_app.py

# <START> Apply filters to the data
def filter_data(df, states, occupation, salary_range, companies):
    """Apply user-selected filters to the dataframe"""
    filtered_df = df.copy()

    # Filter by state
    if states:
        filtered_df = filtered_df[filtered_df['state'].isin(states)]

    # Filter by occupation
    if occupation != 'All':
        filtered_df = filtered_df[filtered_df['target_occupation'] == occupation]

    # Filter by salary range
    filtered_df = filtered_df[
        (filtered_df['salary_numeric'] >= salary_range[0]) &
        (filtered_df['salary_numeric'] <= salary_range[1])
    ]

    # Filter by company
    if 'All' not in selected_companies and selected_companies:
        filtered_df = filtered_df[filtered_df['company_name'].isin(selected_companies)]

    return filtered_df

# Get filtered data
filtered_df = filter_data(df, selected_states, selected_occupation, salary_range, selected_companies)

# Display summary metrics
col1, col2, col3, col4 = st.columns(4)

with col1:
    st.metric("Total Jobs", len(filtered_df))

with col2:
    st.metric("Unique Companies", filtered_df['company_name'].nunique())

with col3:
    avg_salary = filtered_df['salary_numeric'].mean()
    st.metric("Avg Salary", f"${avg_salary:,.0f}" if not pd.isna(avg_salary) else "N/A")

with col4:
    st.metric("States Covered", filtered_df['state'].nunique())

# <END> Apply filters to the data

> **Instructor Cue:** Explain how the filtering function works step by step. The metrics row provides immediate feedback about how filters affect the data. Point out how `st.columns()` creates a professional dashboard layout.

## Creating Interactive Visualizations

Now for the exciting part - creating visualizations that respond to our filters:

In [None]:
%%writefile -a app/job_dashboard_app.py

# <START> Main dashboard content
if len(filtered_df) > 0:

    # Jobs by Location Chart
    st.subheader("📍 Jobs by Location")

    location_counts = filtered_df.groupby(['state', 'location']).size().reset_index(name='job_count')
    location_summary = filtered_df['state'].value_counts().head(10)

    if not location_summary.empty:
        fig, ax = plt.subplots(figsize=(10, 6))
        sns.barplot(data=location_summary.reset_index(), x='count', y='state', hue='state', ax=ax, palette='viridis', legend=False)
        ax.set_title('Number of Jobs by State')
        ax.set_xlabel('Number of Jobs')
        ax.set_ylabel('State')

        # Add value labels on bars
        for i, v in enumerate(location_summary.values):
            ax.text(v + 0.1, i, str(v), va='center')

        st.pyplot(fig)
        plt.clf()  # Clear the figure to prevent memory issues

    # Salary Distribution
    st.subheader("💰 Salary Distribution")

    salary_data = filtered_df.dropna(subset=['salary_numeric'])

    if not salary_data.empty:
        col1, col2 = st.columns(2)

        with col1:
            fig, ax = plt.subplots(figsize=(8, 5))
            sns.histplot(data=salary_data, x='salary_numeric', bins=20, ax=ax)
            ax.set_title('Salary Distribution')
            ax.set_xlabel('Salary ($)')
            ax.set_ylabel('Number of Jobs')
            ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x/1000:.0f}K'))
            st.pyplot(fig)
            plt.clf()

        with col2:
            # Salary by occupation
            if len(salary_data['target_occupation'].unique()) > 1:
                fig, ax = plt.subplots(figsize=(8, 5))
                sns.boxplot(data=salary_data, y='target_occupation', x='salary_numeric', ax=ax)

                ax.set_title('Salary by Occupation')
                ax.set_xlabel('Salary ($)')
                ax.set_ylabel('Occupation')
                ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x/1000:.0f}K'))

                st.pyplot(fig)
                plt.clf()

    # Top Companies
    st.subheader("🏢 Top Hiring Companies")

    company_counts = filtered_df['company_name'].value_counts().head(10)

    if not company_counts.empty:
        fig, ax = plt.subplots(figsize=(10, 6))
        sns.barplot(x=company_counts.values, y=company_counts.index, hue=company_counts.index, ax=ax, palette='Set2', legend=False)

        ax.set_title('Top 10 Companies by Job Postings')
        ax.set_xlabel('Number of Job Postings')
        ax.set_ylabel('Company')

        # Add value labels
        # for i, v in enumerate(company_counts.values):
        #     ax.text(v + 0.1, i, str(v), va='center')

        st.pyplot(fig)
        plt.clf()

else:
    st.warning("No jobs found matching your criteria. Please adjust your filters.")

# <END> Main dashboard content

> **Instructor Cue:** Walk through each visualization and explain why it's valuable. Point out how `plt.clf()` prevents memory leaks in Streamlit. Explain the formatter for salary axes to make them more readable.

## Adding Data Table Display

Users often want to see the raw data behind visualizations:

In [None]:
%%writefile -a app/job_dashboard_app.py

# <START> Detailed Job Listings
st.subheader("📋 Detailed Job Listings")

# Select columns to display
display_columns = ['job_title', 'company_name', 'location', 'salary', 'target_occupation']

# Add search functionality
search_term = st.text_input("🔍 Search job titles or companies:", placeholder="e.g., Python, Google, Engineer")

if search_term:
    search_mask = (
        filtered_df['job_title'].str.contains(search_term, case=False, na=False) |
        filtered_df['company_name'].str.contains(search_term, case=False, na=False) |
        filtered_df['job_description'].str.contains(search_term, case=False, na=False)
    )
    search_results = filtered_df[search_mask]
    st.write(f"Found {len(search_results)} jobs matching '{search_term}'")
    st.dataframe(search_results[display_columns], use_container_width=True)
else:
    # Show all filtered data
    st.dataframe(filtered_df[display_columns], use_container_width=True)

# Add download button for filtered data
csv_data = filtered_df.to_csv(index=False)
st.download_button(
    label="📥 Download Filtered Data as CSV",
    data=csv_data,
    file_name=f"job_data_filtered_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv",
    mime="text/csv"
)
# <END> Detailed Job Listings

> **Instructor Cue:** Demonstrate the search functionality and explain how it uses pandas string operations. The download button is a great example of Streamlit's built-in functionality for data export.

## PDF Report Generation

Now let's add the ability to generate PDF reports. First, we need to install the fpdf2 library:

> **Instructor Cue:** Have everyone install fpdf2 if they haven't already: `pip install fpdf2`

In [None]:
%%writefile -a app/job_dashboard_app.py

# <START> PDF Report Generation
def create_job_report(filtered_df: pd.DataFrame, selected_states: list, selected_occupation: str):
    from fpdf import FPDF
    from fpdf.enums import XPos, YPos
    import tempfile

    """Generate a PDF report from the filtered job data"""

    class JobReportPDF(FPDF):
        def header(self):
            self.set_font('helvetica', 'B', 16)
            self.cell(0, 10, 'Job Market Analysis Report', 0, new_x=XPos.LMARGIN, new_y=YPos.NEXT, align='C')
            self.ln(10)

        def footer(self):
            self.set_y(-15)
            self.set_font('helvetica', 'I', 8)
            self.cell(0, 10, f'Generated on {datetime.now().strftime("%Y-%m-%d %H:%M:%S")}', 0, new_x=XPos.RIGHT, new_y=YPos.TOP, align='C')

    pdf: FPDF = JobReportPDF()
    FONT_NAME = 'Helvetica'

    pdf.add_page()

    # Report summary
    pdf.set_font(FONT_NAME, 'B', 14)
    pdf.cell(0, 10, 'Executive Summary', 0, new_x=XPos.LMARGIN, new_y=YPos.NEXT)
    pdf.ln(5)

    pdf.set_font(FONT_NAME, '', 12)

    # Create HTML summary content
    avg_salary = filtered_df['salary_numeric'].mean()
    avg_salary_str = f"${avg_salary:,.0f}" if not pd.isna(avg_salary) else "N/A"

    summary_html = f"""
    <h3>Job Market Analysis</h3>
    <p>This report analyzes {len(filtered_df)} job listings across {filtered_df['state'].nunique()} states.</p>

    <h4>Key Findings:</h4>
    <ul>
        <li>Total job postings: {len(filtered_df)}</li>
        <li>Unique companies: {filtered_df['company_name'].nunique()}</li>
        <li>Average salary: {avg_salary_str} (where available)</li>
        <li>Most common occupation: {filtered_df['target_occupation'].value_counts().index[0]}</li>
        <li>Top hiring state: {filtered_df['state'].value_counts().index[0]}</li>
    </ul>
    """

    # Write HTML content to PDF
    pdf.write_html(summary_html)

    pdf.ln(10)

    # Top companies section
    pdf.set_font(FONT_NAME, 'B', 14)

    company_counts = filtered_df['company_name'].value_counts().head(10)

    # Create HTML content for top companies
    companies_html = "<h4>Top Hiring Companies</h4><ul>"
    for company, count in company_counts.items():
        companies_html += f"<li><b>{company}</b>: {count} jobs</li>"
    companies_html += "</ul>"

    # Write HTML content to PDF
    pdf.write_html(companies_html)

    pdf.ln(10)

    # Salary insights
    if not filtered_df['salary_numeric'].isna().all():
        pdf.set_font(FONT_NAME, 'B', 14)
        pdf.cell(0, 10, 'Salary Insights', 0, new_x=XPos.LMARGIN, new_y=YPos.NEXT)
        pdf.ln(5)

        pdf.set_font(FONT_NAME, '', 10)
        salary_stats = filtered_df['salary_numeric'].describe()

        pdf.cell(0, 6, f"- Minimum salary: ${salary_stats['min']:,.0f}", 0, new_x=XPos.LMARGIN, new_y=YPos.NEXT)
        pdf.cell(0, 6, f"- Maximum salary: ${salary_stats['max']:,.0f}", 0, new_x=XPos.LMARGIN, new_y=YPos.NEXT)
        pdf.cell(0, 6, f"- Median salary: ${salary_stats['50%']:,.0f}", 0, new_x=XPos.LMARGIN, new_y=YPos.NEXT)
        pdf.cell(0, 6, f"- Average salary: ${salary_stats['mean']:,.0f}", 0, new_x=XPos.LMARGIN, new_y=YPos.NEXT)

    # Generate and add plots to PDF
    if len(filtered_df) > 0:  # Only add visualizations if we have data
        pdf.add_page()
        pdf.set_font(FONT_NAME, 'B', 14)
        pdf.cell(0, 10, 'Data Visualizations', 0, new_x=XPos.LMARGIN, new_y=YPos.NEXT)
        pdf.ln(5)

    # Location chart
    location_summary = filtered_df['state'].value_counts().head(10)
    if not location_summary.empty:
        fig, ax = plt.subplots(figsize=(10, 6))
        sns.barplot(data=location_summary.reset_index(), x='count', y='state', hue='state', ax=ax, palette='viridis', legend=False)
        ax.set_title('Number of Jobs by State')
        ax.set_xlabel('Number of Jobs')
        ax.set_ylabel('State')

        # Add value labels on bars
        for i, v in enumerate(location_summary.values):
            ax.text(v + 0.1, i, str(v), va='center')

        # Save the figure to a temporary file and add it to the PDF
        with tempfile.NamedTemporaryFile(delete=False, suffix=".png") as tmpfile:
            fig.savefig(tmpfile.name, bbox_inches='tight', dpi=300)
            pdf.image(tmpfile.name, x=10, y=None, w=180)
        plt.close(fig)

    pdf.ln(5)

    # Salary distribution
    salary_data = filtered_df.dropna(subset=['salary_numeric'])
    if not salary_data.empty:
        fig, ax = plt.subplots(figsize=(10, 6))
        sns.histplot(data=salary_data, x='salary_numeric', bins=20, ax=ax)
        ax.set_title('Salary Distribution')
        ax.set_xlabel('Salary ($)')
        ax.set_ylabel('Number of Jobs')
        ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x/1000:.0f}K'))

        # Save the figure to a temporary file and add it to the PDF
        with tempfile.NamedTemporaryFile(delete=False, suffix=".png") as tmpfile:
            fig.savefig(tmpfile.name, bbox_inches='tight', dpi=300)
            pdf.image(tmpfile.name, x=10, y=None, w=180)
        plt.close(fig)

    pdf.ln(5)

    # Top companies chart
    company_counts = filtered_df['company_name'].value_counts().head(10)
    if not company_counts.empty:
        fig, ax = plt.subplots(figsize=(10, 6))
        sns.barplot(x=company_counts.values, y=company_counts.index, hue=company_counts.index, ax=ax, palette='Set2', legend=False)
        ax.set_title('Top 10 Companies by Job Postings')
        ax.set_xlabel('Number of Job Postings')
        ax.set_ylabel('Company')

        # Save the figure to a temporary file and add it to the PDF
        with tempfile.NamedTemporaryFile(delete=False, suffix=".png") as tmpfile:
            fig.savefig(tmpfile.name, bbox_inches='tight', dpi=300)
            pdf.image(tmpfile.name, x=10, y=None, w=180)
        plt.close(fig)

    # Return PDF as bytes
    return bytes(pdf.output())
# <END> PDF Report Generation


In [None]:
%%writefile -a app/job_dashboard_app.py

# <START> PDF Report Download Section
st.subheader("📄 Generate PDF Report")
st.write("Generate a comprehensive PDF report based on your current filters and analysis.")

if st.button("📄 Generate Report", type="primary", use_container_width=True):
    if len(filtered_df) > 0:
        with st.spinner("Generating PDF report..."):
            pdf_bytes = create_job_report(filtered_df, selected_states, selected_occupation)

            st.download_button(
                label="📥 Download PDF Report",
                data=pdf_bytes,
                file_name=f"job_market_report_{datetime.now().strftime('%Y%m%d_%H%M%S')}.pdf",
                mime="application/pdf"
            )

            st.success("✅ Report generated successfully!")
    else:
        st.error("No data available for report generation. Please adjust your filters.")
# <END>

> **Instructor Cue:** Walk through the PDF generation code step by step. Explain how we create a custom PDF class with headers and footers. Point out how the report includes both summary statistics and detailed insights.

## Adding Help and Instructions

Good dashboards include help for users:

In [None]:
%%writefile -a app/job_dashboard_app.py

# <START> Help section in sidebar
st.sidebar.markdown("---")
st.sidebar.subheader("📖 How to Use")
st.sidebar.markdown("""
**Filters:**
- Use state selection to focus on specific regions
- Choose occupation types to analyze particular job categories
- Adjust salary range to find jobs in your budget
- Select companies to compare specific employers

**Visualizations:**
- Bar charts show job distribution across states
- Salary histograms reveal pay patterns
- Company rankings identify top hirers

**Actions:**
- Download filtered data as CSV
- Generate PDF reports for sharing
- Search within job listings
""")

# Footer
st.markdown("---")
st.markdown("""
<div style='text-align: center; color: #666;'>
<p>💼 Job Market Dashboard | Built with Streamlit | Data from Indeed API</p>
</div>
""", unsafe_allow_html=True)
# <END>

> **Instructor Cue:** Explain how help text makes the dashboard self-explanatory. The footer adds a professional touch.

## Running Your Complete Dashboard

Save your `job_dashboard_app.py` file and run it with:

In [None]:
# %%bash
# streamlit run job_dashboard.py

> **Instructor Cue:** Have everyone run their complete dashboard. Walk around and help troubleshoot any issues. Common problems include missing imports, file path issues, or data type errors.

## Dashboard Testing and Validation

Let's test our dashboard systematically:

> **Instructor Cue:** Guide the class through each test case to ensure everything works:

1. **Filter Testing:**
   - Try different state combinations
   - Change occupation filters
   - Adjust salary ranges
   - Test company selections

2. **Visualization Testing:**
   - Verify charts update with filters
   - Check that empty data shows appropriate messages
   - Ensure salary formatting is readable

3. **Export Testing:**
   - Download CSV data
   - Generate and download PDF report
   - Verify file contents

4. **Search Testing:**
   - Search for specific job titles
   - Try company names
   - Test partial matches

## Dashboard Enhancement Ideas

> **Instructor Cue:** These are bonus features for advanced students or homework:

1. **Time Series Analysis:**
   - Add date-based filtering
   - Show job posting trends over time

2. **Geographic Mapping:**
   - Use Streamlit's built-in mapping for location data

3. **Advanced Analytics:**
   - Salary regression analysis
   - Job requirement text analysis

4. **User Preferences:**
   - Save filter preferences in session state
   - Remember user settings between sessions

## Best Practices for Dashboard Design

> **Instructor Cue:** Share these principles for creating effective dashboards:

1. **User-First Design:**
   - Think about your audience's needs
   - Make common tasks easy
   - Provide clear guidance

2. **Performance Optimization:**
   - Use caching for expensive operations
   - Limit data processing in the main thread
   - Consider data sampling for large datasets

3. **Visual Hierarchy:**
   - Most important information first
   - Consistent color schemes
   - Clear section boundaries

4. **Error Handling:**
   - Graceful degradation when data is missing
   - Clear error messages
   - Fallback options

## Exercise: Customize Your Dashboard

> **Instructor Cue:** Give participants 15 minutes to enhance their dashboard:

Choose one or more enhancements:

1. **Add a new filter** (e.g., job posting date range)
2. **Create a new visualization** (e.g., salary by state)
3. **Improve the PDF report** (add more insights or better formatting)
4. **Enhance the user interface** (add icons, colors, or layout improvements)

## Troubleshooting Common Issues

> **Instructor Cue:** Keep this handy for addressing problems:

**Charts not updating:**
- Check filter logic in `filter_data()` function
- Verify data types match filter expectations

**PDF generation fails:**
- Ensure fpdf2 is installed
- Check for special characters in data
- Verify file paths are correct

**Performance issues:**
- Add `@st.cache_data` to expensive functions
- Consider data sampling for large datasets
- Use `plt.clf()` after each plot

**Layout problems:**
- Check column configurations
- Verify container widths
- Test on different screen sizes

## Key Takeaways

> **Instructor Cue:** Summarize the key learning points:

- **Dashboards transform data into insights** - We converted raw job data into an interactive analysis tool
- **User experience matters** - Filters, search, and clear navigation make dashboards usable
- **Caching improves performance** - `@st.cache_data` prevents unnecessary data reloading
- **Export capabilities add value** - CSV and PDF downloads make insights shareable
- **Error handling ensures reliability** - Good dashboards handle edge cases gracefully

## Real-World Applications

> **Instructor Cue:** Connect this to practical uses:

The techniques you've learned can be applied to:

- **Sales dashboards** - Track revenue, customers, and trends
- **Marketing analytics** - Monitor campaign performance and ROI
- **Operations monitoring** - Real-time system metrics and alerts
- **Financial reporting** - Budget tracking and expense analysis
- **Research presentations** - Interactive data exploration for stakeholders

## Next Steps

> **Instructor Cue:** Preview how this connects to the final module:

In our next module, we'll take dashboard building to the next level by integrating AI capabilities. We'll add:

- **Intelligent insights** - AI-generated summaries of dashboard data
- **Natural language queries** - Ask questions about your data in plain English
- **Automated recommendations** - AI suggestions based on data patterns
- **Dynamic report generation** - AI-written analysis and insights

The dashboard we built today will serve as the foundation for these advanced AI-powered features!

## Homework Challenge

> **Instructor Cue:** Optional assignment for motivated learners:

Create a dashboard for a different dataset:
1. Find a CSV dataset online (e.g., from Kaggle)
2. Apply the same dashboard patterns we learned
3. Add at least 3 interactive filters
4. Include 2-3 meaningful visualizations
5. Implement PDF report generation

Share your creation with the class for feedback and inspiration!