# 🩺 **Process WIP Data Function**

I have written this Python function to automate a 45-minute manual Excel task, reducing it to 1 minute for me and my coworker.  This Python function, `process_wip_data`, processes Work-In-Progress (WIP) PPMC data, cleanses it, and generates insightful reports for better decision-making. It handles various edge cases like missing files, invalid data, and ensures robust file handling for smooth execution. Here's what the function does:

---

## **📋 Key Features**
1. **File Validation**:  
   - Ensures the source file exists. If not, it gracefully exits with a helpful error message.
   - Checks for missing or invalid data, such as empty or corrupted `RequestDate` values.

2. **Data Cleaning**:  
   - Filters out rows based on Insurer Requirements:
     - Removes records from previous financial years (To show the current financial year report).
     - Excludes completed cases with predefined statuses (As we have to show the pending cases report only).
   - Classifies data into different categories for better understanding. (e.g., `Max Attempts`, `DND`, `Working On It`).

3. **Categorization**:  
   - Creates a `Status` column based on business rules.
   - Categorizes records into `Workable` and `Non_Workable` types for overall summarization.

4. **Reporting**:  
   - Prepares two key reports:
     1. **Aggregated Report**: Summarizes `Status` counts by `Type` (row-wise and column-wise).
     2. **Detailed Sheets**: Exports cleaned data, raw data, and aggregated results into a structured Excel file.

5. **Directory Handling**:  
   - Automatically creates the destination folder if it doesn’t exist.


In [20]:
def process_wip_data():
    """
    Process Work-in-Progress (WIP) data to filter and categorize entries,
    then generate a summary report and save results to an Excel file.

    This function:
    - Filters data based on financial year and completion status.
    - Categorizes rows into statuses and types.
    - Generates a grouped report with counts of statuses by type.
    - Handles missing files, empty data, and other unexpected issues.
    - Saves the raw, cleaned, and grouped data to an Excel file.

    Args:
        None (user inputs file paths interactively).

    Returns:
        None
    """
    import pandas as pd
    from datetime import datetime
    import os

    try:
        # Prompt user for file paths
        source = input("Enter the file path of the source data (CSV): ").strip()
        destination = input("Enter the file path for the output Excel file: ").strip()

        # Check if the source file exists
        if not os.path.isfile(source):
            print(f"Error: Source file '{source}' not found.")
            return

        # Load the raw data with selected columns
        selected_columns = [
            "RequestDate", "PatientName", "ApplicationId",
            "AppointmentStatus", "LastCallStatus",
            "NumberofAttempts", "DND"
        ]
        try:
            df_raw = pd.read_csv(source, usecols=selected_columns)
        except Exception as e:
            print(f"Error loading the source file: {e}")
            return

        # Check if the data is empty
        if df_raw.empty:
            print("Error: The source file contains no data.")
            return

        # Create a copy for processing
        df = df_raw.copy()

        # Convert the 'RequestDate' column to datetime
        df["RequestDate"] = pd.to_datetime(df["RequestDate"], format="%d/%m/%Y", errors="coerce")

        # Check for any missing or invalid dates
        if df["RequestDate"].isna().all():
            print("Error: All 'RequestDate' values are invalid or missing.")
            return

        # Filter out data from previous financial years
        financial_year_end = datetime(2024, 3, 31)
        df = df[df["RequestDate"] > financial_year_end]

        # Filter out completed cases
        completed_statuses = [
            "QC Approved", "QC APPROVED", "Reports Uploaded",
            "Appointment Attended", "QC Rejected", "Sent For Interpretation"
        ]
        df = df[~df["AppointmentStatus"].isin(completed_statuses)]

        # Initialize the 'Status' column
        df["Status"] = None

        # Assign statuses based on conditions
        df.loc[df["NumberofAttempts"] > 30, "Status"] = "Max Attempts"
        df.loc[(df["Status"].isna()) & (df["DND"] == "Yes"), "Status"] = "DND"

        appointment_status_cases = [
            "Cancelled", "Cancelled by insurer",
            "Appointment Confirmed", "Order sent to partner"
        ]
        df.loc[
            (df["AppointmentStatus"].isin(appointment_status_cases)) & (df["Status"].isna()),
            "Status"
        ] = df["AppointmentStatus"]

        # Assign remaining statuses based on 'LastCallStatus'
        df.loc[df["Status"].isna(), "Status"] = df["LastCallStatus"]

        # Fill any remaining null values in 'Status'
        df["Status"] = df["Status"].fillna("Non Contactable")

        # Categorize "Working On It" cases
        working_on_it_statuses = [
            "Appointment Request Received", "Direct Medical",
            "Location Constraint", "Medical Done Report Awaited", "Reminder"
        ]
        df.loc[df["Status"].isin(working_on_it_statuses), "Status"] = "Working On It"

        # Classify rows as 'Workable' or 'Non_Workable'
        workable_statuses = [
            "Appointment Confirmed", "Callback", "Non Contactable",
            "Order sent to partner", "Working On It"
        ]
        df["Type"] = "Non_Workable"  # Default to Non_Workable
        df.loc[df["Status"].isin(workable_statuses), "Type"] = "Workable"

        # Convert 'RequestDate' back to string format for final output
        df["RequestDate"] = df["RequestDate"].dt.strftime("%d/%m/%Y")

        # Generate a grouped report with counts of 'Status' by 'Type'
        grouped_report = (
            df.groupby(["Type", "Status"])
            .size()
            .reset_index(name="Count")
        )

        # Ensure the destination directory exists
        dest_dir = os.path.dirname(destination)
        if dest_dir and not os.path.exists(dest_dir):
            os.makedirs(dest_dir)

        # Save results to an Excel file
        with pd.ExcelWriter(destination) as writer:
            grouped_report.to_excel(writer, sheet_name="Report", index=False)
            df_raw.to_excel(writer, sheet_name="Raw_Data", index=False)
            df.to_excel(writer, sheet_name="Cleaned_Data", index=False)

        print("Processing complete. Output saved to:", destination)

    except Exception as e:
        print(f"An unexpected error occurred: {e}")


In [22]:
process_wip_data()

Enter the file path of the source data (CSV):  mis.csv
Enter the file path for the output Excel file:  test.xlsx


Processing complete. Output saved to: test.xlsx
