## Invoice Generation Script

This notebook reads billable item data from a CSV file, groups the items by invoice number, and generates a separate Word document (.docx) for each invoice, including calculated totals and tax.

### 1. Install Necessary Library

First, we need to ensure the `python-docx` library is installed. This library allows us to create and manipulate Microsoft Word documents using Python. The `%pip` magic command ensures we install it into the current kernel's environment.

In [None]:
%pip install python-docx

### 2. Import Libraries

Next, we import the required Python libraries:
* `pandas`: For reading and manipulating the data from the CSV file efficiently.
* `docx` (specifically `Document`): For creating the Word document structure.
* `docx.shared` (`Pt`): For specifying font sizes in points.
* `docx.enum.text` (`WD_ALIGN_PARAGRAPH`): For setting text alignment.
* `pathlib` (`Path`): For handling file paths and creating directories in a way that works across different operating systems.

In [None]:
import pandas as pd
from docx import Document
from docx.shared import Pt
from docx.enum.text import WD_ALIGN_PARAGRAPH
from pathlib import Path

### 3. Configure Tax Rate

Set the sales tax rate that will be applied to the subtotal of each invoice. This is defined as a decimal (e.g., 0.10 represents 10%).

In [None]:
# Set the tax rate here (e.g., 0.07 for 7%)
TAX_RATE = 0.10

### 4. Load and Preview Data

Read the invoice line items from the `billable_items.csv` file into a pandas DataFrame. We also tell pandas to parse the 'Date' column as datetime objects. Finally, display the DataFrame to verify the data has been loaded correctly.

In [None]:
# Ensure 'billable_items.csv' is in the same directory as the notebook
# or provide the full path.
df = pd.read_csv("billable_items.csv", parse_dates=["Date"])

df

### 5. Define Invoice Creation Function

This cell defines the main function, `create_invoice`. This function takes the invoice number, the DataFrame group containing all items for that specific invoice, the output directory path, and the tax rate as input. 

Inside the function, it:
* Creates a new Word document.
* Sets default font style.
* Extracts common information (client name, date).
* Adds the header section (Title, Client Info, Invoice #, Date).
* Creates a table for the line items (Description, Quantity, Unit Price, Line Total).
* Iterates through the line items in the group, adds them to the table, and calculates the subtotal.
* Calculates the tax amount and grand total.
* Adds the totals section (Subtotal, Tax, Grand Total) aligned to the right.
* Saves the completed document to the specified output path with the filename `{invoice_num}.docx`.

In [None]:
def create_invoice(invoice_num, group, output_path, tax_rate):
    doc = Document()
    # Set default font for the document
    doc.styles["Normal"].font.name = "Calibri"
    doc.styles["Normal"].font.size = Pt(12) # Using 12pt as per previous update

    # Extract common invoice information from the first row of the group
    first_row = group.iloc[0]
    client_name = first_row["Client Name"]

    # Ensure date is formatted correctly even if read as string initially
    try:
        invoice_date_dt = pd.to_datetime(first_row["Date"])
        # Format the date to a more readable format (e.g., "May 01, 2025")
        invoice_date = invoice_date_dt.strftime("%B %d, %Y")
    except (ValueError, TypeError):
        invoice_date = str(first_row["Date"]) # Fallback

    # Add a title to the document (using paragraph for consistent font size)
    p_title = doc.add_paragraph()
    run_title = p_title.add_run('INVOICE')
    run_title.font.bold = True
    p_title.alignment = WD_ALIGN_PARAGRAPH.CENTER

    # Add client/invoice details
    p_info = doc.add_paragraph()
    p_info.add_run("Client: ").bold = True
    p_info.add_run(f"{client_name}\n")
    p_info.add_run("Invoice Number: ").bold = True
    p_info.add_run(f"{invoice_num}\n")
    p_info.add_run("Date: ").bold = True
    p_info.add_run(invoice_date)

    doc.add_paragraph()  # Add some space

    # Line items table
    table = doc.add_table(rows=1, cols=4)
    table.style = "Table Grid"
    table.autofit = True # Use autofit

    # Add header row
    header_row_cells = table.rows[0].cells
    header_row_cells[0].text = "Item Description"
    header_row_cells[1].text = "Quantity"
    header_row_cells[2].text = "Unit Price"
    header_row_cells[3].text = "Line Total"

    # Make header bold
    for cell in header_row_cells:
        if cell.paragraphs and cell.paragraphs[0].runs:
             cell.paragraphs[0].runs[0].font.bold = True

    subtotal = 0.0
    # Add data rows
    for index, row in group.iterrows():
        try:
            item = str(row["Item"])
            # Convert quantity and unit price safely
            quantity = pd.to_numeric(row["Quantity"])
            unit_price = pd.to_numeric(row["Unit Price"])

            line_total = quantity * unit_price
            subtotal += line_total

            row_cells = table.add_row().cells
            row_cells[0].text = item
            row_cells[1].text = str(quantity)
            row_cells[2].text = f"${unit_price:,.2f}"
            row_cells[3].text = f"${line_total:,.2f}"
        except (ValueError, TypeError) as conv_err:
            print(
                f"  Skipping row {index+2} in CSV due to data conversion error: {conv_err}. Check Quantity/Unit Price."
            )
            continue  # Skip this row and continue with the next

    # Add some space before the totals
    doc.add_paragraph()

    # Calculate totals
    tax_amount = subtotal * tax_rate
    grand_total = subtotal + tax_amount

    # Add a subtotal, tax, and grand total section (Right aligned)
    # Subtotal
    p_subtotal = doc.add_paragraph()
    p_subtotal.alignment = WD_ALIGN_PARAGRAPH.RIGHT
    run_sub_label = p_subtotal.add_run('Subtotal: ')
    run_sub_label.font.bold = True
    run_sub_value = p_subtotal.add_run(f"${subtotal:,.2f}")

    # Tax
    p_tax = doc.add_paragraph()
    p_tax.alignment = WD_ALIGN_PARAGRAPH.RIGHT
    run_tax_label = p_tax.add_run(f'Tax ({tax_rate:.1%}): ')
    run_tax_label.font.bold = True
    run_tax_value = p_tax.add_run(f"${tax_amount:,.2f}")

    # Grand Total
    p_total = doc.add_paragraph()
    p_total.alignment = WD_ALIGN_PARAGRAPH.RIGHT
    run_total_label = p_total.add_run('Grand Total: ')
    run_total_label.font.bold = True
    run_total_value = p_total.add_run(f"${grand_total:,.2f}")
    run_total_value.font.bold = True

    # --- Save the document ---
    output_filename = output_path / f"{invoice_num}.docx"
    try:
        doc.save(output_filename)
        print(f"  Invoice '{output_filename}' generated successfully.")
    except Exception as e:
         print(f"  Error saving invoice {invoice_num}: {e}")

### 6. Main Execution Logic

This is the main part of the script that orchestrates the invoice generation:
1.  **Create Output Directory:** Defines the name for the output folder (`generated_invoices`) and uses `pathlib` to create it. `mkdir(parents=True, exist_ok=True)` safely creates the directory without errors if it already exists.
2.  **Group Data:** Groups the rows in the DataFrame based on the unique values in the 'Invoice Number' column.
3.  **Iterate and Generate:** Loops through each invoice group. For each group, it prints a status message indicating which invoice is being processed and then calls the `create_invoice` function, passing the necessary details to generate the Word document for that specific invoice.
4.  **Completion Message:** Prints a final message when all invoices have been processed.

In [None]:
# Create a directory to save the invoices
output_path = Path("generated_invoices")
output_path.mkdir(parents=True, exist_ok=True)
print(f"Output directory '{output_path}' ensured.")

# Group the data by "Invoice Number"
# Check if 'Invoice Number' column exists before grouping
if 'Invoice Number' not in df.columns:
     print(f"Error: Column 'Invoice Number' not found in the DataFrame.")
else:
    grouped_data = df.groupby("Invoice Number")
    print(f"Found {len(grouped_data)} unique invoices to generate.")

    # Iterate through each group and generate invoice
    if len(grouped_data) > 0:
        for invoice_num, group in grouped_data:
             # Ensure the group is not empty and has necessary columns before proceeding
            if not group.empty and 'Client Name' in group.columns:
                print(
                    f"Processing Invoice: {invoice_num} for Client: {group.iloc[0]['Client Name']}..."
                )
                create_invoice(invoice_num, group, output_path, TAX_RATE)
            else:
                 print(f"  Skipping empty or invalid group for Invoice: {invoice_num}")
        print("\nInvoice generation process complete.")
    else:
        print("No invoice groups found to process.")