# Generating Invoices from a CSV file

This notebook reads billable item data from a CSV file, groups the items by invoice number, and generates a separate Word document (.docx) for each invoice, including calculated totals and tax.


## Install Packages

First, we need to ensure the `python-docx` library is installed. This library allows us to create and manipulate Microsoft Word documents using Python. The `%pip` magic command ensures we install it into the current kernel's environment.


In [None]:
%pip install python-docx

## Import Packages

Next, we import the required Python libraries:

- `pandas`: For reading and manipulating the data from the input CSV file.
- `docx` (specifically `Document`): For creating a Word document programmatically.
- `docx.shared` (`Pt`): For specifying font sizes in points.
- `docx.enum.text` (`WD_ALIGN_PARAGRAPH`): For setting text alignment.
- `pathlib` (`Path`): For handling file paths and creating directories in a way that works across different operating systems.


In [8]:
import pandas as pd
import pandas as pd
from docx import Document
from docx.shared import Pt
from docx.enum.text import WD_ALIGN_PARAGRAPH
from pathlib import Path

## Configure Tax Rate

Set the sales tax rate that will be applied to the subtotal of each invoice. This is defined as a decimal (e.g., 0.10 represents 10%).


In [9]:
# Set the tax rate here (e.g., 0.07 for 7%)
TAX_RATE = 0.10

## Load and Preview Data

Read the invoice line items from the `billable_items.csv` file into a pandas DataFrame. We also tell pandas to parse the "Date" column as datetime objects. Finally, display the DataFrame to verify the data has been loaded correctly.


In [None]:
df = pd.read_csv(
    "https://raw.githubusercontent.com/subwaymatch/mba564b-2025-redesign/main/01-generate-invoices/data/billable_items.csv",
    parse_dates=["Date"],
)

df

Unnamed: 0,Invoice Number,Date,Client Name,Item,Quantity,Unit Price
0,INV001,2025-05-01,Acme Corp,Consulting Hour - Strategy,4,150.0
1,INV001,2025-05-01,Acme Corp,Report Generation,1,300.0
2,INV002,2025-05-02,Beta LLC,Data Analysis Service,10,75.5
3,INV002,2025-05-02,Beta LLC,Custom Visualization,2,250.0
4,INV003,2025-05-03,Gamma Inc,Software License - Annual,1,1200.0
5,INV003,2025-05-03,Gamma Inc,Support Contract,1,400.0
6,INV003,2025-05-03,Gamma Inc,Training Session,3,200.0
7,INV004,2025-05-04,Acme Corp,Widget Installation,5,50.0
8,INV004,2025-05-04,Acme Corp,Travel Expenses,1,125.75


## Define the Invoice Creation Function

This cell defines the main function, `create_invoice()`. This function takes the invoice number, the DataFrame group containing all items for that specific invoice, the output directory path, and the tax rate as input.

Inside the function, it:

- Creates a new Word document.
- Sets default font style.
- Extracts common information (client name, date).
- Adds the header section (Title, Client Info, Invoice #, Date).
- Creates a table for the line items (Description, Quantity, Unit Price, Line Total).
- Iterates through the line items in the group, adds them to the table, and calculates the subtotal.
- Calculates the tax amount and grand total.
- Adds the totals section (Subtotal, Tax, Grand Total) aligned to the right.
- Saves the completed document to the specified output path with the filename `{invoice_num}.docx`.


In [11]:
def create_invoice(invoice_num, group, output_path, tax_rate):
    doc = Document()
    doc.styles["Normal"].font.name = "Calibri"
    doc.styles["Normal"].font.size = Pt(13)

    # Extract common invoice information from the first row of the group
    first_row = group.iloc[0]
    client_name = first_row["Client Name"]

    # Ensure date is formatted correctly even if read as string initially
    invoice_date_dt = pd.to_datetime(first_row["Date"])

    # Format the date to a more readable format (e.g., "January 01, 2025")
    invoice_date = invoice_date_dt.strftime("%B %d, %Y")

    # Add a title to the document
    doc.add_heading("INVOICE", level=0)

    p_info = doc.add_paragraph()
    p_info.add_run("Client: ").bold = True
    p_info.add_run(f"{client_name}\n")
    p_info.add_run("Invoice Number: ").bold = True
    p_info.add_run(f"{invoice_num}\n")
    p_info.add_run("Date: ").bold = True
    p_info.add_run(invoice_date)

    doc.add_paragraph()  # Add some space

    # Line items table
    table = doc.add_table(rows=1, cols=4)
    table.style = "Table Grid"

    # Add header row
    header_row_cells = table.rows[0].cells
    header_row_cells[0].text = "Item Description"
    header_row_cells[1].text = "Quantity"
    header_row_cells[2].text = "Unit Price"
    header_row_cells[3].text = "Line Total"

    for cell in header_row_cells:
        cell.paragraphs[0].runs[0].font.bold = True

    subtotal = 0.0
    # Add data rows
    for index, row in group.iterrows():
        try:
            item = row["Item"]
            # Convert quantity and unit price safely
            quantity = pd.to_numeric(row["Quantity"])
            unit_price = pd.to_numeric(row["Unit Price"])

            line_total = quantity * unit_price
            subtotal += line_total

            row_cells = table.add_row().cells
            row_cells[0].text = item
            row_cells[1].text = str(quantity)
            row_cells[2].text = f"${unit_price:,.2f}"
            row_cells[3].text = f"${line_total:,.2f}"
        except (ValueError, TypeError) as conv_err:
            print(
                f"  Skipping row {index+2} in CSV due to data conversion error: {conv_err}. Check Quantity/Unit Price."
            )
            continue  # Skip this row and continue with the next

    # Add some space before the totals
    doc.add_paragraph()

    # Add a subtotal, tax, and grand total section
    p_total_section = doc.add_paragraph()

    tax_amount = subtotal * tax_rate
    grand_total = subtotal + tax_amount

    p_total_section.alignment = WD_ALIGN_PARAGRAPH.RIGHT
    p_total_section.add_run("Subtotal: ").bold = True
    p_total_section.add_run(f"${subtotal:,.2f}\n")
    p_total_section.add_run(f"Tax ({tax_rate:.1%}): ").bold = True
    p_total_section.add_run(f"${tax_amount:,.2f}\n")
    p_total_section.add_run("Grand Total: ").bold = True
    p_total_section.add_run(f"${grand_total:,.2f}")

    # --- Save the document ---
    output_filename = output_path / f"{invoice_num}.docx"
    doc.save(output_filename)
    print(f"  Invoice '{output_filename}' generated successfully.")

## Generate Invoices for each Invoice Number

This is the main part of the script that orchestrates the invoice generation:

1.  **Create Output Directory:** Defines the name for the output folder (`generated_invoices`) and uses `pathlib` to create it. `mkdir(parents=True, exist_ok=True)` safely creates the directory without errors if it already exists.
2.  **Group Data:** Groups the rows in the DataFrame based on the unique values in the 'Invoice Number' column.
3.  **Iterate and Generate:** Loops through each invoice group. For each group, it prints a status message indicating which invoice is being processed and then calls the `create_invoice` function, passing the necessary details to generate the Word document for that specific invoice.
4.  **Completion Message:** Prints a final message when all invoices have been processed.


In [12]:
# Create a directory to save the invoices
output_path = Path("generated_invoices")
output_path.mkdir(parents=True, exist_ok=True)

# Group the data by "Invoice Number"
grouped_data = df.groupby("Invoice Number")
print(f"Found {len(grouped_data)} unique invoices to generate.")

# Iterate through each group and generate invoice
for invoice_num, group in grouped_data:
    print(
        f"Processing Invoice: {invoice_num} for Client: {group.iloc[0]['Client Name']}..."
    )

    create_invoice(invoice_num, group, output_path, TAX_RATE)

print("\nInvoice generation process complete.")

Found 4 unique invoices to generate.
Processing Invoice: INV001 for Client: Acme Corp...
  Invoice 'generated_invoices\INV001.docx' generated successfully.
Processing Invoice: INV002 for Client: Beta LLC...
  Invoice 'generated_invoices\INV002.docx' generated successfully.
Processing Invoice: INV003 for Client: Gamma Inc...
  Invoice 'generated_invoices\INV003.docx' generated successfully.
Processing Invoice: INV004 for Client: Acme Corp...
  Invoice 'generated_invoices\INV004.docx' generated successfully.

Invoice generation process complete.
