# Win32com

- ```win32com``` is a Python module that is part of the PyWin32 library that lets Python communicate with Windows applications like Outlook, Excel, Word, etc. - using COM **(Component Object Model)** interfaces. In short, it allows Python to behave like a human using Outlook via VBA.
    - ```win32com.client``` is the submodule used for controlling these applications as a client. You use it to start a COM application (Outlook, excel, etc.), send messages to it (read emails, write Excel files, etc.), and access its objects (folders, messages, workbooks, etc.)
    - **COM (Component Object Model)** is a Microsoft technology that allows different programs to talk to each other. You can think of it as a standard way for Windows programs to expose their features and objects (like emails, folders, Excel cells, etc.) so that other programs - like Python - can control them.
        - For instance, Outlook exposes a COM interface where folders, messages, etc. are accessible.

- **Installing PyWin32**:
    - ```python
        pip install pywin32
        ```

## Key Concepts for Outlook Automation

- **Connect to Outlook application:**
    - ```python
        import win32com.client

        outlook = win32com.client.Dispatch("Outlook.Application")
        namespace = outlook.GetNamespace("MAPI")
        ```

        - ```outlook``` A COM object representing the Outlook Application (used to access all functionality).
        - ```Dispatch("Outlook.Application")``` Launches or attaches to a running Outlook process.
        - ```.GetNamespace("MAPI")``` The root MAPI interface (Messaging API) that gives access to a top-level object (called "namespace") to access all Outlook data like Inbox, Calendar, Contacts, etc.


- **Accessing Folders:**
    - ```python
        inbox = namespace.GetDefaultFolder(6)
        ```

        - **6** represents the Inbox Folder. It is a Outlook's built-in folder ID. Some other IDs include:
            - 5: Sent Mail
            - 3: Deleted Items
            - 4: Outbox
            - 16: Drafts
            - 9: Calendar
    - Once you have the Inbox, you can navigate to subfolders:
        - ```python
            personal_folder = inbox.Folders["Personal Folder"]
            ```
    - You can keep chaining:
        - ```python
            from_leah = personal_folder.Folders['Leah']

- **Accessing Mail Items:**
    - Each folder contains ```.Items```, a **collection** of emails and possibly other Outlook items (meetings, etc.)
    - ```python
        for item in personal_folder.Items:
            if item.Class == 43: # item.Class returns an integer
                print(item.Subject)
        ```
        - You filter by ```item.Class``` because ```.Items``` can include:
            - 43: MailItem
            - 26: AppointmentItem (calendar)
            - 48: TaskItem


- **Useful MailItem Properties in Outlook:**

    - When working with Outlook emails using `win32com`, you interact with `MailItem` objects (Class ID `43`). Below are some of the most useful properties you can access:

| Property          | Type     | Description |
|------------------|----------|-------------|
| `Subject`         | `str`    | The subject line of the email |
| `Body`            | `str`    | The plain text body of the email |
| `HTMLBody`        | `str`    | The body of the email in HTML format |
| `SenderName`      | `str`    | The display name of the sender |
| `SenderEmailAddress` | `str` | The email address of the sender |
| `To`              | `str`    | The recipient(s) in the "To" field |
| `CC`              | `str`    | The recipients in the "CC" field |
| `ReceivedTime`    | `datetime` | The date and time the email was received |
| `SentOn`          | `datetime` | The date and time the email was sent |
| `Attachments`     | `Attachments` collection | Use `.Count`, `.Item(index)` to access attachments |
| `Categories`      | `str`    | The category names assigned to the email (e.g., "Control Request") |
| `Unread`          | `bool`   | `True` if the message is unread |
| `EntryID`         | `str`    | Unique ID for the email (used to retrieve the item again later) |
| `Parent`          | `Folder` object | The folder where the message resides |

- Example Usage:
    - ```python
            for item in folder.Items:
                if item.Class == 43:  # MailItem
                    print("Subject:", item.Subject)
                    print("Received:", item.ReceivedTime)
                    print("Body starts with:", item.Body[:50])
                    print("Category:", item.Categories)
        ```

- **Filtering with ```.Restrict()```**
    - Looping over every email is slow for large mailboxes. Instead, you can filter (like SQL ```WHERE```) with ```.Restrict()```.
    - ```python
        filtered_items = personal_folder.Items.Restrict("[Categories] = 'From Leah'")
        ```
    - This uses **DASL** syntax (like SQL, but for Outlook). You can filter by:
        - ```[Subject]```
        - ```[Categories]```
        - ```[ReceivedTime]```
        - ```[SenderEmailAddress]```
        - etc.
    - ```python
        filtered_items = personal_folder.Items.Restrict("[ReceivedTime] >= '01/01/2024") # This has to be in US date format ('MM/DD/YYYY HH:MM AM/PM')
        ```

- **Sorting with ```.Sort()```**
    - In Outlook automation with ```win32com```, you can use the ```.Sort()``` method to **sort items within a folder (like emails in your Inbox) before looping through them.** This is especially useful if you are only interested in the newest or oldest emails.
    - ```python
        items.Sort("[Received Time]", True) # True sorts in descending order (newest first). False sorts in ascending order.
        ```

- **Loop over Subfolders**
    - ```python
        for subfolder in main_folder.Folders: # subfolder and main_folder are MAPI items (think of them as a single folder in desktop)
            print(subfolder.Name) # If you want to loop through a parent folder, you need to add .Folders to it. .Folders represent a collection of subfolders. You cannot loop over a single parent file in your desktop! But you can loop over the subfolders INSIDE it!)
        ```
    - ```python
        for subfolder in main_folder.Folders:
            for item in subfolder.Items: # .Items represent a collection of items inside each subfolder
                print(item.Subject)
        ```
    

- **Example**
    - ```python
        import win32com.client

        # Connect to Outlook and get the MAPI namespace
        outlook = win32com.client.Dispatch('Outlook.Application')
        namespace = outlook.GetNamespace('MAPI')

        # Navigate to the target folder
        inbox = namespace.GetDefaultFolder(6)  # 6 = Inbox
        personal_inbox = inbox.Folders['Personal Folder']
        from_leah = personal_inbox.Folders['From Leah']

        # Get mail items and sort by ReceivedTime (newest first)
        items = from_leah.Items
        items.Sort('[ReceivedTime]', True)

        # Loop through emails and print key details
        for item in items:
            if item.Class == 43:  # Ensure it's a MailItem
                print('Subject:', item.Subject)
                print('Body:', item.Body)
                print('ReceivedTime:', item.ReceivedTime)
                print('Sender Email:', item.SenderEmailAddress)
                print('---\n')
        ```

In [None]:
# Import dependencies
from openpyxl import load_workbook
import pandas as pd
import win32com.client
import re
from datetime import datetime, timedelta

# Connect to Excel Tracker
wb = load_workbook('GSFMO CR and IR Tracker.xlsx')
sheet = wb['CR IR TRacker']

# Connect to Outlook
outlook = win32com.client.Dispatch('Outlook.Application')
namespace = outlook.GetNamespace('MAPI')

# Retrieve the GSFMO folder and its subfolders
gsfmo_folder = namespace.Folders('td.gsfmo@td.com')
gsfmo_inbox = gsfmo_folder.Folders('Inbox')
gsfmo_cr = gsfmo_inbox.Folders('Control Request')

# Store today's date and cutoff date for inbox filtering
now = datetime.now()
cutoff = now - timedelta(days=7)
cutoff_str = cutoff.strftime('%m/%d/%Y %I:%M%p') # Convert the datetime object into U.S. time string format (e.g. 06/06/2025 03:02PM)

# Initiate REGEX to read date patterns in Email items
date_pattern = r"""
    (Start\s*Date|From)              # Match Start Date label
    [\s:–—\-]*                       # Skip over spaces/dashes/colons
    (?P<start>                       # Start named group
        \d{1,2}[\s\-\/]*[A-Za-z]+[\s\-\/]*\d{2,4}     # e.g., 05-JUN-2025 or 5 JUN 2025
        |
        \d{1,2}[\-/]\d{1,2}[\-/]\d{2,4}               # e.g., 05/06/2025
    )
    .*?                              # Allow anything in between (non-greedy)
    (End\s*Date|To)                  # Match End Date label
    [\s:–—\-]*                       # Skip over spaces/dashes/colons
    (?P<end>                         # End named group
        \d{1,2}[\s\-\/]*[A-Za-z]+[\s\-\/]*\d{2,4}     # e.g., 18-JUN-2025
        |
        \d{1,2}[\-/]\d{1,2}[\-/]\d{2,4}               # e.g., 18/06/2025
    )
"""

closure_pattern = r"""
    (Date\s*of\s*Successful\s*Implementation) # Match label
    [\s:\-–—\u00A0\u200B]* # Separator: colon, dash, or invisible space
    \n* # Optional line break
    [\s\u00A0\u200B]* # Whitespace or invisible space
    (?P<closure> # Named group
        \d{1,2}(st|nd|rd|th)? # Day with optional ordinal
        [\s\-–—/]* # Separator(s)
        [A-Za-z]+ # Month
        [\s\-–—/]* # Separator(s)
        \d{2,4} # Year
    )
"""
date_formats = [
    "%d %B %Y",     # 28 May 2025
    "%d-%b-%Y",     # 28-May-2025
    "%d-%B-%Y",     # 28-May-2025
    "%Y-%m-%d",     # 2025-05-28
    "%m/%d/%Y",     # 05/28/2025
    "%d %b %Y",     # 28 May 25
]

def parse_date(date_str):
    for fmt in date_formats:
        try:
            return datetime.strptime(date_str.strip(), fmt)
        except ValueError:
            continue
    return None  # Could not parse

def clean_date_str(s):
    # Remove spaces around dashes or slashes to normalize dates like '05- JUN -2025' => '05-JUN-2025'
    return re.sub(r'\s*([-\/])\s*', r'\1', s.strip())

# Loop over each vendor folder
for vendor_folder in gsfmo_cr.Folders:
    filtered_items = vendor_folder.Items.Restrict(f"[TimeReceived] >= '{cutoff_str}'") # Filter for items from the last n days
    for item in filtered_items:
        if item.Class == 43 and "Control Request" in item.Categories: # For Mailbox items with "Control Request" tag...

            # Loop through the Excel sheet to match Subject + Raised Date
            subject = item.Subject
            raised_date = item.ReceivedTime.strftime('%m%d%Y')
            sender_email = item.SenderEmailAddress

            match_found = False
            last_row = sheet.max_row
            last_col = sheet.max_column



            for row in sheet.iter_rows(min_row=3, max_row=last_row, min_col=1, max_col=last_col):
                excel_raised_date = str(row[4].value).strip()
                row_date = excel_raised_date[1:] if excel_raised_date.startswith("'") else excel_raised_date
                if subject == row[6].value and raised_date == row_date and sender_email == row[7].value:
                    match_found = True
                    break

            if match_found == False:
                nontz_receivedTime = item.ReceivedTime.replace(tzinfo=None)
                if nontz_receivedTime >= datetime(2026,11,1):
                    fiscal_year = "FY 2026"
                elif nontz_receivedTime >= datetime(2025, 11, 1):
                    fiscal_year = "FY 2025"
                
                if "Temporary" in item.Body or "temporary" in item.Body:
                    temp_perm = 'T'
                elif "Permanent" in item.Body or "permanent" in item.Body:
                    temp_perm = 'P'
                
                match = re.search(date_pattern, item.Body, re.DOTALL | re.IGNORECASE | re.VERBOSE)
                if match:
                    start_raw = match.group('start')
                    end_raw = match.group('end')

                    start_dt = parse_date(clean_date_str(start_raw))
                    if start_dt is not None:
                        start_date = start_dt.strftime('%d-%m-%Y')
                    else:
                        start_date = "Enter manually"

                    end_dt = parse_date(clean_date_str(end_raw))
                    if end_dt is not None:
                        end_date = end_dt.strftime('%d-%m-%Y')
                    else:
                        end_date = "Enter manually"

                else:
                    start_date = "Enter manually"
                    end_date = "Enter manually"

                sheet.append([fiscal_year, vendor_folder.Name, "CR", temp_perm, raised_date, None, subject, sender_email, start_date, end_date, None, None, None, None, "Open"])
                        

        
        elif item.Class == 43 and item.SenderEmailAddress != "/O=EXCHANGELABS/OU=EXCHANGE ADMINISTRATIVE GROUP (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=9257D7775D104A96B4D8681CB2375415-VRMSUP, TD": # Filter for closure emails that are not sent by GSFMO          
            match = re.search(closure_pattern, item.Body, re.IGNORECASE | re.VERBOSE)
            if match:
                for row in sheet.iter_rows(min_row=3, max_row=last_row, min_col=1, max_col=last_col):
                    if row[1].value == vendor_folder.Name and row[6].value == f"Re: {item.Subject}" and str(row[-1].value).strip().lower() == 'open': # Filter for closure emails by vendor name, subject, and status
                        row[14].value = 'Closed'
                        row[15].value = item.ReceivedTime.strftime('%d-%b-%Y')



wb.save('GSFMO CR and IR Tracker.xlsx')


In [None]:
# Import required libraries
from openpyxl import load_workbook  # To read/write Excel files
from openpyxl.styles import PatternFill, Font
import pandas as pd  # Not currently used in code but useful for data handling
import win32com.client  # To interact with Outlook via COM
import re  # Regular expressions for pattern matching
from datetime import datetime, timedelta  # Date and time manipulation

# Connect to the Excel tracker workbook and select the specific worksheet
wb = load_workbook('GSFMO CR and IR Tracker for testing.xlsx')
sheet = wb['CR IR Tracker']

# Connect to Outlook application and get the MAPI namespace (email folders)
outlook = win32com.client.Dispatch('Outlook.Application')
namespace = outlook.GetNamespace('MAPI')

# Access the GSFMO mailbox folder, then its Inbox and finally the "Control Request" folder
gsfmo_folder = namespace.Folders('td.gsfmo@td.com')
gsfmo_inbox = gsfmo_folder.Folders('Inbox')
gsfmo_cr = gsfmo_inbox.Folders('Control Request')

# Store the current date and calculate the cutoff date (7 days ago) to filter recent emails
now = datetime.now()
cutoff = now - timedelta(days=7)
cutoff_str = cutoff.strftime('%m/%d/%Y %I:%M%p')  # Format cutoff date string as used by Outlook Restrict method

# Define regex patterns to extract start/end dates and closure dates from email bodies
date_pattern = r"""
    (Start\s*Date|From)              # Match "Start Date" or "From" label
    [\s:–—\-]*                      # Ignore spaces, colons, dashes between label and date
    (?P<start>                      # Capture group for start date
        \d{1,2}[\s\-\/]*[A-Za-z]+[\s\-\/]*\d{2,4}     # e.g. 05-JUN-2025 or 5 JUN 2025
        |
        \d{1,2}[\-/]\d{1,2}[\-/]\d{2,4}               # e.g. 05/06/2025
    )
    .*?                             # Allow any characters (non-greedy) between start and end date
    (End\s*Date|To)                 # Match "End Date" or "To" label
    [\s:–—\-]*                     # Ignore spaces, colons, dashes between label and date
    (?P<end>                       # Capture group for end date
        \d{1,2}[\s\-\/]*[A-Za-z]+[\s\-\/]*\d{2,4}     # e.g. 18-JUN-2025
        |
        \d{1,2}[\-/]\d{1,2}[\-/]\d{2,4}               # e.g. 18/06/2025
    )
"""

closure_pattern = r"""
    (Date\s*of\s*Successful\s*Implementation) # Match closure date label
    [\s:\-–—\u00A0\u200B]*  # Ignore colon, dash, or invisible spaces
    \n*                      # Optional line breaks
    [\s\u00A0\u200B]*        # Ignore whitespace or invisible spaces
    (?P<closure>             # Capture group for closure date
        \d{1,2}(st|nd|rd|th)?  # Day with optional ordinal (e.g. 1st, 2nd)
        [\s\-–—/]*           # Separator(s)
        [A-Za-z]+            # Month name
        [\s\-–—/]*           # Separator(s)
        \d{2,4}              # Year
    )
"""

# List of date formats to try when parsing date strings from emails
date_formats = [
    "%d %B %Y",     # 28 May 2025
    "%d-%b-%Y",     # 28-May-2025
    "%d-%B-%Y",     # 28-May-2025
    "%Y-%m-%d",     # 2025-05-28
    "%m/%d/%Y",     # 05/28/2025
    "%d %b %Y",     # 28 May 25
]

def parse_date(date_str):
    """
    Attempt to parse a date string into a datetime object using multiple possible formats.
    Returns None if no format matches.

    Args:
        date_str (str): The date string extracted from email body.

    Returns:
        datetime or None: Parsed datetime object or None if parsing failed.
    """
    for fmt in date_formats:
        try:
            return datetime.strptime(date_str.strip(), fmt)
        except ValueError:
            continue
    return None
 
def clean_date_str(s):
    """
    Normalize date strings by removing spaces around dashes or slashes.

    Args:
        s (str): Raw date string with inconsistent spacing.

    Returns:
        str: Cleaned date string.
    """
    return re.sub(r'\s*([-\/])\s*', r'\1', s.strip())

# Main processing loop: Iterate over each vendor folder under Control Request
for vendor_folder in gsfmo_cr.Folders:

    # Filter emails received after the cutoff date (7 days ago)
    filtered_items = vendor_folder.Items.Restrict(f"[ReceivedTime] >= '{cutoff_str}'")

    # Process each email item in filtered items
    for item in filtered_items:

        # Check if item is a Mail item (Class 43) and categorized as "Control Request"
        if item.Class == 43 and "Control Request" in item.Categories:
 
            # Extract key email properties for matching
            subject = item.Subject
            raised_date = item.ReceivedTime.strftime('%m%d%Y')  # Format received date as MMDDYYYY string
            sender_email = item.SenderEmailAddress

            match_found = False  # Flag to track if email already exists in Excel

            last_row = sheet.max_row
            last_col = sheet.max_column

            # Loop through Excel rows to check if email record already exists
            for row in sheet.iter_rows(min_row=3, max_row=last_row, min_col=1, max_col=last_col):
                excel_raised_date = str(row[4].value).strip()

                # Remove leading apostrophe if Excel stored the date as a string with apostrophe (e.g. '06062025)
                row_date = excel_raised_date[1:] if excel_raised_date.startswith("'") else excel_raised_date
 
                # Check if Subject, Raised Date, and Sender Email match an existing row
                if subject == row[6].value and raised_date == row_date and sender_email == row[7].value:
                    match_found = True
                    break  # Stop searching once a match is found

            # If no matching record found, append new data to Excel
            if not match_found:

                # Remove timezone info from ReceivedTime for comparison
                nontz_receivedTime = item.ReceivedTime.replace(tzinfo=None)

                # Determine fiscal year based on ReceivedTime
                if nontz_receivedTime >= datetime(2025, 11, 1):
                    fiscal_year = "FY 2026"
                elif nontz_receivedTime >= datetime(2024, 11, 1):
                    fiscal_year = "FY 2025"
                else:
                    fiscal_year = "Unknown"

                # Determine if request is Temporary or Permanent based on keywords in email body
                if "Temporary" in item.Body or "temporary" in item.Body:
                    temp_perm = 'T'
                elif "Permanent" in item.Body or "permanent" in item.Body:
                    temp_perm = 'P'
                else:
                    temp_perm = None
 
                # Extract start and end dates from email body using regex pattern
                match = re.search(date_pattern, item.Body, re.DOTALL | re.IGNORECASE | re.VERBOSE)
                if match:
                    start_raw = match.group('start')
                    end_raw = match.group('end')

                    # Parse and format start date; default to "Enter manually" if parsing fails
                    start_dt = parse_date(clean_date_str(start_raw))
                    if start_dt is not None:
                        start_date = start_dt
                    else:
                        start_date = "Enter manually"

                    # Parse and format end date; default to "Enter manually" if parsing fails
                    end_dt = parse_date(clean_date_str(end_raw))
                    if end_dt is not None:
                        end_date = end_dt
                    else:
                        end_date = "Enter manually"
                else:
                    # If no dates found by regex, require manual entry
                    start_date = "Enter manually"
                    end_date = "Enter manually"

                # Append a new row to the Excel sheet with extracted and default data
                sheet.append([
                    fiscal_year,          # Fiscal Year (FY 2025 or FY 2026)
                    vendor_folder.Name,   # Vendor folder name
                    "CR",                 # Fixed value "CR" to indicate Control Request
                    temp_perm,            # Temporary or Permanent flag
                    raised_date,          # Raised date in MMDDYYYY format
                    None,                 # Placeholder for a field (empty)
                    subject,              # Email subject
                    sender_email,         # Sender's email address
                    start_date,           # Extracted or manual start date
                    end_date,             # Extracted or manual end date
                    None, None, None, None,  # Placeholders for additional fields (empty)
                    "Open"                # Status set to "Open"
                ])

                last_row_idx = sheet.max_row           

                if isinstance(start_date, datetime):
                    sheet.cell(row=last_row_idx, column=9).number_format = 'dd-mmm-yyyy'            

                if isinstance(end_date, datetime):
                    sheet.cell(row=last_row_idx, column=10).number_format = 'dd-mmm-yyyy'              
 
        # Additional check for closure emails NOT sent by a specific system account
        elif item.Class == 43 and item.SenderEmailAddress != "/O=EXCHANGELABS/OU=EXCHANGE ADMINISTRATIVE GROUP (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=9257D7775D104A96B4D8681CB2375415-VRMSUP, TD":

             # Search for closure date in email body
            match = re.search(closure_pattern, item.Body, re.IGNORECASE | re.VERBOSE)
            if match:
                last_row = sheet.max_row  # Update last_row in case rows changed
                last_col = sheet.max_column

                # Loop through Excel rows to find matching vendor, subject, and open status
                for row in sheet.iter_rows(min_row=3, max_row=last_row, min_col=1, max_col=last_col):

                    # Check if vendor matches, subject matches prefixed with "Re: ", and status is "open"
                    if (row[1].value == vendor_folder.Name and
                        row[6].value in f"RE: {item.Subject}" and
                        str(row[14].value).strip().lower() == 'open'):

                        # Mark the status as Closed and update closure date
                        row[14].value = 'Closed'
                        row[14].fill = PatternFill(fill_type = 'solid', fgColor = 'FF0000')
                        row[15].value = item.ReceivedTime.strftime('%d-%b-%Y')

# Save changes back to the Excel workbook
wb.save('GSFMO CR and IR Tracker.xlsx')