# RENT AUTOMATION BOT (PROOF OF CONCEPT, PROTOTYPE, DEPLOYMENT)

***

<img src= 'images\rent bot.jpg' width='200'>


# Problem Statement and Project Objectives

***

## Problem Statement
The primary challenge addressed in this notebook is the automation of rent payment tracking for Lemaiyan Heights. Traditional rent collection and record-keeping can be error-prone and time-consuming, especially when dealing with multiple payment sources and tenants. Key challenges include:
- Manual reconciliation of payments.
- Risk of duplicate entries due to repetitive processing of the same payment notifications.
- Difficulty in maintaining up-to-date financial records and tenant-specific payment histories.

## Project Objectives
The notebook aims to achieve the following objectives:
- **Automate Payment Parsing:** Use regular expressions to extract payment details from email notifications reliably.
- **Ensure Data Integrity:** Avoid double-processing of payments by tracking unique payment references.
- **Dynamic Sheet Management:** Automatically update or create individual tenant sheets based on the received payment details.
- **Seamless Integration:** Connect with the Google platform (Gmail and Google Sheets) to streamline data extraction and real-time updates.
- **User-Friendly Dashboard:** Provide a streamlined overview through a Streamlit app, allowing users to initiate the payment bot and view processing logs and summaries.

This solution demonstrates a proof of concept that leverages Python, Pandas, openpyxl, gspread, and Google APIs to enhance the rent collection process, optimize administrative workflows, and reduce the possibility of human error.

***


## PROOF OF CONCEPT

* Random generated dataset to simulate results

In [1]:
import random
from faker import Faker
from pathlib import Path
import string

fake = Faker()

# Define possible account codes (A1-A6, B1-B6, ..., G1-G6)
accounts = [f"{l}{n}" for l in "ABCDEFG" for n in range(1, 7)]

email_template = ("Dear Customer, your payment of KES {amount} for account: PAYLEMAIYAN #{code} "
                  "has been received from {name} {phone} on {date_time}. "
                  "M-Pesa Ref: {mpesa_ref}")
def random_ref_code(length=10):
    return ''.join(random.choices(string.ascii_uppercase + string.digits, k=length))
dummy_emails = []
for _ in range(200):
    amount = f"{random.randint(5, 20) * 1000}.00"
    code = random.choice(accounts)
    name = fake.name()
    phone = f"{random.randint(700, 799)}****{random.randint(100,999)}"
    date_time = fake.date_time_this_year().strftime('%d/%m/%Y %I:%M %p')
    mpesa_ref = random_ref_code(10)
    email_text = email_template.format(
        amount=amount,
        code=code,
        name=name,
        phone=phone,
        date_time=date_time,
        mpesa_ref=mpesa_ref
    )
    dummy_emails.append(email_text)

# Save to data/dummy_emails_200.txt
out_path = Path("data/dummy_emails_200.txt")
out_path.parent.mkdir(parents=True, exist_ok=True)
out_path.write_text("\n\n".join(dummy_emails))

print(f"Created {len(dummy_emails)} dummy emails in {out_path}")
print("Sample email:\n", dummy_emails[0])

Created 200 dummy emails in data\dummy_emails_200.txt
Sample email:
 Dear Customer, your payment of KES 19000.00 for account: PAYLEMAIYAN #A3 has been received from Kathleen Jones 759****602 on 04/02/2025 09:06 AM. M-Pesa Ref: R80VIOA2YW


* Logic engine

In [2]:
# --------------------------------------------------------
# Reads MPESA email notifications from dummy_emails.txt,
# Parses payment info, updates dummy_rent_tracker.xlsx,
# Avoids double-logging by checking ProcessedRefs sheet.

import pandas as pd
import re
from pathlib import Path
from openpyxl import load_workbook
import warnings
warnings.filterwarnings('ignore')

# --- CONFIGURATION ---
DATA_DIR = Path('data')
EMAIL_FILE = DATA_DIR / 'dummy_emails_200.txt'
SPREADSHEET_FILE = DATA_DIR / 'dummy_rent_tracker.xlsx'

# --- 1. Load Dummy Emails ---
with open(EMAIL_FILE, 'r') as f:
    email_texts = f.read().split('\n\n')

print(f"Loaded {len(email_texts)} emails.")

Loaded 200 emails.


In [3]:
email_texts[22]

'Dear Customer, your payment of KES 20000.00 for account: PAYLEMAIYAN #G4 has been received from Blake Lee 719****900 on 08/05/2025 08:08 AM. M-Pesa Ref: 1NRVADULXZ'

In [4]:
# --- 2. Load Workbook and All Sheet Names ---
wb = load_workbook(SPREADSHEET_FILE)
sheet_names = wb.sheetnames

# --- 3. Load ProcessedRefs (deduplication) ---
try:
    processed_refs_df = pd.read_excel(SPREADSHEET_FILE, sheet_name='ProcessedRefs')
    processed_refs = set(str(ref).strip().upper() for ref in processed_refs_df['Ref'] if pd.notna(ref))
except Exception:
    processed_refs = set()
    print("ProcessedRefs sheet is empty or missing. Will create it.")

print(f"Found {len(processed_refs)} previously processed refs.")

Found 1250 previously processed refs.


In [5]:
# --- 4. Regex Parser Function ---
def extract_payment_info(email_body):
    pattern = (
        r'payment of KES ([\d,]+\.\d{2}) '
        r'for account: PAYLEMAIYAN\s*#?\s*([A-Za-z]\d{1,2})'
        r' has been received from (.+?) '
        r'(.{1,13}) '
        r'on (\d{2}/\d{2}/\d{4} \d{1,2}:\d{2} [APM]{2})\. '
        r'M-Pesa Ref: ([\w\d]+)'
    )
    match = re.search(pattern, email_body, flags=re.IGNORECASE)
    if match:
        return {
            'Amount': float(match.group(1).replace(',', '').strip()),
            'AccountCode': match.group(2).strip().upper(),
            'Payer': match.group(3).strip(),
            'Phone': match.group(4).strip(),
            'Date': match.group(5).strip(),
            'Ref': match.group(6).strip().upper(),
        }
    return None

In [6]:
# --- 5. Process Emails, Update or Create Sheets ---


updates_log = []
new_refs = []
updates_per_sheet = {}

# We'll use openpyxl to add new sheets if needed
wb = load_workbook(SPREADSHEET_FILE)
writer = pd.ExcelWriter(SPREADSHEET_FILE, engine='openpyxl', mode='a', if_sheet_exists='overlay')

# Loading a Master payments file
try:
    payment_history_df = pd.read_excel(SPREADSHEET_FILE, sheet_name='PaymentHistory')
except Exception:
    payment_history_df = pd.DataFrame(columns=[
        'Date', 'Amount', 'Ref', 'Payer', 'Phone', 'Payment Mode', 'AccountCode', 'TenantSheet'
    ])


for email in email_texts:
    payment_data = extract_payment_info(email)
    if not payment_data:
        updates_log.append("Skipped email: Could not parse payment info.")
        continue

    ref = payment_data['Ref'].upper().strip()
    if ref in processed_refs:
        updates_log.append(f"Duplicate ignored (Ref {ref})")
        continue

    account_code = payment_data['AccountCode']
    payer_name = payment_data['Payer'].replace(" ", "_")[:15]
    # Try to match an existing tenant sheet
    target_sheet = None
    for s in sheet_names:
        # Take just the code part from the sheet name
        sheet_token = s.split()[0].replace('-', '').upper().strip()
        if account_code == sheet_token and 'PROCESSEDREFS' not in s.upper() and 'PAYMENTHISTORY' not in s.upper():
            target_sheet = s
            break

    # --- 7. If no sheet found, CREATE it ---
    if target_sheet is None:
        target_sheet = f"{account_code} - {payer_name if payer_name else 'AutoAdded'}"
        print(f"Creating new sheet: {target_sheet} for new tenant {account_code}")
        new_tenant_df = pd.DataFrame(columns=[
            'Date', 'Amount', 'Ref', 'Payer', 'Phone', 'Payment Mode'
        ])
        new_tenant_df.to_excel(writer, sheet_name=target_sheet, index=False)
        updates_log.append(f"Created new sheet: {target_sheet}")
        sheet_names.append(target_sheet)  # So we don't create it twice

    # --- 8. Append payment to tenant sheet ---
    try:
        df = pd.read_excel(SPREADSHEET_FILE, sheet_name=target_sheet)
    except Exception:
        df = pd.DataFrame(columns=['Date', 'Amount', 'Ref', 'Payer', 'Phone', 'Payment Mode'])

    new_row = pd.DataFrame({
        'Date': [payment_data['Date']],
        'Amount': [payment_data['Amount']],
        'Ref': [payment_data['Ref']],
        'Payer': [payment_data['Payer']],
        'Phone': [payment_data['Phone']],
        'Payment Mode': ['MPESA Payment'],
    })
    df = pd.concat([df, new_row], ignore_index=True)
    df.to_excel(writer, sheet_name=target_sheet, index=False)
    updates_log.append(f"Logged payment for {account_code} - Ref {ref}")
    new_refs.append(ref)
    updates_per_sheet.setdefault(target_sheet, 0)
    updates_per_sheet[target_sheet] += 1

     # --- 9. Add to PaymentHistory sheet ---
    new_hist_row = new_row.copy()
    new_hist_row['AccountCode'] = account_code
    new_hist_row['TenantSheet'] = target_sheet
    payment_history_df = pd.concat([payment_history_df, new_hist_row], ignore_index=True)

In [7]:
# --- 10. Save PaymentHistory sheet
payment_history_df.to_excel(writer, sheet_name='PaymentHistory', index=False)

# --- 11. Update ProcessedRefs sheet
try:
    refs_df = pd.read_excel(SPREADSHEET_FILE, sheet_name='ProcessedRefs')
except Exception:
    refs_df = pd.DataFrame({'Ref': []})
if new_refs:
    new_refs_df = pd.DataFrame({'Ref': new_refs})
    updated_refs = pd.concat([refs_df, new_refs_df], ignore_index=True)
    updated_refs.to_excel(writer, sheet_name='ProcessedRefs', index=False)
    updates_log.append(f"ProcessedRefs updated with {len(new_refs)} new refs.")

writer.close()

print("\n--- Processing Summary ---")
for log in updates_log:
    print(log)
print("\nUpdates per tenant sheet:")
for k, v in updates_per_sheet.items():
    print(f"{k}: {v} payments appended")


--- Processing Summary ---
Logged payment for A3 - Ref R80VIOA2YW
Logged payment for C4 - Ref WURDRH7ZBU
Logged payment for D2 - Ref LTW0VCTS5S
Logged payment for A6 - Ref WP2X99YUDD
Logged payment for G4 - Ref MUH400LVXG
Logged payment for A3 - Ref 3RB37S9E7O
Logged payment for A3 - Ref LRVN72GBQD
Logged payment for F6 - Ref FRGEE8A03G
Logged payment for F1 - Ref SXFW54V1I9
Logged payment for G3 - Ref IY79MGDB1U
Logged payment for F2 - Ref 8AEOYFJRQ8
Logged payment for C3 - Ref 2CT6AX00NG
Logged payment for E1 - Ref JEFNXKQ14B
Logged payment for D4 - Ref 1OKUUKCM8U
Logged payment for C2 - Ref 6KZBWB5WB5
Logged payment for G2 - Ref RRYATOC3YD
Logged payment for G2 - Ref CBIHOA0YP1
Logged payment for A1 - Ref V5W74BU5YC
Logged payment for B2 - Ref L2YI1I2XWG
Logged payment for E5 - Ref AKPXDAOR4G
Logged payment for G6 - Ref X1W9MGLQ64
Logged payment for B2 - Ref 5V2F3WF2HQ
Logged payment for G4 - Ref 1NRVADULXZ
Logged payment for A3 - Ref X5DK2WVQC2
Logged payment for F1 - Ref 9EW3MP7T

## PROTOTYPE

* Intergrating the proof of concept to the Google Platform
* First is to create a dummy account on gmail and populate it with dummy emails as above.

### DUMMY EMAIL GENERATION

In [1]:
# Loading dependencies for sending email notifications
import base64, random, string, time, datetime
from faker import Faker
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
from email.mime.text import MIMEText


In [5]:
# ---------- SEND 200 TEST EMAILS INTO SANDBOX GMAIL ----------

fake = Faker()
SCOPES = SCOPES = [
    'https://www.googleapis.com/auth/gmail.send',     # to inject test mail
    'https://www.googleapis.com/auth/gmail.readonly',
    'https://www.googleapis.com/auth/gmail.modify' # to call getProfile
]
flow   = InstalledAppFlow.from_client_secrets_file('bot_secret.json', SCOPES)
creds  = flow.run_local_server(port=0)
gmail  = build('gmail', 'v1', credentials=creds)
user_email = gmail.users().getProfile(userId='me').execute()['emailAddress']

accounts = [f"{l}{n}" for l in "ABCDEFG" for n in range(1,7)]
def rand_ref(): return ''.join(random.choices(string.ascii_uppercase+string.digits, k=10))

def make_msg(text):
    m = MIMEText(text)
    m['From'] = 'NCB <ncbcustomer@ncbgroup.com>'
    m['To']   = user_email
    m['Subject'] = 'NCBA TRANSACTIONS STATUS UPDATE'
    return {'raw': base64.urlsafe_b64encode(m.as_bytes()).decode()}

for _ in range(40):
    code  = random.choice(accounts)
    code_fragment = f"#{code.lower()}" if random.random()>.4 else code   # hash optional
    amt   = f"{random.randint(5,20)*1000:,}.00"
    name  = fake.name().title()
    phone = f"0{random.randint(100,999)}***{random.randint(100,999)}"
    dt    = fake.date_time_this_year().strftime('%d/%m/%Y %I:%M %p')
    ref   = rand_ref()
    body  = (f"Your M-Pesa payment of KES {amt} for account: PAYLEMAIYAN {code_fragment} "
             f"has been received from {name} {phone} on {dt}. M-Pesa Ref: {ref}. NCBA, Go for it.")
    gmail.users().messages().send(userId='me', body=make_msg(body)).execute()
    time.sleep(0.3) # Adjusted sleep time to avoid rate limits

print("✅ 40 dummy messages delivered to", user_email)

Please visit this URL to authorize this application: https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=899105285450-50tdk35cnnrrich3nlr0d80kdp2qeovr.apps.googleusercontent.com&redirect_uri=http%3A%2F%2Flocalhost%3A53635%2F&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fgmail.send+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fgmail.readonly+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fgmail.modify&state=Dt8GnubiuGBZcnEm2f3Exs4ILuX5AJ&access_type=offline
✅ 40 dummy messages delivered to dmmccntdev@gmail.com


### GOOGLE SHEET GENERATION

In [2]:
# ---------- LOAD GSPREAD LIBRARIES FOR GOOGLE SHEETS ----------
import pandas as pd, gspread, openpyxl
from google.oauth2.service_account import Credentials

In [5]:
# ---------- ONE‑TIME MIGRATION EXCEL → GOOGLE SHEETS ----------


SRC_EXCEL = 'data/2025 RENT TRACKING - Lemaiyan Heights.xlsx'  # original data file
DEST_SHEET = 'RENT TRACKING-Lemaiyan Heights' # New file in google sheets

creds = Credentials.from_service_account_file('bot_service.json',
    scopes=['https://www.googleapis.com/auth/spreadsheets',
            'https://www.googleapis.com/auth/drive'])
gc = gspread.authorize(creds)
sh = gc.open(DEST_SHEET)

wb = openpyxl.load_workbook(SRC_EXCEL, data_only=True)
for ws in wb.worksheets:
    title = ws.title[:99]  # Sheets title limit
    if title in [s.title for s in sh.worksheets()]:
        sheet = sh.worksheet(title)
    else:
        sheet = sh.add_worksheet(title, rows=2000, cols=10)

    data = [[str(cell) if cell is not None else '' for cell in row] for row in ws.iter_rows(values_only=True)]
    
#   Update the sheets to populate
    sheet.update(values=data, range_name='A1', value_input_option='USER_ENTERED')
    time.sleep(2)  # Wait 2 seconds per write

    # freeze first 7 rows and bold headers
    sheet.format('1:7', {'textFormat': {'bold': True}})
    sheet.freeze(rows=1)

print("Bootstrap complete – Google Sheet mirrors the Excel file.")


Bootstrap complete – Google Sheet mirrors the Excel file.


### BOT-SERVICE

In [None]:
# ================================
#  RENT RPA — PROTOTYPE
#  Gmail -> Google Sheets
# ================================

import re, base64, time
import pandas as pd
from datetime import datetime, timedelta
from email.mime.text import MIMEText
from IPython.display import display

# --- Google APIs ---
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from google.oauth2.service_account import Credentials
import gspread
from gspread.utils import rowcol_to_a1
from dateutil.relativedelta import relativedelta

# ---------------- CONFIG ----------------
CLIENT_SECRET = 'bot_secret.json'        # Gmail OAuth Desktop credentials
SERVICE_KEY   = 'bot_service.json'      # Sheets service account (shared on the target Sheet)
SHEET_NAME    = 'RENT TRACKING-Lemaiyan Heights'  # exact Google Sheet NAME
GMAIL_QUERY   = 'PAYLEMAIYAN subject:"NCBA TRANSACTIONS STATUS UPDATE" newer_than:365d'  # tweak as needed

# This prototype uses a unified event schema for consistency:
PAYMENT_COLS  = ['Date Paid','Amount Paid','REF Number','Payer','Phone','Payment Mode']
MAX_PHONE_LEN = 13
REF_LEN       = 10


# ----- AUTH -----
gmail_flow = InstalledAppFlow.from_client_secrets_file(
    CLIENT_SECRET,
    scopes=[
        'https://www.googleapis.com/auth/gmail.modify',  # read + mark read
        'https://www.googleapis.com/auth/gmail.readonly',
        'https://www.googleapis.com/auth/gmail.send'
    ]
)
gmail_creds = gmail_flow.run_local_server(port=0)
gmail = build('gmail', 'v1', credentials=gmail_creds)

sheets_creds = Credentials.from_service_account_file(
    SERVICE_KEY,
    scopes=['https://www.googleapis.com/auth/spreadsheets',
            'https://www.googleapis.com/auth/drive'])
gc = gspread.authorize(sheets_creds)
sh = gc.open(SHEET_NAME)

# ----- PARSER (flexible account code; 10-char ref) -----
PATTERN = re.compile(
    rf'payment of KES ([\d,]+\.\d{{2}}) '
    rf'for account: PAYLEMAIYAN\s*#?\s*([A-Za-z]\d{{1,2}})'
    rf' has been received from (.+?) '
    rf'(.{{1,{MAX_PHONE_LEN}}}) '
    rf'on (\d{{2}}/\d{{2}}/\d{{4}} \d{{1,2}}:\d{{2}} [APM]{{2}})\. '
    rf'M-Pesa Ref: ([A-Z0-9]{{{REF_LEN}}})',
    flags=re.IGNORECASE
)

def parse_email(text: str):
    m = PATTERN.search(text or "")
    if not m:
        return None
    amt, code, payer, phone, dt, ref = m.groups()
    return {
        'Date Paid':   dt.strip(),                      # dd/mm/YYYY hh:mm AM/PM
        'Amount Paid': float(amt.replace(',', '')),
        'REF Number':  ref.upper(),
        'Payer':       payer.strip(),
        'Phone':       phone.strip(),
        'Payment Mode':'MPESA Payment',
        'AccountCode': code.upper(),                    # used for routing to the tenant sheet
    }

# ----- GMAIL: get message text (best-effort) -----
def get_message_text(service, msg_id):
    msg = service.users().messages().get(userId="me", id=msg_id, format="full").execute()
    payload = msg.get("payload", {})
    body_texts = []

    def walk(part):
        mime = part.get("mimeType", "")
        data = part.get("body", {}).get("data")
        parts = part.get("parts", [])
        if mime == "text/plain" and data:
            body_texts.append(base64.urlsafe_b64decode(data).decode("utf-8", errors="ignore"))
        for p in parts:
            walk(p)

    walk(payload)
    if body_texts:
        return "\n".join(body_texts)
    return msg.get("snippet", "")

# Normalizer: lower, trim, collapse spaces, strip punctuation/currency/nbsp
_PUNCT = re.compile(r"[^\w\s/]+", re.UNICODE)
def _norm(s):
    if s is None: return ""
    s = str(s).replace("\xa0", " ")  # nbsp -> space
    s = s.strip().lower()
    s = _PUNCT.sub("", s)            # remove punctuation like ( ), :, KES, etc.
    s = re.sub(r"\s+", " ", s)
    return s

# Broad alias sets (normalized)
ALIASES = {
    "month": {
        "month","month/period","period","rent month","billing month"
    },
    "amount_due": {
        "amount due","rent due","due","amountdue","monthly rent","rent","amount due kes","rent (kes)"
    },
    "amount_paid": {
        "amount paid","paid","amt paid","paid (kes)","amountpaid"
    },
    "date_paid": {
        "date paid","paid date","payment date","datepaid"
    },
    "ref": {
        "ref number","ref","reference","ref no","reference no","mpesa ref","mpesa reference","receipt","receipt no"
    },
    "date_due": {
        "date due","due date","rent due date","datedue"
    },
    "prepay_arrears": {
        "prepayment/arrears","prepayment","arrears","balance","bal","prepayment arrears","carry forward","cf"
    },
    "penalties": {
        "penalties","penalty","late fee","late fees","fine","fines"
    },
}

REQUIRED_KEYS = ["month","amount_due","amount_paid","date_paid","ref","date_due","prepay_arrears","penalties"]

# --- Helper to score header row ---
def _score_header(row_norm):
    """How many required columns does this row satisfy?"""
    hits = 0
    for key in REQUIRED_KEYS:
        if any(a in row_norm for a in ALIASES[key]):
            hits += 1
    return hits

# --- Helper to map row tokens to column keys ---
def _header_map_from_row(row):
    """Return (colmap) by matching normalized row tokens against aliases."""
    row_norm = [_norm(c) for c in row]
    colmap = {}
    for key, aliases in ALIASES.items():
        for i, token in enumerate(row_norm):
            if token in aliases:
                colmap[key] = i
                break
    return colmap


# --- Helper to detect or create a header row ---
def _detect_or_create_header(ws):
    """
    Find a header row in the first 10 rows.
    If none reaches a threshold (>=4 matches), insert a standard header at row 1.
    Returns (header_row_idx_0based, header_list, colmap).
    """
    all_data = ws.get_all_values()
    max_rows = len(all_data) if all_data else 1
    probe_rows = min(max_rows, 10)
    last_col = ws.col_count or 12
    rn = f"A1:{rowcol_to_a1(probe_rows, last_col)}"
    values = ws.get_values(rn)  # rectangular cut

    best_idx, best_hits, best_map = None, -1, None
    for idx, row in enumerate(values):
        colmap = _header_map_from_row(row)
        hits = len(colmap)
        if hits > best_hits:
            best_idx, best_hits, best_map = idx, hits, colmap

    if best_hits >= 4:
        header = ws.row_values(best_idx+1)
        missing_keys = [k for k in REQUIRED_KEYS if k not in best_map]
        if missing_keys:
            standard_columns = {
                "month": "Month",
                "amount_due": "Amount Due",
                "amount_paid": "Amount paid",
                "date_paid": "Date paid",
                "ref": "REF Number",
                "date_due": "Date due",
                "prepay_arrears": "Prepayment/Arrears",
                "penalties": "Penalties"
            }
            for key in missing_keys:
                header.append(standard_columns[key])
            ws.update(values=[header], range_name=f"{best_idx+1}:{best_idx+1}", value_input_option="USER_ENTERED")
            best_map = _header_map_from_row(header)
        return best_idx, header, best_map
    # No good header found: create standard header on row 1
    header = ['Month','Amount Due','Amount paid','Date paid','REF Number','Date due','Prepayment/Arrears','Penalties']
    if max_rows == 0:
        ws.update(values=[header], range_name="1:1", value_input_option="USER_ENTERED")
    else:
        ws.insert_row(header, index=1, value_input_option="USER_ENTERED")
    return 0, header, _header_map_from_row(header)


# --- Helper to convert date string to month key ---
def _month_key_from_date_str(date_str):
    dt = datetime.strptime(date_str, '%d/%m/%Y %I:%M %p')
    return dt.strftime('%B-%Y'), dt   # e.g., January-2025

# --- Helper to find the month row in values ---
def _find_month_row(values, month_col_idx, month_key):
    for r in range(1, len(values)):  # skip header at 0
        cell = str(values[r][month_col_idx]).strip()
        if not cell:
            continue
        # accept "Jan-2025"/"JAN 2025"/"January 2025"
        if cell.lower().startswith(month_key.lower()[:3]) and month_key[-4:] in cell:
            return r
    return None

# --- Helper to convert row/col to letter(s) ---
def _col_letter(row, col):
    """Return column letter(s) for a given 1-based row/col using A1 conversion."""
    return re.sub(r'\d+', '', rowcol_to_a1(row, col))


# --- Main function to update tenant month row ---
def update_tenant_month_row(tenant_ws, payment):
    """
    Realtime version:
      - Writes ONLY: Amount paid, Date paid, REF Number
      - Sets once-per-row formulas for:
          Prepayment/Arrears = N(Amount paid) - N(Amount Due)
          Penalties          = IF(DATEVALUE(LEFT(DatePaid,10)) > DATEVALUE(DateDue)+2, 3000, 0)
    """

    # --- detect/insert header (uses your robust detector from the previous block) ---
    header_row0, header, colmap = _detect_or_create_header(tenant_ws)
    missing = [k for k in REQUIRED_KEYS if k not in colmap]
    if missing:
        raise ValueError(f"Sheet '{tenant_ws.title}' missing required columns after normalization: {missing}")

    # Reload values from header row downward
    all_vals = tenant_ws.get_all_values()
    vals = all_vals[header_row0:]
    base_row_1based = header_row0 + 1

    # --- find or create the month row ---
    month_key, pay_dt = _month_key_from_date_str(payment['Date Paid'])
    row_rel = _find_month_row(vals, colmap['month'], month_key)
    if row_rel is None:
        new_row = [''] * len(header)
        new_row[colmap['month']] = month_key
        new_row[colmap['amount_due']] = '0'
        new_row[colmap['amount_paid']] = '0'
        new_row[colmap['date_paid']] = ''
        new_row[colmap['ref']] = ''
        # Set Date due as the previous row's date due plus one month.
        # Try to get last row's Date due (skip header row)
        if len(vals) > 1 and vals[-1][colmap['date_due']]:
            try:
                last_date_due = datetime.strptime(vals[-1][colmap['date_due']], "%d/%m/%Y").replace(day=5)
                new_date_due = last_date_due + relativedelta(months=1)
            except Exception:
            # Fallback to payment date plus one month if parsing fails
                new_date_due = datetime.strptime(payment['Date Paid'], '%d/%m/%Y %I:%M %p') + relativedelta(months=1)
        else:
            new_date_due = datetime.strptime(payment['Date Paid'], '%d/%m/%Y %I:%M %p') + relativedelta(months=1)
            new_row[colmap['date_due']] = new_date_due.strftime("%d/%m/%Y")
            
        # prepay/arrears and penalties will be set as FORMULAS after append
        tenant_ws.append_row(new_row, value_input_option='USER_ENTERED')
        all_vals = tenant_ws.get_all_values()
        vals = all_vals[header_row0:]
        row_rel = len(vals) - 1

    row_abs_1based = base_row_1based + row_rel
    row = vals[row_rel]

    # --- helpers to coerce numbers/strings ---
    def _num(v):
        try:
            s = str(v).replace(',','').strip()
            return float(s) if s else 0.0
        except:
            return 0.0
    def _str(v):
        return '' if v is None else str(v)

    # current row values
    due0   = _num(row[colmap['amount_due']])
    paid0  = _num(row[colmap['amount_paid']])
    ref0   = _str(row[colmap['ref']])

    pay_amt = float(payment['Amount Paid'])

    # (if you previously tracked arrears carryover in this cell, you can ignore that here
    #  because the balance is now a live formula: Paid - Due)
    paid1 = paid0 + pay_amt

    # --- 1) write the three direct fields ---
    updates = {
        colmap['amount_paid']:  paid1,
        colmap['date_paid']:    payment['Date Paid'],
        colmap['ref']:          (payment['REF Number'] if not ref0 else f"{ref0}, {payment['REF Number']}")
    }

    # compact range write
    touched = sorted(updates.keys())
    c1 = touched[0] + 1
    c2 = touched[-1] + 1
    rng = f"{rowcol_to_a1(row_abs_1based, c1)}:{rowcol_to_a1(row_abs_1based, c2)}"
    payload = [''] * (c2 - c1 + 1)
    for cidx, val in updates.items():
        payload[(cidx + 1 - c1)] = val
    payload = [str(x) if x is not None else '' for x in payload]

    for attempt in range(5):
        try:
            tenant_ws.update(values=[payload], range_name=rng, value_input_option='USER_ENTERED')
            break
        except HttpError as e:
            if getattr(e, "resp", None) and e.resp.status == 429:
                time.sleep(5 * (attempt+1))
                continue
            raise

    # --- 2) ensure the formula cells are present (set once; they’ll recalc automatically) ---
    col_letters = {k: _col_letter(row_abs_1based, colmap[k] + 1) for k in colmap}
    # addresses for this row:
    amt_paid_addr = f"{col_letters['amount_paid']}{row_abs_1based}"
    amt_due_addr  = f"{col_letters['amount_due']}{row_abs_1based}"
    date_paid_addr= f"{col_letters['date_paid']}{row_abs_1based}"
    date_due_addr = f"{col_letters['date_due']}{row_abs_1based}"
    bal_addr      = f"{col_letters['prepay_arrears']}{row_abs_1based}"
    pen_addr      = f"{col_letters['penalties']}{row_abs_1based}"


    # Penalties formula: if DatePaid > DateDue + 2 days, penalty = 3000
    pen_formula = f"=IF(DATEVALUE(LEFT({date_paid_addr},10))>DATEVALUE({date_due_addr})+2, 3000, 0)"

    # Balance formula: if first data row, =N(amt_paid)-N(amt_due); else, =N(prev_bal)+N(amt_paid)-N(amt_due)
    if row_abs_1based == base_row_1based:
        bal_formula = f"=N({amt_paid_addr})-N({amt_due_addr})-N({pen_addr})"
    else:
        prev_bal_addr = f"{col_letters['prepay_arrears']}{row_abs_1based-1}"
        bal_formula = f"=N({prev_bal_addr})+N({amt_paid_addr})-N({amt_due_addr})-N({pen_addr})"
    

    # Only set if not already a formula (so we don't overwrite intentional manual values)
    current_bal = tenant_ws.acell(bal_addr).value or ""
    current_pen = tenant_ws.acell(pen_addr).value or ""
    needs_bal = not str(current_bal).startswith("=")
    needs_pen = not str(current_pen).startswith("=")

    # Set any missing formulas in a single batch
    body = []
    if needs_bal:
        body.append({'range': bal_addr, 'values': [[bal_formula]]})
    if needs_pen:
        body.append({'range': pen_addr, 'values': [[pen_formula]]})
    if body:
        tenant_ws.batch_update(body, value_input_option='USER_ENTERED')

    # Return info (no computed numbers now—Sheet will reflect in realtime)
    return {
        'sheet': tenant_ws.title,
        'month_row': row_abs_1based,
        'paid_before': paid0,
        'paid_after': paid1,
        'ref_added': payment['REF Number'],
        'formulas_set': {'balance': needs_bal, 'penalties': needs_pen},
        'balance_addr': bal_addr,    
        'penalties_addr': pen_addr       
    }



# ----- META SHEETS (ProcessedRefs, PaymentHistory) -----
def ensure_meta(ws_name, header):
    try:
        ws = sh.worksheet(ws_name)
    except gspread.WorksheetNotFound:
        ws = sh.add_worksheet(ws_name, rows=2000, cols=max(10, len(header)))
        ws.append_row(header)
    return ws

refs_ws = ensure_meta("ProcessedRefs", ["Ref"])
hist_ws = ensure_meta("PaymentHistory", PAYMENT_COLS + ['AccountCode','TenantSheet','Month'])

# Load processed refs into a set
ref_vals = refs_ws.get_all_values()
processed_refs = set((r[0] or '').upper() for r in ref_vals[1:]) if len(ref_vals) > 1 else set()

# ----- GMAIL FETCH + PARSE -----
print("🔎 Searching Gmail…")
result = gmail.users().messages().list(userId="me", q=GMAIL_QUERY, maxResults=200).execute()
msg_list = result.get("messages", [])
print(f"Found {len(msg_list)} candidate emails.")

parsed, errors = [], []
for m in msg_list:
    try:
        text = get_message_text(gmail, m["id"])
        pay = parse_email(text)
        if not pay:
            errors.append(f"Could not parse message id {m['id']}")
            continue
        if pay['REF Number'] in processed_refs:
            continue
        parsed.append((m["id"], pay))
    except Exception as e:
        errors.append(f"Error reading message {m['id']}: {e}")

print(f"✅ Parsed {len(parsed)} new payments.")

# ----- APPLY: to tenant sheets + PaymentHistory + ProcessedRefs -----
logs = []
tenant_tally = {}

# Cache worksheets to reduce calls
worksheets = {ws.title: ws for ws in sh.worksheets()}

def find_or_create_tenant_sheet(account_code: str):
    for title, ws in worksheets.items():
        t = title.upper()
        if t.startswith(account_code) and 'PROCESSEDREFS' not in t and 'PAYMENTHISTORY' not in t:
            return ws
    title = f"{account_code} - AutoAdded"
    ws = sh.add_worksheet(title, rows=1000, cols=12)
    ws.update(values=[['Month','Amount Due','Amount paid','Date paid','REF Number','Date due','Prepayment/Arrears','Penalties']],
              range_name='A1', value_input_option='USER_ENTERED')
    ws.format('1:1', {'textFormat': {'bold': True}})
    ws.freeze(rows=1)
    worksheets[title] = ws
    logs.append(f"➕ Created tenant sheet: {title}")
    return ws

for msg_id, p in parsed:
    tenant_ws = find_or_create_tenant_sheet(p['AccountCode'])
    info = update_tenant_month_row(tenant_ws, p)
    # Read the live, recalculated values
    logs.append(
    f"🧾 {info['sheet']} R{info['month_row']} | "
    f"Paid {info['paid_before']}→{info['paid_after']} | "
    f"Ref {info['ref_added']} | Bal/penalties will auto-update in sheet"
    )
    
    tenant_tally[info['sheet']] = tenant_tally.get(info['sheet'], 0) + 1

    # PaymentHistory
    dt = datetime.strptime(p['Date Paid'], '%d/%m/%Y %I:%M %p')
    mon = dt.strftime('%Y-%m')
    hist_ws.append_row(
        [p[k] for k in PAYMENT_COLS] + [p['AccountCode'], tenant_ws.title, mon],
        value_input_option='USER_ENTERED'
    )

    # ProcessedRefs
    refs_ws.append_row([p['REF Number']], value_input_option='RAW')
    processed_refs.add(p['REF Number'])

    # Mark Gmail read (optional)
    try:
        gmail.users().messages().modify(userId='me', id=msg_id, body={'removeLabelIds': ['UNREAD']}).execute()
    except HttpError:
        pass

    time.sleep(2)  # throttle writes

# ----- GROUPED MONTHLY SUMMARY (display) -----
hist_vals = hist_ws.get_all_values()
if len(hist_vals) > 1:
    df = pd.DataFrame(hist_vals[1:], columns=hist_vals[0])
    with pd.option_context('display.float_format', '{:,.2f}'.format):
        df['Amount Paid'] = pd.to_numeric(df['Amount Paid'], errors='coerce').fillna(0.0)
        grouped = df.groupby('Month', dropna=False).agg(
            Payments=('REF Number','count'),
            TotalAmount=('Amount Paid','sum')
        ).reset_index().sort_values('Month')
        display(grouped)
else:
    print("No payment history yet.")

# ----- LOGS -----
print("\n------ BOT LOG ------")
for line in logs:
    print(line)
print("\nPayments per tenant sheet:")
for t, c in tenant_tally.items():
    print(f"  {t}: {c} payment(s)")
if errors:
    print("\nNon-fatal parse/read issues:")
    for e in errors:
        print("  -", e)
print("\n✅ Prototype run complete.")


Please visit this URL to authorize this application: https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=899105285450-50tdk35cnnrrich3nlr0d80kdp2qeovr.apps.googleusercontent.com&redirect_uri=http%3A%2F%2Flocalhost%3A64549%2F&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fgmail.modify+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fgmail.readonly+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fgmail.send&state=RLSPJlljDsFVNMjhzgOmSIa4rLFqNV&access_type=offline
🔎 Searching Gmail…
Found 200 candidate emails.
✅ Parsed 0 new payments.


Unnamed: 0,Month,Payments,TotalAmount
0,2025-01,51,682000
1,2025-02,42,559000
2,2025-03,71,853000
3,2025-04,68,845000
4,2025-05,45,545000
5,2025-06,54,712000
6,2025-07,56,690000
7,2025-08,28,318000



------ BOT LOG ------

Payments per tenant sheet:

✅ Prototype run complete.


## DEPLOYMENT CODE

* In this integration, we extend the proof‐of‐concept prototype into a Streamlit app. The app retains the core functionality: • OAuth-based Google authentication to securely access Gmail, Google Sheets, and Drive. • Parsing payment notification emails with regular expressions. • Updating the relevant tenant sheets and maintaining meta data like ProcessedRefs and PaymentHistory.

* Streamlit provides a user-friendly dashboard where users can trigger the payment bot and view real-time summaries and logs. This separation of the UI from the backend logic enables rapid deployment and easier scalability.


In [None]:
# The bot_logic (backend)

"""
Core sheet-update logic for Rent RPA.

WHY these choices:
- Header-aware & additive: avoids breaking existing tabs.
- Robust parser: tolerates case, '#' omission, varied date formats.
- Business rules baked into formulas: easy to audit in-sheet.
- Minimal API calls with backoff: protects against quota spikes.
- MonthKey + sorting: keeps months ordered even with mixed month strings.
"""

from __future__ import annotations
from copy import deepcopy
from typing import Dict, List, Optional, Tuple
from datetime import datetime
import re
import time
from gspread.utils import rowcol_to_a1

# Match workbook schema (first 6 columns used by PaymentHistory)
PAYMENT_COLS = ['Date Paid', 'Amount Paid', 'REF Number', 'Payer', 'Phone', 'Comments']

# --- NCBA email parser (tolerant to case, '#', date formats) -----------------

DATE_PAT = (
    r'(?:'
    r'\d{2}/\d{2}/\d{4}\s+\d{1,2}:\d{2}\s+[APMapm]{2}'
    r'|'
    r'\d{4}[-/]\d{2}[-/]\d{2}[ T]\d{1,2}:\d{2}(?:\s?[APMapm]{2})?'
    r'|'
    r'\d{2}[-/]\d{2}[-/]\d{4}\s+\d{1,2}:\d{2}(?:\s?[APMapm]{2})?'
    r')'
)

# WHY: Explicit groups for amount, account code, payer, phone, date, ref.
PATTERN = re.compile(
    rf'payment of KES ([\d,]+\.\d{{2}})\s*'
    rf'for account:\s*PAYLEMAIYAN\s*#?\s*([A-Za-z]\d{{1,2}})\s*'
    rf'has been received from\s+(.+?)\s+(.{{1,13}})\s+'
    rf'on\s+({DATE_PAT})\.?\s+'
    rf'M-?Pesa Ref:\s*([A-Za-z0-9\-\s]{{6,32}})',
    flags=re.IGNORECASE
)

def _normalize_ref(ref_raw: str, min_len: int = 8, max_len: int = 16) -> str | None:
    core = re.sub(r'[^A-Za-z0-9]', '', (ref_raw or '')).upper()
    return core if min_len <= len(core) <= max_len else None  # WHY: reject obviously bad refs

def _normalize_payer(name: str) -> str:
    n = (name or "").strip()
    return " ".join(part.capitalize() for part in re.split(r"\s+", n) if part)  # WHY: tidy casing

def _normalize_date_ddmmyyyy(dt_str: str) -> str:
    dt_str = (dt_str or "").strip()
    fmts = [
        "%d/%m/%Y %I:%M %p", "%d/%m/%Y %H:%M",
        "%d-%m-%Y %I:%M %p", "%d-%m-%Y %H:%M",
        "%Y-%m-%d %H:%M", "%Y/%m/%d %H:%M",
        "%Y-%m-%d %I:%M %p", "%Y/%m/%d %I:%M %p",
        "%d/%m/%Y", "%d-%m-%Y", "%Y-%m-%d", "%Y/%m/%d",
    ]
    for f in fmts:
        try:
            dt = datetime.strptime(dt_str, f)
            return dt.strftime("%d/%m/%Y")
        except ValueError:
            continue
    return dt_str  # WHY: keep original if parsing fails; written as text

def parse_email(text: str) -> Optional[Dict]:
    m = PATTERN.search(text or "")
    if not m:
        return None
    amt, code, payer, phone, dt_str, ref_raw = m.groups()
    ref = _normalize_ref(ref_raw)
    if not ref:
        return None  # WHY: REF is our idempotency key
    return {
        'Date Paid':   _normalize_date_ddmmyyyy(dt_str),
        'Amount Paid': float((amt or "0").replace(',', '')) if amt else 0.0,
        'REF Number':  ref,
        'Payer':       _normalize_payer(payer),
        'Phone':       (phone or "").strip(),
        'AccountCode': (code or "").upper(),       # WHY: route to sheet by code
        'Comments':    "",                         # WHY: keep schema stable; never overwrite user comments
    }

# --- Canonical headers & helpers --------------------------------------------

ALIASES = {
    "month": {"month","month/period","period","rent month","billing month"},
    "amount_due": {"amount due","rent due","due","monthly rent","rent","amount due kes","rent kes"},
    "amount_paid": {"amount paid","paid","amt paid","paid kes","amountpaid"},
    "date_paid": {"date paid","paid date","payment date","datepaid","date of payment"},
    "ref": {"ref number","ref","reference","ref no","reference no","mpesa ref","mpesa reference","receipt","receipt no"},
    "date_due": {"date due","due date","rent due date","datedue"},
    "prepay_arrears": {"prepayment/arrears","prepayment","arrears","balance","bal","prepayment arrears","carry forward","cf"},
    "penalties": {"penalties","penalty","late fee","late fees","fine","fines"},
    "comments": {"comments","comment","remarks","notes","note"},
}
REQUIRED_KEYS = ["month","amount_due","amount_paid","date_paid","ref","date_due","prepay_arrears","penalties","comments"]
CANONICAL_NAME = {
    "month": "Month",
    "amount_due": "Amount Due",
    "amount_paid": "Amount Paid",
    "date_paid": "Date Paid",
    "ref": "REF Number",
    "date_due": "Date Due",
    "prepay_arrears": "Prepayment/Arrears",
    "penalties": "Penalties",
    "comments": "Comments",
}
CANONICAL_SET = set(CANONICAL_NAME.values())

def _norm_header(s: str) -> str:
    s = str(s or "").replace("\xa0", " ").strip().lower()
    s = re.sub(r"[^\w\s/]+", "", s)
    s = re.sub(r"\s+", " ", s)
    return s

def _alias_key_for(normalized_header: str) -> Optional[str]:
    for key, aliases in ALIASES.items():
        if normalized_header in aliases:
            return key
    return None

def _header_colmap(header: List[str]) -> Dict[str, int]:
    # WHY: build mapping across canonical/alias names; first win per key
    colmap: Dict[str, int] = {}
    seen = set()
    for idx, name in enumerate(header):
        name_str = str(name or "")
        norm = _norm_header(name_str)
        if name_str in CANONICAL_SET:
            key = next(k for k, v in CANONICAL_NAME.items() if v == name_str)
        else:
            key = _alias_key_for(norm)
        if key and key not in seen:
            colmap[key] = idx
            seen.add(key)
    return colmap

# --- Utilities: backoff, grid safety, header detection, month helpers --------

def _with_backoff(fn, *args, **kwargs):
    delay = 1.0
    for _ in range(6):
        try:
            return fn(*args, **kwargs)
        except Exception as e:
            status = getattr(getattr(e, "resp", None), "status", None)
            if status == 429 or "quota" in str(e).lower() or "rate limit" in str(e).lower():
                time.sleep(delay); delay *= 2; continue  # WHY: survive transient quota errors
            raise

def _with_backoff_factory(fn_factory, *, max_tries=6):
    delay = 1.0
    for _ in range(max_tries):
        try:
            return fn_factory()
        except Exception as e:
            status = getattr(getattr(e, "resp", None), "status", None)
            if status == 429 or "quota" in str(e).lower() or "rate limit" in str(e).lower():
                time.sleep(delay); delay *= 2; continue
            raise

def _strip_ws_prefix(a1: str) -> str:
    s = str(a1 or "")
    # WHY: gspread returns "'Sheet'!A1" sometimes; batch_update expects bare A1
    while True:
        m = re.match(r"^'[^']+'!(.+)$", s)
        if not m: break
        s = m.group(1)
    return s

def _ensure_grid_size(ws, need_rows: Optional[int] = None, need_cols: Optional[int] = None):
    try:
        cur_rows = ws.row_count
        cur_cols = ws.col_count
        if need_rows is not None and need_rows > cur_rows:
            _with_backoff(ws.add_rows, need_rows - cur_rows)
        if need_cols is not None and need_cols > cur_cols:
            _with_backoff(ws.add_cols, need_cols - cur_cols)
    except Exception:
        pass  # WHY: some mocks/older APIs lack row_count/col_count

def _detect_header_row(all_vals, scan_rows: int = 30) -> int:
    # WHY: headers often sit at row 7; detect by scoring canonical/alias hits
    def score(row):
        s = 0
        for cell in row:
            if not cell: continue
            txt = str(cell).strip()
            if txt in CANONICAL_SET or _alias_key_for(_norm_header(txt)):
                s += 1
        return s
    best_i, best_score = 0, 0
    limit = min(len(all_vals), max(1, scan_rows))
    for i in range(limit):
        sc = score(all_vals[i])
        if sc > best_score:
            best_i, best_score = i, sc
    return best_i if best_score >= 3 else 0

def _parse_month_cell(s: str) -> Optional[Tuple[int,int]]:
    if not s: return None
    s = str(s).strip()
    m = re.match(r"^([A-Za-z]{3,9})[- ](\d{4})$", s)
    if m:
        name, year = m.group(1), int(m.group(2))
        for fmt in ("%b", "%B"):
            try:
                dt = datetime.strptime(f"01 {name} {year}", f"%d {fmt} %Y")
                return (dt.year, dt.month)
            except ValueError:
                pass
    m = re.match(r"^(\d{4})[-/](\d{1,2})$", s)
    if m: return (int(m.group(1)), int(m.group(2)))
    m = re.match(r"^(\d{1,2})[-/](\d{4})$", s)
    if m: return (int(m.group(2)), int(m.group(1)))
    return None

def _choose_month_display(existing_samples: List[str], dt: datetime) -> str:
    # WHY: preserve sheet style (e.g., "Sep-2025" vs "2025-09")
    for v in existing_samples:
        t = (v or "").strip()
        if not t: continue
        if re.fullmatch(r"\d{4}[-/]\d{2}", t):
            return f"{dt.year}-{dt.month:02d}"
        if re.fullmatch(r"[A-Za-z]{3,9}[- ]\d{4}", t):
            return dt.strftime("%b-%Y")
        if re.fullmatch(r"\d{2}[-/]\d{4}", t):
            return f"{dt.month:02d}-{dt.year}"
    return dt.strftime("%b-%Y")

def _ensure_monthkey_and_fill(ws, header_row0: int, header: list[str], colmap: dict):
    # WHY: stable sorting regardless of "Month" display format
    norm = [h.strip().lower() for h in header]
    if "monthkey" not in norm:
        header.append("MonthKey")
        ws.update(f"{header_row0+1}:{header_row0+1}", [header], value_input_option="USER_ENTERED")
        colmap.clear(); colmap.update(_header_colmap(header))
    mk_idx = colmap.get("monthkey", len(header)-1)
    mon_idx = colmap["month"]

    vals = ws.get_all_values()
    data = vals[header_row0+1:]
    updates = []
    for i, row in enumerate(data, start=header_row0+2):
        mon_txt = row[mon_idx] if len(row) > mon_idx else ""
        ym = _parse_month_cell(mon_txt)
        if not ym: continue
        y, m = ym
        a1 = rowcol_to_a1(i, mk_idx+1)
        updates.append({"range": a1, "values": [[f"{y:04d}-{m:02d}"]]})
    if updates:
        ws.batch_update(updates, value_input_option="USER_ENTERED")

def _sort_by_monthkey(ws, header_row0: int, header: list[str], colmap: dict):
    try:
        mk_idx = [h.strip().lower() for h in header].index("monthkey")
    except ValueError:
        return
    ws.sort((mk_idx+1, 'asc'))  # WHY: sheets-native sort

# --- Conditional formatting: arrears/penalties highlights --------------------

def _ensure_conditional_formatting(ws, header_row0: int, colmap: Dict[str,int], cache: Dict, debug: Optional[List[str]] = None):
    if cache.get("cf_applied"): return
    try:
        sheet_id = ws.id
    except Exception:
        return
    start_row = header_row0 + 1
    requests = [
        {   # WHY: highlight negative balances (arrears)
            "addConditionalFormatRule": {
                "rule": {
                    "ranges": [{
                        "sheetId": sheet_id,
                        "startRowIndex": start_row,
                        "startColumnIndex": colmap["prepay_arrears"],
                        "endColumnIndex": colmap["prepay_arrears"] + 1
                    }],
                    "booleanRule": {
                        "condition": {"type": "NUMBER_LESS", "values": [{"userEnteredValue": "0"}]},
                        "format": {"backgroundColor": {"red": 1.0, "green": 0.84, "blue": 0.84}}
                    }
                },
                "index": 0
            }
        },
        {   # WHY: highlight any penalties
            "addConditionalFormatRule": {
                "rule": {
                    "ranges": [{
                        "sheetId": sheet_id,
                        "startRowIndex": start_row,
                        "startColumnIndex": colmap["penalties"],
                        "endColumnIndex": colmap["penalties"] + 1
                    }],
                    "booleanRule": {
                        "condition": {"type": "NUMBER_GREATER", "values": [{"userEnteredValue": "0"}]},
                        "format": {"backgroundColor": {"red": 1.0, "green": 1.0, "blue": 0.6}}
                    }
                },
                "index": 1
            }
        },
    ]
    try:
        ws.spreadsheet.batch_update({"requests": requests})
        cache["cf_applied"] = True
        if debug is not None: debug.append("Applied conditional formatting.")
    except Exception as e:
        if debug is not None: debug.append(f"CF skipped: {e}")

# --- Per-sheet cache (reduce API chatter within one run) ---------------------

_sheet_cache: Dict[str, Dict] = {}
def clear_cache(): _sheet_cache.clear()

# --- Main: update a tenant tab for a given payment ---------------------------

def update_tenant_month_row(ws, payment: Dict, debug: Optional[List[str]] = None) -> Dict:
    title = ws.title
    if debug is not None:
        debug.append(f"[{title}] Start REF={payment.get('REF Number')} Amt={payment.get('Amount Paid')} DatePaid={payment.get('Date Paid')}")

    if title not in _sheet_cache:
        all_vals = _with_backoff(ws.get_all_values)
        header_row0 = _detect_header_row(all_vals)
        header = list(all_vals[header_row0]) if len(all_vals) > header_row0 else []
        rows   = [list(r) for r in all_vals[header_row0+1:]] if len(all_vals) > header_row0+1 else []
        colmap = _header_colmap(header)

        added_any = False
        for key in REQUIRED_KEYS:
            if key not in colmap:
                header.append(CANONICAL_NAME[key])
                for r in rows: r.append("None" if key == "comments" else "")
                added_any = True

        _ensure_grid_size(ws, need_rows=header_row0+1, need_cols=len(header))
        _with_backoff(ws.update, f"{header_row0+1}:{header_row0+1}", [header], value_input_option='USER_ENTERED')

        cache = {"header_row0": header_row0, "header": header, "rows": rows, "colmap": _header_colmap(header), "cf_applied": False}
        _sheet_cache[title] = cache
        if debug is not None:
            debug.append(f"[{title}] Header row {header_row0+1}, added_missing={added_any}")
    else:
        cache = _sheet_cache[title]

    header_row0, header, rows, colmap = cache["header_row0"], cache["header"], cache["rows"], cache["colmap"]

    dt_paid = datetime.strptime(payment['Date Paid'], '%d/%m/%Y')
    target_y, target_m = dt_paid.year, dt_paid.month
    existing_month_samples = [r[colmap['month']] for r in rows if len(r) > colmap['month'] and r[colmap['month']]]
    month_display = _choose_month_display(existing_month_samples, dt_paid)

    row_abs = None; row_idx = None
    for idx_in_rows, r in enumerate(rows, start=1):
        if len(r) <= colmap['month']: continue
        ym = _parse_month_cell(r[colmap['month']])
        if ym and ym == (target_y, target_m):
            row_idx = idx_in_rows - 1
            row_abs = header_row0 + 1 + idx_in_rows
            break

    if row_abs is None:
        last_due = "0"
        for r in reversed(rows):
            if len(r) > colmap['amount_due'] and str(r[colmap['amount_due']]).strip():
                last_due = r[colmap['amount_due']]
                break
        new_row = [""] * len(header)
        new_row[colmap['month']]      = month_display
        new_row[colmap['amount_due']] = last_due
        if "comments" in colmap: new_row[colmap['comments']] = "None"
        rows.append(new_row)
        row_idx = len(rows) - 1
        row_abs = header_row0 + 1 + (row_idx + 1)
        _ensure_grid_size(ws, need_rows=row_abs, need_cols=len(header))
        _with_backoff(ws.update, f"{row_abs}:{row_abs}", [new_row], value_input_option='USER_ENTERED')
        if debug is not None:
            debug.append(f"[{title}] Created month row R{row_abs}; carry Amount Due={last_due}")

    row_vals = rows[row_idx]
    while len(row_vals) < len(header): row_vals.append("")

    def _num(v):
        try: return float(str(v).replace(",", "").strip() or 0)
        except Exception: return 0.0

    paid_before = _num(row_vals[colmap['amount_paid']])
    paid_after  = paid_before + float(payment['Amount Paid'])
    prev_ref    = row_vals[colmap['ref']] or ""
    ref_new     = payment['REF Number'] if not prev_ref else f"{prev_ref}, {payment['REF Number']}"
    due_str     = datetime(target_y, target_m, 5).strftime("%d/%m/%Y")

    row_vals[colmap['month']]       = month_display
    row_vals[colmap['amount_paid']] = str(paid_after)
    row_vals[colmap['date_paid']]   = payment['Date Paid']
    row_vals[colmap['ref']]         = ref_new
    row_vals[colmap['date_due']]    = due_str

    amt_paid_addr  = rowcol_to_a1(row_abs, colmap['amount_paid']+1)
    amt_due_addr   = rowcol_to_a1(row_abs, colmap['amount_due']+1)
    date_paid_addr = rowcol_to_a1(row_abs, colmap['date_paid']+1)
    date_due_addr  = rowcol_to_a1(row_abs, colmap['date_due']+1)
    pen_addr       = rowcol_to_a1(row_abs, colmap['penalties']+1)
    bal_addr       = rowcol_to_a1(row_abs, colmap['prepay_arrears']+1)
    prev_bal_addr  = rowcol_to_a1(row_abs-1, colmap['prepay_arrears']+1)

    # WHY: normalize text dates so comparisons work even if stored as text
    dpaid_expr  = f"IF(ISTEXT({date_paid_addr}), DATEVALUE({date_paid_addr}), {date_paid_addr})"
    ddue_expr   = f"IF(ISTEXT({date_due_addr}),  DATEVALUE({date_due_addr}),  {date_due_addr})"
    prev_bal_safe = f"IFERROR({prev_bal_addr},0)"
    net_after = f"({prev_bal_safe} + N({amt_paid_addr}) - N({amt_due_addr}))"

    # BUSINESS RULE: Penalty if paid > due + 2 days AND net balance negative
    pen_formula = (
        f"=IF(AND(LEN({date_paid_addr})>0, LEN({date_due_addr})>0, "
        f"{net_after} < 0, "
        f"{dpaid_expr} > {ddue_expr} + 2), 3000, 0)"
    )

    # Rolling balance (first data row vs subsequent)
    if row_abs == header_row0 + 2:
        bal_formula = f"=N({amt_paid_addr})-N({amt_due_addr})-N({pen_addr})"
    else:
        bal_formula = f"=N({prev_bal_addr})+N({amt_paid_addr})-N({amt_due_addr})-N({pen_addr})"

    _target_cols = [
        colmap['month']+1, colmap['amount_paid']+1, colmap['date_paid']+1,
        colmap['ref']+1, colmap['date_due']+1, colmap['penalties']+1, colmap['prepay_arrears']+1
    ]
    _ensure_grid_size(ws, need_rows=row_abs, need_cols=max(_target_cols))

    updates = [
        {"range": _strip_ws_prefix(rowcol_to_a1(row_abs, colmap['month']+1)),          "values": [[month_display]]},
        {"range": _strip_ws_prefix(rowcol_to_a1(row_abs, colmap['amount_paid']+1)),    "values": [[paid_after]]},
        {"range": _strip_ws_prefix(rowcol_to_a1(row_abs, colmap['date_paid']+1)),      "values": [[payment['Date Paid']]]},
        {"range": _strip_ws_prefix(rowcol_to_a1(row_abs, colmap['ref']+1)),            "values": [[ref_new]]},
        {"range": _strip_ws_prefix(rowcol_to_a1(row_abs, colmap['date_due']+1)),       "values": [[due_str]]},
        {"range": _strip_ws_prefix(rowcol_to_a1(row_abs, colmap['penalties']+1)),      "values": [[pen_formula]]},
        {"range": _strip_ws_prefix(rowcol_to_a1(row_abs, colmap['prepay_arrears']+1)), "values": [[bal_formula]]},
    ]
    _with_backoff_factory(lambda: ws.batch_update(deepcopy(updates), value_input_option='USER_ENTERED'))

    if debug is not None:
        debug.append(f"[{title}] R{row_abs}: paid {paid_before}→{paid_after}, due={due_str}")

    _ensure_conditional_formatting(ws, header_row0, colmap, cache, debug)
    _ensure_monthkey_and_fill(ws, header_row0, header, colmap)
    _sort_by_monthkey(ws, header_row0, header, colmap)

    return {
        "sheet": ws.title,
        "row": row_abs,
        "month": month_display,
        "paid_before": paid_before,
        "paid_after": paid_after,
        "date_due": due_str,
        "month_row": row_abs,
        "ref_added": payment.get("REF Number"),
        "formulas_set": {"balance": True, "penalties": True},
    }

In [None]:
# The front-end

"""
streamlit_app.py — RentRPA Streamlit UI

What this app does
------------------
- Authenticates to Google via OAuth (Gmail + Sheets scopes).
- Searches Gmail for payment alerts, parses them, and de-duplicates by REF.
- For each parsed payment, finds the matching tenant tab and updates the correct month row
  using `bot_logic.update_tenant_month_row` (which is header-row aware and non-destructive).
- Writes a clean PaymentHistory and ProcessedRefs to help with metrics and avoiding duplicates.
- Shows portfolio metrics (income this month, total prepayments/arrears, penalty frequency).
- Optional weekly automation (opt-in checkbox) runs Mondays 09:00 EAT *while the app is open*.

Key UX / Safety details
-----------------------
- The app never renames user headers or deletes columns/rows. It adds missing canonical columns to the far right.
- It assumes Date Due is always the 5th of the month, and computes Penalties + Prepayment/Arrears accordingly.
- The Comments column is never overwritten; new rows get "None" for Comments so a human can fill it in later.
"""

"""
Streamlit UI to ingest NCBA emails → Google Sheets.

WHY these choices:
- Safe OAuth flow & checker: avoids hard-crash when secrets missing.
- Robust Gmail parsing with throttling & dedupe by REF.
- History writes align with workbook header; tolerant to missing fields.
- Optional tenant tab auto-create with canonical headers.
- Portfolio metrics computed from PaymentHistory + per-tenant balances.
"""

import json, time, base64, re
import streamlit as st
import pandas as pd
from datetime import datetime, timedelta

from google_auth_oauthlib.flow import Flow
from google.oauth2.credentials import Credentials
from googleapiclient.discovery import build
from google.auth.transport.requests import Request
import gspread
from gspread.exceptions import APIError

# Extra: capture common OAuth errors to show helpful guidance
try:
    from oauthlib.oauth2.rfc6749.errors import (
        InvalidGrantError, MismatchingStateError, InvalidClientError, InvalidRequestError,
    )
except Exception:  # pragma: no cover
    InvalidGrantError = MismatchingStateError = InvalidClientError = InvalidRequestError = Exception

from bot_logic import (
    parse_email,
    update_tenant_month_row,
    PAYMENT_COLS,
    clear_cache
)

# ---------------------------------------------------------------------------
# 0) PAGE CHROME + TOP HELP
# ---------------------------------------------------------------------------

st.set_page_config(page_title="Rent RPA (Gmail → Sheets)", page_icon="🏠", layout="wide")
st.title("🏠 Rent RPA — Gmail → Google Sheets")
st.markdown(
    """
<div style="padding:10px;border:1px solid #ddd;border-radius:8px;background:#f7f5f4;margin-bottom:8px">
<b>Rules:</b> Date due = <b>5th</b>. <b>Penalty</b> = 3000 KES if paid <i>after</i> due + 2 days and balance is negative.<br>
<b>Safety:</b> We append missing headers; never overwrite <b>Comments</b>.
</div>
""",
    unsafe_allow_html=True,
)
st.caption("Tokens live only in your session memory. No server-side persistence.")

# ---------------------------------------------------------------------------
# 1) OAUTH CONFIG — lenient and debug-friendly
# ---------------------------------------------------------------------------

SCOPES = [
    "https://www.googleapis.com/auth/gmail.modify",
    "https://www.googleapis.com/auth/spreadsheets",
    "https://www.googleapis.com/auth/gmail.send",
    "https://www.googleapis.com/auth/gmail.readonly",
]

ENV            = st.secrets.get("ENV", "local")
GOOGLE_OAUTH   = st.secrets.get("google_oauth", {})
CLIENT_ID      = GOOGLE_OAUTH.get("client_id")
CLIENT_SECRET  = GOOGLE_OAUTH.get("client_secret")
REDIRECT_LOCAL = GOOGLE_OAUTH.get("redirect_uri_local")
REDIRECT_PROD  = GOOGLE_OAUTH.get("redirect_uri_prod")
REDIRECT_URI   = REDIRECT_PROD if ENV == "prod" else REDIRECT_LOCAL

def build_flow(state: str | None = None) -> Flow:
    """Create an OAuth 2.0 Flow; `state` is persisted across the auth dance.
    Why: Google checks `state` to prevent CSRF — mismatches raise InvalidGrant.
    """
    client_config = {
        "web": {
            "client_id": CLIENT_ID or "",
            "project_id": "rent-rpa",
            "auth_uri": "https://accounts.google.com/o/oauth2/auth",
            "token_uri": "https://oauth2.googleapis.com/token",
            "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
            "client_secret": CLIENT_SECRET or "",
            "redirect_uris": [REDIRECT_URI or ""],
        }
    }
    return Flow.from_client_config(client_config, scopes=SCOPES, redirect_uri=REDIRECT_URI or "", state=state)

def get_creds():
    if "creds_json" in st.session_state:
        return Credentials.from_authorized_user_info(json.loads(st.session_state["creds_json"]), SCOPES)
    return None

def store_creds(creds: Credentials):
    st.session_state["creds_json"] = creds.to_json()

def oauth_setup_checker():
    issues, tips = [], []
    if not CLIENT_ID or not CLIENT_SECRET:
        issues.append("Missing CLIENT_ID or CLIENT_SECRET in st.secrets['google_oauth'].")
    if not REDIRECT_URI:
        issues.append("Missing REDIRECT_URI (set redirect_uri_local / redirect_uri_prod).")
    else:
        if not REDIRECT_URI.startswith(("http://", "https://")):
            issues.append("REDIRECT_URI must start with http:// or https://")
        if "localhost" in REDIRECT_URI and not REDIRECT_URI.startswith("http://"):
            tips.append("For localhost, use http:// and include the exact port (e.g., http://localhost:8501/)")
        if not REDIRECT_URI.endswith("/"):
            tips.append("Ensure trailing slash matches GCP OAuth settings.")
    with st.expander("🔧 OAuth Setup Checker", expanded=False):
        st.code({"ENV": ENV, "CLIENT_ID_suffix": (CLIENT_ID or "")[-12:], "REDIRECT_URI": REDIRECT_URI, "SCOPES": SCOPES}, language="json")
        if issues: st.error("\n".join(f"• {i}" for i in issues))
        if tips:   st.info("\n".join(f"• {t}" for t in tips))
    return not issues

# Try refresh
creds = get_creds()
if creds and not creds.valid and creds.refresh_token:
    try:
        creds.refresh(Request()); store_creds(creds)
    except Exception:
        # Why: refresh_token can be revoked or expired; force re-auth cleanly.
        creds = None

# OAuth callback
params = st.query_params
if "code" in params and "state" in params and "creds_json" not in st.session_state:
    returned_state = params.get("state")
    saved_state = st.session_state.get("oauth_state")

    # Guard: if the state doesn't match, stop to avoid CSRF & invalid_grant
    if saved_state and returned_state != saved_state:
        st.error("OAuth state mismatch. Please retry."); st.stop()
    flow = build_flow(state=saved_state)
    try:
        flow.fetch_token(code=params["code"])
    except (MismatchingStateError, InvalidRequestError):
        st.error("OAuth parameters invalid. Retry sign-in."); st.stop()
    except (InvalidClientError, InvalidGrantError) as e:  # pragma: no cover
        st.error("Google rejected the OAuth exchange. Check redirect URI and retry.")
        st.stop()
    creds = flow.credentials
    store_creds(creds)
    st.query_params.clear()
    st.success("Signed in.")
    st.rerun()

# Auth gate
if not creds or not creds.valid:
    oauth_setup_checker()
    flow = build_flow()
    auth_url, state = flow.authorization_url(
        access_type="offline",
        include_granted_scopes=True,  # WHY: actual boolean (not string)
        prompt="consent",
    )
    # Persist state so the callback can verify it and rebuild Flow
    st.session_state["oauth_state"] = state
    st.link_button("🔐 Sign in with Google", auth_url, use_container_width=True)
    st.stop()

# --- Inputs -----------------------------------------------------------------

sheet_url = st.text_input("Google Sheet URL", placeholder="https://docs.google.com/spreadsheets/d/xxxxxxxxxxxxxxxxxxxx/edit#gid=0")
gmail_query = st.text_input(
    "Gmail search query",
    value='PAYLEMAIYAN subject:"NCBA TRANSACTIONS STATUS UPDATE" newer_than:365d',
    help="Use Gmail operators (is:unread, after:, before:) to narrow results."
)

c1, c2, c3, c4, c5, c6 = st.columns([1,1,1,1,1,1])
with c1: mark_read = st.checkbox("Mark processed as Read", value=True)
with c2: throttle_ms = st.number_input("Throttle (ms) between writes", min_value=0, value=200, step=50)
with c3: max_results = st.number_input("Max messages to scan", min_value=10, max_value=1000, value=200, step=10)
with c4: weekly_auto = st.checkbox("Enable weekly automation", value=False)
with c5: verbose_debug = st.checkbox("Verbose debug", value=True)
with c6: create_if_missing = st.checkbox("Auto-create tenant tabs", value=True)
run_now = st.button("▶️ Run Bot Now", type="primary", use_container_width=True)

# --- Helpers ----------------------------------------------------------------

def extract_sheet_id(url: str) -> str:
    try: return url.split("/d/")[1].split("/")[0]
    except Exception: return ""

def _decode_base64url(data: str) -> str:
    """Decode base64url Gmail parts safely (padding fixed)."""
    padding = '=' * (-len(data) % 4)
    return base64.urlsafe_b64decode(data + padding).decode("utf-8", errors="ignore")

def _strip_html(html: str) -> str:
    """Basic HTML to text for simple emails that don't include text/plain parts."""
    text = re.sub(r"<br\s*/?>", "\n", html, flags=re.I)
    text = re.sub(r"<[^>]+>", "", text)
    return text

def get_message_text(service, msg_id):
    """
    Fetch a Gmail message by id and return best-effort text.
    - Prefers text/plain parts
    - Falls back to stripped HTML
    - Finally, uses snippet if nothing else
    """
    msg = service.users().messages().get(userId="me", id=msg_id, format="full").execute()
    payload = msg.get("payload", {})
    body_texts = []
    def walk(part):
        mime = part.get("mimeType", "")
        data = part.get("body", {}).get("data")
        parts = part.get("parts", [])
        if mime == "text/plain" and data:
            body_texts.append(_decode_base64url(data))
        elif mime == "text/html" and data and not body_texts:
            body_texts.append(_strip_html(_decode_base64url(data)))
        for p in parts or []: walk(p)
    walk(payload or {})
    return "\n".join(body_texts) if body_texts else msg.get("snippet", "")

def with_backoff(fn, *args, **kwargs):
    """Backoff wrapper for Sheets operations (append_rows, update, etc.)."""
    delay = 1.0
    for _ in range(6):
        try:
            return fn(*args, **kwargs)
        except APIError as e:
            if hasattr(e, "response") and getattr(e.response, "status_code", None) == 429:
                time.sleep(delay); delay *= 2; continue
            raise
        except Exception as e:
            if "Rate Limit Exceeded" in str(e) or "quota" in str(e).lower():
                time.sleep(delay); delay *= 2; continue
            raise

def should_auto_run():
    """
    Weekly automation gate:
    - Fires Mondays 09:00 EAT (UTC+3) when the app is open.
    - Only once per week (tracked in session).
    """
    if not weekly_auto: return False
    now_utc = datetime.utcnow()
    eat = now_utc + timedelta(hours=3)
    in_window = (eat.weekday() == 0 and eat.hour == 9)
    last = st.session_state.get("last_auto_run_at")
    if in_window and (last is None or (now_utc - last) > timedelta(days=6)):
        return True
    return False

# --- Main run ---------------------------------------------------------------

auto_trigger = should_auto_run()
if auto_trigger:
    st.info("🤖 Weekly automation window detected — running.")
run_now = run_now or auto_trigger

if run_now:
    # Reset bot_logic cache at the start of each run to avoid stale header/col counts
    clear_cache()

    # Validate inputs
    if not sheet_url:
        st.error("Please paste your Google Sheet URL."); st.stop()
    sheet_id = extract_sheet_id(sheet_url)
    if not sheet_id:
        st.error("That doesn't look like a valid Google Sheet URL."); st.stop()

    # Build clients
    gmail = build("gmail", "v1", credentials=creds)
    gs = gspread.authorize(creds)

    # Open spreadsheet
    try:
        sh = gs.open_by_key(sheet_id)
    except Exception as e:
        st.error(f"Could not open the Google Sheet. Ensure edit access.\n\n{e}")
        st.stop()

    def ensure_meta(ws_name, header):
        try:
            ws = sh.worksheet(ws_name)
        except gspread.WorksheetNotFound:
            ws = sh.add_worksheet(title=ws_name, rows=2000, cols=max(10, len(header)))
            with_backoff(ws.update, "1:1", [header], value_input_option="USER_ENTERED")
        return ws

    # Headers aligned to workbook
    refs_ws = ensure_meta("ProcessedRefs", ["Refs"])  # WHY: matches workbook
    hist_ws = ensure_meta("PaymentHistory", PAYMENT_COLS + ['AccountCode','TenantSheet','Month'])

    # Load processed refs to avoid duplicates
    ref_vals = with_backoff(refs_ws.get_all_values)
    processed_refs = set((r[0] or '').upper() for r in ref_vals[1:]) if len(ref_vals) > 1 else set()

    # Gmail search
    st.write("🔎 Searching Gmail…")
    result = gmail.users().messages().list(userId="me", q=gmail_query, maxResults=int(max_results)).execute()
    messages = result.get("messages", [])
    st.write(f"Found {len(messages)} candidate emails.")

    # Parse & filter new payments
    parsed, errors = [], []
    for m in messages:
        try:
            text = get_message_text(gmail, m["id"])
            if "PAYLEMAIYAN" not in (text or "").upper():
                continue
            pay = parse_email(text)
            if not pay:
                errors.append(f"Could not parse message id {m['id']}")
                continue
            ref_norm = (pay.get('REF Number','') or '').upper()
            if ref_norm in processed_refs:
                continue  # WHY: idempotent writes by REF
            parsed.append((m["id"], pay))
        except Exception as e:
            errors.append(f"Error reading message {m['id']}: {e}")

    st.success(f"Parsed {len(parsed)} new payments.")

    # Find/create tenant worksheet by AccountCode (prefix match)
    worksheets = {ws.title: ws for ws in sh.worksheets()}

    def find_or_create_tenant_sheet(account_code: str):
        acct = (account_code or '').strip().upper()
        for title, ws in worksheets.items():
            t = title.strip().upper()
            if t.startswith(acct) and title not in ("ProcessedRefs","PaymentHistory"):
                return ws
        if not create_if_missing:
            raise RuntimeError(f"No tenant sheet found for AccountCode {account_code}")
        # Create with header on row 7 (matches your schema)
        title = f"{account_code} - AutoAdded"
        ws = sh.add_worksheet(title=title, rows=2000, cols=20)
        # Canonical header (row 7) — WHY: matches logic aliasing and keeps uniform
        hdr = ['Month','Date Due','Amount Due','Amount Paid','Date Paid','REF Number','Comments','Prepayment/Arrears','Penalties']
        with_backoff(ws.update, "7:7", [hdr], value_input_option="USER_ENTERED")
        try: ws.freeze(rows=7)
        except Exception: pass
        worksheets[title] = ws
        st.info(f"➕ Created tenant sheet: {title}")
        return ws

    # Process payments
    hist_rows, ref_rows, logs = [], [], []
    debug_accum = [] if verbose_debug else None

    for idx, (msg_id, p) in enumerate(parsed, start=1):
        ws = find_or_create_tenant_sheet(p["AccountCode"])
        info = update_tenant_month_row(ws, p, debug_accum)

        logs.append(f"🧾 {info.get('sheet')} R{info.get('row')} | {info.get('month')} | Paid {info.get('paid_before')}→{info.get('paid_after')} | Ref {p.get('REF Number')}")

        # Append to history using aligned schema; preserve extras at end
        hist_rows.append([p.get(k, "") for k in PAYMENT_COLS] + [
            p.get('AccountCode',''), ws.title, datetime.strptime(p["Date Paid"], "%d/%m/%Y").strftime("%Y-%m")
        ])

        ref_val = (p.get("REF Number","") or "").upper()
        ref_rows.append([ref_val])
        processed_refs.add(ref_val)

        # Optionally mark Gmail as read
        if mark_read:
            try:
                gmail.users().messages().modify(userId="me", id=msg_id, body={"removeLabelIds": ["UNREAD"]}).execute()
            except Exception:
                pass

        if throttle_ms > 0: time.sleep(throttle_ms / 1000.0)

        # Flush in batches
        if len(hist_rows) >= 100 or idx == len(parsed):
            with_backoff(hist_ws.append_rows, hist_rows, value_input_option="USER_ENTERED"); hist_rows.clear()
            with_backoff(refs_ws.append_rows, ref_rows, value_input_option="RAW"); ref_rows.clear()

    # Run results
    st.success("Ingestion complete.")
    st.subheader("Run Log")
    if logs: st.code("\n".join(logs), language="text")
    if errors:
        st.subheader("Non-fatal Parse/Read Errors"); st.code("\n".join(errors), language="text")
    if debug_accum:
        st.subheader("Verbose Debug (bot_logic)"); st.code("\n".join(debug_accum), language="text")

    # --- Metrics -------------------------------------------------------------
    st.subheader("📊 Portfolio Metrics")

    # PaymentHistory aggregates
    hist_vals = with_backoff(hist_ws.get_all_values)
    income_this_month = 0.0
    if len(hist_vals) > 1:
        df_hist = pd.DataFrame(hist_vals[1:], columns=hist_vals[0])
        for col in ("Amount Paid",):
            df_hist[col] = pd.to_numeric(df_hist[col], errors="coerce").fillna(0.0)
        this_month = datetime.now().strftime("%Y-%m")
        income_this_month = float(df_hist.loc[df_hist["Month"] == this_month, "Amount Paid"].sum())
        grouped = df_hist.groupby("Month", dropna=False).agg(Payments=("REF Number","count"), TotalAmount=("Amount Paid","sum")).reset_index().sort_values("Month")
        st.markdown("**Payment History — by Month**")
        st.dataframe(grouped, use_container_width=True)

    # Per-tenant balances & penalties (scan each tenant tab)

    total_prepay = 0.0
    total_arrears = 0.0
    penalty_freq = {}

    def parse_float(x):
        try: return float(str(x).replace(",", "").strip())
        except Exception: return 0.0

    def detect_header_row(all_vals, scan_rows: int = 30) -> int:
        # Lightweight in-app variant
        def score(row):
            s = 0
            for cell in row:
                t = str(cell or "").strip().lower()
                if t in {"month","amount due","amount paid","date paid","ref number","date due","prepayment/arrears","penalties","comments"}:
                    s += 1
            return s
        best_i, best_score = 0, 0
        limit = min(len(all_vals), max(1, scan_rows))
        for i in range(limit):
            sc = score(all_vals[i])
            if sc > best_score: best_i, best_score = i, sc
        return best_i if best_score >= 3 else 0

    for ws in sh.worksheets():
        name = ws.title.upper()
        if name in ("PAYMENTHISTORY", "PROCESSEDREFS"):
            continue
        try:
            vals = with_backoff(ws.get_all_values)
            if not vals: continue
            header_row0 = detect_header_row(vals)
            header = [c.strip() for c in vals[header_row0]]
            rows = vals[header_row0+1:]
            if not rows: continue

            def idx(colname):
                try: return header.index(colname)
                except ValueError: return -1

            i_bal = idx("Prepayment/Arrears")
            i_pen = idx("Penalties")
            i_mon = idx("Month")

            # Latest balance = last populated Month row (or last row if Month empty)
            latest_row = None
            for r in reversed(rows):
                if i_mon != -1 and len(r) > i_mon and str(r[i_mon]).strip():
                    latest_row = r; break
            if latest_row is None and rows:
                latest_row = rows[-1]

            if latest_row and i_bal != -1 and len(latest_row) > i_bal:
                bal = parse_float(latest_row[i_bal])
                if bal > 0: total_prepay += bal
                elif bal < 0: total_arrears += abs(bal)

            if i_pen != -1:
                cnt = 0
                for r in rows:
                    if len(r) > i_pen and parse_float(r[i_pen]) > 0:
                        cnt += 1
                acct = ws.title.split(" - ")[0].strip().upper()
                penalty_freq[acct] = penalty_freq.get(acct, 0) + cnt

        except Exception:
            continue

    c1, c2, c3 = st.columns(3)
    c1.metric("Income (this month)", f"{income_this_month:,.0f} KES")
    c2.metric("Total Prepayments", f"{total_prepay:,.0f} KES")
    c3.metric("Total Arrears", f"{total_arrears:,.0f} KES")

    if penalty_freq:
        df_pen = pd.DataFrame(
            [{"AccountCode": k, "Penalty Rows": v} for k, v in penalty_freq.items()]
        ).sort_values("Penalty Rows", ascending=False)
        st.markdown("**Penalty frequency by AccountCode** (rows with penalties > 0):")
        st.dataframe(df_pen, use_container_width=True)

    if auto_trigger:
        st.session_state["last_auto_run_at"] = datetime.utcnow()

# ---------------------------------------------------------------------------
# 5) FOOTER
# ---------------------------------------------------------------------------

st.divider()
st.caption(
    "Rent-RPA © {year}. Tips: keep Gmail queries narrow; use Google Sheets (not Excel uploads); "
    "Made by [Eugene Maina](https://github.com/eugene-maina72)."
    .format(year=datetime.now().year)
)


***

# Workflow Summary

- **Project Overview:**  
    Developed a rent automation bot for Lemaiyan Heights that processes payment notifications and tracks rent payment histories.

- **Dummy Data Generation:**  
    - Created a dummy dataset of email notifications to simulate incoming rent payment emails.  
    - Utilized Faker for generating realistic customer names, phone numbers, dates, and payment amounts.

- **Payment Parsing & Data Extraction:**  
    - Developed regex-based functions to extract payment details (amount, account code, payer, phone, date, and reference) from email texts.
    - Handled variable formatting and ensured proper deduplication using a set of processed payment references.

- **Excel Workbook Management:**  
    - Implemented logic to update an Excel workbook (`dummy_rent_tracker.xlsx`) with the parsed payment data.
    - Managed multiple sheets: tenant-specific sheets, a master payment history, and a deduplication sheet ("ProcessedRefs").
    - Used Pandas and openpyxl to read, update, and create sheets dynamically when a tenant’s sheet was not found.

- **Integration with Google Services:**  
    - Extended the proof-of-concept to interact with Gmail and Google Sheets:
        - Sent dummy emails into a Gmail sandbox for testing.
        - Migrated data from Excel to Google Sheets.
        - Automated the process to fetch emails, parse their content, and update the corresponding tenant sheets on Google Sheets in real time.
    - Employed OAuth for secure access to Gmail and Google Sheets.

- **Deployment and Prototyping:**  
    - The final implementation was adapted for a deployment scenario, integrating with both Gmail and Google Sheets, and ensuring quota-safe operations.
    - Streamlit was mentioned as a potential UI dashboard for triggering the payment bot and viewing realtime logs and summaries.

This notebook documents the end-to-end automation of rent payment processing, from dummy email generation to real-time Google Sheets updates.


***
## Conclusions
- The rent automation bot successfully simulates the processing of payment emails, extracting key information using regex.
- Data is reliably logged into an Excel workbook with dedicated sheets for tenant details, payment history, and deduplication (ProcessedRefs).
- Integration with Google services (Sheets and Gmail) demonstrates that the core functionality can be extended to a cloud-based, real-time workflow.
- The use of Faker and dummy email generation confirms that the system can handle various data scenarios and potential edge cases.

## Next Steps
- Expand automated tests and CI:
    - Unit tests for the email parser, date/amount normalization, and sheet-update functions.
    - Integration tests that run against a sandbox Gmail + test Google Sheet.
    - Linting and pre-commit checks.

- Harden parsing and validation:
    - Accept more date formats and optional/missing fields (phone, payer).
    - Add strict validation and clear error messages for malformed refs or amounts.
    - Add synthetic edge-case tests (truncated bodies, extra punctuation).

- Reliability & quotas:
    - Centralize exponential backoff/retry strategy for Gmail/Sheets calls.
    - Batch writes where possible and add rate-limit monitoring.
    - Add idempotency guards beyond REF (timestamps, message-id).

- Security & config:
    - Move secrets to secure storage (Vault / environment / Streamlit secrets).
    - Document required IAM scopes and least-privilege roles for service accounts.
    - Rotate keys and provide instructions for safe local dev.

- Observability & alerts:
    - Structured logging (JSON), run-level summary logs and error reports.
    - Export metrics (processed count, duplicates, parse failures) to a monitoring dashboard and alert on spikes.

- Deployment & scheduling:
    - Provide a reproducible deployment (Docker image or Cloud Function / Cloud Run).
    - Add scheduled runner (Cloud Scheduler / cron) that triggers idempotent runs.
    - Optionally expose a webhook or Pub/Sub ingestion for near-real-time processing.

- UX & operational improvements:
    - Add Streamlit UI controls for dry-run, replay window, and manual retry of failed parses.
    - Improve runbook with common troubleshooting steps and rollback plan.
    - Add tenant management UI to rename/move sheets safely.

- Data governance:
    - Mask or hash PII (phone numbers) in logs and histories.
    - Define retention policy for PaymentHistory and ProcessedRefs.
    - Provide a migration plan to consolidate historical Excel data into Google Sheets preserving audit trail.

- Documentation:
    - README with architecture diagram, setup steps, OAuth instructions, and testing guide.
    - Contributing guide and API surface docs for bot_logic functions.

- Roadmap items:
    - Add reconciliation with bank statements for end-to-end verification.
    - Enable multi-property support and role-based access for property managers.
    - Add automated monthly financial reports and export to accounting systems.

## 👨‍💻 Author

Eugene Maina
Data Scientist | RPA Developer

* [LinkedIn](https://www.linkedin.com/in/eugene-maina-4a8b9a128/) | [GitHub](https://github.com/eugene-maina72) | [Email](mailto:eugenemaina72@gmail.com)
