# Project Overview: 


## Collaborators:

1. Agnes Chomba

2. Derrick Malinga

3. Erick Okacha

4. Judah Odida

5. Lucas Ominde

6. Nick  Mwai

7. Olgah Omollo

# FinComBot - Compliance Chatbot 

## 1. Background
Financial institutions face increasing pressure to comply with stringent regulatory frameworks governing customer onboarding, Know Your Customer (KYC), Customer Due Diligence (CDD), Enhanced Due Diligence (EDD), Anti-Money Laundering (AML), Counter Terrorism Financing, Counter Proliferation Financing (CPF), and sanctions screening. These obligations are complex, continuously evolving, and vary across jurisdictions.

Staff often face difficulties accessing and interpreting regulatory documents and internal policies, leading to:
-	Delays in onboarding, affecting customer experience and revenue.
-	Inconsistent application of compliance procedures.
-	Overdependence on compliance officers for basic guidance.
-	Increased risk of regulatory breaches which may lead to fining by regulators and put the bank at risk of its license being suspended.





#  2. Business Objective

a.)  Build a chatbot that retrieves accurate compliance information 
from the bank’s KYC/AML/CTF/CPF policies and responds to staff queries.



## 3. Target Audience

a.) Front office / Relationship Managers (who onboard customers)

b.)  Operations staff (who process documents)

c.) Compliance officers (for guidance validation)

d.) New staff (as a training tool)

e.) Risk & Audit teams (for oversight)


##  4. Data Understanding
Data Source: 
a. Internal compliance policy, stored in Word (.docx) format,  Contains: KYC procedures, AML red flags, CDD/EDD checklists, risk rating methodology, regulatory guidelines (FATF, CBK, CMA)

Data Characteristics:Unstructured text (paragraphs, checklists), Multiple sections (policies, procedures, workflows), Needs preprocessing before AI ingestion


### 1. Data Preprocessing

In [1]:
import pandas as pd
import docx as docx
from docx.shared import Pt

In [2]:
import os

# Ensure TensorFlow is disabled (avoids keras issues)
os.environ["USE_TF"] = "0"
os.environ["TRANSFORMERS_NO_TF"] = "1"

from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
import docx


Load the raw document (.docx) 

In [3]:
from docx import Document

def load_docx(file_path):
    doc = Document(file_path)
    return "\n".join([para.text for para in doc.paragraphs if para.text.strip()])

# Use the correct relative path
text = load_docx(r"Data\SEC5 - OPENING OF ACCOUNTS (004).docx.docx")
print(text[:500])  # preview first 500 characters


OPENING OF ACCOUNTS
TABLE OF CONTENTS
1	INTRODUCTION	11
1.1	General	11
2	ACCOUNT OPENING REQUIREMENTS	11
2.1	Know Your Customer (KYC)	11
2.2	Account Opening Requirements on Referee	11
2.3	Documentation Required for Account Opening	12
2.4	Account Opening Requirements for Foreign Nationals	34
2.4.1	Resident Foreign Nationals	34
2.4.2	Account Opening Requirements for Non-Resident Foreign Nationals (As Per CBK Prudential Guidelines).	35
2.4.3	Minimum requirements for Resident Foreigners	35
2.4.4	Acc
