# Testing & Evaluation
This notebook evaluates the CTR model, embedding-based retrieval,
and end-to-end ad ranking system.


Install Dependencies

In [None]:
!pip install -q sentence-transformers faiss-cpu scikit-learn pandas numpy joblib
!pip install -q gdown


Imports

In [None]:
import numpy as np
import pandas as pd
import joblib
import faiss

from sentence_transformers import SentenceTransformer
from sklearn.metrics import roc_auc_score, log_loss, precision_recall_curve, auc
import matplotlib.pyplot as plt


Download Model Files in Colab

In [None]:
import gdown

# Download entire folder
folder_url = "https://drive.google.com/drive/folders/1gcAqlsae7r9-af-X7rUlAe_lfCggqODv"  # Replace
!gdown --folder {folder_url}

Load Models & Data

In [None]:
# Load models
ctr_model = joblib.load("models/ctr_model.pkl")
ad_embeddings = np.load("models/ad_embeddings.npy")

# Load FAISS index
index = faiss.read_index("models/faiss.index")

# Load embedder
embedder = SentenceTransformer("all-MiniLM-L6-v2")

# Ads
ads = pd.DataFrame({
    "ad_id": range(1, 7),
    "ad_text": [
        "Boost your startup productivity with AI tools",
        "Learn Python and Machine Learning from industry experts",
        "Travel smarter with exclusive flight deals",
        "Upgrade your home gym with smart fitness equipment",
        "Secure your business with cloud security solutions",
        "Discover healthy meal plans tailored for you"
    ],
    "historical_ctr": [0.042, 0.061, 0.033, 0.029, 0.054, 0.038]
})


Load Test Articles

In [None]:
url = "https://raw.githubusercontent.com/mhjabreel/CharCnn_Keras/master/data/ag_news_csv/train.csv"
articles = pd.read_csv(url, header=None)
articles.columns = ["label", "title", "description"]
articles["text"] = articles["title"] + " " + articles["description"]

# holdout test set
articles = articles.sample(300, random_state=7).reset_index(drop=True)
articles.head()


**CTR MODEL EVALUATION**

Rebuild Test Feature Matrix

In [None]:
article_embeddings = embedder.encode(
    articles["text"].tolist(),
    show_progress_bar=True
)

X_test, y_test = [], []

for art_emb in article_embeddings:
    for i, ad_emb in enumerate(ad_embeddings):
        sim = np.dot(art_emb, ad_emb)
        ctr = ads.iloc[i]["historical_ctr"]

        X_test.append([sim, ctr])
        y_test.append(np.random.binomial(1, min(ctr * 10, 0.5)))

X_test = np.array(X_test)
y_test = np.array(y_test)

X_test.shape, y_test.mean()


CTR Metrics

In [None]:
preds = ctr_model.predict_proba(X_test)[:, 1]

print("ROC-AUC:", roc_auc_score(y_test, preds))
print("Log Loss:", log_loss(y_test, preds))


Precision–Recall Curve

In [None]:
precision, recall, _ = precision_recall_curve(y_test, preds)
pr_auc = auc(recall, precision)

plt.plot(recall, precision)
plt.xlabel("Recall")
plt.ylabel("Precision")
plt.title(f"Precision-Recall Curve (AUC={pr_auc:.3f})")
plt.show()


**RETRIEVAL QUALITY (VECTOR SEARCH)**

Top-K Similarity Check

In [None]:
def retrieve_ads(article_text, k=3):
    emb = embedder.encode([article_text])
    faiss.normalize_L2(emb)
    scores, idx = index.search(emb, k)
    return ads.iloc[idx[0]]["ad_text"].tolist()

retrieve_ads("AI tools for improving startup productivity", k=3)


Qualitative Retrieval Test (Multiple Queries)

In [None]:
queries = [
    "machine learning course for engineers",
    "secure cloud infrastructure for companies",
    "fitness equipment for home workouts"
]

for q in queries:
    print("\nQUERY:", q)
    for ad in retrieve_ads(q):
        print("-", ad)


**RANKING SYSTEM EVALUATION**

Full Ranking Function

In [None]:
def rank_ads(article_text, top_k=3):
    emb = embedder.encode([article_text])
    faiss.normalize_L2(emb)

    scores, idx = index.search(emb, len(ads))

    ranked = []
    for i in idx[0]:
        sim = np.dot(emb[0], ad_embeddings[i])
        ctr = ads.iloc[i]["historical_ctr"]
        ctr_pred = ctr_model.predict_proba([[sim, ctr]])[0, 1]

        final_score = 0.7 * ctr_pred + 0.3 * sim
        ranked.append((ads.iloc[i]["ad_text"], final_score))

    ranked.sort(key=lambda x: x[1], reverse=True)
    return ranked[:top_k]


End-to-End Test

In [None]:
article = "This article discusses how AI startups can scale faster using cloud tools"

results = rank_ads(article)

for ad, score in results:
    print(f"{score:.3f} — {ad}")


**SIMPLE SYSTEM-LEVEL METRIC**

Mean Reciprocal Rank (MRR)

In [None]:
def mean_reciprocal_rank(articles, relevant_keyword):
    ranks = []
    for text in articles["text"][:50]:
        ranked_ads = rank_ads(text, top_k=len(ads))
        for idx, (ad, _) in enumerate(ranked_ads):
            if relevant_keyword.lower() in ad.lower():
                ranks.append(1 / (idx + 1))
                break
    return np.mean(ranks)

mean_reciprocal_rank(articles, "AI")


## Evaluation Summary

### CTR Model Evaluation
The CTR model achieved a ROC-AUC score of **0.616** and a Log Loss of **0.655**. The Precision-Recall Curve shows an AUC of **0.495**, indicating moderate performance in distinguishing between positive and negative click-through events.

### Retrieval Quality (Vector Search)
The `retrieve_ads` function, which uses semantic search (FAISS index and sentence embeddings), demonstrated its ability to retrieve relevant advertisements based on article text. For example, a query related to 'AI tools for improving startup productivity' successfully retrieved 'Boost your startup productivity with AI tools' as a top result. Qualitative tests with multiple queries also showed good relevance in the top-k retrieved ads.

### Ranking System Evaluation
The `rank_ads` function combines semantic similarity and predicted CTR to produce a final ranking score. In an end-to-end test with the article "This article discusses how AI startups can scale faster using cloud tools", the system successfully ranked relevant ads like "Boost your startup productivity with AI tools" with a high score (0.504).

### System-Level Metric (Mean Reciprocal Rank - MRR)
The Mean Reciprocal Rank (MRR) was calculated to evaluate the overall system's ability to place relevant ads high in the ranking. For articles relevant to 'AI', the MRR was **0.330**, suggesting that the system generally places AI-related ads in reasonably high positions for such articles.