<img src="awin_peec_logos.png" alt="PEEC x Awin Insights Dashboard" width="30%">

# The Unofficial AI Visibility and Affiliate Performance Cookbook

An unofficial experimental connector that overlays **AI citation visibility data** from [PEEC AI](https://peec.ai) with **affiliate transaction data** from [Awin](https://awin.com), helping you understand which publisher domains are both cited by AI models and driving affiliate revenue.

> **Disclaimer**: This is **not** an official product of PEEC AI or Awin. It is a community project sharing example code of how affiliate and AI visibility platform data can be combined to yield insights. Use at your own discretion.

---

**What you can do with this tool:**
- Pull AI citation data from PEEC — which domains/URLs do AI models (ChatGPT, Gemini, Perplexity, Claude, etc.) cite?
- Pull Awin transaction data for your advertiser account
- Match PEEC-cited domains to Awin publisher domains using normalised hostname matching
- Build enriched reports combining citation metrics with transaction revenue
- Run gap analysis to find high-citation domains with no Awin relationship — potential recruitment targets

---

**Step and What You'll Do:**

- Set up workspace, download scripts, configure session
- Load styles & PEEC client
- Pull PEEC citation data
- Generate domain & URL reports
- Pull Awin transaction data
- Match domains & build enriched report
- Gap analysis — find recruitment targets

---

**Run each cell in order.** Step 1 handles all setup automatically.

*Example output — enriched report combining AI citation data with affiliate transaction metrics:*

![PEEC x Awin Insights Dashboard](peec_awin_insights_dashboard.png)

# Step 1: Setup & Configure Session

This cell handles everything you need to get started:
1. Installs dependencies
2. Detects your environment (Colab or local)
3. Sets up a workspace folder
4. Downloads the latest scripts from GitHub
5. Configures your API keys, project, date range, and advertiser ID

---

<details>
<summary><b>API Keys — Colab setup (click to expand)</b></summary>

1. Click the key icon in the left sidebar (Secrets)
2. Add two secrets:
   - Name: `PEEC_API_KEY` — Value: your PEEC AI API key
   - Name: `AWAPI` — Value: your Awin API token
3. Toggle "Notebook access" ON for both
4. Run this cell

</details>

<details>
<summary><b>API Keys — Local setup (click to expand)</b></summary>

1. Create a `.env` file in your project folder
2. Add:
   ```
   PEEC_API_KEY=your-peec-key-here
   AWAPI=your-awin-token-here
   ```
3. Run this cell

</details>

---

**Run the cell below**, then click **"Confirm Settings"** once you've set your project, dates, and advertiser ID.

In [None]:
# =============================================================================
# BOOTSTRAP: Pip installs + Workspace + Script Download
# =============================================================================

import json
import os
import subprocess
import sys
import urllib.request
import urllib.error
from datetime import datetime
from pathlib import Path

# ── Pip installs (silent) ──────────────────────────────────────────
subprocess.check_call(
    [sys.executable, "-m", "pip", "install", "--quiet",
     "requests", "pandas", "python-dotenv", "ipywidgets"],
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL,
)
print("\u2705 Dependencies installed.")

import ipywidgets as widgets
from IPython.display import display, clear_output

# ── Environment detection ─────────────────────────────────────────
try:
    from google.colab import drive  # type: ignore
except ImportError:
    drive = None

IN_COLAB = "google.colab" in sys.modules

# ── GitHub settings ───────────────────────────────────────────────
GITHUB_REPO = "smartaces/peec-awin-connector"
GITHUB_BRANCH = "main"
GITHUB_RAW_BASE = f"https://raw.githubusercontent.com/{GITHUB_REPO}/{GITHUB_BRANCH}/scripts"

SCRIPT_FILES = [
    "cell_00_pip_installs.py",
    "cell_01_session_config.py",
    "cell_02_css_styling.py",
    "cell_03_peec_client.py",
    "cell_04_peec_data_pull.py",
    "cell_05_domain_report.py",
    "cell_06_url_report.py",
    "cell_07_awin_transactions.py",
    "cell_09_enriched_report.py",
    "cell_10_gap_analysis.py",
]


# ── Detect local scripts in scripts/ subfolder next to notebook ───
def _local_scripts_available():
    """Check if cell scripts exist in a scripts/ folder next to the notebook."""
    return (Path.cwd() / "scripts" / "cell_01_session_config.py").is_file()


# ── Location selection ────────────────────────────────────────────
def _get_location_options():
    if IN_COLAB:
        return [
            ("Google Drive (/content/drive/MyDrive)", "drive"),
            ("Colab Temporary (/content)", "colab"),
            ("Local Folder (current directory)", "local"),
        ]
    return [("Local Folder (current directory)", "local")]


def _resolve_base_path(selection):
    if selection == "drive":
        mount_point = Path("/content/drive")
        if not mount_point.exists() or not os.path.ismount(mount_point):
            print("\U0001f50c Mounting Google Drive...")
            drive.mount(str(mount_point))
        return mount_point / "MyDrive"
    elif selection == "colab":
        return Path("/content")
    return Path.cwd()


def _download_scripts(target_dir):
    """Download all scripts from GitHub into target_dir."""
    target_dir.mkdir(parents=True, exist_ok=True)
    print(f"\U0001f4e5 Downloading latest scripts from GitHub into {target_dir} ...")
    success = 0
    for filename in SCRIPT_FILES:
        url = f"{GITHUB_RAW_BASE}/{filename}"
        dest = target_dir / filename
        try:
            req = urllib.request.Request(url, headers={"User-Agent": "PEEC-Awin-Connector"})
            with urllib.request.urlopen(req, timeout=30) as resp:
                content = resp.read()
            with open(dest, "wb") as fp:
                fp.write(content)
            print(f"   \u2022 {filename} \u2713")
            success += 1
        except (urllib.error.URLError, urllib.error.HTTPError, IOError) as e:
            print(f"   \u2022 {filename} \u2717 ({e})")
    print(f"\u2705 Downloaded {success}/{len(SCRIPT_FILES)} scripts.")
    return success == len(SCRIPT_FILES)


# ── Helper: read a script with UTF-8 encoding ────────────────────
def _read_script(path):
    """Read a script file with explicit UTF-8 encoding (needed on Windows)."""
    with open(path, encoding="utf-8") as f:
        return f.read()


# ── Workspace setup (runs immediately — no button needed locally) ─
import __main__

base = _resolve_base_path("local") if not IN_COLAB else None

if not IN_COLAB:
    # Local: set up workspace automatically
    workspace = (base / "peec_awin_workspace").resolve()
    workspace.mkdir(parents=True, exist_ok=True)
    for name in ["output", "logs"]:
        (workspace / name).mkdir(parents=True, exist_ok=True)

    if _local_scripts_available():
        scripts_dir = Path.cwd() / "scripts"
        print(f"\U0001f4c2 Using local scripts from: {scripts_dir}")
    else:
        scripts_dir = workspace / "scripts"
        scripts_dir.mkdir(parents=True, exist_ok=True)
        existing = sum(1 for f in SCRIPT_FILES if (scripts_dir / f).is_file())
        if existing < len(SCRIPT_FILES):
            _download_scripts(scripts_dir)
        else:
            print(f"\u2705 {existing}/{len(SCRIPT_FILES)} scripts already in workspace.")

    __main__.IN_COLAB = IN_COLAB
    __main__.WORKSPACE_ROOT = workspace
    __main__.PATHS = {
        "scripts": scripts_dir,
        "output": workspace / "output",
        "logs": workspace / "logs",
    }
    os.environ["WORKSPACE_ROOT"] = str(workspace)

    print(f"\U0001f4c2 Workspace: {workspace}")
    print(f"   Output:  {workspace / 'output'}")
    print(f"\n\u2705 Ready. Run the next cell to configure your session.")

else:
    # Colab: show location picker + setup button
    _loc_dd = widgets.Dropdown(
        options=_get_location_options(),
        value="drive",
        description="Location:",
        style={"description_width": "70px"},
        layout=widgets.Layout(width="420px"),
    )
    _setup_btn = widgets.Button(
        description="  Set Up Workspace", button_style="info",
        icon="folder-open", layout=widgets.Layout(width="200px", height="36px"),
    )
    _update_btn = widgets.Button(
        description="  Download Latest from GitHub", button_style="warning",
        icon="download", layout=widgets.Layout(width="260px", height="36px"),
    )
    _setup_output = widgets.Output()

    def _on_setup(b):
        with _setup_output:
            clear_output()
            try:
                cb = _resolve_base_path(_loc_dd.value)
                ws = (cb / "peec_awin_workspace").resolve()
                ws.mkdir(parents=True, exist_ok=True)
                for name in ["output", "logs"]:
                    (ws / name).mkdir(parents=True, exist_ok=True)

                sd = ws / "scripts"
                sd.mkdir(parents=True, exist_ok=True)
                existing = sum(1 for f in SCRIPT_FILES if (sd / f).is_file())
                if existing < len(SCRIPT_FILES):
                    _download_scripts(sd)
                else:
                    print(f"\u2705 {existing}/{len(SCRIPT_FILES)} scripts already in workspace.")

                __main__.IN_COLAB = IN_COLAB
                __main__.WORKSPACE_ROOT = ws
                __main__.PATHS = {
                    "scripts": sd,
                    "output": ws / "output",
                    "logs": ws / "logs",
                }
                os.environ["WORKSPACE_ROOT"] = str(ws)

                print(f"\U0001f4c2 Workspace: {ws}")
                print(f"\n\u2705 Ready. Run the next cell to configure your session.")
            except Exception as e:
                print(f"\u274c Error: {e}")

    def _on_update(b):
        with _setup_output:
            clear_output()
            if hasattr(__main__, "PATHS") and __main__.PATHS is not None:
                _download_scripts(Path(__main__.PATHS["scripts"]))
            else:
                print("\u26a0\ufe0f Set up workspace first.")

    _setup_btn.on_click(_on_setup)
    _update_btn.on_click(_on_update)

    display(
        widgets.HTML("<h3>\U0001f4c1 Workspace Setup</h3>"),
        widgets.HTML("<p>Choose where to store project files, then click <b>Set Up Workspace</b>.</p>"),
        _loc_dd,
        widgets.HBox([_setup_btn, _update_btn], layout=widgets.Layout(margin="8px 0 10px 0")),
        _setup_output,
    )

### Configure Your Session

Set your PEEC project, reporting date range (e.g. 1st February 2026 to 5th February 2026 - this will be automatically applied to both platform datasets), and Awin advertiser ID below, then click **"Confirm Settings"**.

In [None]:
# Session Configuration — API keys, project, dates, advertiser ID
import __main__
from pathlib import Path

scripts_dir = Path(__main__.PATHS["scripts"])
exec(compile(_read_script(scripts_dir / "cell_01_session_config.py"),
     str(scripts_dir / "cell_01_session_config.py"), "exec"))

# Step 2: Load Styles & PEEC Client

This cell loads the visual styling and initialises the PEEC API client with your project's lookup tables (prompts, tags, topics, models).

**Run the cell below.** You should see confirmation that the client is ready.

In [None]:
import __main__
from pathlib import Path

scripts_dir = Path(__main__.PATHS["scripts"])

# Load CSS styling
exec(compile(_read_script(scripts_dir / "cell_02_css_styling.py"),
     str(scripts_dir / "cell_02_css_styling.py"), "exec"))

# Load PEEC client, helpers, and lookups
exec(compile(_read_script(scripts_dir / "cell_03_peec_client.py"),
     str(scripts_dir / "cell_03_peec_client.py"), "exec"))

# Step 3: Pull PEEC Citation Data

This cell fetches citation data from PEEC AI for your configured date range and project. It pulls:
- Domain classifications
- URL-level citation report broken down by prompt and model

**Click "Pull Data"** to fetch the data. Once complete, you can run the reports below.

In [None]:
import __main__
from pathlib import Path

scripts_dir = Path(__main__.PATHS["scripts"])
exec(compile(_read_script(scripts_dir / "cell_04_peec_data_pull.py"),
     str(scripts_dir / "cell_04_peec_data_pull.py"), "exec"))

# Step 4: PEEC Reports

Two report views from your citation data:

- **Domain Report**: One row per domain — total citations, avg position, unique pages, models |
- **URL Report**: One row per URL — with clickable links and prompt counts |

Use the filters to drill down, then download as CSV.

### Domain-Level Report

One row per domain — aggregate citation metrics across all URLs. Filter by page type, domain type, or model.

In [None]:
import __main__
from pathlib import Path

scripts_dir = Path(__main__.PATHS["scripts"])
exec(compile(_read_script(scripts_dir / "cell_05_domain_report.py"),
     str(scripts_dir / "cell_05_domain_report.py"), "exec"))

### URL / Page-Level Report

One row per URL — drill down to individual pages with clickable links. Use the filters to narrow by page type, domain type, model, or keyword.

In [None]:
import __main__
from pathlib import Path

scripts_dir = Path(__main__.PATHS["scripts"])
exec(compile(_read_script(scripts_dir / "cell_06_url_report.py"),
     str(scripts_dir / "cell_06_url_report.py"), "exec"))

# Step 5: Pull Awin Transaction Data

This cell fetches transaction-level data from the Awin API for your configured advertiser and date range.

The Awin API has a 31-day maximum window per request, so the tool automatically chunks larger date ranges.

**Click "Pull Transactions"** to fetch. You can optionally filter by transaction status.

In [None]:
import __main__
from pathlib import Path

scripts_dir = Path(__main__.PATHS["scripts"])
exec(compile(_read_script(scripts_dir / "cell_07_awin_transactions.py"),
     str(scripts_dir / "cell_07_awin_transactions.py"), "exec"))

# Step 6: Enriched Report — Domain Match + Citations + Transactions

This cell does everything in one step:

1. **Domain Match** — Matches PEEC-cited domains to Awin publisher domains using normalised hostnames
2. **Publisher Names** — Pulls the Awin publisher report to get accurate publisher names (saved as a separate CSV)
3. **AI Models** — Adds which AI models cite each domain (3-character codes)
4. **Exclude Filter** — Remove noise domains (e.g. google, facebook)

**Click "Build Enriched Report"** to run. Use the exclude filter to clean up the output.

In [None]:
import __main__
from pathlib import Path

scripts_dir = Path(__main__.PATHS["scripts"])
exec(compile(_read_script(scripts_dir / "cell_09_enriched_report.py"),
     str(scripts_dir / "cell_09_enriched_report.py"), "exec"))

# Step 7: Gap Analysis

This identifies domains and pages that AI models cite frequently but where you have **no Awin publisher relationship** — these are your potential recruitment targets.

Use the filters to:
- Focus on specific domain types (e.g. editorial, blog)
- Search for domains containing specific keywords
- Exclude noise domains (google, wikipedia, youtube, etc.)

In [None]:
import __main__
from pathlib import Path

scripts_dir = Path(__main__.PATHS["scripts"])
exec(compile(_read_script(scripts_dir / "cell_10_gap_analysis.py"),
     str(scripts_dir / "cell_10_gap_analysis.py"), "exec"))