# PEEC AI + Awin Connector

This notebook connects **PEEC AI citation data** with **Awin affiliate transaction data** to help you understand where AI models cite your publishers and identify recruitment opportunities.

**What you can do with this tool:**
- Pull AI citation data from PEEC (which domains/URLs do AI models cite?)
- Pull Awin transaction data for your advertiser account
- Match PEEC-cited domains to Awin publisher domains
- Build enriched reports combining citation + transaction metrics
- Run gap analysis to find high-citation domains with no Awin relationship

---

| Step | What you'll do |
|------|----------------|
| 1 | Set up workspace, download scripts, configure session |
| 2 | Load styles & PEEC client |
| 3 | Pull PEEC citation data |
| 4 | Generate domain & URL reports |
| 5 | Pull Awin transaction data |
| 6 | Match domains & build enriched report |
| 7 | Gap analysis ‚Äî find recruitment targets |

---

**Run each cell in order.** Step 1 handles all setup automatically.

# Step 1: Setup & Configure Session

This cell handles everything you need to get started:
1. Installs dependencies
2. Detects your environment (Colab or local)
3. Sets up a workspace folder
4. Downloads the latest scripts from GitHub
5. Configures your API keys, project, date range, and advertiser ID

---

<details>
<summary><b>API Keys ‚Äî Colab setup (click to expand)</b></summary>

1. Click the key icon in the left sidebar (Secrets)
2. Add two secrets:
   - Name: `PEEC_API_KEY` ‚Äî Value: your PEEC AI API key
   - Name: `AWAPI` ‚Äî Value: your Awin API token
3. Toggle "Notebook access" ON for both
4. Run this cell

</details>

<details>
<summary><b>API Keys ‚Äî Local setup (click to expand)</b></summary>

1. Create a `.env` file in your project folder
2. Add:
   ```
   PEEC_API_KEY=your-peec-key-here
   AWAPI=your-awin-token-here
   ```
3. Run this cell

</details>

---

**Run the cell below**, then click **"Confirm Settings"** once you've set your project, dates, and advertiser ID.

In [1]:
# =============================================================================
# BOOTSTRAP: Pip installs + Workspace + Script Download
# =============================================================================

import json
import os
import subprocess
import sys
import urllib.request
import urllib.error
from datetime import datetime
from pathlib import Path

# ‚îÄ‚îÄ Pip installs (silent) ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
subprocess.check_call(
    [sys.executable, "-m", "pip", "install", "--quiet",
     "requests", "pandas", "python-dotenv", "ipywidgets"],
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL,
)
print("\u2705 Dependencies installed.")

import ipywidgets as widgets
from IPython.display import display, clear_output

# ‚îÄ‚îÄ Environment detection ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
try:
    from google.colab import drive  # type: ignore
except ImportError:
    drive = None

IN_COLAB = "google.colab" in sys.modules

# ‚îÄ‚îÄ GitHub settings ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
GITHUB_REPO = "smartaces/peec-awin-connector"
GITHUB_BRANCH = "main"
GITHUB_RAW_BASE = f"https://raw.githubusercontent.com/{GITHUB_REPO}/{GITHUB_BRANCH}/scripts"

SCRIPT_FILES = [
    "cell_00_pip_installs.py",
    "cell_01_session_config.py",
    "cell_02_css_styling.py",
    "cell_03_peec_client.py",
    "cell_04_peec_data_pull.py",
    "cell_05_domain_report.py",
    "cell_06_url_report.py",
    "cell_07_awin_transactions.py",
    "cell_09_enriched_report.py",
    "cell_10_gap_analysis.py",
]


# ‚îÄ‚îÄ Detect local scripts in scripts/ subfolder next to notebook ‚îÄ‚îÄ‚îÄ
def _local_scripts_available():
    """Check if cell scripts exist in a scripts/ folder next to the notebook."""
    return (Path.cwd() / "scripts" / "cell_01_session_config.py").is_file()


# ‚îÄ‚îÄ Location selection ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
def _get_location_options():
    if IN_COLAB:
        return [
            ("Google Drive (/content/drive/MyDrive)", "drive"),
            ("Colab Temporary (/content)", "colab"),
            ("Local Folder (current directory)", "local"),
        ]
    return [("Local Folder (current directory)", "local")]


def _resolve_base_path(selection):
    if selection == "drive":
        mount_point = Path("/content/drive")
        if not mount_point.exists() or not os.path.ismount(mount_point):
            print("\U0001f50c Mounting Google Drive...")
            drive.mount(str(mount_point))
        return mount_point / "MyDrive"
    elif selection == "colab":
        return Path("/content")
    return Path.cwd()


def _download_scripts(target_dir):
    """Download all scripts from GitHub into target_dir."""
    target_dir.mkdir(parents=True, exist_ok=True)
    print(f"\U0001f4e5 Downloading latest scripts from GitHub into {target_dir} ...")
    success = 0
    for filename in SCRIPT_FILES:
        url = f"{GITHUB_RAW_BASE}/{filename}"
        dest = target_dir / filename
        try:
            req = urllib.request.Request(url, headers={"User-Agent": "PEEC-Awin-Connector"})
            with urllib.request.urlopen(req, timeout=30) as resp:
                content = resp.read()
            with open(dest, "wb") as fp:
                fp.write(content)
            print(f"   \u2022 {filename} \u2713")
            success += 1
        except (urllib.error.URLError, urllib.error.HTTPError, IOError) as e:
            print(f"   \u2022 {filename} \u2717 ({e})")
    print(f"\u2705 Downloaded {success}/{len(SCRIPT_FILES)} scripts.")
    return success == len(SCRIPT_FILES)


# ‚îÄ‚îÄ Helper: read a script with UTF-8 encoding ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
def _read_script(path):
    """Read a script file with explicit UTF-8 encoding (needed on Windows)."""
    with open(path, encoding="utf-8") as f:
        return f.read()


# ‚îÄ‚îÄ Workspace setup (runs immediately ‚Äî no button needed locally) ‚îÄ
import __main__

base = _resolve_base_path("local") if not IN_COLAB else None

if not IN_COLAB:
    # Local: set up workspace automatically
    workspace = (base / "peec_awin_workspace").resolve()
    workspace.mkdir(parents=True, exist_ok=True)
    for name in ["output", "logs"]:
        (workspace / name).mkdir(parents=True, exist_ok=True)

    if _local_scripts_available():
        scripts_dir = Path.cwd() / "scripts"
        print(f"\U0001f4c2 Using local scripts from: {scripts_dir}")
    else:
        scripts_dir = workspace / "scripts"
        scripts_dir.mkdir(parents=True, exist_ok=True)
        existing = sum(1 for f in SCRIPT_FILES if (scripts_dir / f).is_file())
        if existing < len(SCRIPT_FILES):
            _download_scripts(scripts_dir)
        else:
            print(f"\u2705 {existing}/{len(SCRIPT_FILES)} scripts already in workspace.")

    __main__.IN_COLAB = IN_COLAB
    __main__.WORKSPACE_ROOT = workspace
    __main__.PATHS = {
        "scripts": scripts_dir,
        "output": workspace / "output",
        "logs": workspace / "logs",
    }
    os.environ["WORKSPACE_ROOT"] = str(workspace)

    print(f"\U0001f4c2 Workspace: {workspace}")
    print(f"   Output:  {workspace / 'output'}")
    print(f"\n\u2705 Ready. Run the next cell to configure your session.")

else:
    # Colab: show location picker + setup button
    _loc_dd = widgets.Dropdown(
        options=_get_location_options(),
        value="drive",
        description="Location:",
        style={"description_width": "70px"},
        layout=widgets.Layout(width="420px"),
    )
    _setup_btn = widgets.Button(
        description="  Set Up Workspace", button_style="info",
        icon="folder-open", layout=widgets.Layout(width="200px", height="36px"),
    )
    _update_btn = widgets.Button(
        description="  Download Latest from GitHub", button_style="warning",
        icon="download", layout=widgets.Layout(width="260px", height="36px"),
    )
    _setup_output = widgets.Output()

    def _on_setup(b):
        with _setup_output:
            clear_output()
            try:
                cb = _resolve_base_path(_loc_dd.value)
                ws = (cb / "peec_awin_workspace").resolve()
                ws.mkdir(parents=True, exist_ok=True)
                for name in ["output", "logs"]:
                    (ws / name).mkdir(parents=True, exist_ok=True)

                sd = ws / "scripts"
                sd.mkdir(parents=True, exist_ok=True)
                existing = sum(1 for f in SCRIPT_FILES if (sd / f).is_file())
                if existing < len(SCRIPT_FILES):
                    _download_scripts(sd)
                else:
                    print(f"\u2705 {existing}/{len(SCRIPT_FILES)} scripts already in workspace.")

                __main__.IN_COLAB = IN_COLAB
                __main__.WORKSPACE_ROOT = ws
                __main__.PATHS = {
                    "scripts": sd,
                    "output": ws / "output",
                    "logs": ws / "logs",
                }
                os.environ["WORKSPACE_ROOT"] = str(ws)

                print(f"\U0001f4c2 Workspace: {ws}")
                print(f"\n\u2705 Ready. Run the next cell to configure your session.")
            except Exception as e:
                print(f"\u274c Error: {e}")

    def _on_update(b):
        with _setup_output:
            clear_output()
            if hasattr(__main__, "PATHS") and __main__.PATHS is not None:
                _download_scripts(Path(__main__.PATHS["scripts"]))
            else:
                print("\u26a0\ufe0f Set up workspace first.")

    _setup_btn.on_click(_on_setup)
    _update_btn.on_click(_on_update)

    display(
        widgets.HTML("<h3>\U0001f4c1 Workspace Setup</h3>"),
        widgets.HTML("<p>Choose where to store project files, then click <b>Set Up Workspace</b>.</p>"),
        _loc_dd,
        widgets.HBox([_setup_btn, _update_btn], layout=widgets.Layout(margin="8px 0 10px 0")),
        _setup_output,
    )

‚úÖ Dependencies installed.
üìÇ Using local scripts from: c:\Users\james\OneDrive\Documents\projects\peec_awin_connector\scripts
üìÇ Workspace: C:\Users\james\OneDrive\Documents\projects\peec_awin_connector\peec_awin_workspace
   Output:  C:\Users\james\OneDrive\Documents\projects\peec_awin_connector\peec_awin_workspace\output

‚úÖ Ready. Run the next cell to configure your session.


In [None]:
# Session Configuration ‚Äî API keys, project, dates, advertiser ID
import __main__
from pathlib import Path

scripts_dir = Path(__main__.PATHS["scripts"])
exec(compile(_read_script(scripts_dir / "cell_01_session_config.py"),
     str(scripts_dir / "cell_01_session_config.py"), "exec"))

‚úÖ Loaded Peec AI key from environment variable 'PEEC_API_KEY'.
‚úÖ Loaded Awin key from environment variable 'AWAPI'.
üîê Both API keys configured.

‚è≥ Loading PEEC projects...
‚úÖ Found 4 project(s).



HTML(value='<div class="peec-header">‚öôÔ∏è Session Configuration</div><div class="peec-sub">Set your parameters o‚Ä¶

Dropdown(description='PEEC Project:', layout=Layout(width='450px'), options={'TechRadar (PITCH)': 'or_0d17ad7f‚Ä¶

Text(value='', description='Awin Advertiser ID:', layout=Layout(width='300px'), placeholder='e.g. 4567', style‚Ä¶

HBox(children=(DatePicker(value=datetime.date(2026, 1, 1), description='Start date:', layout=Layout(width='240‚Ä¶

Button(button_style='success', description='  Confirm Settings', icon='check', layout=Layout(height='36px', wi‚Ä¶

Output()

HTML(value='')

# Step 2: Load Styles & PEEC Client

This cell loads the visual styling and initialises the PEEC API client with your project's lookup tables (prompts, tags, topics, models).

**Run the cell below.** You should see confirmation that the client is ready.

In [3]:
import __main__
from pathlib import Path

scripts_dir = Path(__main__.PATHS["scripts"])

# Load CSS styling
exec(compile(_read_script(scripts_dir / "cell_02_css_styling.py"),
     str(scripts_dir / "cell_02_css_styling.py"), "exec"))

# Load PEEC client, helpers, and lookups
exec(compile(_read_script(scripts_dir / "cell_03_peec_client.py"),
     str(scripts_dir / "cell_03_peec_client.py"), "exec"))

‚úÖ Styles loaded.
‚è≥ Loading lookups for Boots (CUSTOMER)...
‚úÖ Ready ‚Äî 10 prompts, 0 tags, 5 topics, 17 models


# Step 3: Pull PEEC Citation Data

This cell fetches citation data from PEEC AI for your configured date range and project. It pulls:
- Domain classifications
- URL-level citation report broken down by prompt and model

**Click "Pull Data"** to fetch the data. Once complete, you can run the reports below.

In [4]:
import __main__
from pathlib import Path

scripts_dir = Path(__main__.PATHS["scripts"])
exec(compile(_read_script(scripts_dir / "cell_04_peec_data_pull.py"),
     str(scripts_dir / "cell_04_peec_data_pull.py"), "exec"))

HTML(value='<div class="peec-header">üìä Peec AI ‚Äî Citation Data</div><div class="peec-sub">Project: <b>Boots (C‚Ä¶

Button(button_style='info', description='  Pull Data', icon='cloud-download', layout=Layout(height='36px', wid‚Ä¶

Output()

HTML(value='')

# Step 4: PEEC Reports

Two report views from your citation data:

| Report | What it shows |
|--------|---------------|
| **Domain Report** | One row per domain ‚Äî total citations, avg position, unique pages, models |
| **URL Report** | One row per URL ‚Äî with clickable links and prompt counts |

Use the filters to drill down, then download as CSV.

In [5]:
import __main__
from pathlib import Path

scripts_dir = Path(__main__.PATHS["scripts"])
exec(compile(_read_script(scripts_dir / "cell_05_domain_report.py"),
     str(scripts_dir / "cell_05_domain_report.py"), "exec"))

HTML(value='<div class="peec-header">üåê Domain-Level Report</div><div class="peec-sub">One row per domain ‚Äî cit‚Ä¶

HTML(value='<div class="peec-section">Filters</div>')

HBox(children=(Dropdown(description='Page type:', layout=Layout(width='260px'), options=('All', 'ARTICLE', 'CA‚Ä¶

HBox(children=(Text(value='', description='Prompt contains:', layout=Layout(width='380px'), placeholder='e.g. ‚Ä¶

Button(button_style='success', description='  ‚¨á Download CSV', layout=Layout(height='36px', width='160px'), st‚Ä¶

HTML(value='')

HTML(value='')

In [6]:
import __main__
from pathlib import Path

scripts_dir = Path(__main__.PATHS["scripts"])
exec(compile(_read_script(scripts_dir / "cell_06_url_report.py"),
     str(scripts_dir / "cell_06_url_report.py"), "exec"))

HTML(value='<div class="peec-header">üîó URL / Page-Level Report</div><div class="peec-sub">One row per URL ‚Äî ci‚Ä¶

HTML(value='<div class="peec-section">Filters</div>')

HBox(children=(Dropdown(description='Page type:', layout=Layout(width='260px'), options=('All', 'ARTICLE', 'CA‚Ä¶

HBox(children=(Text(value='', description='Prompt contains:', layout=Layout(width='380px'), placeholder='e.g. ‚Ä¶

Text(value='', description='URL contains:', layout=Layout(width='380px'), placeholder='e.g. boots.com', style=‚Ä¶

Button(button_style='success', description='  ‚¨á Download CSV', layout=Layout(height='36px', width='160px'), st‚Ä¶

HTML(value='')

HTML(value='')

# Step 5: Pull Awin Transaction Data

This cell fetches transaction-level data from the Awin API for your configured advertiser and date range.

The Awin API has a 31-day maximum window per request, so the tool automatically chunks larger date ranges.

**Click "Pull Transactions"** to fetch. You can optionally filter by transaction status.

In [7]:
import __main__
from pathlib import Path

scripts_dir = Path(__main__.PATHS["scripts"])
exec(compile(_read_script(scripts_dir / "cell_07_awin_transactions.py"),
     str(scripts_dir / "cell_07_awin_transactions.py"), "exec"))

HTML(value='<div class="peec-header">üìä Awin Transaction Report</div><div class="peec-sub">Pull individual tran‚Ä¶

Dropdown(description='Status:', layout=Layout(width='200px'), options=('All', 'pending', 'approved', 'declined‚Ä¶

HBox(children=(Button(button_style='info', description='  Pull Transactions', icon='cloud-download', layout=La‚Ä¶

HTML(value='')

HTML(value='')

# Step 6: Enriched Report ‚Äî Domain Match + Citations + Transactions

This cell does everything in one step:

1. **Domain Match** ‚Äî Matches PEEC-cited domains to Awin publisher domains using normalised hostnames
2. **Publisher Names** ‚Äî Pulls the Awin publisher report to get accurate publisher names (saved as a separate CSV)
3. **AI Models** ‚Äî Adds which AI models cite each domain (3-character codes)
4. **Exclude Filter** ‚Äî Remove noise domains (e.g. google, facebook)

**Click "Build Enriched Report"** to run. Use the exclude filter to clean up the output.

In [10]:
import __main__
from pathlib import Path

scripts_dir = Path(__main__.PATHS["scripts"])
exec(compile(_read_script(scripts_dir / "cell_09_enriched_report.py"),
     str(scripts_dir / "cell_09_enriched_report.py"), "exec"))

HTML(value='<div class="peec-header">üìä Enriched Report ‚Äî Domain Match + Citations + Transactions</div><div cla‚Ä¶

HTML(value='<div class="peec-section" style="margin-bottom:8px">‚ÑπÔ∏è <b>Models</b> = AI models citing this domai‚Ä¶

SelectMultiple(description='Domain types:', index=(0,), layout=Layout(height='80px', width='300px'), options=(‚Ä¶

HTML(value='<div style="font-size:11px;color:#888;margin:-4px 0 4px 0">Ctrl+click to select types to include (‚Ä¶

Text(value='', description='Exclude domains:', layout=Layout(width='600px'), placeholder='e.g. google, faceboo‚Ä¶

HBox(children=(Button(button_style='info', description='  Build Enriched Report', icon='bar-chart', layout=Lay‚Ä¶

HTML(value='')

HTML(value='')

HTML(value='')

# Step 7: Gap Analysis

This identifies domains and pages that AI models cite frequently but where you have **no Awin publisher relationship** ‚Äî these are your potential recruitment targets.

Use the filters to:
- Focus on specific domain types (e.g. editorial, blog)
- Search for domains containing specific keywords
- Exclude noise domains (google, wikipedia, youtube, etc.)

In [11]:
import __main__
from pathlib import Path

scripts_dir = Path(__main__.PATHS["scripts"])
exec(compile(_read_script(scripts_dir / "cell_10_gap_analysis.py"),
     str(scripts_dir / "cell_10_gap_analysis.py"), "exec"))

HTML(value='<div class="peec-header">üîç Gap Analysis ‚Äî AI-Cited Domains NOT in Awin</div><div class="peec-sub">‚Ä¶

HTML(value='<div class="peec-section">Filters</div>')

HBox(children=(Dropdown(description='Domain type:', layout=Layout(width='260px'), options=('All',), style=Desc‚Ä¶

Text(value='', description='Domain contains:', layout=Layout(width='400px'), placeholder='e.g. boots, lookfant‚Ä¶

Text(value='', description='Exclude domains:', layout=Layout(width='600px'), placeholder='e.g. google, wikiped‚Ä¶



HTML(value='')

HTML(value='')

HTML(value='')