# 🔧 Run All Setup Tasks

This notebook automates the common steps:
1. Install dependencies (if running in fresh env)
2. Check `.env`
3. Run BigQuery setup
4. Run Vertex AI Search export/setup
5. Run pytest suite

⚠️ Requires: `.env` populated and `gcloud auth application-default login` done once.

In [3]:
# --- Make repo imports (app/, agents/, data/, shared/) work inside this notebook ---
import os, sys, pathlib, subprocess

def find_repo_root(start: pathlib.Path | None = None) -> pathlib.Path | None:
    """
    Find the repo root by (in order):
    1) GIT root if available
    2) Presence of requirements/pyproject + at least one of the new top-level dirs
    """
    start = start or pathlib.Path().resolve()

    # 1) Try git
    try:
        git_root = subprocess.check_output(
            ["git", "rev-parse", "--show-toplevel"],
            cwd=str(start),
            text=True
        ).strip()
        p = pathlib.Path(git_root)
        if (p / "requirements.txt").exists() or (p / "pyproject.toml").exists():
            return p
    except Exception:
        pass

    # 2) Walk upwards looking for new layout markers
    markers = {"app", "agents", "data", "shared"}
    for p in [start, *start.parents]:
        has_manifest = (p / "requirements.txt").exists() or (p / "pyproject.toml").exists()
        has_any_marker = any((p / m).exists() for m in markers)
        if has_manifest and has_any_marker:
            return p
    return None

root = find_repo_root()
if root is None:
    raise RuntimeError(
        "Could not locate repo root. Ensure this notebook is inside the repo "
        "or set REPO_ROOT to your repo path."
    )

# Optional: allow manual override via env var
root = pathlib.Path(os.getenv("REPO_ROOT", str(root))).resolve()

# 1) Put repo root at the front of sys.path so `import agents`, `import shared`, etc. work
root_str = str(root)
if root_str not in sys.path:
    sys.path.insert(0, root_str)

# 2) Switch working dir to repo root so relative paths work
os.chdir(root_str)

print("Repo root:", root_str)
print("Python exe:", sys.executable)
print("sys.path[0]:", sys.path[0])


Repo root: C:\Users\reube\OneDrive\Desktop\Kaggle\geomarket-insight
Python exe: C:\Users\reube\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe
sys.path[0]: C:\Users\reube\OneDrive\Desktop\Kaggle\geomarket-insight


In [2]:
# ✅ 1. Install requirements (optional if already installed)
!pip install -r requirements.txt




[notice] A new release of pip is available: 24.0 -> 25.2
[notice] To update, run: C:\Users\reube\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


In [2]:
# ✅ 2. Check environment values
from cli import env_check
env_check.main()

✅ Environment OK
PROJECT_ID = abstract-arc-469501-e2
DATASET_NAME = bigquery_comp_dataset
GCS_BUCKET = geomarket-insight-ingest
SEARCH_LOCATION = global
GOOGLE_CLOUD_LOCATION = us-central1
GEMINI_MODEL = gemini-2.5-flash
ADK_LOCATION = us-central1
ONTOLOGY_FILE = shared/schemas/ontology/categories.yaml


In [3]:
# ✅ 3. Run BigQuery setup (dataset + tables)
from data.tasks.bq_materialize import main 
main()


✅ Dataset ready: abstract-arc-469501-e2.bigquery_comp_dataset [us-central1]
✅ poi_entities created
✅ poi_entities search view created
✅ area_indicators created
✅ area_boundaries created
✅ org_locations created
✅ org_locations search view created


In [3]:
from data.publish.setup_search_infra import main
main()


=== Infra for shared/schemas/datastore/org_schema.json ===
→ Ensuring datastore org_location
   projects/597564348526/locations/global/collections/default_collection/dataStores/org_location
→ Upserting schema default_schema on datastore org_location
   schema upserted: default_schema

=== Infra for shared/schemas/datastore/poi_schema.json ===
→ Ensuring datastore poi_location
   projects/597564348526/locations/global/collections/default_collection/dataStores/poi_location
→ Upserting schema default_schema on datastore poi_location
   schema upserted: default_schema

✅ setup_search_infra completed.


In [4]:
from data.publish.setup_search_ingest import main
main()

✅ Bucket already exists: geomarket-insight-ingest
✅ Bucket ready: gs://geomarket-insight-ingest

=== Ingest for shared/schemas/datastore/org_schema.json ===


RuntimeError: [export_table_to_jsonl] Failed export
  table_id      = abstract-arc-469501-e2.bigquery_comp_dataset.org_locations_search
  destination   = gs://geomarket-insight-ingest/search_exports/org_locations_search.jsonl
  location      = us-central1
  user_agent    = 'geomarket-insight'
Original error: NotFound: 404 POST https://bigquery.googleapis.com/bigquery/v2/projects/abstract-arc-469501-e2/jobs?prettyPrint=false: Not found: Dataset abstract-arc-469501-e2:bigquery_comp_dataset