# Competitor Research & Notion Database Population

## Step 0: Setup

1.  **Notion Integration Token:** Create a Notion integration and get your internal integration token. See [Notion's Guide](https://developers.notion.com/docs/create-a-notion-integration). Grant the integration access to the parent page where the database will be created.
2.  **Parent Page ID:** In Notion, create or choose an existing page where you want the new Competitor Database to be created. Get its ID. You can get the page ID from its URL (the last part -32 characters- of the URL, e.g., for `https://www.notion.so/My-Page-abcdef1234567890abcdef1234567890`, the ID is `abcdef1234567890abcdef1234567890`). Make sure this page is shared with your integration.
3.  **Install requirements:** Ensure you have `requirements.txt` installed (`pip install -r requirements.txt`). It should include `google-cloud-aiplatform`, `google-generativeai`, and `notion-client`.
4. **Login into Google Cloud** with ADC: Run `gcloud auth application-default login`

In [20]:
# General Imports
import asyncio
import os
import json
from pandas import *
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv('.env')

# Import Vertex AI SDK
import vertexai.generative_models as generative_models
from vertexai.generative_models import Tool, GenerationConfig

# Import utility functions (ensure utils.py is in the same directory or accessible in PYTHONPATH)
from utils import *
from notion_client import Client as NotionSyncClient # For DB creation

In [11]:
# --- Config parameters ---
NOTION_API_TOKEN = os.getenv("NOTION_API_TOKEN")  # @param {type:string} TODO: Fill in your Notion Integration Token (e.g., secret_xxxxxxxxxxx)
NOTION_PARENT_PAGE_ID = os.getenv("NOTION_PARENT_PAGE_ID")  # @param {type:string} TODO: Fill in the ID of the Notion page to host the database (e.g., abcdef1234567890abcdef1234567890)
NOTION_DATABASE_ID = os.getenv("NOTION_DATABASE_ID")  # @param {type:string} TODO: Fill in the ID of the Notion Database (e.g., xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)

# Defaults from env with config.json fallback
NOTION_DATABASE_NAME = os.getenv("NOTION_DATABASE_NAME", "")

COMPETITORS = read_csv("competitors.csv")["Competitor"].tolist()  # @param {type:raw} List of competitor names to research

OUTPUT_FOLDER = os.getenv("OUTPUT_FOLDER", "")

# Load non-sensitive defaults from config.json (initial_research) if not set in env
try:
    with open('config.json', 'r') as f:
        cfg = json.load(f)
        initial = cfg.get('initial_research', {})
        if not NOTION_DATABASE_NAME:
            NOTION_DATABASE_NAME = initial.get('notion_database_name', 'Compete Analysis DB')
        if not OUTPUT_FOLDER:
            OUTPUT_FOLDER = initial.get('output_folder', 'competitor_research_json')
except Exception:
    if not NOTION_DATABASE_NAME:
        NOTION_DATABASE_NAME = 'Compete Analysis DB'
    if not OUTPUT_FOLDER:
        OUTPUT_FOLDER = 'competitor_research_json'

COMPANY_CONTEXT = os.getenv("COMPANY_CONTEXT", "")
if not COMPANY_CONTEXT:
    try:
        with open('config.json', 'r') as f:
            cfg = json.load(f)
            context = cfg.get('initial_research', {}).get('company_context', '')
            if isinstance(context, list):
                COMPANY_CONTEXT = '\n'.join(context)
            else:
                COMPANY_CONTEXT = context.strip()
    except Exception:
        COMPANY_CONTEXT = ""
if not COMPANY_CONTEXT:
    raise ValueError("COMPANY_CONTEXT is not set in env or config.json.")

# Ensure output folder exists
os.makedirs(OUTPUT_FOLDER, exist_ok=True)

print(f"Output folder for JSON: {os.path.abspath(OUTPUT_FOLDER)}")
print(f"Schema fields to be used: {len(CSV_SCHEMA)} fields including '{CSV_SCHEMA[0]}' and '{CSV_SCHEMA[-1]}'")

# Verify critical environment variables
print("\nEnvironment Variables Check:")
print(f"NOTION_API_TOKEN: {'Set' if NOTION_API_TOKEN else 'Not Set'}")
print(f"NOTION_PARENT_PAGE_ID: {'Set' if NOTION_PARENT_PAGE_ID else 'Not Set'}")
print(f"COMPANY_CONTEXT: {'Set' if COMPANY_CONTEXT else 'Not Set'}")

Output folder for JSON: /Users/elenamatay/Cursor Projects/compete-automate-notion-brainster/competitor_research_json
Schema fields to be used: 53 fields including 'Competitor Name' and 'Notes_QualitativeInsights'

Environment Variables Check:
NOTION_API_TOKEN: Set
NOTION_PARENT_PAGE_ID: Set
COMPANY_CONTEXT: Set


## Step 1: Research Competitors (Async)

This step will use Gemini 1.5 Pro to research each competitor based on the `CSV_SCHEMA` defined in `utils.py`. The results for each competitor will be saved as a JSON file in the `OUTPUT_FOLDER`.

In [3]:
# Configure the search tool using Vertex AI
search_tool = Tool.from_dict({
    "google_search": {}
})

# Use Vertex AI's GenerationConfig
config = GenerationConfig(
    temperature=0.1,
    top_p=1.0
)

request_args = {
    "generation_config": config,
    "tools": [search_tool],
    "stream": False
}

async def main_research():
    if not COMPETITORS:
        print("No competitors listed. Please add competitors to the COMPETITORS list.")
        return
    
    print(f"Starting research for {len(COMPETITORS)} competitors: {COMPETITORS}")
    successful_json_files = await research_competitors_async(
        competitors_list=COMPETITORS,
        output_folder_path=OUTPUT_FOLDER,
        company_context=COMPANY_CONTEXT,
        request_args=request_args  # Pass the configured request args
    )
    print(f"\nResearch complete. {len(successful_json_files)} JSON files created in {OUTPUT_FOLDER}:")
    for f_path in successful_json_files:
        print(f" - {f_path}")

await main_research()

Starting research for 4 competitors: ['AppFolio', 'Arteco Fincas', 'Avail', 'Buildium']
[11:41:03] Queueing research for: AppFolio
[11:41:03] Queueing research for: Arteco Fincas
[11:41:03] Queueing research for: Avail
[11:41:03] Queueing research for: Buildium


Falling back to grpc since no async rest credentials were detected.
Falling back to grpc since no async rest credentials were detected.
Falling back to grpc since no async rest credentials were detected.
Falling back to grpc since no async rest credentials were detected.


[11:41:03] Attempt 1 to research AppFolio...
[11:41:03] Attempt 1 to research Arteco Fincas...
[11:41:03] Attempt 1 to research Avail...
[11:41:03] Attempt 1 to research Buildium...
[11:41:51] Successfully researched and saved data for Arteco Fincas to competitor_research_json/Arteco_Fincas.json
[11:41:58] Successfully researched and saved data for Avail to competitor_research_json/Avail.json
[11:42:00] Successfully researched and saved data for AppFolio to competitor_research_json/AppFolio.json
[11:42:08] Successfully researched and saved data for Buildium to competitor_research_json/Buildium.json
Finished researching all competitors. 4 successful out of 4.

Research complete. 4 JSON files created in competitor_research_json:
 - competitor_research_json/AppFolio.json
 - competitor_research_json/Arteco_Fincas.json
 - competitor_research_json/Avail.json
 - competitor_research_json/Buildium.json


## Step 2: Create or Verify Notion Database

This step will create a new database in your Notion workspace under the specified `NOTION_PARENT_PAGE_ID` if `NOTION_DATABASE_ID` is not already provided. The database columns will match the `CSV_SCHEMA`. 
If `NOTION_DATABASE_ID` is provided, this step will simply confirm its use.

In [3]:
NOTION_DATABASE_ID = ""

In [3]:
# Step 2: Create or verify Notion database
new_db_id = await setup_notion_database(
    notion_token=NOTION_API_TOKEN,
    parent_page_id=NOTION_PARENT_PAGE_ID,
    database_name=NOTION_DATABASE_NAME,
    database_id=NOTION_DATABASE_ID
)

print(f"\nFinal Notion Database ID to be used for population: {NOTION_DATABASE_ID}")

Attempting to create Notion Database titled 'Compete Analysis DB' under parent page ID: 278ffe52f86180f18cf0dce58835ead1
Creating Notion database 'Compete Analysis DB' under page ID 278ffe52f86180f18cf0dce58835ead1...
Successfully set property order for database 278ffe52-f861-8147-b069-e117f3ce4b7d
Successfully created Notion database with ID: 278ffe52-f861-8147-b069-e117f3ce4b7d
Link: https://www.notion.so/278ffe52f8618147b069e117f3ce4b7d
Successfully created Notion Database. New ID: 278ffe52-f861-8147-b069-e117f3ce4b7d
Link: https://www.notion.so/278ffe52f8618147b069e117f3ce4b7d

Final Notion Database ID to be used for population: 


## Step 3: Populate Notion Database

This step will take the JSON files from the `OUTPUT_FOLDER` and populate the Notion database specified by `NOTION_DATABASE_ID`.

In [12]:
# Step 3: Populate Notion database

async def main_populate():
    if not NOTION_DATABASE_ID:
        print("Error: NOTION_DATABASE_ID is not set. Cannot populate database. Please ensure Step 2 was successful or provide a valid ID.")
        return
    if not NOTION_API_TOKEN:
        print("Error: NOTION_API_TOKEN is not set. Cannot populate Notion database.")
        return
    
    # Check if the output folder has any JSON files from Step 1
    json_files_exist = any(f.endswith('.json') for f in os.listdir(OUTPUT_FOLDER))
    if not json_files_exist:
        print(f"No JSON files found in {OUTPUT_FOLDER}. Run Step 1 to generate them before populating.")
        return
        
    print(f"Populating Notion database ID: {NOTION_DATABASE_ID} from folder: {OUTPUT_FOLDER}")
    await populate_notion_db_from_folder(
        output_folder=OUTPUT_FOLDER,
        database_id=NOTION_DATABASE_ID,
        notion_token=NOTION_API_TOKEN
    )
    print("Population process complete. Check your Notion database.")

await main_populate()

Populating Notion database ID: 278ffe52-f861-8147-b069-e117f3ce4b7d from folder: competitor_research_json
Competitor 'Arteco Fincas' already exists (ID: 278ffe52-f861-8136-b7af-f6b793574b0c). Updating.
Competitor 'Buildium' already exists (ID: 278ffe52-f861-8119-b335-d23eb2d8cda3). Updating.
Successfully updated 'Buildium' in Notion.
Successfully updated 'Arteco Fincas' in Notion.
Competitor 'Avail' already exists (ID: 278ffe52-f861-81a5-aeff-c7a40a320007). Updating.
Successfully updated 'Avail' in Notion.
Competitor 'AppFolio' already exists (ID: 278ffe52-f861-817c-bc44-d0d84e2776fa). Updating.
Successfully updated 'AppFolio' in Notion.
Finished populating Notion database. 4/4 competitors processed successfully.
Population process complete. Check your Notion database.


## Step 4: Update Competitors Database

In [None]:
!python update_competitor_research.py

Found 4 competitors to check for updates...

Searching for new competitors...
Performing full re-research for 'AppFolio'...
Performing full re-research for 'Buildium'...
Performing full re-research for 'Arteco Fincas'...
Performing full re-research for 'Avail'...
Discovery complete. Found 2 potential new competitors.
Successfully updated research for 'Avail'.
Successfully updated research for 'Buildium'.
Successfully updated research for 'AppFolio'.
Attempt 1 failed for 'Arteco Fincas': Unterminated string starting at: line 71 column 16 (char 6436)
Successfully updated research for 'Arteco Fincas'.

Generating final executive summary of top changes...
