# Competitor Research & Notion Database Population

## Step 0: Setup

1.  **Notion Integration Token:** Create a Notion integration and get your internal integration token. See [Notion's Guide](https://developers.notion.com/docs/create-a-notion-integration). Grant the integration access to the parent page where the database will be created.
2.  **Parent Page ID:** In Notion, create or choose an existing page where you want the new Competitor Database to be created. Get its ID. You can get the page ID from its URL (the last part of the URL, e.g., for `https://www.notion.so/My-Page-abcdef1234567890abcdef1234567890`, the ID is `abcdef1234567890abcdef1234567890`). Make sure this page is shared with your integration.
3.  **Install requirements:** Ensure you have `requirements.txt` installed (`pip install -r requirements.txt`). It should include `google-cloud-aiplatform`, `google-generativeai`, and `notion-client`.

In [1]:
# General Imports
import asyncio
import os
from pandas import *
from dotenv import load_dotenv

load_dotenv()

# Import Vertex AI SDK
import vertexai.generative_models as generative_models
from vertexai.generative_models import Tool, GenerationConfig

# Import utility functions (ensure utils.py is in the same directory or accessible in PYTHONPATH)
from utils import *
from notion_client import Client as NotionSyncClient # For DB creation

In [2]:
# --- Config parameters ---
NOTION_API_TOKEN = os.getenv("NOTION_API_TOKEN")  # @param {type:string} TODO: Fill in your Notion Integration Token (e.g., secret_xxxxxxxxxxx)
NOTION_PARENT_PAGE_ID = os.getenv("NOTION_PARENT_PAGE_ID")  # @param {type:string} TODO: Fill in the ID of the Notion page to host the database (e.g., abcdef1234567890abcdef1234567890)
NOTION_DATABASE_NAME = "Compete Analysis DB"  # @param {type:string} Name for the new Notion Database
NOTION_DATABASE_ID = os.getenv("NOTION_DATABASE_ID")  # @param {type:string} TODO: Fill in the ID of the Notion Database (e.g., xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)

COMPETITORS = read_csv("competitors.csv")["Competitor"].tolist()  # @param {type:raw} List of competitor names to research

TOPIC_DOMAIN = "AI/ML solutions"  # @param {type:string} General domain for the research context provided to the LLM
RESEARCH_GOAL = "To identify key offerings, strengths, weaknesses, target markets, and pricing models for each competitor, to inform our startup's strategic positioning."   # @param {type:string} Specific goal of the research for the LLM

OUTPUT_FOLDER = "competitor_research_json"  # @param {type:string} Folder to save the intermediate JSON research files

# Ensure output folder exists
os.makedirs(OUTPUT_FOLDER, exist_ok=True)

print(f"Output folder for JSON: {os.path.abspath(OUTPUT_FOLDER)}")
print(f"Schema fields to be used: {len(CSV_SCHEMA)} fields including '{CSV_SCHEMA[0]}' and '{CSV_SCHEMA[-1]}'")

# Verify critical environment variables
print("\nEnvironment Variables Check:")
print(f"NOTION_API_TOKEN: {'Set' if NOTION_API_TOKEN else 'Not Set'}")
print(f"NOTION_PARENT_PAGE_ID: {'Set' if NOTION_PARENT_PAGE_ID else 'Not Set'}")

Output folder for JSON: /Users/elenamatay/VSCode projects/compete-automate/competitor_research_json
Schema fields to be used: 56 fields including 'CompetitorID' and 'Notes_QualitativeInsights'

Environment Variables Check:
NOTION_API_TOKEN: Set
NOTION_PARENT_PAGE_ID: Set


## Step 1: Research Competitors (Async)

This step will use Gemini 1.5 Pro to research each competitor based on the `CSV_SCHEMA` defined in `utils.py`. The results for each competitor will be saved as a JSON file in the `OUTPUT_FOLDER`.

In [3]:
# Let's first do a test run with two single competitors
COMPETITORS = ["Woz", "Rocketable"]  # @param {type:raw} List of competitor names to research

In [4]:
# Configure the search tool using Vertex AI
search_tool = Tool.from_dict({
    "google_search": {}
})

# Use Vertex AI's GenerationConfig
config = GenerationConfig(
    temperature=0.1,
    top_p=1.0
)

request_args = {
    "generation_config": config,
    "tools": [search_tool],
    "stream": False
}

async def main_research():
    if not COMPETITORS:
        print("No competitors listed. Please add competitors to the COMPETITORS list.")
        return
    
    print(f"Starting research for {len(COMPETITORS)} competitors: {COMPETITORS}")
    successful_json_files = await research_competitors_async(
        competitors_list=COMPETITORS,
        topic_domain=TOPIC_DOMAIN,
        research_goal=RESEARCH_GOAL,
        output_folder_path=OUTPUT_FOLDER,
        request_args=request_args  # Pass the configured request args
    )
    print(f"\nResearch complete. {len(successful_json_files)} JSON files created in {OUTPUT_FOLDER}:")
    for f_path in successful_json_files:
        print(f" - {f_path}")

await main_research()

Starting research for 2 competitors: ['Woz', 'Rocketable']
Queueing research for: Woz
Queueing research for: Rocketable
Attempt 1 to research Woz...
Attempt 1 to research Rocketable...
Successfully researched and saved data for Woz to competitor_research_json/Woz.json
Successfully researched and saved data for Rocketable to competitor_research_json/Rocketable.json
Finished researching all competitors. 2 successful out of 2.

Research complete. 2 JSON files created in competitor_research_json:
 - competitor_research_json/Woz.json
 - competitor_research_json/Rocketable.json


## Step 2: Create or Verify Notion Database

This step will create a new database in your Notion workspace under the specified `NOTION_PARENT_PAGE_ID` if `NOTION_DATABASE_ID` is not already provided. The database columns will match the `CSV_SCHEMA`. 
If `NOTION_DATABASE_ID` is provided, this step will simply confirm its use.

In [5]:
# Step 2: Create or verify Notion database
new_db_id = await setup_notion_database(
    notion_token=NOTION_API_TOKEN,
    parent_page_id=NOTION_PARENT_PAGE_ID,
    database_name=NOTION_DATABASE_NAME,
    database_id=NOTION_DATABASE_ID
)

print(f"\nFinal Notion Database ID to be used for population: {NOTION_DATABASE_ID}")

Using existing Notion Database ID: 20c0e4393c5481ea8b72d21c4a351dbe

Final Notion Database ID to be used for population: 20c0e4393c5481ea8b72d21c4a351dbe


## Step 3: Populate Notion Database

This step will take the JSON files from the `OUTPUT_FOLDER` and populate the Notion database specified by `NOTION_DATABASE_ID`.

In [6]:
async def main_populate():
    if not NOTION_DATABASE_ID:
        print("Error: NOTION_DATABASE_ID is not set. Cannot populate database. Please ensure Step 2 was successful or provide a valid ID.")
        return
    if not NOTION_API_TOKEN:
        print("Error: NOTION_API_TOKEN is not set. Cannot populate Notion database.")
        return
    
    # Check if the output folder has any JSON files from Step 1
    json_files_exist = any(f.endswith('.json') for f in os.listdir(OUTPUT_FOLDER))
    if not json_files_exist:
        print(f"No JSON files found in {OUTPUT_FOLDER}. Run Step 1 to generate them before populating.")
        return
        
    print(f"Populating Notion database ID: {NOTION_DATABASE_ID} from folder: {OUTPUT_FOLDER}")
    await populate_notion_db_from_folder(
        output_folder=OUTPUT_FOLDER,
        database_id=NOTION_DATABASE_ID,
        notion_token=NOTION_API_TOKEN,
        title_field_name="Competitor Name" # This must match the title property in your Notion DB and schema
    )
    print("Population process complete. Check your Notion database.")

await main_populate()

Populating Notion database ID: 20c0e4393c5481ea8b72d21c4a351dbe from folder: competitor_research_json
Competitor 'Rocketable' already exists (ID: 20c0e439-3c54-81fb-af98-da981acc4bd4). Updating.
Competitor 'Woz' already exists (ID: 20c0e439-3c54-8111-8e9c-ec3057acf5c0). Updating.
Successfully updated 'Woz' in Notion.
Successfully updated 'Rocketable' in Notion.
Finished populating Notion database. 2/2 competitors processed successfully.
Population process complete. Check your Notion database.
