# Competitor Research & Notion Database Population

## Step 0: Setup

1.  **Notion Integration Token:** Create a Notion integration and get your internal integration token. See [Notion's Guide](https://developers.notion.com/docs/create-a-notion-integration). Grant the integration access to the parent page where the database will be created.
2.  **Parent Page ID:** In Notion, create or choose an existing page where you want the new Competitor Database to be created. Get its ID. You can get the page ID from its URL (the last part of the URL, e.g., for `https://www.notion.so/My-Page-abcdef1234567890abcdef1234567890`, the ID is `abcdef1234567890abcdef1234567890`). Make sure this page is shared with your integration.
3.  **Install requirements:** Ensure you have `requirements.txt` installed (`pip install -r requirements.txt`). It should include `google-cloud-aiplatform`, `google-generativeai`, and `notion-client`.

In [1]:
# General Imports
import asyncio
import os
import json
from pandas import *
from dotenv import load_dotenv

load_dotenv()

# Import utility functions (ensure utils.py is in the same directory or accessible in PYTHONPATH)
from utils import research_competitors_async, populate_notion_db_from_folder, create_notion_db_from_schema, CSV_SCHEMA
from notion_client import Client as NotionSyncClient # For DB creation

In [4]:
# --- Config parameters ---
NOTION_API_TOKEN = os.getenv("NOTION_API_TOKEN")  # @param {type:string} TODO: Fill in your Notion Integration Token (e.g., secret_xxxxxxxxxxx)
NOTION_PARENT_PAGE_ID = os.getenv("NOTION_PARENT_PAGE_ID")  # @param {type:string} TODO: Fill in the ID of the Notion page to host the database (e.g., abcdef1234567890abcdef1234567890)
NOTION_DATABASE_NAME = "Compete Analysis DB"  # @param {type:string} Name for the new Notion Database
# Optional: Fill this if DB already exists, otherwise it will be created.
# Format: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx (with or without hyphens)
NOTION_DATABASE_ID = None  # @param {type:string}

COMPETITORS = read_csv("competitors.csv")["Competitor"].tolist()  # @param {type:raw} List of competitor names to research

TOPIC_DOMAIN = "Cloud platforms and AI/ML solutions for enterprises and developers  # @param {type:string} General domain for the research context provided to the LLM"
RESEARCH_GOAL = "To identify key offerings, strengths, weaknesses, target markets, and pricing models for each competitor, to inform Seido's strategic positioning.  # @param {type:string} Specific goal of the research for the LLM"

OUTPUT_FOLDER = "competitor_research_json"  # @param {type:string} Folder to save the intermediate JSON research files

# Ensure output folder exists
os.makedirs(OUTPUT_FOLDER, exist_ok=True)

print(f"Output folder for JSON: {os.path.abspath(OUTPUT_FOLDER)}")
print(f"Schema fields to be used: {len(CSV_SCHEMA)} fields including '{CSV_SCHEMA[0]}' and '{CSV_SCHEMA[-1]}'")

Output folder for JSON: /Users/elenamatay/VSCode projects/compete-automate/competitor_research_json
Schema fields to be used: 53 fields including 'CompetitorID' and 'Notes_QualitativeInsights'


## Step 1: Research Competitors (Async)

This step will use Gemini 1.5 Pro to research each competitor based on the `CSV_SCHEMA` defined in `utils.py`. The results for each competitor will be saved as a JSON file in the `OUTPUT_FOLDER`.

In [5]:
# Let's first do a test run with two single competitors
COMPETITORS = ["Woz", "Rocketable"]  # @param {type:raw} List of competitor names to research

In [6]:
async def main_research():
    if not COMPETITORS:
        print("No competitors listed. Please add competitors to the COMPETITORS list.")
        return
    
    print(f"Starting research for {len(COMPETITORS)} competitors: {COMPETITORS}")
    successful_json_files = await research_competitors_async(
        competitors_list=COMPETITORS,
        topic_domain=TOPIC_DOMAIN,
        research_goal=RESEARCH_GOAL,
        output_folder_path=OUTPUT_FOLDER
    )
    print(f"nResearch complete. {len(successful_json_files)} JSON files created in {OUTPUT_FOLDER}:")
    for f_path in successful_json_files:
        print(f" - {f_path}")

# In a Jupyter notebook, top-level await is generally available in recent versions.
# If you encounter issues, you might need to use asyncio.run(main_research())
# or ensure an event loop is running if in a different async context.
await main_research()

Starting research for 2 competitors: ['Woz', 'Rocketable']
Queueing research for: Woz
Queueing research for: Rocketable
Attempt 1 to research Woz...
Attempt 1 for Woz failed: An unexpected error occurred: When passing a list with SafetySettings objects, every item in a list must be a SafetySetting object.
Retrying...
Attempt 1 to research Rocketable...
Attempt 1 for Rocketable failed: An unexpected error occurred: When passing a list with SafetySettings objects, every item in a list must be a SafetySetting object.
Retrying...
Attempt 2 to research Woz...
Attempt 2 for Woz failed: An unexpected error occurred: When passing a list with SafetySettings objects, every item in a list must be a SafetySetting object.
Retrying...
Attempt 2 to research Rocketable...
Attempt 2 for Rocketable failed: An unexpected error occurred: When passing a list with SafetySettings objects, every item in a list must be a SafetySetting object.
Retrying...
Attempt 3 to research Woz...
Attempt 3 for Woz failed: A

## Step 2: Create or Verify Notion Database

This step will create a new database in your Notion workspace under the specified `NOTION_PARENT_PAGE_ID` if `NOTION_DATABASE_ID` is not already provided. The database columns will match the `CSV_SCHEMA`. 
If `NOTION_DATABASE_ID` is provided, this step will simply confirm its use.

In [None]:
async def setup_notion_database():
    global NOTION_DATABASE_ID # Allow modification of the global variable

    if NOTION_DATABASE_ID:
        print(f"Using existing Notion Database ID: {NOTION_DATABASE_ID}")
        # You might add a check here to ensure the DB exists and is accessible
        return NOTION_DATABASE_ID
    
    if not NOTION_API_TOKEN:
        print("Error: NOTION_API_TOKEN is not set. Cannot create Notion database.")
        return None
    if not NOTION_PARENT_PAGE_ID:
        print("Error: NOTION_PARENT_PAGE_ID is not set. Cannot create Notion database.")
        return None

    print(f"Attempting to create Notion Database titled '{NOTION_DATABASE_NAME}' under parent page ID: {NOTION_PARENT_PAGE_ID}")
    
    sync_notion_client = NotionSyncClient(auth=NOTION_API_TOKEN)
    
    # Competitor Name is the designated title property in our CSV_SCHEMA
    title_property_name_in_schema = "Competitor Name"
    
    # create_notion_db_from_schema is an async function
    new_db_id = await create_notion_db_from_schema(
        notion_sync_client=sync_notion_client, # The function expects a sync client for this operation
        parent_page_id=NOTION_PARENT_PAGE_ID,
        db_title=NOTION_DATABASE_NAME,
        title_property_name=title_property_name_in_schema
    )
    
    if new_db_id:
        print(f"Successfully created/verified Notion Database. New ID: {new_db_id}")
        NOTION_DATABASE_ID = new_db_id # Update the global variable for the next step
    else:
        print("Failed to create Notion Database. Please check logs and Notion settings (API Token, Parent Page ID, sharing permissions).")
        print("You may need to create the database manually in Notion and then set its ID in the NOTION_DATABASE_ID variable.")
    
    return new_db_id

await setup_notion_database()

print(f"nFinal Notion Database ID to be used for population: {NOTION_DATABASE_ID}")

## Step 3: Populate Notion Database

This step will take the JSON files from the `OUTPUT_FOLDER` and populate the Notion database specified by `NOTION_DATABASE_ID`.

In [None]:
async def main_populate():
    if not NOTION_DATABASE_ID:
        print("Error: NOTION_DATABASE_ID is not set. Cannot populate database. Please ensure Step 2 was successful or provide a valid ID.")
        return
    if not NOTION_API_TOKEN:
        print("Error: NOTION_API_TOKEN is not set. Cannot populate Notion database.")
        return
    
    # Check if the output folder has any JSON files from Step 1
    json_files_exist = any(f.endswith('.json') for f in os.listdir(OUTPUT_FOLDER))
    if not json_files_exist:
        print(f"No JSON files found in {OUTPUT_FOLDER}. Run Step 1 to generate them before populating.")
        return
        
    print(f"Populating Notion database ID: {NOTION_DATABASE_ID} from folder: {OUTPUT_FOLDER}")
    await populate_notion_db_from_folder(
        output_folder=OUTPUT_FOLDER,
        database_id=NOTION_DATABASE_ID,
        notion_token=NOTION_API_TOKEN,
        title_field_name="Competitor Name" # This must match the title property in your Notion DB and schema
    )
    print("Population process complete. Check your Notion database.")

await main_populate()

## Next Steps & Considerations

- **Error Handling**: The `utils.py` functions include some error handling (e.g., for LLM retries, JSON parsing, Notion API calls). Review logs for any issues.
- **Notion Property Types**: The `create_notion_db_from_schema` currently creates most columns as 'Rich Text' and 'WebsiteURL' as URL. You can customize `map_data_to_notion_properties` and `create_notion_db_from_schema` in `utils.py` if you need specific Notion property types (Date, Number, Select, Multi-select, etc.) for different fields. This would require updating the `properties` dictionary in those functions according to Notion API specifications.
- **Idempotency**: The `add_json_to_notion_db` function attempts to find existing entries by 'Competitor Name' to update them, making the population step somewhat idempotent. If 'Competitor Name' is not unique or changes, this might lead to duplicates or missed updates. Consider using the `CompetitorID` (if reliably generated and unique) for more robust duplicate checking by modifying the filter in `add_json_to_notion_db`.
- **Rate Limits**: For a very large number of competitors, be mindful of Notion API rate limits. The `async` processing helps with I/O bound tasks, but Notion might still throttle if too many requests are made too quickly. The `notion-client` library has some built-in retry mechanisms.
- **LLM Prompting**: The prompt in `research_competitor_to_json` is generic. You might want to fine-tune it further for your specific needs or if the LLM struggles with certain fields (e.g., providing more examples, or being more prescriptive about the format of list-based fields).
- **Authentication**: Ensure your Google Cloud ADC is set up correctly if running locally and that the Gemini API is enabled in your project.