# Quick Start with AutoDDG

This notebook demonstrates how to generate dataset descriptions, expand them for search, and evaluate their quality using **AutoDDG**.

---

## 1. Imports

In [None]:
import pandas as pd
from openai import OpenAI

from autoddg import AutoDDG, GPTEvaluator
from autoddg.utils import get_sample

## 2. Initialisation of the OpenAI Client

In [None]:
my_api_key = "YOUR_OPENAI_API_KEY"  # Replace with your key
client = OpenAI(api_key=my_api_key)
model_name = "gpt-4o-mini"

## 3. Load Dataset and Prepare Context

Here we sample rows, profile the dataset, extract semantic information, and generate a short topic.

In [None]:
# Instantiate AutoDDG
auto_ddg = AutoDDG(client=client, model_name=model_name)

# Load dataset
csv_file = "clark_dataset.csv"
title = "Renal Cell Carcinoma"
original_description = (
    "This study reports a large-scale proteogenomic analysis of ccRCC to discern the functional impact "
    "of genomic alterations and provides evidence for rational treatment selection stemming from ccRCC pathobiology"
)
csv_df = pd.read_csv(csv_file)

# Sample rows
sample_df, dataset_sample = get_sample(csv_df, sample_size=100)

# Generate profiles
basic_profile, structural_profile = auto_ddg.profile_dataframe(csv_df)
semantic_profile_details = auto_ddg.analyze_semantics(sample_df)
semantic_profile = "\n".join(
    section for section in [structural_profile, semantic_profile_details] if section
)

# Generate topic
data_topic = auto_ddg.generate_topic(
    title=title,
    original_description=original_description,
    dataset_sample=dataset_sample,
)

## 4. Generate Descriptions

We create both a **general dataset description** and a **search-focused description**.

In [None]:
# General description
prompt, description = auto_ddg.describe_dataset(
    dataset_sample=dataset_sample,
    dataset_profile=basic_profile,
    use_profile=True,
    semantic_profile=semantic_profile,
    use_semantic_profile=True,
    data_topic=data_topic,
    use_topic=True,
)

# Search-focused description
search_prompt, search_focused_description = auto_ddg.expand_description_for_search(
    description=description,
    topic=data_topic,
)

###  General Description

In [None]:
description

###  Search-Focused Description

In [None]:
search_focused_description

## 5. Evaluate Quality

Finally, we use the evaluator to score both descriptions.

In [None]:
# Attach evaluator
auto_ddg.set_evaluator(GPTEvaluator(gpt4_api_key=my_api_key))

# Score descriptions
general_score = auto_ddg.evaluate_description(description)
search_score = auto_ddg.evaluate_description(search_focused_description)

print("Score of the general description:", general_score)
print("Score of the search-focused description:", search_score)