# Contextualizing Conversations with OpenAI (Async)

This notebook processes JSON files containing conversations and generates a short cultural context summary for each conversation.  
It uses **asynchronous requests** to the OpenAI API for efficiency, allowing multiple files to be processed in parallel.  

In [7]:
import os
import json
import asyncio
from openai import AsyncOpenAI

### 1. API Client Setup
We initialize the **AsyncOpenAI client** using our API key.  
This client allows us to make non-blocking calls to the OpenAI API, which is useful when processing multiple files simultaneously.

In [13]:
client = AsyncOpenAI(api_key='YOUR-API-KEY')

input_folder = "../Data Cleaning 2/Json Files"
output_folder = "./context"
os.makedirs(output_folder, exist_ok=True)

# Limit concurrent requests to avoid hitting rate limits
semaphore = asyncio.Semaphore(10)

### Extracting Cultural Context
This function:
1. Builds a **system prompt** instructing the model to summarize only the *cultural context* of a conversation.  
2. Joins the conversation messages into a single text block.  
3. Sends the request asynchronously to the model.  
4. Returns the cultural context summary.  

In [14]:
async def extract_context(conversation, country, language):
    system_prompt = (
        f"You are an assistant that summarizes only the cultural context of a conversation from {country}. "
        f"Return a short and precise sentence. Do not include any user details, no message content, and no summaries of what was said. "
        f"Focus only on cultural factors that might influence the communication tone, expectations, or interaction style in {country}. "
        f"Keep the output in {language}, and keep it under 100 words."
    )

    full_text = "\n".join([f"{m['role']}: {m['content']}" for m in conversation if m['role'] in ['user', 'bot'] and m['content'].strip()])

    user_prompt = f"Conversation:\n{full_text}\n\nContext summary:"

    async with semaphore:
        response = await client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt}
            ],
            temperature=0.5,
        )

    return response.choices[0].message.content.strip()

### Processing Individual Files
This function:
- Loads the JSON file.  
- Extracts metadata like `country`, `language`, and `conversation_id`.  
- Skips files without a valid `country`.  
- Calls `extract_context` to get the cultural summary.  
- Returns a dictionary with the results.  

In [15]:
async def process_file(file_path):
    with open(file_path, "r", encoding="utf-8") as f:
        data = json.load(f)

    country = data.get("country")
    if not country or str(country).lower() == "null":
        print(f"⏭️ Skipped (no country): {file_path}")
        return None

    conversation_id = data.get("conversation_id", "unknown_id")
    language = data.get("language", "en")
    messages = data.get("messages", [])

    try:
        context = await extract_context(messages, country, language)
        print(f"✅ Processed: {conversation_id}")
        return {
            "conversation_id": conversation_id,
            "context": context
        }
    except Exception as e:
        print(f"❌ Error with {conversation_id}: {e}")
        return None


### Main Function
The `main` function:
1. Lists all `.json` files in the input folder (limited to 10 for testing).  
2. Processes them in parallel using `asyncio.gather`.  
3. Saves the results to a new JSON file.  

In [16]:
async def main():
    files = os.listdir(input_folder)
    json_files = [f for f in files if f.endswith(".json")][:10]

    tasks = [
        process_file(os.path.join(input_folder, file))
        for file in json_files
    ]

    results = await asyncio.gather(*tasks)
    results = [r for r in results if r]  # Remove None results

    output_path = os.path.join(output_folder, "context_summary.json")
    with open(output_path, "w", encoding="utf-8") as f:
        json.dump(results, f, indent=2, ensure_ascii=False)

    print("\n🏁 Processing complete. Results saved in:", output_path)

### Run the Script
This block starts the asynchronous workflow and processes all files.

In [17]:
await main()

⏭️ Skipped (no country): ../Data Cleaning 2/Json Files\conversation4.json
✅ Processed: conversation3
✅ Processed: conversation5
✅ Processed: conversation6
✅ Processed: conversation7
✅ Processed: conversation8
✅ Processed: conversation10
✅ Processed: conversation1
✅ Processed: conversation2
✅ Processed: conversation9

🏁 Processing complete. Results saved in: ./context\context_summary.json
