In [1]:
from openai import OpenAI
import json
from tqdm import tqdm
import os
import pandas as pd

api_key = "your_api_key"

instruction = """
You are an expert in polymer science, tasked with applying specialized knowledge to generate a paragraph of brief and insightful caption for polymer. Leveraging the structured data provided, your goal is to combine structured information with expert analysis to enhance the understanding of polymer structures and functions. The output caption should:
Basic Details: Include the polymer's name, chemical class, molecular formula, and synthesis method.
Structural Features: Highlight unique structural aspects, such as specific functional groups or molecular arrangements.
Key Properties: Emphasize notable properties, such as mechanical strength, thermal stability, biocompatibility, or transparency.
Applications: Relate the polymer properties to practical uses in industries or technologies.

Here is the structured polymer JSON Data: 
"""
examples = """
Please refer to the following example and use clear, concise, and scientifically accurate language.

"Poly(1-methylethylene), or polypropylene (PP), is a lightweight, semi-crystalline polyolefin with the molecular formula C3H6 and a formula weight of 42.08 g/mol. Synthesized through the addition polymerization of propylene monomers, it features a unique isotactic arrangement of methyl groups that enhances crystallinity and strength. Known for its exceptional chemical resistance, thermal stability, and low density, PP is widely utilized in packaging, automotive parts, textiles, and household goods. Its durability, affordability, and versatility make it indispensable in applications requiring long-term performance and cost-effectiveness."

"Poly(methyl methacrylate) (PMMA) is a transparent, lightweight polymer belonging to the polyacrylic and polyvinyl classes, with the molecular formula C5H8O2 and a molecular weight of 100.12 g/mol. It is typically synthesized through radical addition polymerization in solution. PMMA features a rigid backbone structure with methyl ester functional groups, contributing to its exceptional optical clarity, mechanical strength, and resistance to UV radiation and weathering. These properties make it indispensable for applications such as optical lenses, protective transparent barriers, signage, and lightweight glass alternatives, particularly in architectural, automotive, and medical industries."

"Poly(lactic acid) (PLA), a biodegradable aliphatic polyester C3H4O2, is synthesized through the ring-opening polymerization of L-lactide. Distinguished by its renewable origin and biocompatibility, PLA features ester linkages that enable controlled biodegradation. It exhibits notable thermal processability and mechanical properties suitable for diverse applications. Widely utilized in biomedical fields for sutures, drug delivery systems, and tissue scaffolds, PLA's eco-friendly and thermoplastic nature also makes it an ideal material for sustainable packaging, disposable goods, and 3D printing."
"""

def process_keys(api_key, keys, json_input, instruction, examples, csv_path):
    client = OpenAI(
        api_key=api_key
    )
    for key_smi in tqdm(keys, desc="Processing polymers"):
        data = json_input[key_smi]
        data_str = json.dumps(data, ensure_ascii=False)
        try:
            completion = client.chat.completions.create(
                model="gpt-4o-2024-08-06",
                messages=[
                    {"role": "system", "content": instruction},
                    {"role": "user", "content": data_str + examples},
                ],
                max_tokens=2048
            )
            description = completion.choices[0].message.content
            # Use double quotes around description and replace internal double quotes with single quotes
            description = description.replace('"', "'")
            description = f'"{description}"'
            with open(csv_path, 'a', encoding='utf-8') as f:
                f.write(f"{key_smi},{description}\n")

        except Exception as e:
            print(f"Error processing {key_smi}: {e}")

# Load JSON input
json_input = json.load(open('json_input.json', 'r', encoding='utf-8'))

smi_set = set(json_input.keys())

csv_path = 'polymer_description.csv'
# Initialize CSV if not exists
if not os.path.exists(csv_path):
    df = pd.DataFrame(columns=['smiles', 'description'])
    df.to_csv(csv_path, index=False)
else:
    df = pd.read_csv(csv_path)

processed_smi_set = set(df['smiles'])

# Remaining smiles to process
test_smiles = list(smi_set - processed_smi_set)

print(f"There are {len(test_smiles)} polymers to be processed.")

# Process keys
process_keys(api_key, test_smiles, json_input, instruction, examples, csv_path)
