# Prompt Visualization

# Prompt Crafting

In [1]:
from utils.pipeline_utils import load_sql_generator
from utils.llm_utils import load_sql_model

In [14]:
from openai import OpenAI

client = OpenAI()

META_PROMPT = """
Given a current prompt and a change description, produce a detailed system prompt to guide a language model in completing the task effectively.

Your final output will be the full corrected prompt verbatim. However, before that, at the very beginning of your response, use <reasoning> tags to analyze the prompt and determine the following, explicitly:
<reasoning>
- Reasoning: (yes/no) Does the current prompt use reasoning, analysis, or chain of thought? 
    - Identify: (max 10 words) if so, which section(s) utilize reasoning?
    - Conclusion: (yes/no) is the chain of thought used to determine a conclusion?
    - Ordering: (before/after) is the chain of though located before or after 
- Structure: (yes/no) does the input prompt have a well defined structure
- Examples: (yes/no) does the input prompt have few-shot examples
    - Representative: (1-5) if present, how representative are the examples?
- Complexity: (1-5) how complex is the input prompt?
    - Task: (1-5) how complex is the implied task?
    - Necessity: ()
- Specificity: (1-5) how detailed and specific is the prompt? (not to be confused with length)
- Prioritization: (list) what 1-3 categories are the MOST important to address.
- Conclusion: (max 30 words) given the previous assessment, give a very concise, imperative description of what should be changed and how. this does not have to adhere strictly to only the categories listed
</reasoning>
    
# Guidelines

- Understand the Task: Grasp the main objective, goals, requirements, constraints, and expected output.
- Minimal Changes: If an existing prompt is provided, improve it only if it's simple. For complex prompts, enhance clarity and add missing elements without altering the original structure.
- Reasoning Before Conclusions**: Encourage reasoning steps before any conclusions are reached. ATTENTION! If the user provides examples where the reasoning happens afterward, REVERSE the order! NEVER START EXAMPLES WITH CONCLUSIONS!
    - Reasoning Order: Call out reasoning portions of the prompt and conclusion parts (specific fields by name). For each, determine the ORDER in which this is done, and whether it needs to be reversed.
    - Conclusion, classifications, or results should ALWAYS appear last.
- Examples: Include high-quality examples if helpful, using placeholders [in brackets] for complex elements.
   - What kinds of examples may need to be included, how many, and whether they are complex enough to benefit from placeholders.
- Clarity and Conciseness: Use clear, specific language. Avoid unnecessary instructions or bland statements.
- Formatting: Use markdown features for readability. DO NOT USE ``` CODE BLOCKS UNLESS SPECIFICALLY REQUESTED.
- Preserve User Content: If the input task or prompt includes extensive guidelines or examples, preserve them entirely, or as closely as possible. If they are vague, consider breaking down into sub-steps. Keep any details, guidelines, examples, variables, or placeholders provided by the user.
- Constants: DO include constants in the prompt, as they are not susceptible to prompt injection. Such as guides, rubrics, and examples.
- Output Format: Explicitly the most appropriate output format, in detail. This should include length and syntax (e.g. short sentence, paragraph, JSON, etc.)
    - For tasks outputting well-defined or structured data (classification, JSON, etc.) bias toward outputting a JSON.
    - JSON should never be wrapped in code blocks (```) unless explicitly requested.

The final prompt you output should adhere to the following structure below. Do not include any additional commentary, only output the completed system prompt. SPECIFICALLY, do not include any additional messages at the start or end of the prompt. (e.g. no "---")

[Concise instruction describing the task - this should be the first line in the prompt, no section header]

[Additional details as needed.]

[Optional sections with headings or bullet points for detailed steps.]

# Steps [optional]

[optional: a detailed breakdown of the steps necessary to accomplish the task]

# Output Format

[Specifically call out how the output should be formatted, be it response length, structure e.g. JSON, markdown, etc]

# Examples [optional]

[Optional: 1-3 well-defined examples with placeholders if necessary. Clearly mark where examples start and end, and what the input and output are. User placeholders as necessary.]
[If the examples are shorter than what a realistic example is expected to be, make a reference with () explaining how real examples should be longer / shorter / different. AND USE PLACEHOLDERS! ]

# Notes [optional]

[optional: edge cases, details, and an area to call or repeat out specific important considerations]
[NOTE: you must start with a <reasoning> section. the immediate next token you produce should be <reasoning>]
""".strip()

def generate_prompt(task_or_prompt: str):
    completion = client.chat.completions.create(
        model="gpt-4o",
        temperature=0.5,
        messages=[
            {
                "role": "system",
                "content": META_PROMPT,
            },
            {
                "role": "user",
                "content": "Current Prompt:\n" + task_or_prompt,
            },
        ],
    )

    return completion.choices[0].message.content

## Direct Prompt

In [4]:
model_name = 'gpt-4o-mini-2024-07-18'
promptV_gen = 'dir_v8'
sql_gen_method = 'direct'

model = load_sql_model(model_name,)
# Load the SQL generation model
sql_gen = load_sql_generator(
    model=model,
    promptV_gen=promptV_gen,
    sql_gen_method=sql_gen_method
)


2025-06-10 03:28:49,227 - INFO - Loading model gpt-4o-mini-2024-07-18 with temperature=0.0, max_tokens=4000, top_p=None


In [4]:
print(sql_gen.get_prompt(["table"]))



As a SQL expert with a willingness to assist users, you are tasked with crafting a PostgreSQL query for the Automatic Learning for the Rapid Classification of Events (ALeRCE) Database in 2023. This database serves as a repository for information about individual spatial objects, encompassing various statistics, properties, detections, and features observed by survey telescopes.
The tables within the database are categorized into three types: time and band independent (e.g., object, probability), time-independent (e.g., magstats), and time and band-dependent (e.g., detection). Your role involves carefully analyzing user requests, considering the specifics of the given tables. It is crucial to pay attention to explicit conditions outlined by the user and always maintain awareness of the broader context.
ALeRCE processes data from the alert stream of the Zwicky Transient Facility (ZTF), so unless a specific catalog is mentioned, the data is from ZTF, including the object, candidate and 

In [None]:


new_direct_prompt = generate_prompt(sql_gen.get_prompt(["table"]))
print(new_direct_prompt)



2025-06-09 06:01:53,023 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


<reasoning>
- Reasoning: yes, the prompt uses reasoning in the "Context" section.
    - Identify: Context section
    - Conclusion: yes, the chain of thought is used to determine a conclusion.
    - Ordering: before, the reasoning is located before the conclusion.
- Structure: yes, the input prompt has a well-defined structure.
- Examples: no, the input prompt does not have few-shot examples.
    - Representative: 0, as there are no examples present.
- Complexity: 4
    - Task: 4, the task involves crafting complex SQL queries with specific conditions.
    - Necessity: The complexity is necessary due to the detailed requirements and context.
- Specificity: 4, the prompt is detailed and specific about the task and context.
- Prioritization: Reasoning, Structure, Specificity
- Conclusion: Ensure reasoning precedes conclusions, add examples for clarity, and maintain structure.
</reasoning>

Craft a PostgreSQL query for the ALeRCE Database in 2023 based on user requests. Analyze user reque

In [12]:
print(new_direct_prompt)

<reasoning>
- Simple Change: yes
</reasoning>

As a SQL expert with a willingness to assist users, you are tasked with crafting a PostgreSQL query for the Automatic Learning for the Rapid Classification of Events (ALeRCE) Database in 2023. This database serves as a repository for information about individual spatial objects, encompassing various statistics, properties, detections, and features observed by survey telescopes. The tables within the database are categorized into three types: time and band independent (e.g., object, probability), time-independent (e.g., magstats), and time and band-dependent (e.g., detection). Your role involves carefully analyzing user requests, considering the specifics of the given tables. It is crucial to pay attention to explicit conditions outlined by the user and always maintain awareness of the broader context. ALeRCE processes data from the alert stream of the Zwicky Transient Facility (ZTF), so unless a specific catalog is mentioned, the data is f

## Step-by-step Prompt

Simple

In [5]:

model_name = 'gpt-4o-mini-2024-07-18'
promptV_gen = 'sbs_v4'
sql_gen_method = 'step-by-step'

model = load_sql_model(model_name,)
# Load the SQL generation model
sql_gen = load_sql_generator(
    model=model,
    promptV_gen=promptV_gen,
    sql_gen_method=sql_gen_method
)

print(sql_gen.get_prompt( difficulty_class='simple', tables_list=["table1", "table2"]))

2025-06-10 03:30:22,377 - INFO - Loading model gpt-4o-mini-2024-07-18 with temperature=0.0, max_tokens=4000, top_p=None
2025-06-10 03:30:22,378 - INFO - OPENAI_API_KEY found in environment variables.




# As a SQL expert with a willingness to assist users, you are tasked with crafting a PostgreSQL query for the Automatic Learning for the Rapid Classification of Events (ALeRCE) Database in 2023. This database serves as a repository for information about individual spatial objects, encompassing various statistics, properties, detections, and features observed by survey telescopes.
The tables within the database are categorized into three types: time and band independent (e.g., object, probability), time-independent (e.g., magstats), and time and band-dependent (e.g., detection). Your role involves carefully analyzing user requests, considering the specifics of the given tables. It is crucial to pay attention to explicit conditions outlined by the user and always maintain awareness of the broader context.
ALeRCE processes data from the alert stream of the Zwicky Transient Facility (ZTF), so unless a specific catalog is mentioned, the data is from ZTF, including the object, candidate an

In [9]:
new_simple_prompt = generate_prompt(sql_gen.get_prompt( difficulty_class='simple', tables_list=["table1", "table2"]))
print(new_simple_prompt)



2025-06-10 03:31:39,930 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


<reasoning>
- Reasoning: no
    - Identify: N/A
    - Conclusion: no
    - Ordering: N/A
- Structure: yes
- Examples: no
    - Representative: N/A
- Complexity: 4
    - Task: 4
    - Necessity: 
- Specificity: 4
- Prioritization: Reasoning, Examples, Clarity
- Conclusion: Add reasoning steps before conclusions, include examples, and clarify instructions.
</reasoning>

Craft a PostgreSQL query for the ALeRCE Database in 2023, focusing on user requests and table specifics.

Understand the context of the ALeRCE pipeline and the Q3C extension for spatial queries. Pay attention to explicit user conditions and maintain awareness of the broader context.

# Context

## ALeRCE Pipeline Details
- Stamp Classifier: All alerts related to new objects undergo stamp-based classification.
- Light Curve Classifier: A hierarchical random forest classifier with four models and 15 classes.
- Classifiers: 
  - 'lc_classifier_top': [Periodic, Stochastic, Transient]
  - 'lc_classifier_transient': [SNIa, SNIb

Medium

In [15]:
print("Step-by-step Prompt:")
print(sql_gen.get_prompt( difficulty_class='medium', tables_list=["table1", "table2"]).split("SQL Query Generation Prompt: ")[0])
print("SQL Generation Prompt:")
print(sql_gen.get_prompt( difficulty_class='medium', tables_list=["table1", "table2"]).split("SQL Query Generation Prompt: ")[1])


Step-by-step Prompt:


# Your task is to DECOMPOSE the user request into a series of steps required to generate a PostgreSQL query that will be used for retrieving requested information from the ALeRCE database.
For this, outline a detailed decomposition plan for its systematic resolution, describing and breaking down the problem into subtasks and/or subqueries. 
Be careful to put all the information and details needed in the description, like conditions, the table and column names, and the details of the database schema. This is very important to ensure the query is optimal and accurate.
Take in consideration the advices, conditions and names from "General Context" and details of the database, or the query will not be optimal.
# DON'T RETURN ANY SQL CODE, just the description of each step required to generate it.
Creating a decomposition plan to generate a PostgreSQL query for retrieving information from the ALeRCE astronomy broker database involves several steps. ALeRCE (Automatic Le

In [16]:
new_med_sbs_prompt = generate_prompt(sql_gen.get_prompt( difficulty_class='medium', tables_list=["table1", "table2"]).split("SQL Query Generation Prompt: ")[0])
print(new_med_sbs_prompt)



2025-06-10 03:39:32,150 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


<reasoning>
- Reasoning: yes 
    - Identify: Steps 1 to 5 utilize reasoning
    - Conclusion: no 
    - Ordering: before 
- Structure: yes 
- Examples: no 
    - Representative: 0 
- Complexity: 4 
    - Task: 4 
    - Necessity: 4 
- Specificity: 4 
- Prioritization: Reasoning, Specificity, Complexity
- Conclusion: Improve clarity by restructuring and adding examples to guide decomposition steps.
</reasoning>

Decompose the user request into a series of steps to generate a PostgreSQL query for retrieving requested information from the ALeRCE database. 

Outline a detailed decomposition plan for systematic resolution, breaking down the problem into subtasks and/or subqueries. Include all necessary information such as conditions, table and column names, and database schema details to ensure the query is optimal and accurate. Consider the advice, conditions, and names from the "General Context" and database details. Do not return any SQL code, only the description of each step required 

In [17]:
new_med_gen_prompt = generate_prompt(sql_gen.get_prompt( difficulty_class='medium', tables_list=["table1", "table2"]).split("SQL Query Generation Prompt: ")[1])
print(new_med_gen_prompt)



2025-06-10 03:40:45,950 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


<reasoning>
- Reasoning: yes Does the current prompt use reasoning, analysis, or chain of thought? 
    - Identify: decomposed planification section
    - Conclusion: no is the chain of thought used to determine a conclusion?
    - Ordering: before is the chain of thought located before or after 
- Structure: yes does the input prompt have a well defined structure
- Examples: no does the input prompt have few-shot examples
    - Representative: 0 if present, how representative are the examples?
- Complexity: 4 how complex is the input prompt?
    - Task: 4 how complex is the implied task?
    - Necessity: 
- Specificity: 4 how detailed and specific is the prompt? (not to be confused with length)
- Prioritization: Reasoning, Examples, Output Format
- Conclusion: The prompt requires examples for clarity and a clear output format. Improve reasoning and specify output format.
</reasoning>

As a SQL expert, craft a PostgreSQL query for the Automatic Learning for the Rapid Classification of 

Hard

In [14]:
print(sql_gen.get_prompt( difficulty_class='advanced', tables_list=["table1", "table2"]))




# Your task is to DECOMPOSE the user request into a series of steps required to generate a PostgreSQL query that will be used for retrieving requested information from the ALeRCE database.
For this, outline a detailed decomposition plan for its systematic resolution, describing and breaking down the problem into subtasks and/or subqueries. 
Be careful to put all the information and details needed in the description, like conditions, the table and column names, and the details of the database schema. This is very important to ensure the query is optimal and accurate.
Take in consideration the advices, conditions and names from "General Context" and details of the database, or the query will not be optimal.
The request is a very difficult and advanced query, so you will need to use JOINs, INTERSECTs and UNIONs statements, together with Nested queries. It is very important that you give every possible detail in each step, describing the statements and the nested-queries that are require