# Prompt Visualization

# Prompt Crafting

In [1]:
from utils.pipeline_utils import load_sql_generator
from utils.llm_utils import load_sql_model

In [14]:
from openai import OpenAI

client = OpenAI()

META_PROMPT = """
Given a current prompt and a change description, produce a detailed system prompt to guide a language model in completing the task effectively.

Your final output will be the full corrected prompt verbatim. However, before that, at the very beginning of your response, use <reasoning> tags to analyze the prompt and determine the following, explicitly:
<reasoning>
- Reasoning: (yes/no) Does the current prompt use reasoning, analysis, or chain of thought? 
    - Identify: (max 10 words) if so, which section(s) utilize reasoning?
    - Conclusion: (yes/no) is the chain of thought used to determine a conclusion?
    - Ordering: (before/after) is the chain of though located before or after 
- Structure: (yes/no) does the input prompt have a well defined structure
- Examples: (yes/no) does the input prompt have few-shot examples
    - Representative: (1-5) if present, how representative are the examples?
- Complexity: (1-5) how complex is the input prompt?
    - Task: (1-5) how complex is the implied task?
    - Necessity: ()
- Specificity: (1-5) how detailed and specific is the prompt? (not to be confused with length)
- Prioritization: (list) what 1-3 categories are the MOST important to address.
- Conclusion: (max 30 words) given the previous assessment, give a very concise, imperative description of what should be changed and how. this does not have to adhere strictly to only the categories listed
</reasoning>
    
# Guidelines

- Understand the Task: Grasp the main objective, goals, requirements, constraints, and expected output.
- Minimal Changes: If an existing prompt is provided, improve it only if it's simple. For complex prompts, enhance clarity and add missing elements without altering the original structure.
- Reasoning Before Conclusions**: Encourage reasoning steps before any conclusions are reached. ATTENTION! If the user provides examples where the reasoning happens afterward, REVERSE the order! NEVER START EXAMPLES WITH CONCLUSIONS!
    - Reasoning Order: Call out reasoning portions of the prompt and conclusion parts (specific fields by name). For each, determine the ORDER in which this is done, and whether it needs to be reversed.
    - Conclusion, classifications, or results should ALWAYS appear last.
- Examples: Include high-quality examples if helpful, using placeholders [in brackets] for complex elements.
   - What kinds of examples may need to be included, how many, and whether they are complex enough to benefit from placeholders.
- Clarity and Conciseness: Use clear, specific language. Avoid unnecessary instructions or bland statements.
- Formatting: Use markdown features for readability. DO NOT USE ``` CODE BLOCKS UNLESS SPECIFICALLY REQUESTED.
- Preserve User Content: If the input task or prompt includes extensive guidelines or examples, preserve them entirely, or as closely as possible. If they are vague, consider breaking down into sub-steps. Keep any details, guidelines, examples, variables, or placeholders provided by the user.
- Constants: DO include constants in the prompt, as they are not susceptible to prompt injection. Such as guides, rubrics, and examples.
- Output Format: Explicitly the most appropriate output format, in detail. This should include length and syntax (e.g. short sentence, paragraph, JSON, etc.)
    - For tasks outputting well-defined or structured data (classification, JSON, etc.) bias toward outputting a JSON.
    - JSON should never be wrapped in code blocks (```) unless explicitly requested.

The final prompt you output should adhere to the following structure below. Do not include any additional commentary, only output the completed system prompt. SPECIFICALLY, do not include any additional messages at the start or end of the prompt. (e.g. no "---")

[Concise instruction describing the task - this should be the first line in the prompt, no section header]

[Additional details as needed.]

[Optional sections with headings or bullet points for detailed steps.]

# Steps [optional]

[optional: a detailed breakdown of the steps necessary to accomplish the task]

# Output Format

[Specifically call out how the output should be formatted, be it response length, structure e.g. JSON, markdown, etc]

# Examples [optional]

[Optional: 1-3 well-defined examples with placeholders if necessary. Clearly mark where examples start and end, and what the input and output are. User placeholders as necessary.]
[If the examples are shorter than what a realistic example is expected to be, make a reference with () explaining how real examples should be longer / shorter / different. AND USE PLACEHOLDERS! ]

# Notes [optional]

[optional: edge cases, details, and an area to call or repeat out specific important considerations]
[NOTE: you must start with a <reasoning> section. the immediate next token you produce should be <reasoning>]
""".strip()

def generate_prompt(task_or_prompt: str):
    completion = client.chat.completions.create(
        model="gpt-4o",
        temperature=0.5,
        messages=[
            {
                "role": "system",
                "content": META_PROMPT,
            },
            {
                "role": "user",
                "content": "Current Prompt:\n" + task_or_prompt,
            },
        ],
    )

    return completion.choices[0].message.content

## Direct Prompt

In [4]:
model_name = 'gpt-4o-mini-2024-07-18'
promptV_gen = 'dir_v8'
sql_gen_method = 'direct'

model = load_sql_model(model_name,)
# Load the SQL generation model
sql_gen = load_sql_generator(
    model=model,
    promptV_gen=promptV_gen,
    sql_gen_method=sql_gen_method
)


2025-06-10 03:28:49,227 - INFO - Loading model gpt-4o-mini-2024-07-18 with temperature=0.0, max_tokens=4000, top_p=None


In [4]:
print(sql_gen.get_prompt(["table"]))



As a SQL expert with a willingness to assist users, you are tasked with crafting a PostgreSQL query for the Automatic Learning for the Rapid Classification of Events (ALeRCE) Database in 2023. This database serves as a repository for information about individual spatial objects, encompassing various statistics, properties, detections, and features observed by survey telescopes.
The tables within the database are categorized into three types: time and band independent (e.g., object, probability), time-independent (e.g., magstats), and time and band-dependent (e.g., detection). Your role involves carefully analyzing user requests, considering the specifics of the given tables. It is crucial to pay attention to explicit conditions outlined by the user and always maintain awareness of the broader context.
ALeRCE processes data from the alert stream of the Zwicky Transient Facility (ZTF), so unless a specific catalog is mentioned, the data is from ZTF, including the object, candidate and 

In [None]:


new_direct_prompt = generate_prompt(sql_gen.get_prompt(["table"]))
print(new_direct_prompt)



2025-06-09 06:01:53,023 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


<reasoning>
- Reasoning: yes, the prompt uses reasoning in the "Context" section.
    - Identify: Context section
    - Conclusion: yes, the chain of thought is used to determine a conclusion.
    - Ordering: before, the reasoning is located before the conclusion.
- Structure: yes, the input prompt has a well-defined structure.
- Examples: no, the input prompt does not have few-shot examples.
    - Representative: 0, as there are no examples present.
- Complexity: 4
    - Task: 4, the task involves crafting complex SQL queries with specific conditions.
    - Necessity: The complexity is necessary due to the detailed requirements and context.
- Specificity: 4, the prompt is detailed and specific about the task and context.
- Prioritization: Reasoning, Structure, Specificity
- Conclusion: Ensure reasoning precedes conclusions, add examples for clarity, and maintain structure.
</reasoning>

Craft a PostgreSQL query for the ALeRCE Database in 2023 based on user requests. Analyze user reque

In [12]:
print(new_direct_prompt)

<reasoning>
- Simple Change: yes
</reasoning>

As a SQL expert with a willingness to assist users, you are tasked with crafting a PostgreSQL query for the Automatic Learning for the Rapid Classification of Events (ALeRCE) Database in 2023. This database serves as a repository for information about individual spatial objects, encompassing various statistics, properties, detections, and features observed by survey telescopes. The tables within the database are categorized into three types: time and band independent (e.g., object, probability), time-independent (e.g., magstats), and time and band-dependent (e.g., detection). Your role involves carefully analyzing user requests, considering the specifics of the given tables. It is crucial to pay attention to explicit conditions outlined by the user and always maintain awareness of the broader context. ALeRCE processes data from the alert stream of the Zwicky Transient Facility (ZTF), so unless a specific catalog is mentioned, the data is f

## Step-by-step Prompt

Simple

In [5]:

model_name = 'gpt-4o-mini-2024-07-18'
promptV_gen = 'sbs_v4'
sql_gen_method = 'step-by-step'

model = load_sql_model(model_name,)
# Load the SQL generation model
sql_gen = load_sql_generator(
    model=model,
    promptV_gen=promptV_gen,
    sql_gen_method=sql_gen_method
)

print(sql_gen.get_prompt( difficulty_class='simple', tables_list=["table1", "table2"]))

2025-06-10 03:30:22,377 - INFO - Loading model gpt-4o-mini-2024-07-18 with temperature=0.0, max_tokens=4000, top_p=None
2025-06-10 03:30:22,378 - INFO - OPENAI_API_KEY found in environment variables.




# As a SQL expert with a willingness to assist users, you are tasked with crafting a PostgreSQL query for the Automatic Learning for the Rapid Classification of Events (ALeRCE) Database in 2023. This database serves as a repository for information about individual spatial objects, encompassing various statistics, properties, detections, and features observed by survey telescopes.
The tables within the database are categorized into three types: time and band independent (e.g., object, probability), time-independent (e.g., magstats), and time and band-dependent (e.g., detection). Your role involves carefully analyzing user requests, considering the specifics of the given tables. It is crucial to pay attention to explicit conditions outlined by the user and always maintain awareness of the broader context.
ALeRCE processes data from the alert stream of the Zwicky Transient Facility (ZTF), so unless a specific catalog is mentioned, the data is from ZTF, including the object, candidate an

In [9]:
new_simple_prompt = generate_prompt(sql_gen.get_prompt( difficulty_class='simple', tables_list=["table1", "table2"]))
print(new_simple_prompt)



2025-06-10 03:31:39,930 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


<reasoning>
- Reasoning: no
    - Identify: N/A
    - Conclusion: no
    - Ordering: N/A
- Structure: yes
- Examples: no
    - Representative: N/A
- Complexity: 4
    - Task: 4
    - Necessity: 
- Specificity: 4
- Prioritization: Reasoning, Examples, Clarity
- Conclusion: Add reasoning steps before conclusions, include examples, and clarify instructions.
</reasoning>

Craft a PostgreSQL query for the ALeRCE Database in 2023, focusing on user requests and table specifics.

Understand the context of the ALeRCE pipeline and the Q3C extension for spatial queries. Pay attention to explicit user conditions and maintain awareness of the broader context.

# Context

## ALeRCE Pipeline Details
- Stamp Classifier: All alerts related to new objects undergo stamp-based classification.
- Light Curve Classifier: A hierarchical random forest classifier with four models and 15 classes.
- Classifiers: 
  - 'lc_classifier_top': [Periodic, Stochastic, Transient]
  - 'lc_classifier_transient': [SNIa, SNIb

Medium

In [15]:
print("Step-by-step Prompt:")
print(sql_gen.get_prompt( difficulty_class='medium', tables_list=["table1", "table2"]).split("SQL Query Generation Prompt: ")[0])
print("SQL Generation Prompt:")
print(sql_gen.get_prompt( difficulty_class='medium', tables_list=["table1", "table2"]).split("SQL Query Generation Prompt: ")[1])


Step-by-step Prompt:


# Your task is to DECOMPOSE the user request into a series of steps required to generate a PostgreSQL query that will be used for retrieving requested information from the ALeRCE database.
For this, outline a detailed decomposition plan for its systematic resolution, describing and breaking down the problem into subtasks and/or subqueries. 
Be careful to put all the information and details needed in the description, like conditions, the table and column names, and the details of the database schema. This is very important to ensure the query is optimal and accurate.
Take in consideration the advices, conditions and names from "General Context" and details of the database, or the query will not be optimal.
# DON'T RETURN ANY SQL CODE, just the description of each step required to generate it.
Creating a decomposition plan to generate a PostgreSQL query for retrieving information from the ALeRCE astronomy broker database involves several steps. ALeRCE (Automatic Le

In [16]:
new_med_sbs_prompt = generate_prompt(sql_gen.get_prompt( difficulty_class='medium', tables_list=["table1", "table2"]).split("SQL Query Generation Prompt: ")[0])
print(new_med_sbs_prompt)



2025-06-10 03:39:32,150 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


<reasoning>
- Reasoning: yes 
    - Identify: Steps 1 to 5 utilize reasoning
    - Conclusion: no 
    - Ordering: before 
- Structure: yes 
- Examples: no 
    - Representative: 0 
- Complexity: 4 
    - Task: 4 
    - Necessity: 4 
- Specificity: 4 
- Prioritization: Reasoning, Specificity, Complexity
- Conclusion: Improve clarity by restructuring and adding examples to guide decomposition steps.
</reasoning>

Decompose the user request into a series of steps to generate a PostgreSQL query for retrieving requested information from the ALeRCE database. 

Outline a detailed decomposition plan for systematic resolution, breaking down the problem into subtasks and/or subqueries. Include all necessary information such as conditions, table and column names, and database schema details to ensure the query is optimal and accurate. Consider the advice, conditions, and names from the "General Context" and database details. Do not return any SQL code, only the description of each step required 

In [17]:
new_med_gen_prompt = generate_prompt(sql_gen.get_prompt( difficulty_class='medium', tables_list=["table1", "table2"]).split("SQL Query Generation Prompt: ")[1])
print(new_med_gen_prompt)



2025-06-10 03:40:45,950 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


<reasoning>
- Reasoning: yes Does the current prompt use reasoning, analysis, or chain of thought? 
    - Identify: decomposed planification section
    - Conclusion: no is the chain of thought used to determine a conclusion?
    - Ordering: before is the chain of thought located before or after 
- Structure: yes does the input prompt have a well defined structure
- Examples: no does the input prompt have few-shot examples
    - Representative: 0 if present, how representative are the examples?
- Complexity: 4 how complex is the input prompt?
    - Task: 4 how complex is the implied task?
    - Necessity: 
- Specificity: 4 how detailed and specific is the prompt? (not to be confused with length)
- Prioritization: Reasoning, Examples, Output Format
- Conclusion: The prompt requires examples for clarity and a clear output format. Improve reasoning and specify output format.
</reasoning>

As a SQL expert, craft a PostgreSQL query for the Automatic Learning for the Rapid Classification of 

Hard

In [14]:
print(sql_gen.get_prompt( difficulty_class='advanced', tables_list=["table1", "table2"]))




# Your task is to DECOMPOSE the user request into a series of steps required to generate a PostgreSQL query that will be used for retrieving requested information from the ALeRCE database.
For this, outline a detailed decomposition plan for its systematic resolution, describing and breaking down the problem into subtasks and/or subqueries. 
Be careful to put all the information and details needed in the description, like conditions, the table and column names, and the details of the database schema. This is very important to ensure the query is optimal and accurate.
Take in consideration the advices, conditions and names from "General Context" and details of the database, or the query will not be optimal.
The request is a very difficult and advanced query, so you will need to use JOINs, INTERSECTs and UNIONs statements, together with Nested queries. It is very important that you give every possible detail in each step, describing the statements and the nested-queries that are require

# Schema Linking

In [None]:
from openai import OpenAI
OPENAI_APIKEY = "your_api_key_here"
# set API key
client = OpenAI(api_key=OPENAI_APIKEY)


In [15]:
!conda list openai
!conda list httpx

# packages in environment at /home/jespejo/miniconda3/envs/llm:
#
# Name                    Version                   Build  Channel
openai                    1.97.1                   pypi_0    pypi
# packages in environment at /home/jespejo/miniconda3/envs/llm:
#
# Name                    Version                   Build  Channel
httpx                     0.28.1                   pypi_0    pypi


In [14]:
!pip install openai==1.97.1
!pip install httpx==0.28.1

Collecting openai==1.97.1
  Using cached openai-1.97.1-py3-none-any.whl.metadata (29 kB)
Using cached openai-1.97.1-py3-none-any.whl (764 kB)
Installing collected packages: openai
  Attempting uninstall: openai
    Found existing installation: openai 1.55.0
    Uninstalling openai-1.55.0:
      Successfully uninstalled openai-1.55.0
Successfully installed openai-1.97.1
Collecting httpx==0.28.1
  Using cached httpx-0.28.1-py3-none-any.whl.metadata (7.1 kB)
Using cached httpx-0.28.1-py3-none-any.whl (73 kB)
Installing collected packages: httpx
  Attempting uninstall: httpx
    Found existing installation: httpx 0.27.2
    Uninstalling httpx-0.27.2:
      Successfully uninstalled httpx-0.27.2
Successfully installed httpx-0.28.1


In [47]:

def pred_tables(schema_prompt: str, query: str, prompting: str):
    if prompting == 'user':
        completion = client.chat.completions.create(
            model="gpt-4o-2024-11-20",
            temperature=0.0,
            max_tokens=1000,
            messages=[
                # {
                #     "role": "system",
                #     "content": schema_prompt,
                # },
                {
                    "role": "user",
                    "content": schema_prompt + \
                f"\nThe user request is the following: {query}",
                },
            ],
        )
    elif prompting == 'system':
        completion = client.chat.completions.create(
                model="gpt-4o-2024-11-20",
                temperature=0.0,
                max_tokens=1000,
                messages=[
                    {
                        "role": "system",
                        "content": schema_prompt,
                    },
                    {
                        "role": "user",
                        "content": query,
                    },
                ],
            )
    elif prompting == 'both':
        completion = client.chat.completions.create(
            model="gpt-4o-2024-11-20",
            temperature=0.0,
            messages=[
                {
                        "role": "system",
                        "content": schema_prompt + \
                    f"\nThe user request is the following: {query}",
                },
                {
                        "role": "user",
                        "content": query,
                    },
                ],
        )
    else:
        raise ValueError("Invalid prompting type")

    return completion.choices[0].message.content

In [2]:
from prompts.SchemaLinkingPrompts import sl_final_instructions_v2, prompt_schema_linking_v0
from prompts.DBSchemaPrompts import alerce_tables_desc

schema_linking_prompt = prompt_schema_linking_v0.format(db_schema=alerce_tables_desc, final_instructions=sl_final_instructions_v2)
print(schema_linking_prompt)


# Given the user request, select the tables needed to generate a SQL query
# The Database has the following tables:

TABLE "object": contains basic filter and time–aggregated statistics such as location, number of observations, and the times of first and last detection.
TABLE "probability": classification probabilities associated to a given object, classifier, and class. Contain the object classification probabilities, including those from the stamp and light curve classifiers, and from different versions of these classifiers.
TABLE "feature": contains the object light curve statistics and other features used for ML classification and which are stored as json files in our database.
TABLE "magstat": contains time–aggregated statistics separated by filter, such as the average magnitude, the initial magnitude change rate, number of detections, etc.
TABLE "non_detection": contains the limiting magnitudes of previous non–detections separated by filter.
TABLE "detection": contains the objec

In [3]:
schema_linking_prompt

'\n# Given the user request, select the tables needed to generate a SQL query\n# The Database has the following tables:\n\nTABLE "object": contains basic filter and time–aggregated statistics such as location, number of observations, and the times of first and last detection.\nTABLE "probability": classification probabilities associated to a given object, classifier, and class. Contain the object classification probabilities, including those from the stamp and light curve classifiers, and from different versions of these classifiers.\nTABLE "feature": contains the object light curve statistics and other features used for ML classification and which are stored as json files in our database.\nTABLE "magstat": contains time–aggregated statistics separated by filter, such as the average magnitude, the initial magnitude change rate, number of detections, etc.\nTABLE "non_detection": contains the limiting magnitudes of previous non–detections separated by filter.\nTABLE "detection": contains

In [7]:
tables_linking_prompt_V2 = '''
# Given the user request, select the tables needed to generate a SQL query
# The Database has the following tables:
TABLE "object": contains basic filter and time-aggregated statistics such as location, number of observations, and the times of first and last detection.
TABLE "probability": classification probabilities associated to a given object, classifier, and class. Contain the object classification probabilities, including those from the stamp and light curve classifiers, and from different versions of these classifiers.
TABLE "feature": contains the object light curve statistics and other features used for ML classification and which are stored as json files in our database.
TABLE "magstat": contains time-aggregated statistics separated by filter, such as the average magnitude, the initial magnitude change rate, number of detections, etc.
TABLE "non_detection": contains the limiting magnitudes of previous non-detections separated by filter.
TABLE "detection": contains the object light curves including their difference and corrected magnitudes and associated errors separated by filter.
TABLE "step": contains the different pipeline steps and their versions.
TABLE "taxonomy": contains details about the different taxonomies used in our stamp and light curve classifiers, which can evolve with time.
TABLE "feature_version": contains the version of the feature extraction and preprocessing steps used to generate the features.
TABLE "xmatch": contains the object cross-matches and associated cross-match catalogs.
TABLE "allwise": contains the AllWISE catalog information for the objects.
TABLE "dataquality": detailed object information regarding the quality of the data
TABLE "gaia_ztf": GAIA objects near detected ZTF objects
TABLE "ss_ztf": known solar system objects near detected objects
TABLE "ps1_ztf": PanSTARRS objects near detected ZTF objects
TABLE "reference": properties of the reference images used to build templates
TABLE "pipeline": information about the different pipeline steps and their versions
TABLE "information_schema.tables": information about the database tables and columns
TABLE "forced_photometry": contains the forced photometry measurements for each object, including the object position, magnitude, and associated errors, and the photometry of the reference image.

# Give ONLY the TABLES that are needed to generate the SQL query, nothing more
# Give the answer in the following format: ['table1', 'table2', ...]
# For example, if the TABLES needed for the user request are TABLE object and TABLE taxonomy, then you should type: ['object', 'taxonomy']
# Remember to use the exact name of the TABLES, as they are written in the DATABASE SCHEMA
# Just give the tables and ignore any other task given in the request given as "request".
'''

In [5]:
# Check difference between prompts
import difflib

d = difflib.Differ()
diff = list(d.compare(schema_linking_prompt.strip().splitlines(), tables_linking_prompt_V2.strip().splitlines()))

for line in diff:
    print("Line:", line)

Line:   # Given the user request, select the tables needed to generate a SQL query
Line:   # The Database has the following tables:
Line: - 
Line: - TABLE "object": contains basic filter and time–aggregated statistics such as location, number of observations, and the times of first and last detection.
Line: ?                                               ^

Line: + TABLE "object": contains basic filter and time-aggregated statistics such as location, number of observations, and the times of first and last detection.
Line: ?                                               ^

Line:   TABLE "probability": classification probabilities associated to a given object, classifier, and class. Contain the object classification probabilities, including those from the stamp and light curve classifiers, and from different versions of these classifiers.
Line:   TABLE "feature": contains the object light curve statistics and other features used for ML classification and which are stored as json files in

In [23]:
# query 14
query='''For the next list of oids: ['ZTF23aavzgjg' 'ZTF23aaynzyk' 'ZTF23aavqxos' 'ZTF23aaknyni'\n 'ZTF23aavsdtc' 'ZTF18aandkua' 'ZTF23aaxfewt' 'ZTF23aavshwi'\n 'ZTF22aawasao' 'ZTF23aaxgvnt'], return the unique object identifier, candidate identifier, filter identifier, modified julian date, magnitud, magnitud error, whether the object has stamps, deep learning real bogus score, the star galaxy score of the nearest object, and the distance to the nearest source in panstarrs for objects that have a deep learning real bogus score greater than 0.5 and that either have a star galaxy score less than 0.5 or a distance to the nearest panstarrs source smaller than 1 arcsec.'''

In [24]:
print("First try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))
print("Second try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))
print("Third try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))

First try:  ['detection', 'ps1_ztf']
Second try:  ['detection', 'ps1_ztf']
Third try:  ['detection', 'ps1_ztf']


In [25]:
print("First try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))
print("Second try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))
print("Third try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))


First try:  ['object', 'detection', 'probability', 'ps1_ztf']


Second try:  ['object', 'detection', 'probability', 'ps1_ztf']
Third try:  ['object', 'detection', 'probability', 'ps1_ztf']


In [26]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))


First try:  ['object', 'detection', 'probability', 'ps1_ztf']
Second try:  ['object', 'detection', 'probability', 'ps1_ztf']
Third try:  ['object', 'detection', 'probability', 'ps1_ztf']


In [27]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))


First try:  ['object', 'detection', 'probability', 'ps1_ztf']
Second try:  ['object', 'detection', 'probability', 'ps1_ztf']
Third try:  ['object', 'detection', 'probability', 'ps1_ztf']


In [28]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))


First try:  ['object', 'detection', 'probability', 'ps1_ztf']
Second try:  ['object', 'detection', 'probability', 'ps1_ztf']
Third try:  ['object', 'detection', 'probability', 'ps1_ztf']


In [29]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both'))


First try:  ['detection', 'probability', 'ps1_ztf']
Second try:  ['detection', 'ps1_ztf']
Third try:  ['detection', 'probability', 'ps1_ztf']


In [20]:
# Same final instructions
tables_linking_prompt_same = '''
# Given the user request, select the tables needed to generate a SQL query
# The Database has the following tables:
TABLE "object": contains basic filter and time-aggregated statistics such as location, number of observations, and the times of first and last detection.
TABLE "probability": classification probabilities associated to a given object, classifier, and class. Contain the object classification probabilities, including those from the stamp and light curve classifiers, and from different versions of these classifiers.
TABLE "feature": contains the object light curve statistics and other features used for ML classification and which are stored as json files in our database.
TABLE "magstat": contains time-aggregated statistics separated by filter, such as the average magnitude, the initial magnitude change rate, number of detections, etc.
TABLE "non_detection": contains the limiting magnitudes of previous non-detections separated by filter.
TABLE "detection": contains the object light curves including their difference and corrected magnitudes and associated errors separated by filter.
TABLE "step": contains the different pipeline steps and their versions.
TABLE "taxonomy": contains details about the different taxonomies used in our stamp and light curve classifiers, which can evolve with time.
TABLE "feature_version": contains the version of the feature extraction and preprocessing steps used to generate the features.
TABLE "xmatch": contains the object cross-matches and associated cross-match catalogs.
TABLE "allwise": contains the AllWISE catalog information for the objects.
TABLE "dataquality": detailed object information regarding the quality of the data
TABLE "gaia_ztf": GAIA objects near detected ZTF objects
TABLE "ss_ztf": known solar system objects near detected objects
TABLE "ps1_ztf": PanSTARRS objects near detected ZTF objects
TABLE "reference": properties of the reference images used to build templates
TABLE "pipeline": information about the different pipeline steps and their versions
TABLE "information_schema.tables": information about the database tables and columns
TABLE "forced_photometry": contains the forced photometry measurements for each object, including the object position, magnitude, and associated errors, and the photometry of the reference image.

# Give ONLY the TABLES that are needed to generate the SQL query.
# Give the answer in the following format: ['table1', 'table2', 'table3', ...]. For example, if the TABLES needed for the user request are TABLE object and TABLE taxonomy, then you should type: ['object', 'taxonomy']
# Just give the tables and ignore any other task given in the request given as "request".
# Remember to use the exact name of the TABLES, as they are written in the DATABASE SCHEMA. Do NOT create table names.
# If you think that no table mentioned above is needed, then type: ""
'''

In [30]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_same, query= query, prompting= 'system'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_same, query= query, prompting= 'system'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_same, query= query, prompting= 'system'))

First try:  ['object', 'detection', 'probability', 'ps1_ztf']
Second try:  ['object', 'detection', 'probability', 'ps1_ztf']
Third try:  ['object', 'detection', 'probability', 'ps1_ztf']


In [31]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_same, query= query, prompting= 'user'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_same, query= query, prompting= 'user'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_same, query= query, prompting= 'user'))

First try:  ['object', 'detection', 'probability', 'ps1_ztf']
Second try:  ['detection', 'object', 'probability', 'ps1_ztf']
Third try:  ['detection', 'object', 'probability', 'ps1_ztf']


In [53]:
# query 19
query='''Write a script in PostgreSQL that return objects which appeared between march 1st and april 1st of 2021, which have at most one detection, and which are classified as asteroids by the stamp classifier with a probability greater than 0.7. The query must return the columns oid, meanra, meandec, ndet, firstMJD, class name and probability.'''

In [54]:
print("First try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))
print("Second try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))
print("Third try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))

First try:  ['object', 'probability']
Second try:  ['object', 'detection', 'probability', 'taxonomy']
Third try:  ['object', 'probability']


In [55]:
print("First try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))
print("Second try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))
print("Third try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))


First try:  ['object', 'probability']
Second try:  ['object', 'probability']
Third try:  ['object', 'probability']


In [56]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))


First try:  ['object', 'probability']
Second try:  ['object', 'probability']
Third try:  ['object', 'probability']


In [57]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))


First try:  ['object', 'probability']
Second try:  ['object', 'probability']
Third try:  ['object', 'probability']


In [48]:
# query 12
query='''Give me all the SNe that first occurred between february 13 and september 10 and that are within the polygon defined by the following coordinates '((-20, -20), (-2, -20), (20, 1), (10, 10))'::polygon  . Return the oids, the mean ra and dec.'''

In [49]:
print("First try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))
print("Second try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))
print("Third try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))

First try:  ['object']
Second try:  ['object']
Third try:  ['object']


In [50]:
print("First try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))
print("Second try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))
print("Third try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))


First try:  ['object']
Second try:  ['object']
Third try:  ['object']


In [51]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))


First try:  ['object']
Second try:  ['object']
Third try:  ['object']


In [52]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))


First try:  ['object']
Second try:  ['object']
Third try:  ['object']


In [8]:
# query 104
query='''Find at most 30 ZTF objects that have a probability larger than 0.9 of being an asteroid in the stamp classifier version 'stamp_classifier_1.0.4'. Return the following columns: identifier of the ZTF and candidate, distance from the nearest Solar System object, MPC archive magnitude and name. Include also the following columns related to each candidate in the output table: filter identifier, FWHM from SExtractor, number of PS1 calibrators used, and exposure time'''

In [9]:
print("First try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))
print("Second try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))
print("Third try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))

First try:  ['probability', 'ss_ztf', 'detection']
Second try:  ['probability', 'ss_ztf', 'detection']
Third try:  ['probability', 'ss_ztf', 'detection']


In [10]:
print("First try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))
print("Second try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))
print("Third try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))


First try:  ['object', 'probability', 'ss_ztf', 'detection']
Second try:  ['object', 'probability', 'ss_ztf', 'detection']
Third try:  ['object', 'probability', 'ss_ztf', 'detection']


In [11]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))


First try:  ['object', 'probability', 'ss_ztf', 'detection']
Second try:  ['object', 'probability', 'ss_ztf', 'detection']
Third try:  ['object', 'probability', 'ss_ztf', 'detection']


In [12]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))


First try:  ['object', 'probability', 'ss_ztf', 'detection']
Second try:  ['object', 'probability', 'ss_ztf', 'detection']
Third try:  ['object', 'probability', 'ss_ztf', 'detection']


In [13]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_same, query= query, prompting= 'system'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_same, query= query, prompting= 'system'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_same, query= query, prompting= 'system'))

First try:  ['probability', 'ss_ztf', 'detection']
Second try:  ['probability', 'ss_ztf', 'detection']
Third try:  ['probability', 'ss_ztf', 'detection']


In [14]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_same, query= query, prompting= 'user'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_same, query= query, prompting= 'user'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_same, query= query, prompting= 'user'))

First try:  ['object', 'probability', 'ss_ztf', 'detection']
Second try:  ['object', 'probability', 'ss_ztf', 'detection']
Third try:  ['object', 'probability', 'ss_ztf', 'detection']


In [None]:
# query 90
query='''Find at most 100 ZTF objects that have a multiband period lower than 5 days in the 'lc_classifier_1.2.1-P' feature version. Return all columns from the 'probability' table for such objects, including only data for the light curve classifier, with rankings either 1 or 2'''

In [66]:
print("First try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))
print("Second try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))
print("Third try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))

First try:  ['probability', 'feature_version']
Second try:  ['probability', 'feature_version']
Third try:  ['probability', 'feature', 'feature_version']


In [67]:
print("First try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))
print("Second try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))
print("Third try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))


First try:  ['feature', 'feature_version', 'probability']
Second try:  ['feature', 'feature_version', 'probability']
Third try:  ['feature', 'feature_version', 'probability']


In [75]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))


First try:  ['probability', 'feature', 'feature_version']
Second try:  ['feature', 'feature_version', 'probability']
Third try:  ['feature', 'feature_version', 'probability']


In [74]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))


First try:  ['feature', 'feature_version', 'probability']
Second try:  ['object', 'feature', 'feature_version', 'probability']
Third try:  ['object', 'feature', 'feature_version', 'probability']


In [71]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both'))


First try:  ['object', 'feature', 'feature_version', 'probability']
Second try:  ['object', 'feature', 'feature_version', 'probability']
Third try:  ['object', 'feature', 'feature_version', 'probability']


In [72]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_same, query= query, prompting= 'system'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_same, query= query, prompting= 'system'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_same, query= query, prompting= 'system'))

First try:  ['probability', 'feature', 'feature_version']
Second try:  ['probability', 'feature', 'feature_version']
Third try:  ['probability', 'feature', 'feature_version']


In [73]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_same, query= query, prompting= 'user'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_same, query= query, prompting= 'user'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_same, query= query, prompting= 'user'))

First try:  ['feature', 'feature_version', 'probability']
Second try:  ['feature', 'feature_version', 'probability']
Third try:  ['feature', 'feature_version', 'probability']


In [78]:
# query 34
query='''Given this list of oids ['ZTF17aaadpsi' 'ZTF19aaduncs' 'ZTF18abnvehl' 'ZTF19abrqsxy' 'ZTF19aaduodl' 'ZTF19aadovdv' 'ZTF18aammkke' 'ZTF18abtriul' 'ZTF17aabwtky' 'ZTF18abwjpfy'], write a PostgresSQL script that returns the information of the features 'Amplitude' or 'Multiband_period' associated with the objects in the list.'''

In [79]:
print("First try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))
print("Second try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))
print("Third try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))

First try:  ['feature']
Second try:  ['feature']
Third try:  ['feature']


In [80]:
print("First try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))
print("Second try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))
print("Third try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))


First try:  ['feature']
Second try:  ['feature']
Third try:  ['feature']


In [81]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))


First try:  ['feature']
Second try:  ['feature']
Third try:  ['object', 'feature']


In [82]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))


First try:  ['object', 'feature']
Second try:  ['object', 'feature']
Third try:  ['object', 'feature']


In [86]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both'))


First try:  ['feature']
Second try:  ['feature']
Third try:  ['feature']


In [83]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_same, query= query, prompting= 'system'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_same, query= query, prompting= 'system'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_same, query= query, prompting= 'system'))

First try:  ['feature']
Second try:  ['feature']
Third try:  ['feature']


In [84]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_same, query= query, prompting= 'user'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_same, query= query, prompting= 'user'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_same, query= query, prompting= 'user'))

First try:  ['feature']
Second try:  ['feature']
Third try:  ['feature']


In [33]:
# query 27
query='''Return the oids, meanra, meandec, ndet, firstmjd, deltajd, g_r_max, the classifier and class name, the ranking and the probability columns for each class of each object classified by the lc_classifier, with 100 or more detections that are most likely to be cepheid, with a probability larger than 0.76. '''

In [41]:
print("First try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))
print("Second try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))
print("Third try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))

First try:  ['object', 'probability', 'taxonomy']
Second try:  ['object', 'probability', 'taxonomy']
Third try:  ['object', 'probability', 'taxonomy']


In [35]:
print("First try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))
print("Second try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))
print("Third try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))


First try:  ['object', 'probability']
Second try:  ['object', 'probability']
Third try:  ['object', 'probability']


In [36]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))


First try:  ['object', 'probability', 'taxonomy']
Second try:  ['object', 'probability', 'taxonomy']
Third try:  ['object', 'probability', 'taxonomy']


In [37]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))


First try:  ['object', 'probability']
Second try:  ['object', 'probability']
Third try:  ['object', 'probability']


In [53]:
# query 67
query='''For objects with ZTF identifiers 'ZTF18acxlskz', 'ZTF22aanppbi' and 'ZTF22abunrft', find all rows in the 'gaia_ztf' table where the nearest Gaia source lies within 1.5 arcsec of the ZTF object detection'''

In [54]:
print("First try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))
print("Second try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))
print("Third try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))

First try:  ['gaia_ztf']
Second try:  ['gaia_ztf']
Third try:  ['gaia_ztf']


In [55]:
print("First try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))
print("Second try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))
print("Third try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))


First try:  ['gaia_ztf']
Second try:  ['gaia_ztf']
Third try:  ['gaia_ztf']


In [56]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))


First try:  ['gaia_ztf']
Second try:  ['gaia_ztf']
Third try:  ['gaia_ztf']


In [57]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))


First try:  ['gaia_ztf']
Second try:  ['gaia_ztf']
Third try:  ['gaia_ztf']


In [58]:
print("First try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'both'))
print("Second try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'both'))
print("Third try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'both'))


First try:  ['gaia_ztf']
Second try:  ['gaia_ztf']
Third try:  ['gaia_ztf']


In [59]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both'))


First try:  ['object', 'gaia_ztf']
Second try:  ['object', 'gaia_ztf']
Third try:  ['object', 'gaia_ztf']


In [91]:
# query 83
query='''Get all columns in the 'allwise' table for the ZTF object 'ZTF21aazqwxv' '''

In [92]:
print("First try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))
print("Second try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))
print("Third try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))

First try:  ['allwise']
Second try:  ['allwise']
Third try:  ['allwise']


In [93]:
print("First try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))
print("Second try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))
print("Third try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))


First try:  ['allwise']
Second try:  ['allwise']
Third try:  ['allwise']


In [94]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))


First try:  ['allwise']
Second try:  ['allwise']
Third try:  ['allwise']


In [95]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))


First try:  ['allwise']
Second try:  ['allwise']
Third try:  ['allwise']


In [96]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both'))


First try:  ['allwise']
Second try:  ['allwise']
Third try:  ['allwise']


In [None]:
# query 84
query='''For the ZTF object 'ZTF19aascdol' get the following information about its ALLWISE match(es): identifier in ZTF and in the catalog, distance between counterparts, and magnitudes in filters WISE W1 to W4'''

In [None]:
print("First try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))
print("Second try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))
print("Third try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))

First try:  ['allwise']
Second try:  ['allwise']
Third try:  ['allwise']


In [None]:
print("First try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))
print("Second try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))
print("Third try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))


First try:  ['object', 'allwise']
Second try:  ['object', 'allwise']
Third try:  ['object', 'allwise']


In [None]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))


First try:  ['object', 'allwise']
Second try:  ['object', 'allwise']
Third try:  ['object', 'allwise']


In [None]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))


First try:  ['object', 'allwise']
Second try:  ['object', 'allwise']
Third try:  ['object', 'allwise']


In [None]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both'))


First try:  ['object', 'allwise']
Second try:  ['object', 'allwise']
Third try:  ['object', 'allwise']


### simon openai version

In [1]:
!conda list openai
!conda list httpx

# packages in environment at /home/jespejo/miniconda3/envs/llm:
#
# Name                    Version                   Build  Channel
openai                    1.55.0                   pypi_0    pypi
# packages in environment at /home/jespejo/miniconda3/envs/llm:
#
# Name                    Version                   Build  Channel
httpx                     0.27.2                   pypi_0    pypi


In [5]:
!pip install openai==1.55.0
!pip install httpx==0.27.2

Collecting httpx==0.27.2
  Using cached httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Using cached httpx-0.27.2-py3-none-any.whl (76 kB)
Installing collected packages: httpx
  Attempting uninstall: httpx
    Found existing installation: httpx 0.28.1
    Uninstalling httpx-0.28.1:
      Successfully uninstalled httpx-0.28.1
Successfully installed httpx-0.27.2


In [None]:
from openai import OpenAI
OPENAI_APIKEY = "your_api_key_here"
# set API key
client = OpenAI(api_key=OPENAI_APIKEY)


In [3]:

def pred_tables(schema_prompt: str, query: str, prompting: str):
    if prompting == 'user':
        completion = client.chat.completions.create(
            model="gpt-4o-2024-11-20",
            temperature=0.0,
            max_tokens=1000,
            messages=[
                # {
                #     "role": "system",
                #     "content": schema_prompt,
                # },
                {
                    "role": "user",
                    "content": schema_prompt + \
                f"\nThe user request is the following: {query}",
                },
            ],
        )
    elif prompting == 'system':
        completion = client.chat.completions.create(
                model="gpt-4o-2024-11-20",
                temperature=0.0,
                max_tokens=1000,
                messages=[
                    {
                        "role": "system",
                        "content": schema_prompt,
                    },
                    {
                        "role": "user",
                        "content": query,
                    },
                ],
            )
    elif prompting == 'both':
        completion = client.chat.completions.create(
            model="gpt-4o-2024-11-20",
            temperature=0.0,
            messages=[
                {
                        "role": "system",
                        "content": schema_prompt + \
                    f"\nThe user request is the following: {query}",
                },
                {
                        "role": "user",
                        "content": query,
                    },
                ],
        )
    else:
        raise ValueError("Invalid prompting type")

    return completion.choices[0].message.content

In [4]:
# query 67
query='''For objects with ZTF identifiers 'ZTF18acxlskz', 'ZTF22aanppbi' and 'ZTF22abunrft', find all rows in the 'gaia_ztf' table where the nearest Gaia source lies within 1.5 arcsec of the ZTF object detection'''

In [8]:
print("First try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))
print("Second try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))
print("Third try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system'))

First try:  ['gaia_ztf']
Second try:  ['gaia_ztf']
Third try:  ['gaia_ztf']


In [9]:
print("First try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))
print("Second try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))
print("Third try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user'))


First try:  ['gaia_ztf']
Second try:  ['gaia_ztf']
Third try:  ['gaia_ztf']


In [10]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system'))


First try:  ['gaia_ztf']
Second try:  ['gaia_ztf']
Third try:  ['gaia_ztf']


In [11]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user'))


First try:  ['gaia_ztf']
Second try:  ['gaia_ztf']
Third try:  ['gaia_ztf']


In [12]:
print("First try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'both'))
print("Second try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'both'))
print("Third try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'both'))


First try:  ['gaia_ztf']
Second try:  ['gaia_ztf']
Third try:  ['gaia_ztf']


In [13]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both'))


First try:  ['object', 'gaia_ztf']
Second try:  ['object', 'gaia_ztf']
Third try:  ['object', 'gaia_ztf']


### gpt-4.1-2025-04-14

In [54]:

def pred_tables(schema_prompt: str, query: str, prompting: str, model: str = "gpt-4o-2024-11-20"):
    if prompting == 'user':
        completion = client.chat.completions.create(
            model=model,
            temperature=0.0,
            max_tokens=4000,
            messages=[
                # {
                #     "role": "system",
                #     "content": schema_prompt,
                # },
                {
                    "role": "user",
                    "content": schema_prompt + \
                f"\nThe user request is the following: {query}",
                },
            ],
        )
    elif prompting == 'system':
        completion = client.chat.completions.create(
                model=model,
                temperature=0.0,
                max_tokens=4000,
                messages=[
                    {
                        "role": "system",
                        "content": schema_prompt,
                    },
                    {
                        "role": "user",
                        "content": query,
                    },
                ],
            )
    elif prompting == 'both':
        completion = client.chat.completions.create(
            model=model,
            temperature=0.0,
            messages=[
                {
                        "role": "system",
                        "content": schema_prompt + \
                    f"\nThe user request is the following: {query}",
                },
                {
                        "role": "user",
                        "content": query,
                    },
                ],
        )
    else:
        raise ValueError("Invalid prompting type")

    return completion.choices[0].message.content

In [55]:
# query 34
query='''Given this list of oids ['ZTF17aaadpsi' 'ZTF19aaduncs' 'ZTF18abnvehl' 'ZTF19abrqsxy' 'ZTF19aaduodl' 'ZTF19aadovdv' 'ZTF18aammkke' 'ZTF18abtriul' 'ZTF17aabwtky' 'ZTF18abwjpfy'], write a PostgresSQL script that returns the information of the features 'Amplitude' or 'Multiband_period' associated with the objects in the list.'''

In [56]:
print("First try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system', model='gpt-4.1-2025-04-14'))
print("Second try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system', model='gpt-4.1-2025-04-14'))
print("Third try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system', model='gpt-4.1-2025-04-14'))

First try:  ['feature']
Second try:  ['feature']
Third try:  ['feature']


In [57]:
print("First try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user', model='gpt-4.1-2025-04-14'))
print("Second try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user', model='gpt-4.1-2025-04-14'))
print("Third try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user', model='gpt-4.1-2025-04-14'))


First try:  ['feature']
Second try:  ['feature']
Third try:  ['feature']


In [58]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system', model='gpt-4.1-2025-04-14'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system', model='gpt-4.1-2025-04-14'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system', model='gpt-4.1-2025-04-14'))


First try:  ['feature']
Second try:  ['feature']
Third try:  ['feature']


In [59]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user', model='gpt-4.1-2025-04-14'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user', model='gpt-4.1-2025-04-14'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user', model='gpt-4.1-2025-04-14'))


First try:  ['feature']
Second try:  ['feature']
Third try:  ['feature']


In [60]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both', model='gpt-4.1-2025-04-14'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both', model='gpt-4.1-2025-04-14'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both', model='gpt-4.1-2025-04-14'))


First try:  ['feature']
Second try:  ['feature']
Third try:  ['feature']


In [28]:
# query 27
query='''Return the oids, meanra, meandec, ndet, firstmjd, deltajd, g_r_max, the classifier and class name, the ranking and the probability columns for each class of each object classified by the lc_classifier, with 100 or more detections that are most likely to be cepheid, with a probability larger than 0.76. '''

In [34]:
print("First try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system', model='gpt-4.1-2025-04-14'))
print("Second try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system', model='gpt-4.1-2025-04-14'))
print("Third try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system', model='gpt-4.1-2025-04-14'))

First try:  ['object', 'probability', 'taxonomy']
Second try:  ['object', 'probability', 'taxonomy']
Third try:  ['object', 'probability', 'taxonomy']


In [35]:
print("First try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user', model='gpt-4.1-2025-04-14'))
print("Second try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user', model='gpt-4.1-2025-04-14'))
print("Third try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user', model='gpt-4.1-2025-04-14'))


First try:  ['object', 'probability']
Second try:  ['object', 'probability']
Third try:  ['object', 'probability']


In [36]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system', model='gpt-4.1-2025-04-14'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system', model='gpt-4.1-2025-04-14'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system', model='gpt-4.1-2025-04-14'))


First try:  ['object', 'probability', 'taxonomy']
Second try:  ['object', 'probability', 'taxonomy']
Third try:  ['object', 'probability', 'taxonomy']


In [37]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user', model='gpt-4.1-2025-04-14'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user', model='gpt-4.1-2025-04-14'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user', model='gpt-4.1-2025-04-14'))


First try:  ['object', 'probability', 'taxonomy']
Second try:  ['object', 'probability', 'taxonomy']
Third try:  ['object', 'probability', 'taxonomy']


In [39]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both', model='gpt-4.1-2025-04-14'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both', model='gpt-4.1-2025-04-14'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both', model='gpt-4.1-2025-04-14'))


First try:  ['object', 'probability', 'taxonomy']
Second try:  ['object', 'probability']
Third try:  ['object', 'probability']


In [41]:
# query 19
query='''Write a script in PostgreSQL that return objects which appeared between march 1st and april 1st of 2021, which have at most one detection, and which are classified as asteroids by the stamp classifier with a probability greater than 0.7. The query must return the columns oid, meanra, meandec, ndet, firstMJD, class name and probability.'''

In [47]:
print("First try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system', model='gpt-4.1-2025-04-14'))
print("Second try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system', model='gpt-4.1-2025-04-14'))
print("Third try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system', model='gpt-4.1-2025-04-14'))

First try:  ['object', 'detection', 'probability', 'taxonomy']
Second try:  ['object', 'probability', 'taxonomy']
Third try:  ['object', 'detection', 'probability', 'taxonomy']


In [43]:
print("First try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user', model='gpt-4.1-2025-04-14'))
print("Second try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user', model='gpt-4.1-2025-04-14'))
print("Third try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user', model='gpt-4.1-2025-04-14'))


First try:  ['object', 'detection', 'probability', 'taxonomy']
Second try:  ['object', 'detection', 'probability', 'taxonomy']
Third try:  ['object', 'probability', 'taxonomy']


In [44]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system', model='gpt-4.1-2025-04-14'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system', model='gpt-4.1-2025-04-14'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system', model='gpt-4.1-2025-04-14'))


First try:  ['object', 'probability', 'taxonomy']
Second try:  ['object', 'detection', 'probability', 'taxonomy']
Third try:  ['object', 'probability', 'taxonomy']


In [45]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user', model='gpt-4.1-2025-04-14'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user', model='gpt-4.1-2025-04-14'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user', model='gpt-4.1-2025-04-14'))


First try:  ['object', 'probability', 'taxonomy']
Second try:  ['object', 'probability', 'taxonomy']
Third try:  ['object', 'probability', 'taxonomy']


In [46]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both', model='gpt-4.1-2025-04-14'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both', model='gpt-4.1-2025-04-14'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both', model='gpt-4.1-2025-04-14'))


First try:  ['object', 'probability', 'taxonomy']
Second try:  ['object', 'probability', 'taxonomy']
Third try:  ['object', 'probability', 'taxonomy']


In [48]:
# query 30
query='''Return a Sql query to get the predicted class for each object according to the light curve classifier. The query should return the oid, the class name and the probability of the class for each object.'''


In [49]:
print("First try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system', model='gpt-4.1-2025-04-14'))
print("Second try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system', model='gpt-4.1-2025-04-14'))
print("Third try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'system', model='gpt-4.1-2025-04-14'))

First try:  ['object', 'probability', 'taxonomy']
Second try:  ['probability', 'taxonomy']
Third try:  ['probability', 'taxonomy']


In [50]:
print("First try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user', model='gpt-4.1-2025-04-14'))
print("Second try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user', model='gpt-4.1-2025-04-14'))
print("Third try: ", pred_tables(schema_prompt= schema_linking_prompt, query= query, prompting= 'user', model='gpt-4.1-2025-04-14'))


First try:  ['object', 'probability', 'taxonomy']
Second try:  ['object', 'probability', 'taxonomy']
Third try:  ['object', 'probability', 'taxonomy']


In [51]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system', model='gpt-4.1-2025-04-14'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system', model='gpt-4.1-2025-04-14'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'system', model='gpt-4.1-2025-04-14'))


First try:  ['object', 'probability', 'taxonomy']
Second try:  ['object', 'probability', 'taxonomy']
Third try:  ['object', 'probability', 'taxonomy']


In [52]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user', model='gpt-4.1-2025-04-14'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user', model='gpt-4.1-2025-04-14'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'user', model='gpt-4.1-2025-04-14'))


First try:  ['object', 'probability', 'taxonomy']
Second try:  ['object', 'probability', 'taxonomy']
Third try:  ['object', 'probability', 'taxonomy']


In [53]:
print("First try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both', model='gpt-4.1-2025-04-14'))
print("Second try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both', model='gpt-4.1-2025-04-14'))
print("Third try: ", pred_tables(schema_prompt= tables_linking_prompt_V2, query= query, prompting= 'both', model='gpt-4.1-2025-04-14'))


First try:  ['probability', 'taxonomy']
Second try:  ['probability', 'taxonomy']
Third try:  ['probability', 'taxonomy']


# Direct Generation prompts

In [None]:
general_task_simon='''As a SQL expert with a willingness to assist users, you are tasked with crafting a PostgreSQL query for the Automatic Learning for the Rapid Classification of Events (ALeRCE) Database in 2023. This database serves as a repository for information about individual spatial objects, encompassing various statistics, properties, detections, and features observed by survey telescopes.
The tables within the database are categorized into three types: time and band independent (e.g., object, probability), time-independent (e.g., magstats), and time and band-dependent (e.g., detection). Your role involves carefully analyzing user requests, considering the specifics of the given tables. It is crucial to pay attention to explicit conditions outlined by the user and always maintain awareness of the broader context.
ALeRCE processes data from the alert stream of the Zwicky Transient Facility (ZTF), so unless a specific catalog is mentioned, the data is from ZTF, including the object, candidate and filter identifiers, and other relevant information.  
The user values the personality of a knowledgeable SQL expert, so ensuring accuracy is paramount. Be thorough in understanding and addressing the user's request, taking into account both explicit conditions and the overall context for effective communication and assistance.
'''
general_context_simon='''Given the following text, please thoroughly analyze and provide a detailed explanation of your understanding. Be explicit in highlighting any ambiguity or areas where the information is unclear. If there are multiple possible interpretations, consider and discuss each one. Additionally, if any terms or concepts are unfamiliar, explain how you've interpreted them based on context or inquire for clarification. Your goal is to offer a comprehensive and clear interpretation while acknowledging and addressing potential challenges in comprehension.
"## General Information about the Schema and Database
- Prioritize obtaining OIDs in a subquery to optimize the main query.
- Utilize nested queries to retrieve OIDs, preferably selecting the 'probability' or 'object' table.
- Avoid JOIN clauses; instead, favor nested queries.
## Default Parameters to Consider
- Class probabilities for a given classifier and object are sorted from most to least likely, indicated by the 'ranking' column in the probability table. Hence, the most probable class should have 'ranking'=1.
- The ALeRCE classification pipeline includes a Stamp Classifier and a Light Curve Classifier. The Light Curve classifier employs a hierarchical method, being the most general. If no classifier is specified, use 'classifier_name='lc_classifier' when selecting probabilities.
- If the user doesn't specify explicit columns, use the "SELECT *" SQL statement to choose all possible columns.
- Avoid changing the names of columns or tables unless necessary for the SQL query.
## ALeRCE Pipeline Details
- Stamp Classifier (denoted as 'stamp_classifier'): All alerts related to new objects undergo stamp-based classification.
- Light Curve Classifier (denoted as 'lc_classifier'): A balanced hierarchical random forest classifier employing four models and 15 classes.
- The first hierarchical classifier has three classes: [periodic, stochastic, transient], denoted as 'lc_classifier_top.'
- Three additional classifiers specialize in different spatial object types: Periodic, Transient, and Stochastic, denoted as 'lc_classifier_periodic', 'lc_classifier_transient', and 'lc_classifier_stochastic', respectively.
- The 15 classes are separated for each object type:
  - Transient: [SNe Ia ('SNIa'), SNe Ib/c ('SNIbc'), SNe II ('SNII'), and Super Luminous SNe ('SLSN')].
  - Stochastic: [Active Galactic Nuclei ('AGN'), Quasi Stellar Object ('QSO'), 'Blazar', Cataclysmic Variable/Novae ('CV/Nova'), and Young Stellar Object ('YSO')].
  - Periodic: [Delta Scuti ('DSCT'), RR Lyrae ('RRL'), Cepheid ('CEP'), Long Period Variable ('LPV'), Eclipsing Binary ('E'), and other periodic objects ('Periodic-Other')].
## Probability Variable Names
- classifier_name=('lc_classifier', 'lc_classifier_top', 'lc_classifier_transient', 'lc_classifier_stochastic', 'lc_classifier_periodic', 'stamp_classifier')
- Classes in 'lc_classifier'= ('SNIa', 'SNIbc', 'SNII', 'SLSN', 'QSO', 'AGN', 'Blazar', 'CV/Nova', 'YSO', 'LPV', 'E', 'DSCT', 'RRL', 'CEP', 'Periodic-Other')
- Classes in 'lc_classifier_top'= ('transient', 'stochastic', 'periodic')
- Classes in 'lc_classifier_transient'= ('SNIa', 'SNIbc', 'SNII', 'SLSN')
- Classes in 'lc_classifier_stochastic'= ('QSO', 'AGN', 'Blazar', 'CV/Nova', 'YSO')
- Classes in 'lc_classifier_periodic'= ('LPV', 'E', 'DSCT', 'RRL', 'CEP', 'Periodic-Other')
- Classes in 'stamp_classifier'= ('SN', 'AGN', 'VS', 'asteroid', 'bogus')

If a query involves selecting astronomical objects based on their celestial coordinates, the Q3C extension for PostgreSQL provides a suite of specialized functions optimized for this purpose. 
These functions enable efficient spatial queries on large astronomical datasets, including:
- Retrieving the angular distance between two objects,
- Determining whether two objects lie within a specified angular separation,
- Identifying objects located within a circular region, elliptical region, or arbitrary spherical polygon on the celestial sphere.

The following functions are available in the Q3C extension:
- q3c_dist(ra1, dec1, ra2, dec2) -- returns the distance in degrees between two points (ra1,dec1) and (ra2,dec2)
- q3c_join(ra1, dec1, ra2, dec2, radius)  -- returns true if (ra1, dec1) is within radius spherical distance of (ra2, dec2).
- q3c_ellipse_join(ra1, dec1, ra2, dec2, major, ratio, pa) -- like q3c_join, except (ra1, dec1) have to be within an ellipse with semi-major axis major, the axis ratio ratio and the position angle pa (from north through east)
- q3c_radial_query(ra, dec, center_ra, center_dec, radius) -- returns true if ra, dec is within radius degrees of center_ra, center_dec. This is the main function for cone searches.
- q3c_ellipse_query(ra, dec, center_ra, center_dec, maj_ax, axis_ratio, PA ) -- returns true if ra, dec is within the ellipse from center_ra, center_dec. The ellipse is specified by semi-major axis, axis ratio and positional angle.
- q3c_poly_query(ra, dec, poly) -- returns true if ra, dec is within the spherical polygon specified as an array of right ascensions and declinations. Alternatively poly can be an PostgreSQL polygon type.

It can be useful to define a set of astronomical sources with associated coordinates directly in a SQL query, you can use a WITH clause such as:
    WITH catalog (source_id, ra, dec) AS (
        VALUES ('source_name', ra_value, dec_value),
        ...)
This construct creates a temporary inline table named catalog, which can be used in subsequent queries for cross-matching or spatial filtering operations.
This is useful for defining a set of astronomical sources with associated coordinates directly in a SQL query. Then, you can use the Q3C functions to perform spatial queries on this temporary table (e.g. 'FROM catalog c').
Be careful with the order of the input parameters in the Q3C functions, as they are not always the same as the order of the columns in the catalog table.
'''

final_instructions_simon='''
# Remember to use only the schema provided, using the names of the tables and columns as they are given in the schema. You can use the information provided in the context to help you understand the schema and the request.
# Assume that everything the user asks for is in the schema provided, you do not need to use any other table or column. Do NOT CHANGE the names of the tables or columns unless the user explicitly asks you to do so in the request, giving you the new name to use.
# Answer ONLY with the SQL query, do not include any additional or explanatory text. If you want to add something, add COMMENTS IN PostgreSQL format so that the user can understand.
# Using valid PostgreSQL, the names of the tables and columns, and the information given in 'Context', answer the following request for the tables provided above.
'''


In [2]:
from prompts.DirectSQLGenPrompts import prompt_gen_task_v8, prompt_gen_context_v15, q3c_info, final_instructions_sql_gen_v19

In [18]:
# Check difference between prompts
import difflib

d = difflib.Differ()
diff_gen_task = list(d.compare(general_task_simon.strip().splitlines(), prompt_gen_task_v8.strip().splitlines()))
diff_gen_cntx = list(d.compare(general_context_simon.strip().splitlines(), (prompt_gen_context_v15 + q3c_info).strip().splitlines()))
diff_final_inst = list(d.compare(final_instructions_simon.strip().splitlines(), final_instructions_sql_gen_v19.strip().splitlines()))

for line in diff_gen_cntx:
    print("Line:", line)

Line:   Given the following text, please thoroughly analyze and provide a detailed explanation of your understanding. Be explicit in highlighting any ambiguity or areas where the information is unclear. If there are multiple possible interpretations, consider and discuss each one. Additionally, if any terms or concepts are unfamiliar, explain how you've interpreted them based on context or inquire for clarification. Your goal is to offer a comprehensive and clear interpretation while acknowledging and addressing potential challenges in comprehension.
Line:   "## General Information about the Schema and Database
Line:   - Prioritize obtaining OIDs in a subquery to optimize the main query.
Line:   - Utilize nested queries to retrieve OIDs, preferably selecting the 'probability' or 'object' table.
Line:   - Avoid JOIN clauses; instead, favor nested queries.
Line:   ## Default Parameters to Consider
Line:   - Class probabilities for a given classifier and object are sorted from most to lea

In [9]:
for line in diff_gen_task:
    print("Line:", line)

Line:   As a SQL expert with a willingness to assist users, you are tasked with crafting a PostgreSQL query for the Automatic Learning for the Rapid Classification of Events (ALeRCE) Database in 2023. This database serves as a repository for information about individual spatial objects, encompassing various statistics, properties, detections, and features observed by survey telescopes.
Line:   The tables within the database are categorized into three types: time and band independent (e.g., object, probability), time-independent (e.g., magstats), and time and band-dependent (e.g., detection). Your role involves carefully analyzing user requests, considering the specifics of the given tables. It is crucial to pay attention to explicit conditions outlined by the user and always maintain awareness of the broader context.
Line:   ALeRCE processes data from the alert stream of the Zwicky Transient Facility (ZTF), so unless a specific catalog is mentioned, the data is from ZTF, including the 

In [10]:
for line in diff_final_inst:
    print("Line:", line)

Line:   # Remember to use only the schema provided, using the names of the tables and columns as they are given in the schema. You can use the information provided in the context to help you understand the schema and the request.
Line:   # Assume that everything the user asks for is in the schema provided, you do not need to use any other table or column. Do NOT CHANGE the names of the tables or columns unless the user explicitly asks you to do so in the request, giving you the new name to use.
Line:   # Answer ONLY with the SQL query, do not include any additional or explanatory text. If you want to add something, add COMMENTS IN PostgreSQL format so that the user can understand.
Line:   # Using valid PostgreSQL, the names of the tables and columns, and the information given in 'Context', answer the following request for the tables provided above.
