# Google Gemini Meta Prompt for generating JSON Schema

This notebook demonstrates a meta-prompt for Gemini 2.0 Flash that generates JSON Schema based on user input.  This is useful for:

*   Dynamically creating JSON Schemas based on varying user requirements.
*   Getting started with structured outputs in Gemini by automatically generating the necessary schema.
*   Ensuring the generated JSON conforms to a specific structure, which is crucial for data validation and interoperability.

If you know what schema you want to use rather take a look at the [Structured Outputs](./gemini-structured-outputs.ipynb) notebook or learn more about structured output with Gemini here:
- [https://ai.google.dev/gemini-api/docs/structured-output](https://ai.google.dev/gemini-api/docs/structured-output)

In [27]:
import os
from google import genai

# create client
api_key = os.getenv("GEMINI_API_KEY","xxx")
client = genai.Client(api_key=api_key)

In [49]:
# Used to generate valid JSON Schema that can be used to generate structured output with Gemini
meta_prompt = """You are a JSON Schema expert. Your task is to create JSON schema baed on the user input. The schema will be used for extra data.  

You must also make sure:
- All fields in an object are set as required
- All objects must have properties defined
- Order matters! If the values are dependent or would require additional information, make sure to include the additional information in the description. Same counts for "reasoning" or "thinking" should come before the conclusion.
- $defs must be defined under the schema param
- Return only the schema JSON not more, use ```json to start and ``` to end the JSON schema

Restrictions:
- You cannot use examples, if you think examples are helpful include them in the description.
- You cannot use default values, If you think default are helpful include them in the description.
- Never include a $schema
- The "type" needs to be a single value, no arrays

Guidelines:
- If the user prompt is short define a single object schema and fields based on your knowledge.
- If the user prompt is in detail about the data only use the data in the schema. Don't add more fields than the user asked for.

Examples:

Input: Cookie Recipes
Output: ```json
{{
   "title":"Cookie Recipe",
   "description":"Schema for a cookie recipe, including ingredients and quantities. The 'ingredients' array lists each ingredient along with its corresponding quantity and unit of measurement. The 'instructions' array provides a step-by-step guide to preparing the cookies. The order of instructions is important.",
   "type":"object",
   "properties":{{
      "name":{{
         "type":"string",
         "description":"The name of the cookie recipe."
      }},
      "description":{{
         "type":"string",
         "description":"A short description of the cookie, including taste and textures."
      }},
      "ingredients":{{
         "type":"array",
         "description":"A list of ingredients required for the recipe.",
         "items":{{
            "type":"object",
            "description":"An ingredient with its quantity and unit.",
            "properties":{{
               "name":{{
                  "type":"string",
                  "description":"The name of the ingredient (e.g., flour, sugar, butter)."
               }},
               "quantity":{{
                  "type":"number",
                  "description":"The amount of the ingredient needed."
               }},
               "unit":{{
                  "type":"string",
                  "description":"The unit of measurement for the ingredient (e.g., cups, grams, teaspoons). Use abbreviations like 'tsp' for teaspoon and 'tbsp' for tablespoon."
               }}
            }},
            "required":[
               "name",
               "quantity",
               "unit"
            ]
         }}
      }},
      "instructions":{{
         "type":"array",
         "description":"A sequence of steps to prepare the cookie recipe. The order of instructions matters.",
         "items":{{
            "type":"string",
            "description":"A single instruction step."
         }}
      }}
   }},
   "required":[
      "name",
      "description",
      "ingredients",
      "instructions"
   ],
   "$defs":{{
      "ingredient":{{
         "type":"object",
         "description":"An ingredient with its quantity and unit.",
         "properties":{{
            "name":{{
               "type":"string",
               "description":"The name of the ingredient (e.g., flour, sugar, butter)."
            }},
            "quantity":{{
               "type":"number",
               "description":"The amount of the ingredient needed."
            }},
            "unit":{{
               "type":"string",
               "description":"The unit of measurement for the ingredient (e.g., cups, grams, teaspoons). Use abbreviations like 'tsp' for teaspoon and 'tbsp' for tablespoon."
            }}
         }},
         "required":[
            "name",
            "quantity",
            "unit"
         ]
      }}
   }}
}}
```

Input: Book with title, author, and publication year.
Output: ```json
{{
    "type": "object",
    "properties": {{
        "title": {{
            "type": "string",
            "description": "The title of the book."
        }},
        "author": {{
            "type": "string",
            "description": "The author of the book."
        }},
        "publicationYear": {{
            "type": "integer",
            "description": "The year the book was published."
        }}
    }},
    "required": [
        "title",
        "author",
        "publicationYear"
    ],
}}
```

Input: {user_input}"""

Example of generating a schema based on user input and then using the schema to generate structured output.

In [52]:
from google import genai
import re 
import json
response = client.models.generate_content(
    model='gemini-2.0-flash',
    contents=meta_prompt.format(user_input="Cookie Recipes with ingredients and and quantity"),
)
# Extract the JSON schema from the response
match = re.search(r"```json\s*(.*?)\s*```", response.text, re.DOTALL)
if match:
    json_schema = json.loads(match.group(1).strip())
else:
    json_schema = None


# generate a recipe based on the schema
recipe = client.models.generate_content(
    model='gemini-2.0-flash',
    contents="Cookie Recipes",
    config={
        'response_mime_type': 'application/json',
        'response_schema': json_schema,
    },
)

print(json.loads(recipe.text))

{'ingredients': [{'name': 'Flour', 'quantity': 2.5}, {'name': 'Sugar', 'quantity': 1}, {'name': 'Butter', 'quantity': 1}, {'name': 'Eggs', 'quantity': 2}, {'name': 'Vanilla Extract', 'quantity': 1}, {'name': 'Chocolate Chips', 'quantity': 2}], 'name': 'Chocolate Chip Cookies'}
