In [9]:
def append_text_to_system_prompt(prompt, text):
    if not prompt:
        prompt = []
    
    prompt.append({"type": "text", "text": text})
    return prompt


def append_image_to_system_prompt(prompt, encoded_image):
    if not prompt:
        prompt = []
    
    prompt.append({"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{encoded_image}"}})
    return prompt

In [10]:
content = append_text_to_system_prompt(None, """You extract the text from a provided image of handwritten notes.  The final output is a markdown text.
- The handwritten notes will also be written in markdown.
- For example, the handwritten notes will use symbols like # for headers, - for bullet points, _ for italics, * for bold, and other possible markdown characters.

## Extraction Rules
- First, take a step back and understand the overall structure of the image.
    - Is there a color coding? For example, blue may be for headers, black for text, and green for extra comments.
    - Are there text in multiple columnts that could be representated as tables? Sometimes enclosed in grids, sometimes simply have column line dividers.
    - Are there isolated boxes?
    - Are there text with a line to another text that looks like mind maps and branches?
    - Are there drawings such as flow charts, graphs, swim lanes, etc.?
    - Are there line dividers to indicate a separate topic/section?
    - Are there headings, perhaps underlined, prefixed with # or color coded?
- Determine the section layout based on the observations.
- Process each section separately.

## For main section headings
- If the section has a written title, format it as a header such as "## TITLE" or "### TITLE". You can determine the level of the heading (i.e. #, ##, ### and so on...)
- If the section does not have a written title, name it as "{PLACEHOLDER_HEADER}", for example "## {PLACEHOLDER_HEADER}" or "### {PLACEHOLDER_HEADER}". You should still determine the level or sub level of the heading.
- Do not invent nor rephrase section titles.

## For text enclosed in a box, that's not part of a table or grid
- Treat as a sub section.
- If it has a written title, format it as a header such as "## TITLE" or "### TITLE". You can determine the level of the heading (i.e. ##, ### and so on...)
- If it does not have a written title, name it as "{PLACEHOLDER_HEADER}", for example "## {PLACEHOLDER_HEADER}" or "### {PLACEHOLDER_HEADER}". You should still determine the level of the heading.

## For tables
- Format in markdown tables
- Determine if there is a column header. If there isn't, name it as "{PLACEHOLDER_HEADER}"

For Example:
""")

content

[{'type': 'text',
  'text': 'You extract the text from a provided image of handwritten notes.  The final output is a markdown text.\n- The handwritten notes will also be written in markdown.\n- For example, the handwritten notes will use symbols like # for headers, - for bullet points, _ for italics, * for bold, and other possible markdown characters.\n\n## Extraction Rules\n- First, take a step back and understand the overall structure of the image.\n    - Is there a color coding? For example, blue may be for headers, black for text, and green for extra comments.\n    - Are there text in multiple columnts that could be representated as tables? Sometimes enclosed in grids, sometimes simply have column line dividers.\n    - Are there isolated boxes?\n    - Are there text with a line to another text that looks like mind maps and branches?\n    - Are there drawings such as flow charts, graphs, swim lanes, etc.?\n    - Are there line dividers to indicate a separate topic/section?\n    - Ar

In [11]:
import base64

IMAGE_PATH="../media/notes-sample.png"
encoded_image = base64.b64encode(open(IMAGE_PATH, 'rb').read()).decode('ascii')

content = append_image_to_system_prompt(content, encoded_image)

content

[{'type': 'text',
  'text': 'You extract the text from a provided image of handwritten notes.  The final output is a markdown text.\n- The handwritten notes will also be written in markdown.\n- For example, the handwritten notes will use symbols like # for headers, - for bullet points, _ for italics, * for bold, and other possible markdown characters.\n\n## Extraction Rules\n- First, take a step back and understand the overall structure of the image.\n    - Is there a color coding? For example, blue may be for headers, black for text, and green for extra comments.\n    - Are there text in multiple columnts that could be representated as tables? Sometimes enclosed in grids, sometimes simply have column line dividers.\n    - Are there isolated boxes?\n    - Are there text with a line to another text that looks like mind maps and branches?\n    - Are there drawings such as flow charts, graphs, swim lanes, etc.?\n    - Are there line dividers to indicate a separate topic/section?\n    - Ar

In [12]:
content = append_text_to_system_prompt(content, """
should result to ...
""")

content

[{'type': 'text',
  'text': 'You extract the text from a provided image of handwritten notes.  The final output is a markdown text.\n- The handwritten notes will also be written in markdown.\n- For example, the handwritten notes will use symbols like # for headers, - for bullet points, _ for italics, * for bold, and other possible markdown characters.\n\n## Extraction Rules\n- First, take a step back and understand the overall structure of the image.\n    - Is there a color coding? For example, blue may be for headers, black for text, and green for extra comments.\n    - Are there text in multiple columnts that could be representated as tables? Sometimes enclosed in grids, sometimes simply have column line dividers.\n    - Are there isolated boxes?\n    - Are there text with a line to another text that looks like mind maps and branches?\n    - Are there drawings such as flow charts, graphs, swim lanes, etc.?\n    - Are there line dividers to indicate a separate topic/section?\n    - Ar