In [13]:
import os
import sys
import anthropic
# import ollama
import random
import pandas as pd
from tqdm import tqdm
from google.generativeai.types import RequestOptions
from google.api_core import retry
from typing import List, Tuple
import json
from openai import OpenAI
import datetime

current_dir = os.getcwd()
parent_dir = os.path.dirname(current_dir)

if parent_dir not in sys.path:
    sys.path.append(parent_dir)

In [10]:
base_prompt = """
# General Instructions
You are a helpful assistant tasked with highlighting key facts in text. Given a question and answer pair, you need to identify distinct sections of the text that were needed to answer the question. You are only concerned with identifying excerpts of the text that are ESSENTIAL to answering the question. You should ignore irrelevant information that does not help answer the final question. You should extract the exact parts of the text, not summarized versions of the text. After you have decided what specific quotes to use, tag the original question and answer with xml tags around those quotes. The xml tags should follow the format of <fact1>, <fact2>, etc. Here is what your response should look like:

### Fact Extraction Reasoning:
.... (this is your scratch pad to decide which information is most important)

### Reformatted Text
...

# Fact Extraction Details
Each block of text has a large amount of information that could be a valid fact. However, you should only extract the most important quotes. Try to keep each quote as short as possible while still maintaining the essential information.

## Properly Done Example
Here is an example of properly extracting the information needed to answer the final question:

### Original Text:
Question: The following paragraphs each describe a set of seven objects arranged in a fixed order. The statements are logically consistent within each paragraph. In an antique car show, there are seven vehicles: a bus, a motorcyle, a hatchback, a station wagon, a minivan, a truck, and a limousine. The station wagon is the fourth-newest. The motorcyle is newer than the truck. The station wagon is older than the hatchback. The minivan is newer than the hatchback. The bus is newer than the minivan. The truck is newer than the limousine.
Options:
(A) The bus is the third-oldest
(B) The motorcyle is the third-oldest
(C) The hatchback is the third-oldest
(D) The station wagon is the third-oldest
(E) The minivan is the third-oldest
(F) The truck is the third-oldest
(G) The limousine is the third-oldest

Answer: To determine which vehicle is the third-oldest, we need to analyze the given facts:
1. **The station wagon is the fourth-newest**: This means there are three vehicles that are newer than the station wagon.
2. **The motorcycle is newer than the truck**: This indicates that the motorcycle is not the oldest and is positioned above the truck in terms of age.
3. **The station wagon is older than the hatchback**: This means the hatchback is newer than the station wagon.
4. **The minivan is newer than the hatchback**: This places the minivan above the hatchback in terms of age.
5. **The bus is newer than the minivan**: This means the bus is the newest among the minivan and the hatchback.
6. **The truck is newer than the limousine**: This indicates that the limousine is older than the truck.
Now, let's summarize the order based on the information:
- Since the station wagon is the fourth-newest, we can denote the order as follows:
  - 1st: Bus (newest)
  - 2nd: Minivan (newer than hatchback)
  - 3rd: Motorcycle (newer than truck)
  - 4th: Station Wagon (given)
  - 5th: Hatchback (older than station wagon)
  - 6th: Truck (newer than limousine)
  - 7th: Limousine (oldest)
From this arrangement, we can see that the third-oldest vehicle is the **motorcycle**. 
The answer is {B}.

### Fact Extraction Reasoning:
The question asks us to determine the third-oldest vehicle in the list. To answer this question, we need all the relative ages of the vehicles. I'll focus on just taging the specific ages of each of the vehicels in the text.

### Reformatted Text:
Question: The following paragraphs each describe a set of seven objects arranged in a fixed order. The statements are logically consistent within each paragraph. In an antique car show, there are seven vehicles: a bus, a motorcyle, a hatchback, a station wagon, a minivan, a truck, and a limousine. The <fact1>station wagon is the fourth-newest</fact1>. The <fact2>motorcyle is newer than the truck</fact2>. The <fact3>station wagon is older than the hatchback</fact3>. The <fact4>minivan is newer than the hatchback</fact4>. The <fact5>bus is newer than the minivan</fact5>. The <fact6>truck is newer than the limousine</fact6>.
Options:
(A) The bus is the third-oldest
(B) The motorcyle is the third-oldest
(C) The hatchback is the third-oldest
(D) The station wagon is the third-oldest
(E) The minivan is the third-oldest
(F) The truck is the third-oldest
(G) The limousine is the third-oldest

Answer: To determine which vehicle is the third-oldest, we need to analyze the given facts:
1. **<fact1>The station wagon is the fourth-newest</fact1>**: This means there are three vehicles that are newer than the station wagon.
2. **<fact2>The motorcycle is newer than the truck</fact2>**: This indicates that the motorcycle is not the oldest and is positioned above the truck in terms of age.
3. **<fact3>The station wagon is older than the hatchback</fact3>**: This means the hatchback is newer than the station wagon.
4. **<fact4>The minivan is newer than the hatchback</fact4>**: This places the minivan above the hatchback in terms of age.
5. **<fact5>The bus is newer than the minivan</fact5>**: This means the bus is the newest among the minivan and the hatchback.
6. **<fact6>The truck is newer than the limousine</fact6>**: This indicates that the limousine is older than the truck.
Now, let's summarize the order based on the information:
- Since the <fact1>station wagon is the fourth-newest</fact1>, we can denote the order as follows:
  - 1st: Bus (newest)
  - 2nd: Minivan (newer than hatchback)
  - 3rd: Motorcycle (newer than truck)
  - 4th: Station Wagon (given)
  - 5th: Hatchback (older than station wagon)
  - 6th: Truck (newer than limousine)
  - 7th: Limousine (oldest)
From this arrangement, we can see that the third-oldest vehicle is the **motorcycle**. 
The answer is {B}.

## What NOT to Extract
Given an inputted text, there are large amounts of possible facts that could be used. However, not all of these citations are relevant to the final question. You should only extract information that is essential to answering the question. Here is an example of a poorly tagged response that has too many extracted facts:

### Original Text:
Question: Betty is saving money for a new wallet which costs $100. Betty has only half of the money she needs. Her parents decided to give her $15 for that purpose, and her grandparents twice as much as her parents. How much more money does Betty need to buy the wallet?

Answer: Betty currently has half of the $100 she needs, which is $50. Her parents give her $15, and her grandparents give her twice that amount, which is $30. Adding the $15 from her parents and $30 from her grandparents to the $50 she already has, Betty now has $50 + $15 + $30 = $95. Since the wallet costs $100, she still needs $100 - $95 = $5 more to buy the wallet.

### Fact Extraction Reasoning:
...

### Reformatted Text:
Question: <fact1>Betty is saving money</fact1> for a <fact2>new wallet</fact2> which costs <fact3>$100</fact3>. <fact4>Betty has only half</fact4> of the <fact5>money she needs</fact5>. <fact6>Her parents decided to give</fact6> her <fact7>$15 for that purpose</fact7>, and <fact8>her grandparents twice as much as her parents</fact8>. <fact9>How much more money</fact9> <fact10>does Betty need to buy the wallet</fact10>?

Answer: <fact4>Betty currently has half</fact4> of the <fact3>$100</fact3> she needs, which is $50. <fact6>Her parents give her</fact6> <fact7>$15</fact7>, and <fact8>her grandparents give her twice that amount</fact8>, which is $30. Adding the <fact7>$15</fact7> from her parents and <fact8>$30</fact8> from her grandparents to the <fact4>$50 she already has</fact4>, Betty now has <fact4>$50</fact4> + <fact7>$15</fact7> + <fact8>$30</fact8> = $95. Since the wallet costs <fact3>$100</fact3>, she still needs <fact3>$100</fact3> - $95 = $5 more to buy the <fact2>wallet</fact2>.

While all of these tags do properly wrap the quotes from the original text, they are not all necessary to answer the question. Tags like <fact1>, <fact2>, <fact5>, etc contain information that is not essential to answering the final question. Additionally, the tags should be as concise as possible while still providing the necessary information. Many of these tags are redundant or overly verbose, making the response less clear and concise.

**Tag the following question:**
"""

In [11]:
question = """
Question: We have three blocks, A, B and C. Block A has a medium blue square. Below block A is block B which has one medium black square. To the left of block B there is block C which has two medium blue squares. Medium blue square number one is below medium blue square number two. A medium yellow square is below medium blue square number two and medium blue square number one. What is to the left of the black thing? a medium blue square that is in block A or a medium blue square number two?
(a) medium blue square  that is in block A
(b) medium blue square  number two
(c) both of them
(d) none of them
Answer: Block A has a medium blue square, and Block B (below A) has a black square. Block C, to the left of Block B, has two medium blue squares, with blue square number two on top and number one below it. Since the blue squares in Block C are directly to the left of the black square in Block B.
The answer is {B}.

"""

In [12]:
prompt = base_prompt + question
print(prompt)


# General Instructions
You are a helpful assistant tasked with highlighting key facts in text. Given a question and answer pair, you need to identify distinct sections of the text that were needed to answer the question. You are only concerned with identifying excerpts of the text that are ESSENTIAL to answering the question. You should ignore irrelevant information that does not help answer the final question. You should extract the exact parts of the text, not summarized versions of the text. After you have decided what specific quotes to use, tag the original question and answer with xml tags around those quotes. The xml tags should follow the format of <fact1>, <fact2>, etc. Here is what your response should look like:

### Fact Extraction Reasoning:
.... (this is your scratch pad to decide which information is most important)

### Reformatted Text
...

# Fact Extraction Details
Each block of text has a large amount of information that could be a valid fact. However, you should on

In [33]:
questions = [
    """Question: We have three blocks, A, B and C. Block A has a medium blue square. Below block A is block B which has one medium black square. To the left of block B there is block C which has two medium blue squares. Medium blue square number one is below medium blue square number two. A medium yellow square is below medium blue square number two and medium blue square number one. What is to the left of the black thing? a medium blue square that is in block A or a medium blue square number two?
(a) medium blue square  that is in block A
(b) medium blue square  number two
(c) both of them
(d) none of them
Answer: Block A has a medium blue square, and Block B (below A) has a black square. Block C, to the left of Block B, has two medium blue squares, with blue square number two on top and number one below it. Since the blue squares in Block C are directly to the left of the black square in Block B.
The answer is {B}.
    """,
    """Question: We have two blocks. Lets call them A and B. There are two small yellow triangles in block A. Small yellow triangle number one is above and near to small yellow triangle number two. To the right of block A there is block B which contains one small blue triangle. To the left of and far from a small blue circle is a big blue circle. It is above the small blue triangle. The small blue triangle is touching the bottom edge of this block. To the right of the small blue triangle is the small blue circle. Which object is to the right of a small yellow triangle? the small blue circle or the small blue triangle that is touching the bottom edge of a block?
(a) the small blue circle
(b) the small blue triangle that is touching the bottom edge of a block
(c) both of them
(d) none of them
Answer: In block A, there are two small yellow triangles. To the right of block A, block B contains a small blue triangle touching the bottom edge and a small blue circle to its right. Since both the small blue circle and the small blue triangle are to the right of the small yellow triangles.
The answer is {C}.
    """,
    """Question: There are three blocks. We call them A, B and C. Block A contains two big black circles. There is also a small yellow triangle touching the bottom edge of this block. Big black circle number two is touching the right edge of this block. Big black circle number one is to the left of and near to big black circle number two. This shape is above the small yellow triangle. Above block A is block B with a big black triangle in it. Above block B there is block C. It contains two big yellow triangles. Big yellow triangle number one is touching the right edge of this block. It is above big yellow triangle number two. What is above the big black triangle? a small yellow triangle which is touching the bottom edge of a block or a big yellow triangle?
(a) small yellow triangle which is touching the bottom edge of a block
(b) big yellow triangle
(c) both of them
(d) none of them
Answer: Block A contains a small yellow triangle touching the bottom edge, and Block C (above Block B) contains two big yellow triangles, with big yellow triangle number one above big yellow triangle number two. Since Block C is directly above Block B (which contains the big black triangle), the object above the big black triangle is a big yellow triangle.
The answer is {B}.""",
"""Question: We have three blocks, A, B and C. Block B is below C. Block A is to the left of B. Block A contains a medium yellow triangle and a small yellow circle. The medium yellow triangle is to the left of and above the small yellow circle. Block B contains one small yellow triangle. And block C contains one medium yellow circle, one medium black circle and one small black circle. A medium blue circle is near to the small black circle. This object is near to and to the left of the medium yellow circle which is to the right of, near to and above the medium black circle. It is above the medium black circle. Which object is to the left of a small yellow triangle? the small yellow circle or the medium yellow circle?
(a) the small yellow circle
(b) the medium yellow circle
(c) both of them
(d) none of them
Answer: Block A contains a small yellow circle, and Block B contains a small yellow triangle. Since Block A is to the left of Block B, and the small yellow circle is in Block A, it is to the left of the small yellow triangle. The medium yellow circle, which is in Block C, is above and to the right of the small yellow triangle, but not to the left of it.
The answer is {A}.
""",
"""Question: There are two blocks, A and B. Block A has one big yellow triangle and a big black square. Below the big black square there is the big yellow triangle. It is touching the bottom edge of this block. It also contains one small yellow square. The big black square is to the left of and near to the small yellow square. Above block A we have block B which contains one big black triangle and a big black circle. The big black triangle is near to the big black circle. What is below the big black triangle? a big black square or a circle?
(a) big black square
(b) circle
(c) both of them
(d) none of them
Answer: Block A, which is below Block B, contains a big black square. Since the big black triangle is in Block B and Block A is directly below Block B, the object directly below the big black triangle is the big black square in Block A.
The answer is {A}.""",
"""Question: There are three blocks. Lets call them A, B and C. Block A contains a big blue circle and a medium black circle. The big blue circle is touching the bottom edge of this block. A big yellow circle is to the left of the medium black circle. The medium black circle is touching the right edge of this block. To the left of the medium black circle there is the big blue circle. It is below the yellow object which is to the left of the medium black circle. To the left of block A we have block B with a big blue square in it. Below block A there is block C which has a medium yellow square. This block also has one medium yellow circle. Above the medium yellow square there is the medium yellow circle. What is to the right of the big blue square? a medium yellow circle or a medium yellow square?
(a) medium yellow circle
(b) medium yellow square
(c) both of them
(d) none of them
Answer: Block B contains the big blue square, and Block C, which is below Block A, contains both the medium yellow square and the medium yellow circle. Since the medium yellow circle is above the medium yellow square and both are located in Block C, which is below Block A, neither of these objects is directly to the right of the big blue square in Block B.
The answer is {D}."""
             ]

In [31]:
answers = [
    """Question: We have three blocks, A, B and C. Block A has a medium blue square. Below block A is block B which has one medium black square. To the left of block B there is block C which has two medium blue squares. Medium blue square number one is below medium blue square number two. A medium yellow square is below medium blue square number two and medium blue square number one. What is to the left of the black thing? a medium blue square that is in block A or a medium blue square number two?
(a) medium blue square  that is in block A
(b) medium blue square  number two
(c) both of them
(d) none of them
Reformatted Question: We have three blocks, A, B, and C. <fact1>Block A has a medium blue square</fact1>. <fact2>Below block A is block B, which has one medium black square</fact2>. <fact3>To the left of block B, there is block C, which has two medium blue squares</fact3>. <fact4>Medium blue square number one is below medium blue square number two</fact4>. A medium yellow square is below medium blue square number two and medium blue square number one. What is to the left of the black thing? A medium blue square that is in block A or a medium blue square number two?
Answer: <fact1>Block A has a medium blue square</fact1>, and <fact2>Block B (below A) has a black square</fact2>. <fact3>Block C, to the left of Block B, has two medium blue squares</fact3>, with <fact4>blue square number two on top and number one below it</fact4>. Since <fact3>the blue squares in Block C are directly to the left of the black square in Block B</fact3>, the object to the left of the black thing is medium blue square number two.
The answer is {B}.
    """,
    """Question: We have two blocks. Lets call them A and B. There are two small yellow triangles in block A. Small yellow triangle number one is above and near to small yellow triangle number two. To the right of block A there is block B which contains one small blue triangle. To the left of and far from a small blue circle is a big blue circle. It is above the small blue triangle. The small blue triangle is touching the bottom edge of this block. To the right of the small blue triangle is the small blue circle. Which object is to the right of a small yellow triangle? the small blue circle or the small blue triangle that is touching the bottom edge of a block?
(a) the small blue circle
(b) the small blue triangle that is touching the bottom edge of a block
(c) both of them
(d) none of them
Reformatted Question: We have two blocks. Let's call them A and B. <fact1>There are two small yellow triangles in block A</fact1>. Small yellow triangle number one is above and near to small yellow triangle number two. <fact2>To the right of block A, there is block B which contains one small blue triangle</fact2>. The small blue triangle is touching the bottom edge of block B. <fact3>To the right of the small blue triangle is the small blue circle</fact3>. To the left of and far from a small blue circle is a big blue circle. It is above the small blue triangle. Which object is to the right of a small yellow triangle? The small blue circle or the small blue triangle that is touching the bottom edge of a block?
Answer: In block A, <fact1>there are two small yellow triangles</fact1>. To the right of block A, <fact2>block B contains a small blue triangle touching the bottom edge</fact2> and <fact3>a small blue circle to its right</fact3>. Since both the small blue circle and the small blue triangle are to the right of the small yellow triangles.
The answer is {C}.
""",
"""Question: There are three blocks. We call them A, B and C. Block A contains two big black circles. There is also a small yellow triangle touching the bottom edge of this block. Big black circle number two is touching the right edge of this block. Big black circle number one is to the left of and near to big black circle number two. This shape is above the small yellow triangle. Above block A is block B with a big black triangle in it. Above block B there is block C. It contains two big yellow triangles. Big yellow triangle number one is touching the right edge of this block. It is above big yellow triangle number two. What is above the big black triangle? a small yellow triangle which is touching the bottom edge of a block or a big yellow triangle?
(a) small yellow triangle which is touching the bottom edge of a block
(b) big yellow triangle
(c) both of them
(d) none of them
Reformatted Question: There are three blocks. We call them A, B, and C. Block A contains two big black circles. <fact1>There is also a small yellow triangle touching the bottom edge of this block</fact1>. Big black circle number two is touching the right edge of this block. Big black circle number one is to the left of and near to big black circle number two. This shape is above the small yellow triangle. Above block A is block B with a big black triangle in it. <fact2>Above block B there is block C. It contains two big yellow triangles</fact2>. <fact3>Big yellow triangle number one is touching the right edge of this block</fact3>. It is above big yellow triangle number two. What is above the big black triangle? A small yellow triangle which is touching the bottom edge of a block or a big yellow triangle?
Answer: <fact1>Block A contains a small yellow triangle touching the bottom edge</fact1>, and <fact2>Block C (above Block B) contains two big yellow triangles</fact2>, with <fact3>big yellow triangle number one above big yellow triangle number two</fact3>. Since <fact2>Block C is directly above Block B (which contains the big black triangle)</fact2>, the object above the big black triangle is a big yellow triangle.
The answer is {B}.""",
"""Question: We have three blocks, A, B and C. Block B is below C. Block A is to the left of B. Block A contains a medium yellow triangle and a small yellow circle. The medium yellow triangle is to the left of and above the small yellow circle. Block B contains one small yellow triangle. And block C contains one medium yellow circle, one medium black circle and one small black circle. A medium blue circle is near to the small black circle. This object is near to and to the left of the medium yellow circle which is to the right of, near to and above the medium black circle. It is above the medium black circle. Which object is to the left of a small yellow triangle? the small yellow circle or the medium yellow circle?
(a) the small yellow circle
(b) the medium yellow circle
(c) both of them
(d) none of them
Reformatted Question: We have three blocks, A, B, and C. Block B is below C. <fact1>Block A is to the left of B</fact1>. <fact2>Block A contains a medium yellow triangle and a small yellow circle</fact2>. The medium yellow triangle is to the left of and above the small yellow circle. <fact3>Block B contains one small yellow triangle</fact3>. <fact4>Block C contains one medium yellow circle, one medium black circle, and one small black circle</fact4>. A medium blue circle is near to the small black circle. This object is near to and to the left of the medium yellow circle, which is to the right of, near to, and above the medium black circle. It is above the medium black circle. Which object is to the left of a small yellow triangle? The small yellow circle or the medium yellow circle?
Answer: <fact2>Block A contains a small yellow circle</fact2>, and <fact3>Block B contains a small yellow triangle</fact3>. Since <fact1>Block A is to the left of Block B</fact1>, and <fact2>the small yellow circle is in Block A</fact2>, it is to the left of the small yellow triangle. The <fact4>medium yellow circle, which is in Block C</fact4>, is above and to the right of the small yellow triangle, but not to the left of it.
The answer is {A}.""",
"""Question: There are two blocks, A and B. Block A has one big yellow triangle and a big black square. Below the big black square there is the big yellow triangle. It is touching the bottom edge of this block. It also contains one small yellow square. The big black square is to the left of and near to the small yellow square. Above block A we have block B which contains one big black triangle and a big black circle. The big black triangle is near to the big black circle. What is below the big black triangle? a big black square or a circle?
(a) big black square
(b) circle
(c) both of them
(d) none of them
Reformatted Question: There are two blocks, A and B. <fact1>Block A has one big yellow triangle and a big black square</fact1>. Below the big black square, there is the big yellow triangle. It is touching the bottom edge of this block. It also contains one small yellow square. The big black square is to the left of and near to the small yellow square. <fact2>Above block A we have block B, which contains one big black triangle and a big black circle</fact2>. The big black triangle is near to the big black circle. What is below the big black triangle? A big black square or a circle?
Answer: <fact1>Block A, which is below Block B, contains a big black square</fact1>. Since <fact2>the big black triangle is in Block B</fact2> and <fact1>Block A is directly below Block B</fact1>, the object directly below the big black triangle is the big black square in Block A.
The answer is {A}.""",
"""Question: There are three blocks. Lets call them A, B and C. Block A contains a big blue circle and a medium black circle. The big blue circle is touching the bottom edge of this block. A big yellow circle is to the left of the medium black circle. The medium black circle is touching the right edge of this block. To the left of the medium black circle there is the big blue circle. It is below the yellow object which is to the left of the medium black circle. To the left of block A we have block B with a big blue square in it. Below block A there is block C which has a medium yellow square. This block also has one medium yellow circle. Above the medium yellow square there is the medium yellow circle. What is to the right of the big blue square? a medium yellow circle or a medium yellow square?
(a) medium yellow circle
(b) medium yellow square
(c) both of them
(d) none of them
Reformatted Question: There are three blocks. Let's call them A, B, and C. Block A contains a big blue circle and a medium black circle. The big blue circle is touching the bottom edge of this block. A big yellow circle is to the left of the medium black circle. The medium black circle is touching the right edge of this block. To the left of the medium black circle, there is the big blue circle. It is below the yellow object, which is to the left of the medium black circle. <fact1>To the left of block A, we have block B with a big blue square in it</fact1>. <fact2>Below block A, there is block C, which has a medium yellow square</fact2>. This block also has one medium yellow circle. <fact3>Above the medium yellow square, there is the medium yellow circle</fact3>. What is to the right of the big blue square? A medium yellow circle or a medium yellow square?
Answer: <fact1>Block B contains the big blue square</fact1>, and <fact2>Block C, which is below Block A, contains both the medium yellow square and the medium yellow circle</fact2>. Since <fact3>the medium yellow circle is above the medium yellow square</fact3> and both are located in Block C, which is below Block A, neither of these objects is directly to the right of the big blue square in Block B.
The answer is {D}."""
    
]


In [34]:
def query_4o(prompt: str) -> str:
    client = OpenAI()

    completion = client.chat.completions.create(
        model="gpt-4o-2024-08-06",
        messages=[
            {
                "role": "user",
                "content": f"{prompt}"
            }
        ],
        temperature=0
    )

    return completion.choices[0].message.content

responses = []
for question in questions:
    prompt = base_prompt + question
    responses.append(query_4o(prompt))
    print(responses[-1])
# print(query_4o(prompt))

### Fact Extraction Reasoning:
To determine what is to the left of the black square in Block B, we need to focus on the spatial arrangement of the blocks and the squares within them. The key information is the position of Block C relative to Block B and the arrangement of the squares within Block C.

### Reformatted Text
Question: We have three blocks, A, B and C. Block A has a medium blue square. Below block A is block B which has <fact1>one medium black square</fact1>. To the left of block B there is block C which has <fact2>two medium blue squares</fact2>. Medium blue square number one is below medium blue square number two. A medium yellow square is below medium blue square number two and medium blue square number one. What is to the left of the black thing? a medium blue square that is in block A or a medium blue square number two?
(a) medium blue square that is in block A
(b) medium blue square number two
(c) both of them
(d) none of them
Answer: Block A has a medium blue square,

In [39]:
with open("output.txt", "w") as file:
    # Write each string on a new line
    for string in responses:
        file.write(string + "\n")  # Add newline character at the end

In [37]:
import re
import os

def add_color_to_tags_new(text):
    """
    This function finds all unique tags in the text and assigns each a color from a predefined palette.
    It then replaces the tags with styled <span> elements that include the assigned background color.
    """
    # Find all unique opening tags in the text using regex
    tags = set(re.findall(r'<([A-Za-z]+\d*)>', text))

    # Predefined color palette
    color_palette = [
        'lightyellow', 'lightblue', 'lightgreen', 'lightcoral',
        'lightcyan', 'lightpink', 'lightsalmon', 'lightgray',
        'lightgoldenrodyellow', 'lightseagreen', 'lightskyblue',
        'lightsteelblue',
        'lavender', 'peachpuff', 'paleturquoise', 'wheat', 'mistyrose'
    ]

    # Dictionary to hold tag-color mapping
    tag_color_mapping = {}

    # Assign colors to tags, cycling through the color palette if necessary
    for i, tag in enumerate(sorted(tags)):
        color = color_palette[i % len(color_palette)]
        tag_color_mapping[tag] = color

    # Function to replace tags with styled spans including class names
    def replace_tag(match):
        tag = match.group(1)
        content = match.group(2)
        color = tag_color_mapping.get(tag, 'lightgray')  # Default color if not found
        return f'<span class="{tag}" style="background-color: {color}; font-weight: bold; padding: 2px 4px; border-radius: 3px;">{content}</span>'

    # Regex to find tags and replace them with styled spans
    # This regex handles multi-line content within tags
    tag_regex = re.compile(r'<([A-Za-z]+\d*)>\s*([\s\S]*?)\s*</\1>')

    # Replace all tags with styled spans
    text = tag_regex.sub(replace_tag, text)

    return text


def highlight_final_answer(text):
    """
    This function highlights the final answer enclosed in curly braces {}.
    """
    # Regex to find content within curly braces
    final_answer_regex = re.compile(r'\{([^}]+)\}')

    # Replace with a styled span
    highlighted_text = final_answer_regex.sub(
        lambda match: f'{match.group(1)}',
        text
    )

    return highlighted_text


def create_highlight_html(questions, output_file='questions_visualization.html'):
    """
    This function takes a list of question strings, applies color highlighting to any tags within them,
    and generates an HTML file to display the visualized questions.
    """
    html_content = """
    <!DOCTYPE html>
    <html lang="en">
    <head>
        <meta charset="UTF-8">
        <title>Questions Visualization</title>
        <style>
            body {{
                font-family: Arial, sans-serif;
                margin: 20px;
                background-color: #f9f9f9;
            }}
            .container {{
                background-color: #ffffff;
                padding: 20px 25px;
                margin-bottom: 20px;
                border-radius: 8px;
                box-shadow: 0 2px 5px rgba(0,0,0,0.1);
            }}
            .question-header {{
                font-size: 1.3em;
                margin-bottom: 10px;
                color: #333333;
            }}
            .question-body {{
                font-size: 1.1em;
                line-height: 1.6;
                white-space: pre-wrap; /* Preserve line breaks */
            }}
            .final-answer {{
                margin-top: 15px;
                padding: 10px;
                background-color: #e6f7ff;
                border-left: 4px solid #1890ff;
                font-weight: bold;
            }}
            /* Styles for the highlighted spans */
            .highlighted {{
                padding: 2px 4px;
                border-radius: 3px;
                display: inline-block;
            }}
        </style>
    </head>
    <body>
    <h1>Questions Visualization</h1>
    <div class="summary">
        <strong>Total Questions:</strong> {total_questions}
    </div>
    <hr>
    """

    # Process each question and append to HTML content
    for idx, question in enumerate(questions, 1):
        try:
            # Apply color to tags in the question
            highlighted_question = add_color_to_tags_new(question)

            # Highlight the final answer enclosed in {}
            highlighted_question = highlight_final_answer(highlighted_question)

            # Build the HTML structure for each question
            html_content += f"<div class='container'>"
            html_content += f"<div class='question-header'><strong>Question {idx}:</strong></div>"
            html_content += f"<div class='question-body'>{highlighted_question}</div>"
            html_content += "</div>\n"
        except Exception as e:
            print(f"Cannot process question {idx}: {e}")
            continue

    # Close the HTML tags
    html_content += """
    </body>
    </html>
    """

    # Insert the total number of questions into the summary section
    html_content = html_content.format(total_questions=len(questions))

    # Write the HTML content to the output file
    with open(output_file, 'w', encoding='utf-8') as file:
        file.write(html_content)

    print(f"HTML content has been successfully written to {output_file}")


def main():
    """
    Main function to execute the visualization process.
    """
    # Define your array of string questions here

    # Specify the output HTML file name
    output_html_file = 'questions_visualization.html'

    # Generate the HTML visualization
    create_highlight_html(responses, output_html_file)


if __name__ == "__main__":
    main()


HTML content has been successfully written to questions_visualization.html


## responses + answers

In [38]:
import re
import os

def add_color_to_tags_new(text):
    """
    This function finds all unique tags in the text and assigns each a color from a predefined palette.
    It then replaces the tags with styled <span> elements that include the assigned background color.
    """
    # Find all unique opening tags in the text using regex
    tags = set(re.findall(r'<([A-Za-z]+\d*)>', text))

    # Predefined color palette
    color_palette = [
        'lightyellow', 'lightblue', 'lightgreen', 'lightcoral',
        'lightcyan', 'lightpink', 'lightsalmon', 'lightgray',
        'lightgoldenrodyellow', 'lightseagreen', 'lightskyblue',
        'lightsteelblue',
        'lavender', 'peachpuff', 'paleturquoise', 'wheat', 'mistyrose'
    ]

    # Dictionary to hold tag-color mapping
    tag_color_mapping = {}

    # Assign colors to tags, cycling through the color palette if necessary
    for i, tag in enumerate(sorted(tags)):
        color = color_palette[i % len(color_palette)]
        tag_color_mapping[tag] = color

    # Function to replace tags with styled spans including class names
    def replace_tag(match):
        tag = match.group(1)
        content = match.group(2)
        color = tag_color_mapping.get(tag, 'lightgray')  # Default color if not found
        return f'<span class="{tag}" style="background-color: {color}; font-weight: bold; padding: 2px 4px; border-radius: 3px;">{content}</span>'

    # Regex to find tags and replace them with styled spans
    # This regex handles multi-line content within tags
    tag_regex = re.compile(r'<([A-Za-z]+\d*)>\s*([\s\S]*?)\s*</\1>')

    # Replace all tags with styled spans
    text = tag_regex.sub(replace_tag, text)

    return text


def highlight_final_answer(text):
    """
    This function highlights the final answer enclosed in curly braces {}.
    """
    # Regex to find content within curly braces
    final_answer_regex = re.compile(r'\{([^}]+)\}')

    # Replace with a styled span
    highlighted_text = final_answer_regex.sub(
        lambda match: f'<span class="final-answer">{match.group(1)}</span>',
        text
    )

    return highlighted_text


def create_highlight_html(responses, answers, output_file='questions_visualization.html'):
    """
    This function takes lists of responses and answers, applies color highlighting to any tags within them,
    and generates an HTML file to display the visualized questions and answers side by side.
    """
    html_content = """
    <!DOCTYPE html>
    <html lang="en">
    <head>
        <meta charset="UTF-8">
        <title>Questions and Answers Visualization</title>
        <style>
            body {{
                font-family: Arial, sans-serif;
                margin: 20px;
                background-color: #f9f9f9;
            }}
            .container {{
                background-color: #ffffff;
                padding: 20px 25px;
                margin-bottom: 20px;
                border-radius: 8px;
                box-shadow: 0 2px 5px rgba(0,0,0,0.1);
                display: flex;
                flex-direction: row;
                gap: 20px;
            }}
            .response, .answer {{
                flex: 1;
            }}
            .response-header, .answer-header {{
                font-size: 1.2em;
                margin-bottom: 10px;
                color: #333333;
                border-bottom: 2px solid #e0e0e0;
                padding-bottom: 5px;
            }}
            .response-body, .answer-body {{
                font-size: 1.1em;
                line-height: 1.6;
                white-space: pre-wrap; /* Preserve line breaks */
            }}
            .final-answer {{
                display: inline-block;
                margin-top: 15px;
                padding: 10px;
                background-color: #e6f7ff;
                border-left: 4px solid #1890ff;
                font-weight: bold;
                border-radius: 3px;
            }}
            /* Styles for the highlighted spans */
            .highlighted {{
                padding: 2px 4px;
                border-radius: 3px;
                display: inline-block;
            }}
            /* Responsive adjustments */
            @media (max-width: 768px) {{
                .container {{
                    flex-direction: column;
                }}
            }}
            .answer-body {{
                margin-top: 19rem;
            }}
        </style>
    </head>
    <body>
    <h1>Questions and Answers Visualization</h1>
    <div class="summary">
        <strong>Total Pairs:</strong> {total_pairs}
    </div>
    <hr>
    """

    # Ensure both lists have the same length
    total_pairs = min(len(responses), len(answers))

    # Process each pair and append to HTML content
    for idx in range(total_pairs):
        try:
            response = responses[idx]
            answer = answers[idx]

            # Apply color to tags in the response and answer
            highlighted_response = add_color_to_tags_new(response)
            highlighted_answer = add_color_to_tags_new(answer)

            # Highlight the final answer enclosed in {}
            highlighted_response = highlight_final_answer(highlighted_response)
            highlighted_answer = highlight_final_answer(highlighted_answer)

            # Build the HTML structure for each pair
            html_content += f"<div class='container'>"
            html_content += f"<div class='response'>"
            html_content += f"<div class='response-header'><strong>Response {idx + 1}:</strong></div>"
            html_content += f"<div class='response-body'>{highlighted_response}</div>"
            html_content += f"</div>"
            html_content += f"<div class='answer'>"
            html_content += f"<div class='answer-header'><strong>Answer {idx + 1}:</strong></div>"
            html_content += f"<div class='answer-body'>{highlighted_answer}</div>"
            html_content += f"</div>"
            html_content += "</div>\n"
        except Exception as e:
            print(f"Cannot process pair {idx + 1}: {e}")
            continue

    # Close the HTML tags
    html_content += """
    </body>
    </html>
    """

    # Insert the total number of pairs into the summary section
    html_content = html_content.format(total_pairs=total_pairs)

    # Write the HTML content to the output file
    with open(output_file, 'w', encoding='utf-8') as file:
        file.write(html_content)

    print(f"HTML content has been successfully written to {output_file}")


def main():

    # Ensure that the number of responses and answers match
    if len(responses) != len(answers):
        print("Warning: The number of responses and answers do not match. Only matching pairs will be processed.")

    # Specify the output HTML file name
    output_html_file = 'questions_visualization.html'

    # Generate the HTML visualization
    create_highlight_html(responses, answers, output_html_file)


if __name__ == "__main__":
    main()


HTML content has been successfully written to questions_visualization.html
