# Introduction to the Brands-CompSent-19-SCQRE Dataset

This notebook analyzes subjective comparative questions from Brands-CompSent-19-SCQRE dataset, focusing on the Brands domain. We aim to extract meaningful components from these questions to enhance the ability of Automated Question Answering systems to handle subjective comparisons.



## Loading and Displaying Data from JSON

We can load the data from the JSON files and display it as a Pandas DataFrame for easier analysis.  The following code demonstrates how to read the JSON data, extract relevant information about comparative relations, and present it in a structured tabular format.

# **Read From .JSON:**

In [9]:
import json
import pandas as pd
import requests

def json_to_table(url):
    """Reads a JSON file from a URL, extracts data, and displays it as a Pandas DataFrame.

    Args:
        url: The URL to the JSON file.

    Returns:
        A Pandas DataFrame representing the data, or None if an error occurs.
    """
    try:
        response = requests.get(url)
        response.raise_for_status()  # Raise an error for bad responses (4xx and 5xx)

        data = response.json()  # Automatically decode JSON

        rows = []
        row_id = 1
        for item in data:
            question = item['Question']
            for relation in item['Relations']:
                row = {
                    'ID': row_id,
                    'Question': question,
                    'Subject Entity': ', '.join(relation['Subject Entity']),
                    'Compared Aspect': relation['Aspect'],  # Renamed column
                    'Object Entity': ', '.join(relation['Object Entity']),
                    'Preference Category': relation['Preference'],  # Renamed column
                    'Constraint': ', '.join(relation['Constraint'])
                }
                rows.append(row)
                row_id += 1

        df = pd.DataFrame(rows)
        return df

    except requests.exceptions.RequestException as e:
        print(f"Error retrieving data from URL: {e}")
        return None
    except json.JSONDecodeError:
        print("Error: Invalid JSON format.")
        return None
    except Exception as e:
        print(f"An unexpected error occurred: {e}")
        return None


# Example usage
url = "https://raw.githubusercontent.com/mahsamb/SCRQD/main/Brands-CompSent-19-SCQRE.json"  # URL of the JSON file
df = json_to_table(url)

if df is not None:
    print(df.head(5))  # Display only the first 5 rows

   ID                                           Question Subject Entity  \
0   1  Did you like the Toyota over the Ford and does...         Toyota   
1   2  Did you like the Toyota over the Ford and does...         Toyota   
2   3  Does the Toyota Tundra get worse fuel economy ...  Toyota Tundra   
3   4                    Is Apple better than Microsoft?          Apple   
4   5                   Is Google better than Microsoft?         Google   

  Compared Aspect          Object Entity Preference Category Constraint  
0             All                   Ford              Better       none  
1            look                  Chevy              Better       none  
2    fuel economy  Ford F-150, Dodge Ram               Worse       none  
3             All              Microsoft              Better       none  
4             All              Microsoft              Better       none  


## Relation Extraction and Representation

The relationships between entities within questions are extracted and represented in a structured table format. Each row in the table corresponds to a single comparative relation extracted from a question. The table is organized as follows:

* **Question:** The original question from which the relations were extracted. This provides context for the relations.
* **ID:** A unique identifier for each relation.
* **Subject Entity:** The primary entity being compared.
* **Compared Aspect:** The specific feature or attribute being compared between the entities.
* **Object Entity:** The entity being compared to the subject entity.
* **Preference:** The direction of comparison, indicating which entity is preferred or considered better regarding the compared aspect (e.g., "Better," "Worse," "Equal").
* **Constraint:** Any conditions or limitations on the comparison.

This structure allows for a detailed and nuanced representation of the comparative relationships present in the questions. For example, the question "Does the Toyota Tundra get worse fuel economy than the larger Ford F-150 or Dodge Ram?" would be decomposed into a table entry with:

* **Subject Entity:** Toyota Tundra
* **Compared Aspect:** Fuel Economy
* **Object Entity:** Ford F-150, Dodge Ram
* **Preference:** Worse
* **Constraint:** none

This structured representation facilitates analysis and comparison of different relations across multiple questions. Viewing the first few rows of the table provides a concrete illustration of this structure and how it captures the comparative information.

In [10]:
import json
import pandas as pd
import requests
from IPython.display import display, HTML

def json_to_html_table(url, num_rows=5):
    """Reads a JSON file from a URL, extracts data, and displays it as an HTML table.

    Args:
        url: The URL to the JSON file.
        num_rows: The number of rows to display (default is 5).
    """
    try:
        response = requests.get(url)
        response.raise_for_status()  # Raise an error for bad responses

        data = response.json()  # Automatically decode JSON

        rows = []
        row_id = 1
        for item in data:
            question = item['Question']
            for relation in item['Relations']:
                row = {
                    'ID': row_id,
                    'Question': question,
                    'Subject Entity': ', '.join(relation['Subject Entity']),
                    'Compared Aspect': relation['Aspect'],
                    'Object Entity': ', '.join(relation['Object Entity']),
                    'Preference': relation['Preference'],
                    'Constraint': ', '.join(relation['Constraint'])
                }
                rows.append(row)
                row_id += 1

        df = pd.DataFrame(rows)

        # Display only the specified number of rows
        df_display = df.head(num_rows)

        # Convert the DataFrame to an HTML table string
        html_table = df_display.to_html(index=False, classes='styled-table', escape=False)

        # Display the HTML table
        display(HTML(html_table))

        # Include CSS styling within the HTML output for standalone display
        display(HTML('''
        <style>
        .styled-table {
            border-collapse: collapse;
            margin: 25px 0;
            font-size: 0.9em;
            font-family: sans-serif;
            min-width: 400px;
            box-shadow: 0 0 20px rgba(0, 0, 0, 0.15);
        }
        .styled-table thead tr {
            background-color: #009879;
            color: #ffffff;
            text-align: left;
        }
        .styled-table th,
        .styled-table td {
            padding: 12px 15px;
        }
        .styled-table tbody tr {
            border-bottom: 1px solid #dddddd;
        }

        .styled-table tbody tr:nth-of-type(even) {
            background-color: #f3f3f3;
        }

        .styled-table tbody tr:last-of-type {
            border-bottom: 2px solid #009879;
        }
        </style>
        '''))

    except requests.exceptions.RequestException as e:
        print(f"Error retrieving data from URL: {e}")
    except json.JSONDecodeError:
        print("Error: Invalid JSON format.")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")

# Example usage
url = "https://raw.githubusercontent.com/mahsamb/SCRQD/main/Brands-CompSent-19-SCQRE.json"  # Use raw URL to access JSON
json_to_html_table(url, num_rows=10)  # Display 10 rows

ID,Question,Subject Entity,Compared Aspect,Object Entity,Preference,Constraint
1,Did you like the Toyota over the Ford and does it look better than the Chevy?,Toyota,All,Ford,Better,none
2,Did you like the Toyota over the Ford and does it look better than the Chevy?,Toyota,look,Chevy,Better,none
3,Does the Toyota Tundra get worse fuel economy than the larger Ford F-150 or Dodge Ram?,Toyota Tundra,fuel economy,"Ford F-150, Dodge Ram",Worse,none
4,Is Apple better than Microsoft?,Apple,All,Microsoft,Better,none
5,Is Google better than Microsoft?,Google,All,Microsoft,Better,none
6,Does Pepsi work better in flavored colas than Coke?,Pepsi,flavored colas,Coke,Better,none
7,Isn't it pretty bad when a Ford has much better reliability than a BMW?,Ford,reliability,BMW,Strong Better,none
8,Do Nokia executives intimate that their mapping application is superior to both Google and Apple Maps?,Nokia,mapping application,"Google Maps, Apple Maps",Strong Better,none
9,"Is it far superior to the Ranger and Colorado, with a better appearance than Toyota and Nissan?",it,All,"Ranger, Colorado",Strong Better,none
10,"Is it far superior to the Ranger and Colorado, with a better appearance than Toyota and Nissan?",it,appearance,"Toyota, Nissan",Better,none


## Counting Questions and Relations

The following code counts the total number of questions and relations within the provided JSON file.

In [11]:
import json
import requests

def count_questions(url):
    """Counts the total number of questions in a JSON file from a URL.

    Args:
        url: The URL to the JSON file.

    Returns:
        The number of questions, or None if an error occurs.
    """
    try:
        response = requests.get(url)
        response.raise_for_status()  # Raise an error for bad responses

        data = response.json()  # Automatically decode JSON

        question_count = len(data)
        return question_count

    except requests.exceptions.RequestException as e:
        print(f"Error retrieving data from URL: {e}")
        return None
    except json.JSONDecodeError:
        print("Error: Invalid JSON format.")
        return None
    except Exception as e:
        print(f"An unexpected error occurred: {e}")
        return None

# Example usage
url = "https://raw.githubusercontent.com/mahsamb/SCRQD/main/Brands-CompSent-19-SCQRE.json"  # Use raw URL to access JSON
question_count = count_questions(url)

if question_count is not None:
    print(f"Number of questions: {question_count}")

Number of questions: 927


## Counting Relations per Question

This code analyzes a JSON file to determine the distribution of relations per question. It counts how many questions have one, two, three, or more relations, providing a summary of relation occurrences across the dataset.

In [12]:
import json
import requests

def count_relations_per_question(url):
    """Counts the number of questions with different numbers of relations from a JSON file at a URL.

    Args:
        url: The URL to the JSON file.

    Returns:
        A dictionary where keys are the number of relations and values are the
        corresponding counts of questions, or None if an error occurs.
    """
    try:
        response = requests.get(url)
        response.raise_for_status()  # Raise an error for bad responses

        data = response.json()  # Automatically decode JSON

        relation_counts = {}
        for item in data:
            num_relations = len(item.get('Relations', []))
            relation_counts[num_relations] = relation_counts.get(num_relations, 0) + 1

        return relation_counts

    except requests.exceptions.RequestException as e:
        print(f"Error retrieving data from URL: {e}")
        return None
    except json.JSONDecodeError:
        print("Error: Invalid JSON format.")
        return None
    except Exception as e:
        print(f"An unexpected error occurred: {e}")
        return None

# Example usage
url = "https://raw.githubusercontent.com/mahsamb/SCRQD/main/Brands-CompSent-19-SCQRE.json"  # Use raw URL to access JSON
relation_counts = count_relations_per_question(url)

if relation_counts is not None:
    for num_relations, count in relation_counts.items():
        print(f"Questions with {num_relations} relation(s): {count}")

Questions with 2 relation(s): 93
Questions with 1 relation(s): 821
Questions with 4 relation(s): 6
Questions with 3 relation(s): 7
