In [1]:
import pandas as pd
import json
import re
import csv
import time
from tqdm import tqdm
from ollama import chat


input_file = 'merged_200_papers.json'
# Load data
df = pd.read_json(input_file, lines=True)

llm_fields = [
    "llm_Comprehensiveness", "llm_Vagueness", "llm_Objectivity", "llm_Fairness", "llm_Actionability", 
    "llm_Constructiveness", "llm_Relevance Alignment", "llm_Clarity and Readability", "llm_Usage of Technical Terms",
    "llm_Factuality", "llm_Overall Quality", "llm_overall_score_100", "llm_Sentiment Polarity", "llm_Politeness", 
]


# Check for missing fields and add them if not present
for field in llm_fields:
    if field not in df.columns:
        df[field] = pd.NA

# Pattern to extract JSON block
pattern = re.compile(r"<review_assessment>\s*(\{.*?\})\s*</review_assessment>", re.DOTALL)

# Define prompt template
template = """# REVIEW-QUALITY JUDGE

## 0 — ROLE

You are **ReviewInspector-LLM**, a rigorous, impartial meta-reviewer.
Your goal is to assess the quality of a single peer-review against a predefined set of criteria and to provide precise, structured evaluations.

## 1 — INPUTS

Title: {title}
Abstract: {abstract}
Review: {review_text}

## 2 — EVALUATION CRITERIA

Return **only** the scale value or label at right (no rationale text).

| #  | Criterion                    | Allowed scale / label                       | Description                                                                |
| -- | ---------------------------- | ------------------------------------------- | -------------------------------------------------------------------------- |
| 1  | **Comprehensiveness**        | integer **0-5**                             | Extent to which the review covers all key aspects of the paper.            |
| 2  | **Usage of Technical Terms** | integer **0-5**                             | Appropriateness and frequency of domain-specific vocabulary.               |
| 3  | **Factuality**               | **factual / partially factual / unfactual** | Accuracy of the statements made in the review.                             |
| 4  | **Sentiment Polarity**       | **negative / neutral / positive**           | Overall sentiment conveyed by the reviewer.                                |
| 5  | **Politeness**               | **polite / neutral / impolite**             | Tone and manner of the review language.                                    |
| 6  | **Vagueness**                | **none / low / moderate / high / extreme**  | Degree of ambiguity or lack of specificity in the review.                  |
| 7  | **Objectivity**              | integer **0-5**                             | Presence of unbiased, evidence-based commentary.                           |
| 8  | **Fairness**                 | integer **0-5**                             | Perceived impartiality and balance in judgments.                           |
| 9  | **Actionability**            | integer **0-5**                             | Helpfulness of the review in suggesting clear next steps.                  |
| 10 | **Constructiveness**         | integer **0-5**                             | Degree to which the review offers improvements rather than just criticism. |
| 11 | **Relevance Alignment**      | integer **0-5**                             | How well the review relates to the content and scope of the paper.         |
| 12 | **Clarity and Readability**  | integer **0-5**                             | Ease of understanding the review, including grammar and structure.         |
| 13 | **Overall Quality**          | integer **0-100**                           | Holistic evaluation of the review's usefulness and professionalism.        |

## 3 — SCORING GUIDELINES

For 0-5 scales:

* 5 = Outstanding
* 4 = Strong
* 3 = Adequate
* 2 = Weak
* 1 = Very weak
* 0 = Absent/irrelevant

## 4 — ANALYSIS & COMPUTATION (silent)

1. Read and understand the review in the context of the paper title and abstract.
2. Extract quantitative and qualitative signals (e.g., term usage, factual consistency, tone, clarity).
3. Map observations to the corresponding scoring scales.

## 5 — OUTPUT FORMAT (strict)  
Return **exactly one** JSON block wrapped in the tag below — **no comments or extra text**.

```json
<review_assessment>
{{
  "paper_title": "{title}",
  "criteria": {{
    "Comprehensiveness":       ...,
    "Usage of Technical Terms":   ...,
    "Factuality":    ...,
    "Sentiment Polarity":      ...,
    "Politeness":  ...,
    "Vagueness":          ...,
    "Objectivity":             ...,
    "Fairness":         ...,
    "Actionability":        ...,
    "Constructiveness":    ...,
    "Relevance Alignment":    ...,
    "Clarity and Readability":    ...,
    "Relevance Alignment":    ...,
    "Overall Quality":     ...
  }},
  "overall_score_100": ...
}}
</review_assessment>
```
"""

df

Unnamed: 0,paper_id,title,abstract,review_text,authors,reviewer,review_date,review_rating,review_confidence,review_soundness,...,llm_Fairness,llm_Actionability,llm_Constructiveness,llm_Relevance Alignment,llm_Clarity and Readability,llm_Usage of Technical Terms,llm_Factuality,llm_Overall Quality,llm_Sentiment Polarity,llm_Politeness
0,123,Navigating Text-To-Image Customization: From L...,Text-to-image generative models have garnered ...,**Summary:** \nThis paper presents an open-sou...,"['~SHIH-YING_YEH1', '~Yu-Guan_Hsieh1', '~Zhido...",Reviewer_EGJf,1701662567826,6.0,3.0,3.0,...,,,,,,,,,,
1,123,Navigating Text-To-Image Customization: From L...,Text-to-image generative models have garnered ...,This paper proposes a comprehensive library fo...,"['~SHIH-YING_YEH1', '~Yu-Guan_Hsieh1', '~Zhido...",Reviewer_DWom,1699636125239,6.0,3.0,3.0,...,,,,,,,,,,
2,123,Navigating Text-To-Image Customization: From L...,Text-to-image generative models have garnered ...,"This author introduces LyCORIS, an open source...","['~SHIH-YING_YEH1', '~Yu-Guan_Hsieh1', '~Zhido...",Reviewer_PnHf,1699636125143,6.0,4.0,4.0,...,,,,,,,,,,
3,123,Navigating Text-To-Image Customization: From L...,Text-to-image generative models have garnered ...,"The authors propose LyCORIS, an open-source li...","['~SHIH-YING_YEH1', '~Yu-Guan_Hsieh1', '~Zhido...",Reviewer_ekPo,1699636125075,8.0,4.0,3.0,...,,,,,,,,,,
4,0,$\nu$-ensembles: Improving deep ensemble calib...,We present a method to improve the calibration...,"This paper introduces ν-ensembles, a novel dee...","['~Konstantinos_Pitas1', '~Julyan_Arbel1']",Reviewer_HFRa,1699636992453,3.0,4.0,2.0,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
656,118,Module Extraction for Efficient Object Query o...,The extraction of logically-independent fragme...,The submission addresses the problem of partit...,,Anonymous,03/May/2014,,,,...,,,,,,,,,,
657,61,EARTh: an Environmental Application Reference ...,The paper aims at providing a description of E...,This revision addresses my concerns. I am part...,,Natasha Noy,22/Jul/2013,,,,...,,,,,,,,,,
658,76,Facilitating Data Discovery by Connecting Rela...,"In this study, we investigate two approaches t...",The paper presents and compares RDF/XML (in th...,,Anonymous,15/Jun/2013,,,,...,,,,,,,,,,
659,76,Facilitating Data Discovery by Connecting Rela...,"In this study, we investigate two approaches t...",This paper investigates two different approach...,,Ghislain Hachey,17/Jun/2013,,,,...,,,,,,,,,,


In [2]:
# Process each row
# Set the temperature parameter for the llama model
temperature = 0
seed = 42


# Process each row
for idx, row in tqdm(df.iterrows(), total=len(df), desc="Scoring with LLM"):
    # Skip if all llm fields are already filled
    if all(pd.notna(row.get(field, pd.NA)) for field in llm_fields):
        continue
    # if idx >= 5:
    #     break

    prompt = template.format(
        title=row['title'],
        abstract=row['abstract'],
        review_text=row['review_text'],
    )
    
    for attempt in range(5):
        try:
            response = chat("llama3:8b", messages=[{'role': 'user', 'content': prompt}], options={'temperature': temperature, 'seed': seed})
            content = response['message']['content']
            match = pattern.search(content)
            if not match:
                raise ValueError("No JSON block found")

            parsed = json.loads(match.group(1))
            print(parsed["overall_score_100"])
            for key, val in parsed["criteria"].items():
                df.at[idx, f"llm_{key}"] = val
            df.at[idx, "llm_overall_score_100"] = parsed["overall_score_100"]

            # Save after every successful row
            # df.to_csv(input_file, index=False, quoting=csv.QUOTE_ALL)
            break

        except Exception as e:
            print(f"❌ Error at row {idx}, attempt {attempt + 1}: {e}")

Scoring with LLM:   0%|          | 0/661 [00:00<?, ?it/s]

Scoring with LLM:   0%|          | 1/661 [00:03<39:58,  3.63s/it]

60


Scoring with LLM:   0%|          | 2/661 [00:07<39:39,  3.61s/it]

80


Scoring with LLM:   0%|          | 3/661 [00:10<38:48,  3.54s/it]

80


Scoring with LLM:   1%|          | 4/661 [00:14<38:56,  3.56s/it]

80


Scoring with LLM:   1%|          | 5/661 [00:17<39:13,  3.59s/it]

70


❌ Error at row 5, attempt 1: Expecting value: line 6 column 22 (char 208)


Scoring with LLM:   1%|          | 6/661 [00:24<51:31,  4.72s/it]

60


❌ Error at row 6, attempt 1: Expecting value: line 6 column 22 (char 208)


❌ Error at row 6, attempt 2: Expecting value: line 6 column 22 (char 208)


Scoring with LLM:   1%|          | 7/661 [00:33<1:07:13,  6.17s/it]

80


Scoring with LLM:   1%|          | 8/661 [00:37<58:33,  5.38s/it]  

80


Scoring with LLM:   1%|▏         | 9/661 [00:41<52:24,  4.82s/it]

80


Scoring with LLM:   2%|▏         | 10/661 [00:45<49:15,  4.54s/it]

70


Scoring with LLM:   2%|▏         | 11/661 [00:48<46:38,  4.31s/it]

80


Scoring with LLM:   2%|▏         | 12/661 [00:52<44:17,  4.09s/it]

60


Scoring with LLM:   2%|▏         | 13/661 [00:55<41:43,  3.86s/it]

80


Scoring with LLM:   2%|▏         | 14/661 [01:00<44:42,  4.15s/it]

70


Scoring with LLM:   2%|▏         | 15/661 [01:04<41:56,  3.90s/it]

60


Scoring with LLM:   2%|▏         | 16/661 [01:07<39:50,  3.71s/it]

60


Scoring with LLM:   3%|▎         | 17/661 [01:10<39:01,  3.64s/it]

70


Scoring with LLM:   3%|▎         | 18/661 [01:14<38:07,  3.56s/it]

80


Scoring with LLM:   3%|▎         | 19/661 [01:17<38:38,  3.61s/it]

70


Scoring with LLM:   3%|▎         | 20/661 [01:21<38:09,  3.57s/it]

70


Scoring with LLM:   3%|▎         | 21/661 [01:24<37:41,  3.53s/it]

80


❌ Error at row 21, attempt 1: Expecting value: line 6 column 22 (char 185)


Scoring with LLM:   3%|▎         | 22/661 [01:33<53:56,  5.07s/it]

80


Scoring with LLM:   3%|▎         | 23/661 [01:37<49:13,  4.63s/it]

60


Scoring with LLM:   4%|▎         | 24/661 [01:40<45:00,  4.24s/it]

80


Scoring with LLM:   4%|▍         | 25/661 [01:43<42:18,  3.99s/it]

60


Scoring with LLM:   4%|▍         | 26/661 [01:47<41:09,  3.89s/it]

80


Scoring with LLM:   4%|▍         | 27/661 [01:51<40:09,  3.80s/it]

70


Scoring with LLM:   4%|▍         | 28/661 [01:54<40:28,  3.84s/it]

80


Scoring with LLM:   4%|▍         | 29/661 [01:58<38:29,  3.65s/it]

80


Scoring with LLM:   5%|▍         | 30/661 [02:01<38:02,  3.62s/it]

40


Scoring with LLM:   5%|▍         | 31/661 [02:04<35:56,  3.42s/it]

70


Scoring with LLM:   5%|▍         | 32/661 [02:07<35:02,  3.34s/it]

60


Scoring with LLM:   5%|▍         | 33/661 [02:11<37:21,  3.57s/it]

60


Scoring with LLM:   5%|▌         | 34/661 [02:16<39:02,  3.74s/it]

70


❌ Error at row 34, attempt 1: Expecting property name enclosed in double quotes: line 7 column 36 (char 256)


Scoring with LLM:   5%|▌         | 35/661 [02:23<51:22,  4.92s/it]

80


Scoring with LLM:   5%|▌         | 36/661 [02:27<46:27,  4.46s/it]

70


Scoring with LLM:   6%|▌         | 37/661 [02:30<43:43,  4.20s/it]

70


Scoring with LLM:   6%|▌         | 38/661 [02:34<42:15,  4.07s/it]

80


Scoring with LLM:   6%|▌         | 39/661 [02:37<40:16,  3.89s/it]

80


❌ Error at row 39, attempt 1: Expecting value: line 6 column 22 (char 208)


❌ Error at row 39, attempt 2: Expecting value: line 6 column 22 (char 208)


❌ Error at row 39, attempt 3: Expecting value: line 6 column 22 (char 208)


❌ Error at row 39, attempt 4: Expecting value: line 6 column 22 (char 208)


Scoring with LLM:   6%|▌         | 40/661 [02:54<1:19:04,  7.64s/it]

❌ Error at row 39, attempt 5: Expecting value: line 6 column 22 (char 208)


Scoring with LLM:   6%|▌         | 41/661 [02:57<1:06:10,  6.40s/it]

80


❌ Error at row 41, attempt 1: No JSON block found


❌ Error at row 41, attempt 2: No JSON block found


❌ Error at row 41, attempt 3: No JSON block found


❌ Error at row 41, attempt 4: No JSON block found


Scoring with LLM:   6%|▋         | 42/661 [03:35<2:43:17, 15.83s/it]

❌ Error at row 41, attempt 5: No JSON block found


Scoring with LLM:   7%|▋         | 43/661 [03:39<2:05:53, 12.22s/it]

70


Scoring with LLM:   7%|▋         | 44/661 [03:42<1:38:30,  9.58s/it]

60


Scoring with LLM:   7%|▋         | 45/661 [03:46<1:19:20,  7.73s/it]

80


Scoring with LLM:   7%|▋         | 46/661 [03:50<1:06:50,  6.52s/it]

80


Scoring with LLM:   7%|▋         | 47/661 [03:53<57:33,  5.63s/it]  

70


Scoring with LLM:   7%|▋         | 48/661 [03:57<52:28,  5.14s/it]

80


Scoring with LLM:   7%|▋         | 49/661 [04:01<50:15,  4.93s/it]

80


Scoring with LLM:   8%|▊         | 50/661 [04:05<45:54,  4.51s/it]

70


Scoring with LLM:   8%|▊         | 51/661 [04:09<43:04,  4.24s/it]

80


Scoring with LLM:   8%|▊         | 52/661 [04:12<41:13,  4.06s/it]

70


Scoring with LLM:   8%|▊         | 53/661 [04:16<39:15,  3.87s/it]

80


Scoring with LLM:   8%|▊         | 54/661 [04:19<37:48,  3.74s/it]

80


Scoring with LLM:   8%|▊         | 55/661 [04:23<36:41,  3.63s/it]

60


Scoring with LLM:   8%|▊         | 56/661 [04:26<35:47,  3.55s/it]

80


Scoring with LLM:   9%|▊         | 57/661 [04:29<35:00,  3.48s/it]

80


Scoring with LLM:   9%|▉         | 58/661 [04:32<33:15,  3.31s/it]

70


Scoring with LLM:   9%|▉         | 59/661 [04:35<33:21,  3.32s/it]

80


Scoring with LLM:   9%|▉         | 60/661 [04:40<36:20,  3.63s/it]

70


Scoring with LLM:   9%|▉         | 61/661 [04:43<36:26,  3.64s/it]

80


Scoring with LLM:   9%|▉         | 62/661 [04:47<36:31,  3.66s/it]

80


Scoring with LLM:  10%|▉         | 63/661 [04:51<35:35,  3.57s/it]

70


Scoring with LLM:  10%|▉         | 64/661 [04:54<35:30,  3.57s/it]

70


Scoring with LLM:  10%|▉         | 65/661 [04:58<35:32,  3.58s/it]

60


Scoring with LLM:  10%|▉         | 66/661 [05:01<35:11,  3.55s/it]

70


Scoring with LLM:  10%|█         | 67/661 [05:04<34:25,  3.48s/it]

80


Scoring with LLM:  10%|█         | 68/661 [05:08<34:20,  3.48s/it]

70


Scoring with LLM:  10%|█         | 69/661 [05:11<34:22,  3.48s/it]

80


Scoring with LLM:  11%|█         | 70/661 [05:16<37:40,  3.83s/it]

60


❌ Error at row 70, attempt 1: Expecting value: line 6 column 22 (char 210)


❌ Error at row 70, attempt 2: Expecting value: line 6 column 22 (char 210)


❌ Error at row 70, attempt 3: Expecting value: line 6 column 22 (char 210)


❌ Error at row 70, attempt 4: Expecting value: line 6 column 22 (char 210)


Scoring with LLM:  11%|█         | 71/661 [05:32<1:14:02,  7.53s/it]

❌ Error at row 70, attempt 5: Expecting value: line 6 column 22 (char 210)


❌ Error at row 71, attempt 1: Expecting value: line 6 column 22 (char 210)


❌ Error at row 71, attempt 2: Expecting value: line 6 column 22 (char 210)


❌ Error at row 71, attempt 3: Expecting value: line 6 column 22 (char 210)


❌ Error at row 71, attempt 4: Expecting value: line 6 column 22 (char 210)


Scoring with LLM:  11%|█         | 72/661 [05:49<1:40:53, 10.28s/it]

❌ Error at row 71, attempt 5: Expecting value: line 6 column 22 (char 210)


Scoring with LLM:  11%|█         | 73/661 [05:53<1:21:50,  8.35s/it]

80


Scoring with LLM:  11%|█         | 74/661 [05:56<1:07:08,  6.86s/it]

80


Scoring with LLM:  11%|█▏        | 75/661 [06:00<57:04,  5.84s/it]  

70


Scoring with LLM:  11%|█▏        | 76/661 [06:04<51:11,  5.25s/it]

80


Scoring with LLM:  12%|█▏        | 77/661 [06:07<46:13,  4.75s/it]

80


Scoring with LLM:  12%|█▏        | 78/661 [06:10<42:00,  4.32s/it]

60


Scoring with LLM:  12%|█▏        | 79/661 [06:14<39:43,  4.10s/it]

80


Scoring with LLM:  12%|█▏        | 80/661 [06:18<38:10,  3.94s/it]

70


Scoring with LLM:  12%|█▏        | 81/661 [06:21<37:28,  3.88s/it]

80


Scoring with LLM:  12%|█▏        | 82/661 [06:25<35:35,  3.69s/it]

70


Scoring with LLM:  13%|█▎        | 83/661 [06:28<33:49,  3.51s/it]

70


❌ Error at row 83, attempt 1: Expecting value: line 6 column 22 (char 212)


❌ Error at row 83, attempt 2: Expecting value: line 6 column 22 (char 212)


❌ Error at row 83, attempt 3: Expecting value: line 6 column 22 (char 212)


❌ Error at row 83, attempt 4: Expecting value: line 6 column 22 (char 212)


Scoring with LLM:  13%|█▎        | 84/661 [06:44<1:10:10,  7.30s/it]

❌ Error at row 83, attempt 5: Expecting value: line 6 column 22 (char 212)


Scoring with LLM:  13%|█▎        | 85/661 [06:47<59:17,  6.18s/it]  

80


Scoring with LLM:  13%|█▎        | 86/661 [06:51<52:27,  5.47s/it]

60


Scoring with LLM:  13%|█▎        | 87/661 [06:56<50:11,  5.25s/it]

60


Scoring with LLM:  13%|█▎        | 88/661 [06:59<45:16,  4.74s/it]

80


Scoring with LLM:  13%|█▎        | 89/661 [07:03<40:23,  4.24s/it]

70


Scoring with LLM:  14%|█▎        | 90/661 [07:06<38:46,  4.07s/it]

60


Scoring with LLM:  14%|█▍        | 91/661 [07:10<38:40,  4.07s/it]

80


Scoring with LLM:  14%|█▍        | 92/661 [07:14<36:43,  3.87s/it]

80


Scoring with LLM:  14%|█▍        | 93/661 [07:17<35:30,  3.75s/it]

80


Scoring with LLM:  14%|█▍        | 94/661 [07:21<35:30,  3.76s/it]

60


Scoring with LLM:  14%|█▍        | 95/661 [07:25<36:27,  3.86s/it]

60


Scoring with LLM:  15%|█▍        | 96/661 [07:28<34:45,  3.69s/it]

40


Scoring with LLM:  15%|█▍        | 97/661 [07:32<33:48,  3.60s/it]

60


Scoring with LLM:  15%|█▍        | 98/661 [07:35<32:48,  3.50s/it]

80


Scoring with LLM:  15%|█▍        | 99/661 [07:39<32:55,  3.52s/it]

70


Scoring with LLM:  15%|█▌        | 100/661 [07:42<33:18,  3.56s/it]

70


Scoring with LLM:  15%|█▌        | 101/661 [07:46<34:49,  3.73s/it]

80


Scoring with LLM:  15%|█▌        | 102/661 [07:50<33:35,  3.61s/it]

70


Scoring with LLM:  16%|█▌        | 103/661 [07:53<33:27,  3.60s/it]

90


Scoring with LLM:  16%|█▌        | 104/661 [07:57<33:44,  3.63s/it]

80


Scoring with LLM:  16%|█▌        | 105/661 [08:00<33:28,  3.61s/it]

80


Scoring with LLM:  16%|█▌        | 106/661 [08:04<33:48,  3.66s/it]

80


Scoring with LLM:  16%|█▌        | 107/661 [08:08<33:41,  3.65s/it]

80


Scoring with LLM:  16%|█▋        | 108/661 [08:12<33:57,  3.68s/it]

80


Scoring with LLM:  16%|█▋        | 109/661 [08:15<33:44,  3.67s/it]

80


Scoring with LLM:  17%|█▋        | 110/661 [08:19<33:50,  3.68s/it]

80


Scoring with LLM:  17%|█▋        | 111/661 [08:23<33:19,  3.63s/it]

70


Scoring with LLM:  17%|█▋        | 112/661 [08:26<33:29,  3.66s/it]

60


Scoring with LLM:  17%|█▋        | 113/661 [08:29<32:04,  3.51s/it]

80


Scoring with LLM:  17%|█▋        | 114/661 [08:32<30:28,  3.34s/it]

70


Scoring with LLM:  17%|█▋        | 115/661 [08:36<30:41,  3.37s/it]

80


Scoring with LLM:  18%|█▊        | 116/661 [08:40<33:54,  3.73s/it]

60


Scoring with LLM:  18%|█▊        | 117/661 [08:44<34:25,  3.80s/it]

60


Scoring with LLM:  18%|█▊        | 118/661 [08:48<33:39,  3.72s/it]

80


Scoring with LLM:  18%|█▊        | 119/661 [08:51<33:20,  3.69s/it]

60


Scoring with LLM:  18%|█▊        | 120/661 [08:55<33:08,  3.68s/it]

70


Scoring with LLM:  18%|█▊        | 121/661 [08:59<32:58,  3.66s/it]

70


Scoring with LLM:  18%|█▊        | 122/661 [09:02<32:28,  3.62s/it]

70


❌ Error at row 122, attempt 1: Expecting value: line 6 column 22 (char 204)


❌ Error at row 122, attempt 2: Expecting value: line 6 column 22 (char 204)


❌ Error at row 122, attempt 3: Expecting value: line 6 column 22 (char 204)


❌ Error at row 122, attempt 4: Expecting value: line 6 column 22 (char 204)


Scoring with LLM:  19%|█▊        | 123/661 [09:19<1:08:07,  7.60s/it]

80


Scoring with LLM:  19%|█▉        | 124/661 [09:23<56:37,  6.33s/it]  

80


Scoring with LLM:  19%|█▉        | 125/661 [09:26<49:46,  5.57s/it]

80


Scoring with LLM:  19%|█▉        | 126/661 [09:29<42:50,  4.80s/it]

80


Scoring with LLM:  19%|█▉        | 127/661 [09:33<39:14,  4.41s/it]

80


Scoring with LLM:  19%|█▉        | 128/661 [09:37<39:20,  4.43s/it]

60


❌ Error at row 128, attempt 1: Expecting value: line 6 column 22 (char 173)


❌ Error at row 128, attempt 2: Expecting value: line 6 column 22 (char 173)


❌ Error at row 128, attempt 3: Expecting value: line 6 column 22 (char 173)


❌ Error at row 128, attempt 4: Expecting value: line 6 column 22 (char 173)


Scoring with LLM:  20%|█▉        | 129/661 [09:51<1:05:10,  7.35s/it]

❌ Error at row 128, attempt 5: Expecting value: line 6 column 22 (char 173)


❌ Error at row 129, attempt 1: Expecting value: line 6 column 22 (char 173)


❌ Error at row 129, attempt 2: Expecting value: line 6 column 22 (char 173)


❌ Error at row 129, attempt 3: Expecting value: line 6 column 22 (char 173)


❌ Error at row 129, attempt 4: Expecting value: line 6 column 22 (char 173)


Scoring with LLM:  20%|█▉        | 130/661 [10:08<1:29:38, 10.13s/it]

❌ Error at row 129, attempt 5: Expecting value: line 6 column 22 (char 173)


❌ Error at row 130, attempt 1: Expecting value: line 6 column 22 (char 188)


❌ Error at row 130, attempt 2: Expecting value: line 6 column 22 (char 188)


❌ Error at row 130, attempt 3: Expecting value: line 6 column 22 (char 188)


❌ Error at row 130, attempt 4: Expecting value: line 6 column 22 (char 188)


Scoring with LLM:  20%|█▉        | 131/661 [10:24<1:45:44, 11.97s/it]

❌ Error at row 130, attempt 5: Expecting value: line 6 column 22 (char 188)


Scoring with LLM:  20%|█▉        | 132/661 [10:28<1:23:52,  9.51s/it]

80


Scoring with LLM:  20%|██        | 133/661 [10:32<1:08:07,  7.74s/it]

60


❌ Error at row 133, attempt 1: Expecting value: line 6 column 22 (char 188)


❌ Error at row 133, attempt 2: Expecting value: line 6 column 22 (char 188)


❌ Error at row 133, attempt 3: Expecting value: line 6 column 22 (char 188)


❌ Error at row 133, attempt 4: Expecting value: line 6 column 22 (char 188)


Scoring with LLM:  20%|██        | 134/661 [10:48<1:30:25, 10.29s/it]

❌ Error at row 133, attempt 5: Expecting value: line 6 column 22 (char 188)


❌ Error at row 134, attempt 1: Expecting value: line 6 column 22 (char 227)


❌ Error at row 134, attempt 2: Expecting value: line 6 column 22 (char 227)


❌ Error at row 134, attempt 3: Expecting value: line 6 column 22 (char 227)


❌ Error at row 134, attempt 4: Expecting value: line 6 column 22 (char 227)


Scoring with LLM:  20%|██        | 135/661 [11:05<1:46:42, 12.17s/it]

❌ Error at row 134, attempt 5: Expecting value: line 6 column 22 (char 227)


Scoring with LLM:  21%|██        | 136/661 [11:08<1:24:47,  9.69s/it]

80


Scoring with LLM:  21%|██        | 137/661 [11:12<1:08:38,  7.86s/it]

60


Scoring with LLM:  21%|██        | 138/661 [11:16<57:27,  6.59s/it]  

70


Scoring with LLM:  21%|██        | 139/661 [11:19<49:30,  5.69s/it]

70


Scoring with LLM:  21%|██        | 140/661 [11:25<49:33,  5.71s/it]

60


Scoring with LLM:  21%|██▏       | 141/661 [11:29<43:49,  5.06s/it]

80


Scoring with LLM:  21%|██▏       | 142/661 [11:32<39:09,  4.53s/it]

80


Scoring with LLM:  22%|██▏       | 143/661 [11:36<37:43,  4.37s/it]

80


Scoring with LLM:  22%|██▏       | 144/661 [11:39<34:59,  4.06s/it]

80


Scoring with LLM:  22%|██▏       | 145/661 [11:43<34:52,  4.06s/it]

60


Scoring with LLM:  22%|██▏       | 146/661 [11:47<35:14,  4.10s/it]

80


Scoring with LLM:  22%|██▏       | 147/661 [11:51<34:18,  4.00s/it]

80


Scoring with LLM:  22%|██▏       | 148/661 [11:54<31:56,  3.74s/it]

80


Scoring with LLM:  23%|██▎       | 149/661 [11:58<32:24,  3.80s/it]

80


Scoring with LLM:  23%|██▎       | 150/661 [12:02<31:12,  3.66s/it]

70


Scoring with LLM:  23%|██▎       | 151/661 [12:07<34:54,  4.11s/it]

70


Scoring with LLM:  23%|██▎       | 152/661 [12:11<34:56,  4.12s/it]

80


❌ Error at row 152, attempt 1: Expecting value: line 6 column 22 (char 192)


Scoring with LLM:  23%|██▎       | 153/661 [12:18<42:57,  5.07s/it]

70


Scoring with LLM:  23%|██▎       | 154/661 [12:22<39:35,  4.69s/it]

70


Scoring with LLM:  23%|██▎       | 155/661 [12:26<37:13,  4.41s/it]

80


❌ Error at row 155, attempt 1: Expecting value: line 6 column 22 (char 192)


Scoring with LLM:  24%|██▎       | 156/661 [12:32<42:02,  4.99s/it]

70


Scoring with LLM:  24%|██▍       | 157/661 [12:36<40:02,  4.77s/it]

60


Scoring with LLM:  24%|██▍       | 158/661 [12:40<36:40,  4.38s/it]

60


Scoring with LLM:  24%|██▍       | 159/661 [12:43<34:30,  4.13s/it]

80


Scoring with LLM:  24%|██▍       | 160/661 [12:47<33:03,  3.96s/it]

80


Scoring with LLM:  24%|██▍       | 161/661 [12:50<31:54,  3.83s/it]

80


Scoring with LLM:  25%|██▍       | 162/661 [12:54<30:30,  3.67s/it]

60


Scoring with LLM:  25%|██▍       | 163/661 [12:57<30:10,  3.64s/it]

80


Scoring with LLM:  25%|██▍       | 164/661 [13:01<29:55,  3.61s/it]

70


❌ Error at row 164, attempt 1: Expecting value: line 6 column 22 (char 205)


❌ Error at row 164, attempt 2: Expecting value: line 6 column 22 (char 205)


❌ Error at row 164, attempt 3: Expecting value: line 6 column 22 (char 205)


❌ Error at row 164, attempt 4: Expecting value: line 6 column 22 (char 205)


Scoring with LLM:  25%|██▍       | 165/661 [13:18<1:02:32,  7.57s/it]

❌ Error at row 164, attempt 5: Expecting value: line 6 column 22 (char 205)


Scoring with LLM:  25%|██▌       | 166/661 [13:22<54:00,  6.55s/it]  

85


Scoring with LLM:  25%|██▌       | 167/661 [13:25<46:13,  5.61s/it]

40


❌ Error at row 167, attempt 1: Expecting value: line 6 column 22 (char 205)


❌ Error at row 167, attempt 2: Expecting value: line 6 column 22 (char 205)


❌ Error at row 167, attempt 3: Expecting value: line 6 column 22 (char 205)


❌ Error at row 167, attempt 4: Expecting value: line 6 column 22 (char 205)


Scoring with LLM:  25%|██▌       | 168/661 [13:42<1:13:45,  8.98s/it]

❌ Error at row 167, attempt 5: Expecting value: line 6 column 22 (char 205)


Scoring with LLM:  26%|██▌       | 169/661 [13:45<59:01,  7.20s/it]  

80


Scoring with LLM:  26%|██▌       | 170/661 [13:48<48:58,  5.98s/it]

70


Scoring with LLM:  26%|██▌       | 171/661 [13:51<41:56,  5.14s/it]

80


Scoring with LLM:  26%|██▌       | 172/661 [13:55<37:34,  4.61s/it]

80


Scoring with LLM:  26%|██▌       | 173/661 [13:58<34:06,  4.19s/it]

80


Scoring with LLM:  26%|██▋       | 174/661 [14:02<32:30,  4.01s/it]

60


Scoring with LLM:  26%|██▋       | 175/661 [14:05<30:03,  3.71s/it]

80


Scoring with LLM:  27%|██▋       | 176/661 [14:08<28:37,  3.54s/it]

80


Scoring with LLM:  27%|██▋       | 177/661 [14:11<27:55,  3.46s/it]

80


Scoring with LLM:  27%|██▋       | 178/661 [14:14<27:13,  3.38s/it]

80


Scoring with LLM:  27%|██▋       | 179/661 [14:18<27:26,  3.42s/it]

60


Scoring with LLM:  27%|██▋       | 180/661 [14:21<27:21,  3.41s/it]

70


Scoring with LLM:  27%|██▋       | 181/661 [14:25<27:59,  3.50s/it]

80


Scoring with LLM:  28%|██▊       | 182/661 [14:28<28:10,  3.53s/it]

70


Scoring with LLM:  28%|██▊       | 183/661 [14:32<27:01,  3.39s/it]

60


Scoring with LLM:  28%|██▊       | 184/661 [14:35<26:26,  3.33s/it]

80


Scoring with LLM:  28%|██▊       | 185/661 [14:38<27:15,  3.44s/it]

80


❌ Error at row 185, attempt 1: Expecting value: line 6 column 22 (char 236)


❌ Error at row 185, attempt 2: Expecting value: line 6 column 22 (char 236)


❌ Error at row 185, attempt 3: Expecting value: line 6 column 22 (char 236)


❌ Error at row 185, attempt 4: Expecting value: line 6 column 22 (char 236)


Scoring with LLM:  28%|██▊       | 186/661 [14:56<1:00:43,  7.67s/it]

❌ Error at row 185, attempt 5: Expecting value: line 6 column 22 (char 236)


Scoring with LLM:  28%|██▊       | 187/661 [15:00<50:56,  6.45s/it]  

80


❌ Error at row 187, attempt 1: Expecting value: line 6 column 22 (char 236)


❌ Error at row 187, attempt 2: Expecting value: line 6 column 22 (char 236)


❌ Error at row 187, attempt 3: Expecting value: line 6 column 22 (char 236)


Scoring with LLM:  28%|██▊       | 188/661 [15:13<1:08:24,  8.68s/it]

80


Scoring with LLM:  29%|██▊       | 189/661 [15:17<55:32,  7.06s/it]  

70


Scoring with LLM:  29%|██▊       | 190/661 [15:20<46:56,  5.98s/it]

60


Scoring with LLM:  29%|██▉       | 191/661 [15:24<42:07,  5.38s/it]

80


Scoring with LLM:  29%|██▉       | 192/661 [15:28<38:12,  4.89s/it]

80


Scoring with LLM:  29%|██▉       | 193/661 [15:31<34:51,  4.47s/it]

60


Scoring with LLM:  29%|██▉       | 194/661 [15:35<33:07,  4.26s/it]

60


Scoring with LLM:  30%|██▉       | 195/661 [15:38<30:15,  3.89s/it]

70


Scoring with LLM:  30%|██▉       | 196/661 [15:42<29:55,  3.86s/it]

80


Scoring with LLM:  30%|██▉       | 197/661 [15:46<31:01,  4.01s/it]

80


Scoring with LLM:  30%|██▉       | 198/661 [15:50<29:39,  3.84s/it]

80


Scoring with LLM:  30%|███       | 199/661 [15:53<27:32,  3.58s/it]

70


Scoring with LLM:  30%|███       | 200/661 [15:56<27:03,  3.52s/it]

80


Scoring with LLM:  30%|███       | 201/661 [15:59<26:29,  3.46s/it]

80


Scoring with LLM:  31%|███       | 202/661 [16:03<26:06,  3.41s/it]

70


Scoring with LLM:  31%|███       | 203/661 [16:06<25:16,  3.31s/it]

80


Scoring with LLM:  31%|███       | 204/661 [16:09<25:32,  3.35s/it]

90


Scoring with LLM:  31%|███       | 205/661 [16:13<25:53,  3.41s/it]

60


Scoring with LLM:  31%|███       | 206/661 [16:16<26:14,  3.46s/it]

80


Scoring with LLM:  31%|███▏      | 207/661 [16:19<24:53,  3.29s/it]

70


❌ Error at row 207, attempt 1: Expecting value: line 6 column 22 (char 214)


❌ Error at row 207, attempt 2: Expecting value: line 6 column 22 (char 214)


❌ Error at row 207, attempt 3: Expecting value: line 6 column 22 (char 214)


❌ Error at row 207, attempt 4: Expecting value: line 6 column 22 (char 214)


Scoring with LLM:  31%|███▏      | 208/661 [16:36<55:20,  7.33s/it]

❌ Error at row 207, attempt 5: Expecting value: line 6 column 22 (char 214)


Scoring with LLM:  32%|███▏      | 209/661 [16:39<45:08,  5.99s/it]

80


Scoring with LLM:  32%|███▏      | 210/661 [16:43<40:11,  5.35s/it]

70


Scoring with LLM:  32%|███▏      | 211/661 [16:46<35:54,  4.79s/it]

70


Scoring with LLM:  32%|███▏      | 212/661 [16:50<32:33,  4.35s/it]

80


Scoring with LLM:  32%|███▏      | 213/661 [16:53<30:17,  4.06s/it]

70


Scoring with LLM:  32%|███▏      | 214/661 [16:57<29:24,  3.95s/it]

80


❌ Error at row 214, attempt 1: Expecting value: line 6 column 22 (char 195)


❌ Error at row 214, attempt 2: Expecting value: line 6 column 22 (char 195)


Scoring with LLM:  33%|███▎      | 215/661 [17:06<40:34,  5.46s/it]

80


Scoring with LLM:  33%|███▎      | 216/661 [17:09<36:41,  4.95s/it]

60


Scoring with LLM:  33%|███▎      | 217/661 [17:12<32:26,  4.38s/it]

80


❌ Error at row 217, attempt 1: Expecting value: line 6 column 22 (char 195)


Scoring with LLM:  33%|███▎      | 218/661 [17:19<36:59,  5.01s/it]

80


Scoring with LLM:  33%|███▎      | 219/661 [17:23<34:26,  4.68s/it]

80


Scoring with LLM:  33%|███▎      | 220/661 [17:27<32:21,  4.40s/it]

80


Scoring with LLM:  33%|███▎      | 221/661 [17:31<31:23,  4.28s/it]

80


Scoring with LLM:  34%|███▎      | 222/661 [17:35<30:47,  4.21s/it]

70


Scoring with LLM:  34%|███▎      | 223/661 [17:38<28:57,  3.97s/it]

60


Scoring with LLM:  34%|███▍      | 224/661 [17:42<29:57,  4.11s/it]

80


Scoring with LLM:  34%|███▍      | 225/661 [17:46<29:29,  4.06s/it]

90


Scoring with LLM:  34%|███▍      | 226/661 [17:50<28:07,  3.88s/it]

80


Scoring with LLM:  34%|███▍      | 227/661 [17:54<27:38,  3.82s/it]

90


Scoring with LLM:  34%|███▍      | 228/661 [17:57<26:59,  3.74s/it]

80


Scoring with LLM:  35%|███▍      | 229/661 [18:01<26:59,  3.75s/it]

80


❌ Error at row 229, attempt 1: Expecting value: line 6 column 22 (char 223)


Scoring with LLM:  35%|███▍      | 230/661 [18:08<34:09,  4.76s/it]

80


Scoring with LLM:  35%|███▍      | 231/661 [18:12<31:36,  4.41s/it]

80


Scoring with LLM:  35%|███▌      | 232/661 [18:15<29:34,  4.14s/it]

90


Scoring with LLM:  35%|███▌      | 233/661 [18:19<28:15,  3.96s/it]

80


Scoring with LLM:  35%|███▌      | 234/661 [18:22<27:33,  3.87s/it]

80


Scoring with LLM:  36%|███▌      | 235/661 [18:25<25:56,  3.65s/it]

70


Scoring with LLM:  36%|███▌      | 236/661 [18:29<25:20,  3.58s/it]

80


Scoring with LLM:  36%|███▌      | 237/661 [18:33<25:59,  3.68s/it]

80


Scoring with LLM:  36%|███▌      | 238/661 [18:36<25:14,  3.58s/it]

70


Scoring with LLM:  36%|███▌      | 239/661 [18:40<25:25,  3.62s/it]

80


Scoring with LLM:  36%|███▋      | 240/661 [18:44<25:47,  3.68s/it]

80


Scoring with LLM:  36%|███▋      | 241/661 [18:47<24:37,  3.52s/it]

70


Scoring with LLM:  37%|███▋      | 242/661 [18:50<24:13,  3.47s/it]

80


Scoring with LLM:  37%|███▋      | 243/661 [18:53<23:42,  3.40s/it]

80


Scoring with LLM:  37%|███▋      | 244/661 [18:56<23:03,  3.32s/it]

70


Scoring with LLM:  37%|███▋      | 245/661 [19:00<23:05,  3.33s/it]

80


Scoring with LLM:  37%|███▋      | 246/661 [19:03<23:46,  3.44s/it]

70


Scoring with LLM:  37%|███▋      | 247/661 [19:08<25:20,  3.67s/it]

60


Scoring with LLM:  38%|███▊      | 248/661 [19:11<25:12,  3.66s/it]

80


Scoring with LLM:  38%|███▊      | 249/661 [19:16<27:30,  4.01s/it]

70


Scoring with LLM:  38%|███▊      | 250/661 [19:20<26:06,  3.81s/it]

80


Scoring with LLM:  38%|███▊      | 251/661 [19:23<25:45,  3.77s/it]

80


Scoring with LLM:  38%|███▊      | 252/661 [19:27<26:37,  3.91s/it]

70


Scoring with LLM:  38%|███▊      | 253/661 [19:31<26:46,  3.94s/it]

80


Scoring with LLM:  38%|███▊      | 254/661 [19:35<25:55,  3.82s/it]

80


Scoring with LLM:  39%|███▊      | 255/661 [19:39<25:31,  3.77s/it]

80


Scoring with LLM:  39%|███▊      | 256/661 [19:43<26:03,  3.86s/it]

80


Scoring with LLM:  39%|███▉      | 257/661 [19:47<26:09,  3.88s/it]

70


Scoring with LLM:  39%|███▉      | 258/661 [19:50<25:30,  3.80s/it]

80


Scoring with LLM:  39%|███▉      | 259/661 [19:54<25:24,  3.79s/it]

80


❌ Error at row 259, attempt 1: Expecting value: line 6 column 22 (char 229)


❌ Error at row 259, attempt 2: Expecting value: line 6 column 22 (char 229)


❌ Error at row 259, attempt 3: Expecting value: line 6 column 22 (char 229)


❌ Error at row 259, attempt 4: Expecting value: line 6 column 22 (char 229)


Scoring with LLM:  39%|███▉      | 260/661 [20:10<50:42,  7.59s/it]

70


Scoring with LLM:  39%|███▉      | 261/661 [20:14<42:30,  6.38s/it]

70


Scoring with LLM:  40%|███▉      | 262/661 [20:18<37:38,  5.66s/it]

80


Scoring with LLM:  40%|███▉      | 263/661 [20:22<34:21,  5.18s/it]

80


Scoring with LLM:  40%|███▉      | 264/661 [20:26<32:01,  4.84s/it]

70


Scoring with LLM:  40%|████      | 265/661 [20:29<29:03,  4.40s/it]

70


Scoring with LLM:  40%|████      | 266/661 [20:33<27:35,  4.19s/it]

80


Scoring with LLM:  40%|████      | 267/661 [20:37<26:07,  3.98s/it]

70


Scoring with LLM:  41%|████      | 268/661 [20:40<25:33,  3.90s/it]

70


Scoring with LLM:  41%|████      | 269/661 [20:44<24:47,  3.79s/it]

80


Scoring with LLM:  41%|████      | 270/661 [20:47<24:13,  3.72s/it]

80


Scoring with LLM:  41%|████      | 271/661 [20:51<24:31,  3.77s/it]

60


Scoring with LLM:  41%|████      | 272/661 [20:55<23:54,  3.69s/it]

80


Scoring with LLM:  41%|████▏     | 273/661 [20:58<23:23,  3.62s/it]

80


Scoring with LLM:  41%|████▏     | 274/661 [21:03<24:42,  3.83s/it]

70


Scoring with LLM:  42%|████▏     | 275/661 [21:07<24:47,  3.85s/it]

80


Scoring with LLM:  42%|████▏     | 276/661 [21:10<24:15,  3.78s/it]

80


Scoring with LLM:  42%|████▏     | 277/661 [21:14<23:59,  3.75s/it]

80


Scoring with LLM:  42%|████▏     | 278/661 [21:17<23:38,  3.70s/it]

80


Scoring with LLM:  42%|████▏     | 279/661 [21:21<23:32,  3.70s/it]

80


Scoring with LLM:  42%|████▏     | 280/661 [21:25<23:46,  3.74s/it]

70


Scoring with LLM:  43%|████▎     | 281/661 [21:28<23:05,  3.65s/it]

60


Scoring with LLM:  43%|████▎     | 282/661 [21:32<22:41,  3.59s/it]

80


Scoring with LLM:  43%|████▎     | 283/661 [21:35<22:29,  3.57s/it]

80


Scoring with LLM:  43%|████▎     | 284/661 [21:39<22:13,  3.54s/it]

60


Scoring with LLM:  43%|████▎     | 285/661 [21:42<22:05,  3.53s/it]

80


Scoring with LLM:  43%|████▎     | 286/661 [21:45<21:07,  3.38s/it]

70


Scoring with LLM:  43%|████▎     | 287/661 [21:49<21:42,  3.48s/it]

80


Scoring with LLM:  44%|████▎     | 288/661 [21:53<21:31,  3.46s/it]

80


Scoring with LLM:  44%|████▎     | 289/661 [21:56<21:53,  3.53s/it]

80


Scoring with LLM:  44%|████▍     | 290/661 [21:59<21:13,  3.43s/it]

80


❌ Error at row 290, attempt 1: Expecting value: line 6 column 22 (char 228)


❌ Error at row 290, attempt 2: Expecting value: line 6 column 22 (char 228)


❌ Error at row 290, attempt 3: Expecting value: line 6 column 22 (char 228)


❌ Error at row 290, attempt 4: Expecting value: line 6 column 22 (char 228)


Scoring with LLM:  44%|████▍     | 291/661 [22:17<47:54,  7.77s/it]

❌ Error at row 290, attempt 5: Expecting value: line 6 column 22 (char 228)


Scoring with LLM:  44%|████▍     | 292/661 [22:21<40:32,  6.59s/it]

80


Scoring with LLM:  44%|████▍     | 293/661 [22:26<36:31,  5.96s/it]

80


Scoring with LLM:  44%|████▍     | 294/661 [22:29<32:33,  5.32s/it]

80


Scoring with LLM:  45%|████▍     | 295/661 [22:34<31:32,  5.17s/it]

80


Scoring with LLM:  45%|████▍     | 296/661 [22:38<28:55,  4.76s/it]

80


Scoring with LLM:  45%|████▍     | 297/661 [22:41<26:27,  4.36s/it]

60


Scoring with LLM:  45%|████▌     | 298/661 [22:45<24:36,  4.07s/it]

80


❌ Error at row 298, attempt 1: Expecting value: line 6 column 22 (char 209)


❌ Error at row 298, attempt 2: Expecting value: line 6 column 22 (char 209)


❌ Error at row 298, attempt 3: Expecting value: line 6 column 22 (char 209)


❌ Error at row 298, attempt 4: Expecting value: line 6 column 22 (char 209)


Scoring with LLM:  45%|████▌     | 299/661 [23:02<48:04,  7.97s/it]

❌ Error at row 298, attempt 5: Expecting value: line 6 column 22 (char 209)


Scoring with LLM:  45%|████▌     | 300/661 [23:06<40:52,  6.79s/it]

80


Scoring with LLM:  46%|████▌     | 301/661 [23:10<34:55,  5.82s/it]

80


Scoring with LLM:  46%|████▌     | 302/661 [23:13<30:20,  5.07s/it]

80


Scoring with LLM:  46%|████▌     | 303/661 [23:17<28:27,  4.77s/it]

80


Scoring with LLM:  46%|████▌     | 304/661 [23:20<26:06,  4.39s/it]

80


Scoring with LLM:  46%|████▌     | 305/661 [23:24<24:41,  4.16s/it]

80


Scoring with LLM:  46%|████▋     | 306/661 [23:28<23:44,  4.01s/it]

70


Scoring with LLM:  46%|████▋     | 307/661 [23:31<22:39,  3.84s/it]

80


Scoring with LLM:  47%|████▋     | 308/661 [23:35<22:40,  3.85s/it]

80


Scoring with LLM:  47%|████▋     | 309/661 [23:39<21:54,  3.73s/it]

80


Scoring with LLM:  47%|████▋     | 310/661 [23:42<21:29,  3.68s/it]

80


Scoring with LLM:  47%|████▋     | 311/661 [23:46<21:09,  3.63s/it]

80


Scoring with LLM:  47%|████▋     | 312/661 [23:49<21:23,  3.68s/it]

70


Scoring with LLM:  47%|████▋     | 313/661 [23:53<21:12,  3.66s/it]

60


Scoring with LLM:  48%|████▊     | 314/661 [23:57<21:07,  3.65s/it]

80


Scoring with LLM:  48%|████▊     | 315/661 [24:00<20:45,  3.60s/it]

60


Scoring with LLM:  48%|████▊     | 316/661 [24:04<21:02,  3.66s/it]

90


Scoring with LLM:  48%|████▊     | 317/661 [24:07<20:53,  3.64s/it]

80


Scoring with LLM:  48%|████▊     | 318/661 [24:11<20:03,  3.51s/it]

80


Scoring with LLM:  48%|████▊     | 319/661 [24:14<19:17,  3.38s/it]

80


Scoring with LLM:  48%|████▊     | 320/661 [24:17<18:50,  3.32s/it]

80


Scoring with LLM:  49%|████▊     | 321/661 [24:20<18:45,  3.31s/it]

80


Scoring with LLM:  49%|████▊     | 322/661 [24:24<19:27,  3.44s/it]

80


Scoring with LLM:  49%|████▉     | 323/661 [24:28<19:42,  3.50s/it]

80


Scoring with LLM:  49%|████▉     | 324/661 [24:31<19:41,  3.51s/it]

80


Scoring with LLM:  49%|████▉     | 325/661 [24:35<20:12,  3.61s/it]

60


❌ Error at row 325, attempt 1: Expecting value: line 6 column 22 (char 191)


❌ Error at row 325, attempt 2: Expecting value: line 6 column 22 (char 191)


❌ Error at row 325, attempt 3: Expecting value: line 6 column 22 (char 191)


❌ Error at row 325, attempt 4: Expecting value: line 6 column 22 (char 191)


Scoring with LLM:  49%|████▉     | 326/661 [24:52<42:58,  7.70s/it]

❌ Error at row 325, attempt 5: Expecting value: line 6 column 22 (char 191)


Scoring with LLM:  49%|████▉     | 327/661 [24:56<36:02,  6.47s/it]

80


Scoring with LLM:  50%|████▉     | 328/661 [24:59<31:14,  5.63s/it]

80


Scoring with LLM:  50%|████▉     | 329/661 [25:03<27:20,  4.94s/it]

60


Scoring with LLM:  50%|████▉     | 330/661 [25:06<25:09,  4.56s/it]

60


Scoring with LLM:  50%|█████     | 331/661 [25:10<23:11,  4.22s/it]

80


Scoring with LLM:  50%|█████     | 332/661 [25:13<21:54,  3.99s/it]

80


Scoring with LLM:  50%|█████     | 333/661 [25:17<21:38,  3.96s/it]

80


Scoring with LLM:  51%|█████     | 334/661 [25:21<20:46,  3.81s/it]

80


Scoring with LLM:  51%|█████     | 335/661 [25:24<20:10,  3.71s/it]

80


Scoring with LLM:  51%|█████     | 336/661 [25:28<19:27,  3.59s/it]

80


Scoring with LLM:  51%|█████     | 337/661 [25:31<19:23,  3.59s/it]

80


Scoring with LLM:  51%|█████     | 338/661 [25:34<18:58,  3.52s/it]

80


Scoring with LLM:  51%|█████▏    | 339/661 [25:38<18:54,  3.52s/it]

80


Scoring with LLM:  51%|█████▏    | 340/661 [25:42<19:33,  3.66s/it]

70


Scoring with LLM:  52%|█████▏    | 341/661 [25:46<20:08,  3.78s/it]

80


Scoring with LLM:  52%|█████▏    | 342/661 [25:50<20:09,  3.79s/it]

80


Scoring with LLM:  52%|█████▏    | 343/661 [25:54<20:14,  3.82s/it]

60


Scoring with LLM:  52%|█████▏    | 344/661 [25:57<19:31,  3.70s/it]

80


Scoring with LLM:  52%|█████▏    | 345/661 [26:01<18:58,  3.60s/it]

80


Scoring with LLM:  52%|█████▏    | 346/661 [26:04<17:57,  3.42s/it]

70


Scoring with LLM:  52%|█████▏    | 347/661 [26:06<16:58,  3.24s/it]

80


Scoring with LLM:  53%|█████▎    | 348/661 [26:10<17:16,  3.31s/it]

70


Scoring with LLM:  53%|█████▎    | 349/661 [26:13<17:27,  3.36s/it]

80


Scoring with LLM:  53%|█████▎    | 350/661 [26:17<17:19,  3.34s/it]

80


Scoring with LLM:  53%|█████▎    | 351/661 [26:20<17:13,  3.33s/it]

80


Scoring with LLM:  53%|█████▎    | 352/661 [26:23<17:15,  3.35s/it]

80


Scoring with LLM:  53%|█████▎    | 353/661 [26:27<17:01,  3.32s/it]

70


Scoring with LLM:  54%|█████▎    | 354/661 [26:30<17:06,  3.34s/it]

70


Scoring with LLM:  54%|█████▎    | 355/661 [26:33<17:03,  3.35s/it]

80


Scoring with LLM:  54%|█████▍    | 356/661 [26:37<17:39,  3.47s/it]

70


Scoring with LLM:  54%|█████▍    | 357/661 [26:41<18:21,  3.62s/it]

80


Scoring with LLM:  54%|█████▍    | 358/661 [26:45<18:34,  3.68s/it]

70


Scoring with LLM:  54%|█████▍    | 359/661 [26:48<17:38,  3.51s/it]

80


Scoring with LLM:  54%|█████▍    | 360/661 [26:51<17:07,  3.41s/it]

70


Scoring with LLM:  55%|█████▍    | 361/661 [26:54<16:25,  3.29s/it]

80


Scoring with LLM:  55%|█████▍    | 362/661 [26:57<15:52,  3.19s/it]

70


Scoring with LLM:  55%|█████▍    | 363/661 [27:01<16:29,  3.32s/it]

90


❌ Error at row 363, attempt 1: Expecting value: line 6 column 22 (char 220)


❌ Error at row 363, attempt 2: Expecting value: line 6 column 22 (char 220)


❌ Error at row 363, attempt 3: Expecting value: line 6 column 22 (char 220)


Scoring with LLM:  55%|█████▌    | 364/661 [27:17<35:11,  7.11s/it]

60


Scoring with LLM:  55%|█████▌    | 365/661 [27:20<29:58,  6.08s/it]

70


Scoring with LLM:  55%|█████▌    | 366/661 [27:24<26:10,  5.32s/it]

70


Scoring with LLM:  56%|█████▌    | 367/661 [27:27<23:28,  4.79s/it]

60


Scoring with LLM:  56%|█████▌    | 368/661 [27:31<21:57,  4.50s/it]

80


Scoring with LLM:  56%|█████▌    | 369/661 [27:35<21:18,  4.38s/it]

70


Scoring with LLM:  56%|█████▌    | 370/661 [27:40<21:24,  4.41s/it]

80


Scoring with LLM:  56%|█████▌    | 371/661 [27:44<20:28,  4.24s/it]

80


Scoring with LLM:  56%|█████▋    | 372/661 [27:47<19:12,  3.99s/it]

80


Scoring with LLM:  56%|█████▋    | 373/661 [27:50<18:09,  3.78s/it]

80


Scoring with LLM:  57%|█████▋    | 374/661 [27:54<18:06,  3.79s/it]

80


Scoring with LLM:  57%|█████▋    | 375/661 [27:58<18:32,  3.89s/it]

80


Scoring with LLM:  57%|█████▋    | 376/661 [28:02<18:22,  3.87s/it]

70


Scoring with LLM:  57%|█████▋    | 377/661 [28:06<18:49,  3.98s/it]

80


Scoring with LLM:  57%|█████▋    | 378/661 [28:10<18:27,  3.91s/it]

80


Scoring with LLM:  57%|█████▋    | 379/661 [28:14<17:49,  3.79s/it]

70


Scoring with LLM:  57%|█████▋    | 380/661 [28:17<17:20,  3.70s/it]

90


❌ Error at row 380, attempt 1: Expecting value: line 6 column 22 (char 188)


Scoring with LLM:  58%|█████▊    | 381/661 [28:24<21:43,  4.65s/it]

70


Scoring with LLM:  58%|█████▊    | 382/661 [28:28<20:18,  4.37s/it]

80


Scoring with LLM:  58%|█████▊    | 383/661 [28:31<19:17,  4.16s/it]

80


Scoring with LLM:  58%|█████▊    | 384/661 [28:35<18:20,  3.97s/it]

80


❌ Error at row 384, attempt 1: Expecting value: line 6 column 22 (char 185)


Scoring with LLM:  58%|█████▊    | 385/661 [28:42<22:12,  4.83s/it]

70


Scoring with LLM:  58%|█████▊    | 386/661 [28:46<21:08,  4.61s/it]

70


Scoring with LLM:  59%|█████▊    | 387/661 [28:49<19:39,  4.30s/it]

80


Scoring with LLM:  59%|█████▊    | 388/661 [28:53<18:05,  3.98s/it]

80


Scoring with LLM:  59%|█████▉    | 389/661 [28:56<17:06,  3.78s/it]

80


Scoring with LLM:  59%|█████▉    | 390/661 [28:59<16:41,  3.70s/it]

80


Scoring with LLM:  59%|█████▉    | 391/661 [29:03<16:05,  3.58s/it]

70


Scoring with LLM:  59%|█████▉    | 392/661 [29:07<16:16,  3.63s/it]

70


Scoring with LLM:  59%|█████▉    | 393/661 [29:10<16:16,  3.64s/it]

80


Scoring with LLM:  60%|█████▉    | 394/661 [29:14<16:27,  3.70s/it]

80


Scoring with LLM:  60%|█████▉    | 395/661 [29:18<17:20,  3.91s/it]

70


Scoring with LLM:  60%|█████▉    | 396/661 [29:23<18:14,  4.13s/it]

80


Scoring with LLM:  60%|██████    | 397/661 [29:27<17:45,  4.03s/it]

80


Scoring with LLM:  60%|██████    | 398/661 [29:30<16:49,  3.84s/it]

80


Scoring with LLM:  60%|██████    | 399/661 [29:34<16:18,  3.73s/it]

80


Scoring with LLM:  61%|██████    | 400/661 [29:37<15:57,  3.67s/it]

70


Scoring with LLM:  61%|██████    | 401/661 [29:41<15:33,  3.59s/it]

80


Scoring with LLM:  61%|██████    | 402/661 [29:45<15:53,  3.68s/it]

60


Scoring with LLM:  61%|██████    | 403/661 [29:48<16:00,  3.72s/it]

70


Scoring with LLM:  61%|██████    | 404/661 [29:52<15:14,  3.56s/it]

80


Scoring with LLM:  61%|██████▏   | 405/661 [29:55<15:10,  3.56s/it]

80


Scoring with LLM:  61%|██████▏   | 406/661 [29:58<14:47,  3.48s/it]

80


Scoring with LLM:  62%|██████▏   | 407/661 [30:02<14:41,  3.47s/it]

80


Scoring with LLM:  62%|██████▏   | 408/661 [30:05<14:40,  3.48s/it]

70


Scoring with LLM:  62%|██████▏   | 409/661 [30:08<13:57,  3.32s/it]

80


Scoring with LLM:  62%|██████▏   | 410/661 [30:12<14:39,  3.50s/it]

80


Scoring with LLM:  62%|██████▏   | 411/661 [30:17<15:35,  3.74s/it]

80


Scoring with LLM:  62%|██████▏   | 412/661 [30:21<16:14,  3.91s/it]

70


❌ Error at row 412, attempt 1: Expecting value: line 6 column 22 (char 207)


Scoring with LLM:  62%|██████▏   | 413/661 [30:28<20:07,  4.87s/it]

80


Scoring with LLM:  63%|██████▎   | 414/661 [30:31<18:02,  4.38s/it]

70


Scoring with LLM:  63%|██████▎   | 415/661 [30:35<17:10,  4.19s/it]

80


Scoring with LLM:  63%|██████▎   | 416/661 [30:39<16:31,  4.05s/it]

70


Scoring with LLM:  63%|██████▎   | 417/661 [30:42<16:01,  3.94s/it]

80


Scoring with LLM:  63%|██████▎   | 418/661 [30:46<15:18,  3.78s/it]

80


❌ Error at row 418, attempt 1: Expecting value: line 6 column 22 (char 200)


❌ Error at row 418, attempt 2: Expecting value: line 6 column 22 (char 200)


❌ Error at row 418, attempt 3: Expecting value: line 6 column 22 (char 200)


❌ Error at row 418, attempt 4: Expecting value: line 6 column 22 (char 200)


Scoring with LLM:  63%|██████▎   | 419/661 [31:03<31:01,  7.69s/it]

❌ Error at row 418, attempt 5: Expecting value: line 6 column 22 (char 200)


Scoring with LLM:  64%|██████▎   | 420/661 [31:07<26:30,  6.60s/it]

85


Scoring with LLM:  64%|██████▎   | 421/661 [31:12<24:29,  6.12s/it]

80


Scoring with LLM:  64%|██████▍   | 422/661 [31:16<22:23,  5.62s/it]

80


Scoring with LLM:  64%|██████▍   | 423/661 [31:20<20:14,  5.10s/it]

60


Scoring with LLM:  64%|██████▍   | 424/661 [31:24<19:01,  4.82s/it]

80


Scoring with LLM:  64%|██████▍   | 425/661 [31:28<17:30,  4.45s/it]

90


Scoring with LLM:  64%|██████▍   | 426/661 [31:32<16:51,  4.31s/it]

80


Scoring with LLM:  65%|██████▍   | 427/661 [31:36<16:14,  4.17s/it]

70


Scoring with LLM:  65%|██████▍   | 428/661 [31:39<15:37,  4.02s/it]

70


Scoring with LLM:  65%|██████▍   | 429/661 [31:43<15:25,  3.99s/it]

90


Scoring with LLM:  65%|██████▌   | 430/661 [31:47<15:08,  3.93s/it]

80


Scoring with LLM:  65%|██████▌   | 431/661 [31:51<14:46,  3.85s/it]

80


Scoring with LLM:  65%|██████▌   | 432/661 [31:54<14:16,  3.74s/it]

80


Scoring with LLM:  66%|██████▌   | 433/661 [31:58<14:03,  3.70s/it]

90


Scoring with LLM:  66%|██████▌   | 434/661 [32:01<13:50,  3.66s/it]

80


Scoring with LLM:  66%|██████▌   | 435/661 [32:05<14:23,  3.82s/it]

60


Scoring with LLM:  66%|██████▌   | 436/661 [32:09<14:14,  3.80s/it]

80


Scoring with LLM:  66%|██████▌   | 437/661 [32:13<14:36,  3.91s/it]

60


Scoring with LLM:  66%|██████▋   | 438/661 [32:17<14:40,  3.95s/it]

60


Scoring with LLM:  66%|██████▋   | 439/661 [32:21<14:11,  3.83s/it]

80


Scoring with LLM:  67%|██████▋   | 440/661 [32:25<14:22,  3.90s/it]

60


Scoring with LLM:  67%|██████▋   | 441/661 [32:29<14:13,  3.88s/it]

80


Scoring with LLM:  67%|██████▋   | 442/661 [32:33<14:02,  3.85s/it]

60


Scoring with LLM:  67%|██████▋   | 443/661 [32:36<13:10,  3.63s/it]

80


Scoring with LLM:  67%|██████▋   | 444/661 [32:39<12:16,  3.40s/it]

80


Scoring with LLM:  67%|██████▋   | 445/661 [32:42<11:55,  3.31s/it]

80


Scoring with LLM:  67%|██████▋   | 446/661 [32:45<11:50,  3.31s/it]

80


Scoring with LLM:  68%|██████▊   | 447/661 [32:48<11:46,  3.30s/it]

20


Scoring with LLM:  68%|██████▊   | 448/661 [32:51<11:29,  3.24s/it]

60


Scoring with LLM:  68%|██████▊   | 449/661 [32:55<11:32,  3.27s/it]

60


Scoring with LLM:  68%|██████▊   | 450/661 [32:58<11:19,  3.22s/it]

60


Scoring with LLM:  68%|██████▊   | 451/661 [33:02<11:56,  3.41s/it]

70


Scoring with LLM:  68%|██████▊   | 452/661 [33:05<11:36,  3.33s/it]

60


Scoring with LLM:  69%|██████▊   | 453/661 [33:09<12:24,  3.58s/it]

80


Scoring with LLM:  69%|██████▊   | 454/661 [33:13<12:45,  3.70s/it]

80


Scoring with LLM:  69%|██████▉   | 455/661 [33:17<12:56,  3.77s/it]

80


Scoring with LLM:  69%|██████▉   | 456/661 [33:21<12:51,  3.76s/it]

80


Scoring with LLM:  69%|██████▉   | 457/661 [33:24<12:34,  3.70s/it]

70


Scoring with LLM:  69%|██████▉   | 458/661 [33:28<12:17,  3.63s/it]

90


Scoring with LLM:  69%|██████▉   | 459/661 [33:31<12:18,  3.66s/it]

80


Scoring with LLM:  70%|██████▉   | 460/661 [33:36<12:47,  3.82s/it]

90


Scoring with LLM:  70%|██████▉   | 461/661 [33:39<12:26,  3.73s/it]

80


Scoring with LLM:  70%|██████▉   | 462/661 [33:43<12:52,  3.88s/it]

60


Scoring with LLM:  70%|███████   | 463/661 [33:47<12:34,  3.81s/it]

40


Scoring with LLM:  70%|███████   | 464/661 [33:51<12:28,  3.80s/it]

60


Scoring with LLM:  70%|███████   | 465/661 [33:56<13:48,  4.23s/it]

60


Scoring with LLM:  70%|███████   | 466/661 [34:00<13:46,  4.24s/it]

70


Scoring with LLM:  71%|███████   | 467/661 [34:04<12:59,  4.02s/it]

90


Scoring with LLM:  71%|███████   | 468/661 [34:07<12:14,  3.81s/it]

90


Scoring with LLM:  71%|███████   | 469/661 [34:11<11:59,  3.75s/it]

85


Scoring with LLM:  71%|███████   | 470/661 [34:15<12:11,  3.83s/it]

85


Scoring with LLM:  71%|███████▏  | 471/661 [34:19<12:54,  4.08s/it]

60


Scoring with LLM:  71%|███████▏  | 472/661 [34:23<12:36,  4.00s/it]

40


Scoring with LLM:  72%|███████▏  | 473/661 [34:27<12:24,  3.96s/it]

70


Scoring with LLM:  72%|███████▏  | 474/661 [34:31<12:16,  3.94s/it]

70


Scoring with LLM:  72%|███████▏  | 475/661 [34:35<11:53,  3.84s/it]

80


Scoring with LLM:  72%|███████▏  | 476/661 [34:38<11:28,  3.72s/it]

80


Scoring with LLM:  72%|███████▏  | 477/661 [34:42<11:31,  3.76s/it]

90


Scoring with LLM:  72%|███████▏  | 478/661 [34:46<11:53,  3.90s/it]

80


Scoring with LLM:  72%|███████▏  | 479/661 [34:50<12:00,  3.96s/it]

80


Scoring with LLM:  73%|███████▎  | 480/661 [34:54<11:53,  3.94s/it]

90


❌ Error at row 480, attempt 1: Expecting value: line 6 column 22 (char 244)


❌ Error at row 480, attempt 2: Expecting value: line 6 column 22 (char 244)


❌ Error at row 480, attempt 3: Expecting value: line 6 column 22 (char 244)


❌ Error at row 480, attempt 4: Expecting value: line 6 column 22 (char 244)


Scoring with LLM:  73%|███████▎  | 481/661 [35:13<25:28,  8.49s/it]

❌ Error at row 480, attempt 5: Expecting value: line 6 column 22 (char 244)


Scoring with LLM:  73%|███████▎  | 482/661 [35:17<21:23,  7.17s/it]

60


Scoring with LLM:  73%|███████▎  | 483/661 [35:21<18:19,  6.18s/it]

60


Scoring with LLM:  73%|███████▎  | 484/661 [35:25<15:56,  5.40s/it]

80


Scoring with LLM:  73%|███████▎  | 485/661 [35:28<14:10,  4.83s/it]

80


Scoring with LLM:  74%|███████▎  | 486/661 [35:32<13:15,  4.55s/it]

80


Scoring with LLM:  74%|███████▎  | 487/661 [35:36<12:20,  4.26s/it]

60


Scoring with LLM:  74%|███████▍  | 488/661 [35:40<11:52,  4.12s/it]

80


Scoring with LLM:  74%|███████▍  | 489/661 [35:44<11:47,  4.12s/it]

80


Scoring with LLM:  74%|███████▍  | 490/661 [35:47<11:18,  3.97s/it]

80


Scoring with LLM:  74%|███████▍  | 491/661 [35:51<10:53,  3.84s/it]

90


Scoring with LLM:  74%|███████▍  | 492/661 [35:54<10:05,  3.58s/it]

80


Scoring with LLM:  75%|███████▍  | 493/661 [35:57<09:52,  3.53s/it]

90


Scoring with LLM:  75%|███████▍  | 494/661 [36:01<09:47,  3.52s/it]

90


Scoring with LLM:  75%|███████▍  | 495/661 [36:04<09:32,  3.45s/it]

90


Scoring with LLM:  75%|███████▌  | 496/661 [36:07<09:31,  3.46s/it]

60


Scoring with LLM:  75%|███████▌  | 497/661 [36:11<09:26,  3.45s/it]

70


Scoring with LLM:  75%|███████▌  | 498/661 [36:14<09:07,  3.36s/it]

80


Scoring with LLM:  75%|███████▌  | 499/661 [36:17<08:45,  3.24s/it]

80


Scoring with LLM:  76%|███████▌  | 500/661 [36:20<08:54,  3.32s/it]

80


Scoring with LLM:  76%|███████▌  | 501/661 [36:24<09:11,  3.45s/it]

85


Scoring with LLM:  76%|███████▌  | 502/661 [36:28<09:15,  3.50s/it]

70


Scoring with LLM:  76%|███████▌  | 503/661 [36:31<09:20,  3.54s/it]

80


Scoring with LLM:  76%|███████▌  | 504/661 [36:35<09:21,  3.58s/it]

80


Scoring with LLM:  76%|███████▋  | 505/661 [36:38<08:52,  3.41s/it]

80


Scoring with LLM:  77%|███████▋  | 506/661 [36:41<08:06,  3.14s/it]

80


Scoring with LLM:  77%|███████▋  | 507/661 [36:44<07:56,  3.10s/it]

80


Scoring with LLM:  77%|███████▋  | 508/661 [36:46<07:27,  2.93s/it]

80


Scoring with LLM:  77%|███████▋  | 509/661 [36:49<07:07,  2.81s/it]

80


Scoring with LLM:  77%|███████▋  | 510/661 [36:52<07:33,  3.00s/it]

80


Scoring with LLM:  77%|███████▋  | 511/661 [36:55<07:39,  3.06s/it]

80


Scoring with LLM:  77%|███████▋  | 512/661 [36:59<07:47,  3.14s/it]

80


Scoring with LLM:  78%|███████▊  | 513/661 [37:02<08:11,  3.32s/it]

80


Scoring with LLM:  78%|███████▊  | 514/661 [37:05<07:42,  3.14s/it]

80


Scoring with LLM:  78%|███████▊  | 515/661 [37:09<07:49,  3.21s/it]

80


Scoring with LLM:  78%|███████▊  | 516/661 [37:11<07:12,  2.98s/it]

80


Scoring with LLM:  78%|███████▊  | 517/661 [37:14<06:53,  2.87s/it]

80


Scoring with LLM:  78%|███████▊  | 518/661 [37:17<07:20,  3.08s/it]

90


Scoring with LLM:  79%|███████▊  | 519/661 [37:21<07:42,  3.26s/it]

80


Scoring with LLM:  79%|███████▊  | 520/661 [37:25<07:57,  3.39s/it]

90


Scoring with LLM:  79%|███████▉  | 521/661 [37:28<08:13,  3.53s/it]

80


Scoring with LLM:  79%|███████▉  | 522/661 [37:32<08:15,  3.56s/it]

40


Scoring with LLM:  79%|███████▉  | 523/661 [37:36<08:18,  3.62s/it]

80


Scoring with LLM:  79%|███████▉  | 524/661 [37:39<08:03,  3.53s/it]

70


Scoring with LLM:  79%|███████▉  | 525/661 [37:43<08:07,  3.59s/it]

80


Scoring with LLM:  80%|███████▉  | 526/661 [37:47<08:28,  3.77s/it]

80


Scoring with LLM:  80%|███████▉  | 527/661 [37:51<08:18,  3.72s/it]

80


Scoring with LLM:  80%|███████▉  | 528/661 [37:54<08:07,  3.67s/it]

70


Scoring with LLM:  80%|████████  | 529/661 [37:58<07:57,  3.62s/it]

80


Scoring with LLM:  80%|████████  | 530/661 [38:01<07:51,  3.60s/it]

80


Scoring with LLM:  80%|████████  | 531/661 [38:05<07:41,  3.55s/it]

90


Scoring with LLM:  80%|████████  | 532/661 [38:08<07:34,  3.53s/it]

80


Scoring with LLM:  81%|████████  | 533/661 [38:12<07:35,  3.56s/it]

80


Scoring with LLM:  81%|████████  | 534/661 [38:15<07:28,  3.53s/it]

90


Scoring with LLM:  81%|████████  | 535/661 [38:19<07:26,  3.54s/it]

20


Scoring with LLM:  81%|████████  | 536/661 [38:23<07:33,  3.63s/it]

40


Scoring with LLM:  81%|████████  | 537/661 [38:26<07:24,  3.59s/it]

70


Scoring with LLM:  81%|████████▏ | 538/661 [38:31<08:26,  4.11s/it]

80


Scoring with LLM:  82%|████████▏ | 539/661 [38:35<07:42,  3.79s/it]

80


Scoring with LLM:  82%|████████▏ | 540/661 [38:38<07:12,  3.57s/it]

80


Scoring with LLM:  82%|████████▏ | 541/661 [38:41<07:10,  3.59s/it]

80


Scoring with LLM:  82%|████████▏ | 542/661 [38:45<07:09,  3.61s/it]

80


Scoring with LLM:  82%|████████▏ | 543/661 [38:48<06:49,  3.47s/it]

80


Scoring with LLM:  82%|████████▏ | 544/661 [38:51<06:41,  3.44s/it]

80


Scoring with LLM:  82%|████████▏ | 545/661 [38:54<06:22,  3.29s/it]

80


Scoring with LLM:  83%|████████▎ | 546/661 [38:58<06:34,  3.43s/it]

80


Scoring with LLM:  83%|████████▎ | 547/661 [39:02<06:32,  3.44s/it]

80


Scoring with LLM:  83%|████████▎ | 548/661 [39:05<06:27,  3.43s/it]

80


Scoring with LLM:  83%|████████▎ | 549/661 [39:09<06:28,  3.47s/it]

80


Scoring with LLM:  83%|████████▎ | 550/661 [39:12<06:19,  3.42s/it]

80


Scoring with LLM:  83%|████████▎ | 551/661 [39:15<06:19,  3.45s/it]

80


Scoring with LLM:  84%|████████▎ | 552/661 [39:18<05:59,  3.30s/it]

90


Scoring with LLM:  84%|████████▎ | 553/661 [39:23<06:30,  3.62s/it]

80


Scoring with LLM:  84%|████████▍ | 554/661 [39:26<06:27,  3.62s/it]

80


Scoring with LLM:  84%|████████▍ | 555/661 [39:30<06:19,  3.58s/it]

80


Scoring with LLM:  84%|████████▍ | 556/661 [39:33<06:13,  3.56s/it]

70


Scoring with LLM:  84%|████████▍ | 557/661 [39:37<06:17,  3.63s/it]

60


Scoring with LLM:  84%|████████▍ | 558/661 [39:41<06:25,  3.75s/it]

70


Scoring with LLM:  85%|████████▍ | 559/661 [39:44<06:10,  3.63s/it]

80


Scoring with LLM:  85%|████████▍ | 560/661 [39:48<06:01,  3.58s/it]

80


Scoring with LLM:  85%|████████▍ | 561/661 [39:51<05:44,  3.44s/it]

80


Scoring with LLM:  85%|████████▌ | 562/661 [39:54<05:40,  3.44s/it]

90


Scoring with LLM:  85%|████████▌ | 563/661 [39:58<05:36,  3.44s/it]

80


Scoring with LLM:  85%|████████▌ | 564/661 [40:01<05:23,  3.34s/it]

90


Scoring with LLM:  85%|████████▌ | 565/661 [40:05<05:48,  3.63s/it]

60


Scoring with LLM:  86%|████████▌ | 566/661 [40:09<05:49,  3.68s/it]

60


Scoring with LLM:  86%|████████▌ | 567/661 [40:13<05:41,  3.63s/it]

70


Scoring with LLM:  86%|████████▌ | 568/661 [40:16<05:36,  3.62s/it]

80


Scoring with LLM:  86%|████████▌ | 569/661 [40:19<05:18,  3.46s/it]

80


Scoring with LLM:  86%|████████▌ | 570/661 [40:22<05:06,  3.37s/it]

80


Scoring with LLM:  86%|████████▋ | 571/661 [40:26<05:11,  3.46s/it]

90


Scoring with LLM:  87%|████████▋ | 572/661 [40:30<05:14,  3.53s/it]

80


Scoring with LLM:  87%|████████▋ | 573/661 [40:34<05:19,  3.63s/it]

90


Scoring with LLM:  87%|████████▋ | 574/661 [40:38<05:35,  3.86s/it]

80


Scoring with LLM:  87%|████████▋ | 575/661 [40:42<05:30,  3.85s/it]

90


Scoring with LLM:  87%|████████▋ | 576/661 [40:46<05:30,  3.89s/it]

70


Scoring with LLM:  87%|████████▋ | 577/661 [40:50<05:30,  3.94s/it]

40


Scoring with LLM:  87%|████████▋ | 578/661 [40:54<05:17,  3.83s/it]

60


Scoring with LLM:  88%|████████▊ | 579/661 [40:57<05:15,  3.85s/it]

80


Scoring with LLM:  88%|████████▊ | 580/661 [41:01<05:07,  3.80s/it]

60


Scoring with LLM:  88%|████████▊ | 581/661 [41:05<05:01,  3.77s/it]

80


Scoring with LLM:  88%|████████▊ | 582/661 [41:08<04:55,  3.74s/it]

80


Scoring with LLM:  88%|████████▊ | 583/661 [41:12<04:46,  3.67s/it]

80


Scoring with LLM:  88%|████████▊ | 584/661 [41:16<04:40,  3.64s/it]

70


Scoring with LLM:  89%|████████▊ | 585/661 [41:19<04:33,  3.60s/it]

80


Scoring with LLM:  89%|████████▊ | 586/661 [41:23<04:32,  3.64s/it]

90


Scoring with LLM:  89%|████████▉ | 587/661 [41:27<04:34,  3.71s/it]

80


Scoring with LLM:  89%|████████▉ | 588/661 [41:30<04:30,  3.70s/it]

90


Scoring with LLM:  89%|████████▉ | 589/661 [41:34<04:30,  3.75s/it]

80


Scoring with LLM:  89%|████████▉ | 590/661 [41:38<04:27,  3.77s/it]

60


Scoring with LLM:  89%|████████▉ | 591/661 [41:43<04:46,  4.09s/it]

60


Scoring with LLM:  90%|████████▉ | 592/661 [41:48<05:10,  4.50s/it]

60


Scoring with LLM:  90%|████████▉ | 593/661 [41:53<05:05,  4.49s/it]

60


Scoring with LLM:  90%|████████▉ | 594/661 [41:57<04:56,  4.42s/it]

80


Scoring with LLM:  90%|█████████ | 595/661 [42:01<04:47,  4.35s/it]

80


Scoring with LLM:  90%|█████████ | 596/661 [42:05<04:27,  4.12s/it]

60


Scoring with LLM:  90%|█████████ | 597/661 [42:09<04:17,  4.03s/it]

80


Scoring with LLM:  90%|█████████ | 598/661 [42:13<04:14,  4.04s/it]

80


Scoring with LLM:  91%|█████████ | 599/661 [42:16<04:01,  3.89s/it]

80


Scoring with LLM:  91%|█████████ | 600/661 [42:20<03:49,  3.77s/it]

80


Scoring with LLM:  91%|█████████ | 601/661 [42:23<03:37,  3.62s/it]

80


Scoring with LLM:  91%|█████████ | 602/661 [42:27<03:32,  3.61s/it]

80


Scoring with LLM:  91%|█████████ | 603/661 [42:30<03:33,  3.67s/it]

80


Scoring with LLM:  91%|█████████▏| 604/661 [42:34<03:26,  3.62s/it]

60


Scoring with LLM:  92%|█████████▏| 605/661 [42:38<03:27,  3.70s/it]

70


Scoring with LLM:  92%|█████████▏| 606/661 [42:42<03:37,  3.95s/it]

60


Scoring with LLM:  92%|█████████▏| 607/661 [42:47<03:51,  4.30s/it]

60


Scoring with LLM:  92%|█████████▏| 608/661 [42:51<03:43,  4.21s/it]

70


Scoring with LLM:  92%|█████████▏| 609/661 [42:55<03:31,  4.06s/it]

70


Scoring with LLM:  92%|█████████▏| 610/661 [42:58<03:14,  3.82s/it]

80


Scoring with LLM:  92%|█████████▏| 611/661 [43:02<03:02,  3.65s/it]

80


Scoring with LLM:  93%|█████████▎| 612/661 [43:05<02:53,  3.53s/it]

80


Scoring with LLM:  93%|█████████▎| 613/661 [43:09<02:53,  3.62s/it]

80


Scoring with LLM:  93%|█████████▎| 614/661 [43:12<02:42,  3.45s/it]

90


Scoring with LLM:  93%|█████████▎| 615/661 [43:15<02:41,  3.52s/it]

80


Scoring with LLM:  93%|█████████▎| 616/661 [43:19<02:35,  3.45s/it]

80


Scoring with LLM:  93%|█████████▎| 617/661 [43:23<02:38,  3.60s/it]

60


Scoring with LLM:  93%|█████████▎| 618/661 [43:27<02:38,  3.69s/it]

60


Scoring with LLM:  94%|█████████▎| 619/661 [43:31<02:49,  4.04s/it]

60


Scoring with LLM:  94%|█████████▍| 620/661 [43:35<02:39,  3.90s/it]

80


Scoring with LLM:  94%|█████████▍| 621/661 [43:39<02:31,  3.79s/it]

90


Scoring with LLM:  94%|█████████▍| 622/661 [43:42<02:23,  3.67s/it]

80


Scoring with LLM:  94%|█████████▍| 623/661 [43:45<02:16,  3.60s/it]

90


Scoring with LLM:  94%|█████████▍| 624/661 [43:49<02:11,  3.56s/it]

80


Scoring with LLM:  95%|█████████▍| 625/661 [43:52<02:06,  3.52s/it]

80


Scoring with LLM:  95%|█████████▍| 626/661 [43:56<02:04,  3.55s/it]

80


Scoring with LLM:  95%|█████████▍| 627/661 [43:59<01:57,  3.46s/it]

80


Scoring with LLM:  95%|█████████▌| 628/661 [44:02<01:46,  3.24s/it]

80


Scoring with LLM:  95%|█████████▌| 629/661 [44:05<01:38,  3.09s/it]

80


Scoring with LLM:  95%|█████████▌| 630/661 [44:07<01:31,  2.94s/it]

80


Scoring with LLM:  95%|█████████▌| 631/661 [44:10<01:29,  3.00s/it]

60


Scoring with LLM:  96%|█████████▌| 632/661 [44:14<01:35,  3.31s/it]

60


❌ Error at row 632, attempt 1: Expecting value: line 6 column 22 (char 246)


❌ Error at row 632, attempt 2: Expecting value: line 6 column 22 (char 246)


❌ Error at row 632, attempt 3: Expecting value: line 6 column 22 (char 246)


❌ Error at row 632, attempt 4: Expecting value: line 6 column 22 (char 246)


Scoring with LLM:  96%|█████████▌| 633/661 [44:32<03:34,  7.64s/it]

❌ Error at row 632, attempt 5: Expecting value: line 6 column 22 (char 246)


Scoring with LLM:  96%|█████████▌| 634/661 [44:36<02:57,  6.58s/it]

80


Scoring with LLM:  96%|█████████▌| 635/661 [44:40<02:28,  5.70s/it]

80


Scoring with LLM:  96%|█████████▌| 636/661 [44:44<02:10,  5.21s/it]

80


Scoring with LLM:  96%|█████████▋| 637/661 [44:48<01:58,  4.94s/it]

60


Scoring with LLM:  97%|█████████▋| 638/661 [44:52<01:45,  4.57s/it]

60


Scoring with LLM:  97%|█████████▋| 639/661 [44:56<01:37,  4.45s/it]

80


Scoring with LLM:  97%|█████████▋| 640/661 [44:59<01:26,  4.12s/it]

70


Scoring with LLM:  97%|█████████▋| 641/661 [45:03<01:17,  3.88s/it]

70


Scoring with LLM:  97%|█████████▋| 642/661 [45:06<01:09,  3.66s/it]

70


Scoring with LLM:  97%|█████████▋| 643/661 [45:10<01:06,  3.67s/it]

80


Scoring with LLM:  97%|█████████▋| 644/661 [45:14<01:05,  3.84s/it]

60


Scoring with LLM:  98%|█████████▊| 645/661 [45:18<01:01,  3.83s/it]

70


Scoring with LLM:  98%|█████████▊| 646/661 [45:22<00:58,  3.89s/it]

80


Scoring with LLM:  98%|█████████▊| 647/661 [45:26<00:54,  3.91s/it]

70


Scoring with LLM:  98%|█████████▊| 648/661 [45:30<00:50,  3.92s/it]

70


Scoring with LLM:  98%|█████████▊| 649/661 [45:34<00:47,  3.95s/it]

60


Scoring with LLM:  98%|█████████▊| 650/661 [45:37<00:42,  3.88s/it]

40


Scoring with LLM:  98%|█████████▊| 651/661 [45:41<00:38,  3.87s/it]

70


Scoring with LLM:  99%|█████████▊| 652/661 [45:45<00:33,  3.75s/it]

70


Scoring with LLM:  99%|█████████▉| 653/661 [45:48<00:29,  3.74s/it]

70


Scoring with LLM:  99%|█████████▉| 654/661 [45:53<00:28,  4.05s/it]

40


Scoring with LLM:  99%|█████████▉| 655/661 [45:57<00:24,  4.02s/it]

60


Scoring with LLM:  99%|█████████▉| 656/661 [46:02<00:20,  4.19s/it]

70


Scoring with LLM:  99%|█████████▉| 657/661 [46:06<00:16,  4.21s/it]

60


Scoring with LLM: 100%|█████████▉| 658/661 [46:09<00:11,  3.93s/it]

90


Scoring with LLM: 100%|█████████▉| 659/661 [46:13<00:07,  3.80s/it]

60


Scoring with LLM: 100%|█████████▉| 660/661 [46:17<00:03,  3.84s/it]

60


Scoring with LLM: 100%|██████████| 661/661 [46:20<00:00,  3.73s/it]

Scoring with LLM: 100%|██████████| 661/661 [46:20<00:00,  4.21s/it]

40





In [3]:
df.to_json('/home/ali/Review_Quality_Benchmark/data/processed/merged_200_HA_llm.json', orient='records')

In [4]:
# Filter the first 50 rows and check for rows without null values in "llm_" columns
non_null_count = df.iloc[:][[col for col in df.columns if col.startswith("llm_")]].dropna().shape[0]
print(non_null_count)
print(df.shape)

239
(661, 58)


In [5]:
df

Unnamed: 0,paper_id,title,abstract,review_text,authors,reviewer,review_date,review_rating,review_confidence,review_soundness,...,llm_Fairness,llm_Actionability,llm_Constructiveness,llm_Relevance Alignment,llm_Clarity and Readability,llm_Usage of Technical Terms,llm_Factuality,llm_Overall Quality,llm_Sentiment Polarity,llm_Politeness
0,123,Navigating Text-To-Image Customization: From L...,Text-to-image generative models have garnered ...,**Summary:** \nThis paper presents an open-sou...,"['~SHIH-YING_YEH1', '~Yu-Guan_Hsieh1', '~Zhido...",Reviewer_EGJf,1701662567826,6.0,3.0,3.0,...,3,2,3,4,4,4,partially factual,60,neutral,polite
1,123,Navigating Text-To-Image Customization: From L...,Text-to-image generative models have garnered ...,This paper proposes a comprehensive library fo...,"['~SHIH-YING_YEH1', '~Yu-Guan_Hsieh1', '~Zhido...",Reviewer_DWom,1699636125239,6.0,3.0,3.0,...,4,3,4,5,4,3,factual,80,neutral,polite
2,123,Navigating Text-To-Image Customization: From L...,Text-to-image generative models have garnered ...,"This author introduces LyCORIS, an open source...","['~SHIH-YING_YEH1', '~Yu-Guan_Hsieh1', '~Zhido...",Reviewer_PnHf,1699636125143,6.0,4.0,4.0,...,4,3,4,5,4,4,factual,80,positive,polite
3,123,Navigating Text-To-Image Customization: From L...,Text-to-image generative models have garnered ...,"The authors propose LyCORIS, an open-source li...","['~SHIH-YING_YEH1', '~Yu-Guan_Hsieh1', '~Zhido...",Reviewer_ekPo,1699636125075,8.0,4.0,3.0,...,4,3,4,5,4,5,factual,80,neutral,polite
4,0,$\nu$-ensembles: Improving deep ensemble calib...,We present a method to improve the calibration...,"This paper introduces ν-ensembles, a novel dee...","['~Konstantinos_Pitas1', '~Julyan_Arbel1']",Reviewer_HFRa,1699636992453,3.0,4.0,2.0,...,4,2,3,4,4,3,factual,70,neutral,polite
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
656,118,Module Extraction for Efficient Object Query o...,The extraction of logically-independent fragme...,The submission addresses the problem of partit...,,Anonymous,03/May/2014,,,,...,3,1,2,4,3,4,unfactual,60,negative,neutral
657,61,EARTh: an Environmental Application Reference ...,The paper aims at providing a description of E...,This revision addresses my concerns. I am part...,,Natasha Noy,22/Jul/2013,,,,...,5,3,4,5,5,4,factual,90,positive,polite
658,76,Facilitating Data Discovery by Connecting Rela...,"In this study, we investigate two approaches t...",The paper presents and compares RDF/XML (in th...,,Anonymous,15/Jun/2013,,,,...,3,2,2,4,5,4,factual,60,neutral,polite
659,76,Facilitating Data Discovery by Connecting Rela...,"In this study, we investigate two approaches t...",This paper investigates two different approach...,,Ghislain Hachey,17/Jun/2013,,,,...,3,1,2,4,4,4,partially factual,60,neutral,polite
