This document outlines and explains what the various Python scripts do and how to run them. 

# Step 1: retrieve requirements
We start by running llm-call_retrieve-requirements.py

### Data Overview

| Type | Folder / File | Purpose |
|------|----------------|----------|
| Input | `.env` | API credentials and model settings |
| Input | `transcripts/` | JSON interview transcripts |
| Output | `results/` | Folder for generated CSV results |
| Output | `results/single_results.csv` | Requirements from single prompt workflow |
| Output | `results/meta_results.csv` | Requirements and analyses from meta prompt workflow |
| Output | `progress.log` | Execution log file |



In [None]:
from llm-call_retrieve-requirements import main

main()

### Terminal output

When we run `llm-call_retrieve-requirements.py` we will first be ask to select a mode:
```
Select an option:
[1] Run single prompt
[2] Run meta-prompt
[3] Run both
Enter choice (1, 2, or 3):
```

If everything is configured correctly, you’ll see a message like:
```
2025-10-21 12:00:05,123 - INFO - Azure OpenAI client created and credentials verified.
```

Or an error message like:
```
2025-10-21 12:00:05,456 - ERROR - Failed to create Azure OpenAI client. Check .env file and credentials. Error: ...
2025-10-21 12:00:05,457 - ERROR - Exiting due to client initialization failure.
```

Then you will see a message like (in case of `single` mode):
```
2025-10-21 12:00:06,000 - INFO - --- Starting Single-Prompt Workflow ---
2025-10-21 12:00:06,001 - INFO - Created new results file: results/single_results.csv
2025-10-21 12:00:06,002 - INFO - Found 3 new transcripts to process.
Single-Prompt Processing: 0%|          | 0/3 [00:00<?, ?it/s]
```

Then for each transcript:
```
2025-10-21 12:00:06,123 - INFO - Processing 3a6077fa-20e9-4eb9-b590-00e5dde74ca3.json.
2025-10-21 12:00:10,789 - INFO - Successfully processed and saved requirements for 3a6077fa-20e9-4eb9-b590-00e5dde74ca3.json.
2025-10-21 12:00:10,790 - INFO - Processing 3a6077fa-20e9-4eb9-b590-00e5dde74ca3.json...
...
2025-10-21 12:00:20,234 - INFO - Script finished.
```

If any file is invalid or empty, you’ll see:
```
2025-10-21 12:00:07,111 - WARNING - Transcript interview_003 is empty or invalid. Skipping.
```

If a model call fails:
```
2025-10-21 12:00:08,222 - ERROR - LLM API call failed: ...
2025-10-21 12:00:08,223 - WARNING - Skipping CSV entry for interview_003 due to processing error.
```

The same kind of messages will be shown for the `meta` mode. LLM call logs are also saved to local file `progress.log`.





# Step 2: verify requirements and quotes output
In this step we verify the previously created `single_results.csv` and `meta_results.csv` files based on these criteria:

### Verification Criteria
- **Identifier (Col A):** Unique ID per entry  
- **Iterations (Col B):** Counts occurrences per identifier  
- **System Prompt Check (meta only, Col C):** Must contain “system role” (case insensitive)  
- **Analysis Length (Col D):** More than 20 characters unless value is “N/A”  
- **Requirement–Quote Pairs:**  
  - single_results.csv → start at B–C, then E–F, G–H…  
  - meta_results.csv → start at E–F, then G–H…  
  - Requirement valid if non-empty after normalization  
  - Quote valid if non-empty after normalization  
- **Strict Missing Quote Rule:** Requirement with no paired quote = failure  
- **Tail Allowance:** Empty trailing requirement/quote pairs are allowed if all later cells are empty  
- **Output:** Prints summary tables, totals, differences, and missing quote details to console only  

### Data Overview
| Type | Folder / File | Purpose |
|------|----------------|----------|
| Input | `results/single_results.csv` | CSV file containing requirement–quote pairs from the single prompt workflow |
| Input | `results/meta_results.csv` | CSV file containing requirement–quote pairs and system prompt data from the meta prompt workflow |


from verify_csv_content import main

main()

### Terminal output
```
<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
              NOTE: read the top of this script for details on the checks performed.
<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>


Check for single_results.csv
Identifier                              Iterations  Analysis length    Req length    Quote length
------------------------------------  ------------  -----------------  ------------  --------------
5ef11a7f-45af-4e3a-8399-2cfb469c5253             1  CHECK              CHECK         CHECK
77b8b3ae-8c49-4ef3-9cc7-81965d1ab986             1  CHECK              CHECK         CHECK
46515d9e-58ad-487d-9eb9-23584a9c987b             1  CHECK              CHECK         CHECK
ba13038b-352a-4fd1-a2bc-78828412ab09             1  CHECK              CHECK         CHECK
5267f55a-9337-49b3-9747-d05c69c877f0             1  CHECK              CHECK         CHECK
447f8927-ecf4-4fe5-8396-f3ed2cf84f75             1  CHECK              CHECK         CHECK
3a6077fa-20e9-4eb9-b590-00e5dde74ca3             1  CHECK              CHECK         CHECK
d08c455a-35cc-43ce-aaf2-ddae7520e4ff             1  CHECK              CHECK         CHECK
ef90e0df-81db-44b1-a004-c3c5287eff60             1  CHECK              CHECK         CHECK
4fdb024f-c213-4091-9567-bcb80e822990             1  CHECK              CHECK         CHECK
5ef30849-40da-4787-aea5-6ce30d8316b0             1  CHECK              CHECK         CHECK
e1edb050-9f6a-489e-9bb5-250273d9bd1d             1  CHECK              CHECK         CHECK
1fa79326-df01-46c6-b09a-582a6f7396a9             1  CHECK              CHECK         CHECK
5012efc8-c30a-4922-b54e-d9fe8ef3356e             1  CHECK              CHECK         CHECK
ed345832-0cec-4695-8f89-7eeb230076bd             1  CHECK              CHECK         CHECK
3cca3e81-59a7-4543-97dc-5b4f945d3326             1  CHECK              CHECK         CHECK
62eae299-e773-4cb1-8873-915c1b853e58             1  CHECK              CHECK         CHECK
fdd53def-9dcc-4ccf-9890-ab3ebef859e1             1  CHECK              CHECK         CHECK
959a0116-b74b-40bb-93a5-2e302ebe719b             1  CHECK              CHECK         CHECK
55e944c3-0821-491c-be8c-daccf374e827             1  CHECK              CHECK         CHECK
9b79199f-6e38-4234-8be6-1bfa7068bfa1             1  CHECK              CHECK         CHECK

Total unique identifiers: 21
Total requirement entries: 534
Total quote entries: 534
Difference between requirement and quote entries: 0

Quote differences single_results.csv: None
____________________________________________________________________________________________________


Check for meta_results.csv
Identifier                              Iterations  System Prompt Check    Analysis length    Req length    Quote length
------------------------------------  ------------  ---------------------  -----------------  ------------  --------------
5ef11a7f-45af-4e3a-8399-2cfb469c5253             3  CHECK                  CHECK              CHECK         CHECK
77b8b3ae-8c49-4ef3-9cc7-81965d1ab986             3  CHECK                  CHECK              CHECK         CHECK
46515d9e-58ad-487d-9eb9-23584a9c987b             3  CHECK                  CHECK              CHECK         CHECK
ba13038b-352a-4fd1-a2bc-78828412ab09             3  CHECK                  CHECK              CHECK         CHECK
5267f55a-9337-49b3-9747-d05c69c877f0             3  CHECK                  CHECK              CHECK         CHECK
447f8927-ecf4-4fe5-8396-f3ed2cf84f75             3  CHECK                  CHECK              CHECK         CHECK
3a6077fa-20e9-4eb9-b590-00e5dde74ca3             3  CHECK                  CHECK              CHECK         CHECK
d08c455a-35cc-43ce-aaf2-ddae7520e4ff             3  CHECK                  CHECK              CHECK         CHECK
ef90e0df-81db-44b1-a004-c3c5287eff60             3  CHECK                  CHECK              CHECK         CHECK
4fdb024f-c213-4091-9567-bcb80e822990             3  CHECK                  CHECK              CHECK         CHECK
5ef30849-40da-4787-aea5-6ce30d8316b0             3  CHECK                  CHECK              CHECK         CHECK
e1edb050-9f6a-489e-9bb5-250273d9bd1d             3  CHECK                  CHECK              CHECK         CHECK
1fa79326-df01-46c6-b09a-582a6f7396a9             3  CHECK                  CHECK              CHECK         CHECK
5012efc8-c30a-4922-b54e-d9fe8ef3356e             3  CHECK                  CHECK              CHECK         CHECK
ed345832-0cec-4695-8f89-7eeb230076bd             3  CHECK                  CHECK              CHECK         CHECK
3cca3e81-59a7-4543-97dc-5b4f945d3326             3  CHECK                  CHECK              CHECK         CHECK
62eae299-e773-4cb1-8873-915c1b853e58             3  CHECK                  CHECK              CHECK         CHECK
fdd53def-9dcc-4ccf-9890-ab3ebef859e1             3  CHECK                  CHECK              CHECK         CHECK
959a0116-b74b-40bb-93a5-2e302ebe719b             3  CHECK                  CHECK              CHECK         CHECK
55e944c3-0821-491c-be8c-daccf374e827             3  CHECK                  CHECK              CHECK         CHECK
9b79199f-6e38-4234-8be6-1bfa7068bfa1             3  CHECK                  CHECK              CHECK         CHECK

Total unique identifiers: 21
Total requirement entries: 1605
Total quote entries: 1605
Difference between requirement and quote entries: 0

Quote differences meta_results.csv: None
```

Above is the terminal output given in our experiment. Note that all tests are marked CHECKED indicating that the data can be further used.

# Step 3: Verify quotes tracability
In this step we check the quotes in `single_results.csv` and `meta_results.csv` and try to trace them back to their original transcript json files.

### Verification Criteria
- **Quote Verification Method:** Uses fuzzy matching to compare CSV quotes against transcript text  
- **Standard Quotes:** Matched using partial fuzzy ratio (threshold ≥ 80%)  
- **Ellipsis Quotes ("..."):** Split into parts and searched sequentially; each part must meet threshold  
- **Ellipsis Scoring:** Final score is the average of all parts’ similarity scores  
- **Failure Condition:** Any quote (or ellipsis part) scoring below 80% is reported as a failed match  
- **Transcript Handling:**  
  - Missing transcript → all quotes for that identifier marked as failed (`TRANSCRIPT_MISSING`)  
  - Invalid or unreadable JSON → all quotes for that identifier marked as failed (`ERROR_READING_JSON`)  
- **Output:** Console report listing failed quotes with identifier, row, column, similarity score, and quote text  

### Data Overview
| Type | Folder / File | Purpose |
|------|----------------|----------|
| Input | `results/single_results.csv` | Source of requirements and quotes to verify |
| Input | `results/meta_results.csv` | Source of meta prompt results and quotes |
| Input | `transcripts/` | Folder with original interview transcripts (JSON) |


In [None]:
from verify_quotes import main

main()

### Terminal output
```
Verification of Quotes for: single_results.csv

Result: Found 5 quotes that failed the 80% similarity threshold.

Identifier                              Row  Column    Match Score    Quote
------------------------------------  -----  --------  -------------  ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ba13038b-352a-4fd1-a2bc-78828412ab09      5  Q         66%            'Stakeholder: Dieses Jahr noch'
ba13038b-352a-4fd1-a2bc-78828412ab09      5  U         78%            'Stakeholder: Innovation klingt immer gut.'
3cca3e81-59a7-4543-97dc-5b4f945d3326     17  AI        50%            'Would you like real-time updates on equipment availability ... Stakeholder: Yes'
3cca3e81-59a7-4543-97dc-5b4f945d3326     17  AK        50%            'Would you like real-time updates on equipment availability or integration with the booking system ... Stakeholder: Yes'
9b79199f-6e38-4234-8be6-1bfa7068bfa1     22  O         57%            'Interviewer: Do you plan to handle the content creation and scheduling for these posts internally, or would you be interested in having features within the system to assist... Stakeholder: Internally'
____________________________________________________________________________________________________

Verification of Quotes for: meta_results.csv

Result: Found 34 quotes that failed the 80% similarity threshold.

Identifier                              Row  Column    Match Score    Quote
------------------------------------  -----  --------  -------------  -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
77b8b3ae-8c49-4ef3-9cc7-81965d1ab986      6  F         77%            'Stakeholder: Can we talk in German?'
ba13038b-352a-4fd1-a2bc-78828412ab09     13  N         76%            'Stakeholder: Wir bieten ein Resort Hotels'
ba13038b-352a-4fd1-a2bc-78828412ab09     13  P         65%            'Stakeholder: Wir bieten ... Ski-Ausleihe ...'
ba13038b-352a-4fd1-a2bc-78828412ab09     13  R         65%            'Stakeholder: Wir bieten ... Ski-Trainigings an'
ba13038b-352a-4fd1-a2bc-78828412ab09     13  AD        66%            'Stakeholder: Dieses Jahr noch'
ba13038b-352a-4fd1-a2bc-78828412ab09     13  AL        64%            'Stakeholder: Wir möchten komplette digitalisieren + eine zentrale datenbank\nStakeholder: komplette neue lösung. Ich möchte nicht mehr mit fax und telefon arbeiteb'
ba13038b-352a-4fd1-a2bc-78828412ab09     13  AR        69%            'Stakeholder: Ja ich möchte ... Zahluingen ermöglichen ...'
ba13038b-352a-4fd1-a2bc-78828412ab09     13  AT        69%            'Stakeholder: Ja ich möchte ... eine digitale verwaltung über alles inventar'
ba13038b-352a-4fd1-a2bc-78828412ab09     13  BD        78%            'Stakeholder: Vor-Ort-Training wäre uns am liebsten'
5267f55a-9337-49b3-9747-d05c69c877f0     14  V         76%            'Stakeholder: Email would be enough'
447f8927-ecf4-4fe5-8396-f3ed2cf84f75     19  AF        74%            'Stakeholder: Paypal'
5ef30849-40da-4787-aea5-6ce30d8316b0     32  AP        53%            'For customer feedback, would you prefer the system to automatically request feedback after appointments and compile reports for review? And regarding staff performance... Stakeholder: no'
5ef30849-40da-4787-aea5-6ce30d8316b0     34  T         77%            'Stakeholder: i want to do Fast Accounts'
5ef30849-40da-4787-aea5-6ce30d8316b0     34  AD        78%            'Stakeholder: Yes but no loyalty programs'
5ef30849-40da-4787-aea5-6ce30d8316b0     34  AJ        69%            "Stakeholder: so does the system Offer a booking system for clients to make appointments Keep track of her/his business records (receipts, expenses, wages, GST, etc.) Keep track of supplies and notify her/him when supplies are running out and should be ordered Provide data in form of reports to run the business more efficiently\nInterviewer: Yes, based on the requirements we've discussed, the system should offer the following capabilities:\n1. Booking System for Clients: Allow clients to seamlessly make appointments online, with automated confirmations and reminders.\n...\nStakeholder: ok thats all i need"
5ef30849-40da-4787-aea5-6ce30d8316b0     34  AL        69%            "Interviewer: Yes, based on the requirements we've discussed, the system should offer the following capabilities:\n1. Booking System for Clients: Allow clients to seamlessly make appointments online, with automated confirmations and reminders.\n...\nStakeholder: ok thats all i need"
5ef30849-40da-4787-aea5-6ce30d8316b0     34  AN        69%            "Interviewer: Yes, based on the requirements we've discussed, the system should offer the following capabilities:\n1. Booking System for Clients: Allow clients to seamlessly make appointments online, with automated confirmations and reminders.\n...\nStakeholder: ok thats all i need"
5ef30849-40da-4787-aea5-6ce30d8316b0     34  AP        69%            "Interviewer: Yes, based on the requirements we've discussed, the system should offer the following capabilities:\n1. Booking System for Clients: Allow clients to seamlessly make appointments online, with automated confirmations and reminders.\n...\nStakeholder: ok thats all i need"
5ef30849-40da-4787-aea5-6ce30d8316b0     34  AR        72%            "Stakeholder: so does the system Offer a booking system for clients to make appointments Keep track of her/his business records (receipts, expenses, wages, GST, etc.) ...\nInterviewer: Yes, based on the requirements we've discussed, the system should offer the following capabilities:\n2. Business Record Management: Integrate with Fast Accounts to keep track of business records such as receipts, expenses, wages, and GST.\n...\nStakeholder: ok thats all i need"
5ef30849-40da-4787-aea5-6ce30d8316b0     34  AT        72%            "Stakeholder: so does the system Offer a booking system for clients to make appointments Keep track of her/his business records (receipts, expenses, wages, GST, etc.) ...\nInterviewer: Yes, based on the requirements we've discussed, the system should offer the following capabilities:\n2. Business Record Management: Integrate with Fast Accounts to keep track of business records such as receipts, expenses, wages, and GST.\n...\nStakeholder: ok thats all i need"
5ef30849-40da-4787-aea5-6ce30d8316b0     34  AV        72%            "Stakeholder: so does the system Offer a booking system for clients to make appointments Keep track of her/his business records (receipts, expenses, wages, GST, etc.) ...\nInterviewer: Yes, based on the requirements we've discussed, the system should offer the following capabilities:\n2. Business Record Management: Integrate with Fast Accounts to keep track of business records such as receipts, expenses, wages, and GST.\n...\nStakeholder: ok thats all i need"
5ef30849-40da-4787-aea5-6ce30d8316b0     34  AX        72%            "Stakeholder: so does the system Offer a booking system for clients to make appointments Keep track of her/his business records (receipts, expenses, wages, GST, etc.) ...\nInterviewer: Yes, based on the requirements we've discussed, the system should offer the following capabilities:\n2. Business Record Management: Integrate with Fast Accounts to keep track of business records such as receipts, expenses, wages, and GST.\n...\nStakeholder: ok thats all i need"
5ef30849-40da-4787-aea5-6ce30d8316b0     34  AZ        69%            "Stakeholder: so does the system Offer a booking system for clients to make appointments Keep track of her/his business records (receipts, expenses, wages, GST, etc.) Keep track of supplies and notify her/him when supplies are running out and should be ordered Provide data in form of reports to run the business more efficiently\nInterviewer: Yes, based on the requirements we've discussed, the system should offer the following capabilities:\n3. Inventory Management: Track supplies and provide notifications when they are running low and need to be reordered.\n...\nStakeholder: ok thats all i need"
5ef30849-40da-4787-aea5-6ce30d8316b0     34  BB        69%            "Stakeholder: so does the system Offer a booking system for clients to make appointments Keep track of her/his business records (receipts, expenses, wages, GST, etc.) Keep track of supplies and notify her/him when supplies are running out and should be ordered ...\nInterviewer: Yes, based on the requirements we've discussed, the system should offer the following capabilities:\n3. Inventory Management: Track supplies and provide notifications when they are running low and need to be reordered.\n...\nStakeholder: ok thats all i need"
5ef30849-40da-4787-aea5-6ce30d8316b0     34  BD        76%            "Stakeholder: so does the system Offer a booking system for clients to make appointments Keep track of her/his business records (receipts, expenses, wages, GST, etc.) Keep track of supplies and notify her/him when supplies are running out and should be ordered Provide data in form of reports to run the business more efficiently\nInterviewer: Yes, based on the requirements we've discussed, the system should offer the following capabilities:\n4. Data Reports for Business Efficiency: Provide detailed reports and analytics to help you run the business more efficiently, potentially including demand forecasting and staff scheduling recommendations.\nStakeholder: ok thats all i need"
5ef30849-40da-4787-aea5-6ce30d8316b0     34  BF        75%            "Interviewer: Yes, based on the requirements we've discussed, the system should offer the following capabilities:\n4. Data Reports for Business Efficiency: Provide detailed reports and analytics to help you run the business more efficiently, potentially including demand forecasting and staff scheduling recommendations.\nStakeholder: ok thats all i need"
3cca3e81-59a7-4543-97dc-5b4f945d3326     48  P         78%            'Stakeholder: I want a Social Media Connection\n...\nInterviewer: Understood, so the goal is to utilize social media to attract a larger customer base. Would you like specific social media platforms integrated for promotions, or are you aiming for general sharing and connectivity features across major platforms?\nStakeholder: Yes to the last point'
3cca3e81-59a7-4543-97dc-5b4f945d3326     48  V         79%            'Stakeholder: I want a centralized database'
3cca3e81-59a7-4543-97dc-5b4f945d3326     48  Z         72%            'Stakeholder: I also want a feature where customers get a forecast avojt the wether\nInterviewer: Would you prefer a daily overview, a week-long forecast, or real-time updates integrated into the booking process?\nStakeholder: Daily would be fine'
3cca3e81-59a7-4543-97dc-5b4f945d3326     48  AV        78%            'Stakeholder: I want a Social Media Connection\n...\nStakeholder: No it is mainly Abort advertising'
3cca3e81-59a7-4543-97dc-5b4f945d3326     48  BH        62%            'Stakeholder: I own 3 Resorts for skiing\nStakeholder: I want a centralized database'
3cca3e81-59a7-4543-97dc-5b4f945d3326     49  R         79%            'Stakeholder: I want a centralized database'
3cca3e81-59a7-4543-97dc-5b4f945d3326     49  AP        76%            'Stakeholder: I do everything manually'
9b79199f-6e38-4234-8be6-1bfa7068bfa1     63  AX        75%            'Stakeholder: I would use an SQL database'
```

Note that in our experiment various quotes good not be automatically linked backed. This could be the case because they were too short, they were summarized or contain an ellipsis. Even though the LLM was instructed to be careful in giving this kind of output as tracability weakens, it still outputs it in some cases. This terminal output allows to manually assess these cases. In our research the quotes the 39 quotes were able to be traced back to their respective transcript.

# Step 4: Limit amount of requirements per transcript
Even though the LLM in step 1 was instructed to not elicited more than 30 requirements per transcript, it sometimes still tends to do so (LLMs are not so good at counting). In this step we automatically remove these requirements.

### Data Overview
| Type | Folder / File | Purpose |
|------|----------------|----------|
| Input | `results/single_results.csv` | Original CSV file to be trimmed up to the column `R30_QT` |
| Input | `results/meta_results.csv` | Original CSV file to be trimmed up to the column `R30_QT` |
| Output | `results/single_results.csv` | Trimmed version of the same file with columns beyond `R30_QT` removed |
| Output | `results/meta_results.csv` | Trimmed version of the same file with columns beyond `R30_QT` removed |

In [None]:
from format_limit-requirements import main

main()

# Step 5: Match elicited quotes to the ground truth dataset
In this step we go through all the elicited quotes and use an LLM to try to link them to the ground truth requirements dataset.

### Data Overview
| Type | Folder / File | Purpose |
|------|----------------|----------|
| Input | `.env` | Stores Azure OpenAI credentials and deployment settings |
| Input | `requirements_list.csv` | Master list of official requirements used for exact matching |
| Input | `scenarios_list.csv` | Maps interview IDs to their corresponding scenarios |
| Input | `results/single_results.csv` | Elicited requirements extracted from the single prompt workflow |
| Input | `results/meta_results.csv` | Elicited requirements extracted from the meta prompt workflow |
| Input | `transcripts/` | JSON interview transcripts used for hallucination and context checks |
| Output | `results/analysis_single.csv` | LLM-generated structured analysis for single prompt workflow |
| Output | `results/analysis_meta.csv` | LLM-generated structured analysis for meta prompt workflow |
| Output | `progress.log` | Log file recording progress, warnings, and errors |



### Terminal output

When we run `llm-call_match-requirements` we will first be asked to select which file to process:
```
Select an option:
[1] Run for single_results.csv
[2] Run for meta_results.csv
[3] Run for both
Enter choice (1, 2, or 3):
```

If everything is configured correctly, you’ll see a message like:
```
2025-10-21 12:00:05,123 - INFO - Azure OpenAI client created and credentials verified.
```

Or an error message like:
```
2025-10-21 12:00:05,456 - ERROR - Failed to create Azure OpenAI client. Check .env file and credentials. Error: ...
2025-10-21 12:00:05,457 - ERROR - LLM client could not be initialized. Exiting.
```

Then you will see a message like (in case of single_results.csv mode):
```
2025-10-21 12:00:06,000 - INFO - --- Starting processing for single_results.csv ---
2025-10-21 12:00:06,001 - INFO - Extracted requirements for 3 unique interviews/iterations.
2025-10-21 12:00:06,002 - INFO - Initial empty analysis file created: results/analysis_single.csv
Processing single_results.csv: 0%|          | 0/3 [00:00<?, ?it/s]
```

Then for each interview entry:
```
2025-10-21 12:00:06,123 - INFO - Saved results for 3a6077fa-20e9-4eb9-b590-00e5dde74ca3
2025-10-21 12:00:10,789 - INFO - Saved results for 5c6e9b22-1b45-42b3-973f-16b3f06a5fdd
2025-10-21 12:00:12,234 - INFO - Sorting final output file...
2025-10-21 12:00:12,235 - INFO - Successfully finished processing for single_results.csv
2025-10-21 12:00:12,236 - INFO - Script finished.
```

If you select meta_results.csv mode instead, the log will look similar but include iteration information:
```
2025-10-21 12:00:06,000 - INFO - --- Starting processing for meta_results.csv ---
2025-10-21 12:00:06,001 - INFO - Extracted requirements for 5 unique interviews/iterations.
2025-10-21 12:00:06,002 - INFO - Initial empty analysis file created: results/analysis_meta.csv
Processing meta_results.csv: 0%|          | 0/5 [00:00<?, ?it/s]
2025-10-21 12:00:08,456 - INFO - Saved results for 3a6077fa-20e9-4eb9-b590-00e5dde74ca3 - Iteration 1
2025-10-21 12:00:11,001 - INFO - Saved results for 3a6077fa-20e9-4eb9-b590-00e5dde74ca3 - Iteration 2
2025-10-21 12:00:14,234 - INFO - Sorting final output file...
2025-10-21 12:00:14,235 - INFO - Successfully finished processing for meta_results.csv
2025-10-21 12:00:14,236 - INFO - Script finished.
```

If foundational files (the ground truth requirements txt list and csv file) are missing or empty, you will see:
```
2025-10-21 12:00:04,500 - ERROR - Could not load foundational data. Exiting.
```

If a CSV file is missing, the log will contain:
```
2025-10-21 12:00:05,234 - WARNING - Input file not found, skipping: results/single_results.csv
```

If a transcript is missing or invalid:
```
2025-10-21 12:00:06,345 - WARNING - Transcript not found: transcripts/3a6077fa-20e9-4eb9-b590-00e5dde74ca3.json
or
2025-10-21 12:00:07,111 - WARNING - Could not parse JSON for transcript: transcripts/invalid_file.json
```

If the LLM call fails repeatedly:
```
2025-10-21 12:00:08,222 - WARNING - Error on attempt 1: ...
2025-10-21 12:00:09,223 - WARNING - Error on attempt 2: ...
2025-10-21 12:00:10,224 - ERROR - Failed to get valid analysis from LLM after 3 attempts.
```

LLM call logs are also saved to local file `progress.log`.


# Step 6: Human assessment of LLM made matches
In this step a human researcher will asses the matches made in the previous step through a GUI tool. In this tool a requirement from the ground truth dataset will be shown as well as all the elicited requirements the LLM deemed a match. This step should be performed by at least 3 researchers.

**IMPORTANT!** After each human assessment the following should be done manually:
1. Create folder `researcher1` in folder `results`.
2. Move files `analysis_meta_human.csv`, `analysis_single_human.csv` and `human_verification_state.json` into this folder.
For researcher 2 call the folder in step 1 `researcher2` and for 3 `researcher3` etc.

If these steps are not done the data of the previous researcher WILL be overwritten. Besides this folder structure is needed for the next step.

### Data Overview
| Type | Folder / File | Purpose |
|------|----------------|----------|
| Input | `requirements_list.csv` | Ground truth list of official requirements displayed for human reference |
| Input | `results/analysis_single.csv` | LLM analysis results for the single prompt workflow used for verification |
| Input | `results/analysis_meta.csv` | LLM analysis results for the meta prompt workflow used for verification |
| Input | `results/single_results.csv` | Original extracted requirements from the single prompt workflow |
| Input | `results/meta_results.csv` | Original extracted requirements from the meta prompt workflow |
| Output | `results/analysis_single_human.csv` | Human-validated version of the single analysis file |
| Output | `results/analysis_meta_human.csv` | Human-validated version of the meta analysis file |
| Output | `results/human_verification_state.json` | Stores user progress and decisions (Yes/No) between sessions |


In [None]:
from verify_human_GUI import main

main()

# Step 7: Combine human assessments
In this step we combine the assessments of the 3 (or more) researchers into 2 synergized csv files.

### Data Overview
Note that the * in `researcher*` refers to a number and is ascending.
| Type | Folder / File | Purpose |
|------|----------------|----------|
| Input | `results/researcher*/analysis_single_human.csv` | Individual human-validated analysis files for the single prompt workflow from multiple researchers |
| Input | `results/researcher*/analysis_meta_human.csv` | Individual human-validated analysis files for the meta prompt workflow from multiple researchers |
| Output | `results/combined_analysis_single_human.csv` | Consensus file combining all researcher single analyses using a two-thirds majority rule |
| Output | `results/combined_analysis_meta_human.csv` | Consensus file combining all researcher meta analyses using a two-thirds majority rule |


In [None]:
from combine_human_assessments import main
main()

### Terminal output

```
--- Starting analysis for: analysis_single_human.csv ---
Found 3 researcher files to process:
 - results/researcher1/analysis_single_human.csv
 - results/researcher2/analysis_single_human.csv
 - results/researcher3/analysis_single_human.csv
----------------------------------------
Using a 2/3 majority rule.
Agreement threshold set to 2 out of 3 researchers.

Processing complete!

============================================================
--- Summary for combined_analysis_single_human.csv ---
Combined analysis file created at: results/combined_analysis_single_human.csv
Total mismatches (cells with no majority): 9
Total individual entries removed (due to non-consensus): 76
Total individual entries kept (consensus): 312
============================================================

--- Starting analysis for: analysis_meta_human.csv ---
Found 3 researcher files to process:
 - results/researcher1/analysis_meta_human.csv
 - results/researcher2/analysis_meta_human.csv
 - results/researcher3/analysis_meta_human.csv
----------------------------------------
Using a 2/3 majority rule.
Agreement threshold set to 2 out of 3 researchers.

Processing complete!

============================================================
--- Summary for combined_analysis_meta_human.csv ---
Combined analysis file created at: results/combined_analysis_meta_human.csv
Total mismatches (cells with no majority): 8
Total individual entries removed (due to non-consensus): 88
Total individual entries kept (consensus): 1389
============================================================
```

# Step 8: Create the confusion matrix
In this step we create the confusion matrix for both the LLM assessment and the LLM + human assessment.

### Data Overview
| Type | Folder / File | Purpose |
|------|----------------|----------|
| Input | `ground_truth/dataset_new.csv` | Ground truth dataset containing official requirement elicitation results |
| Input | `results/analysis_single_human.csv` | Human-validated analysis file for the single prompt workflow |
| Input | `results/analysis_meta_human.csv` | Human-validated analysis file for the meta prompt workflow |
| Output | `results/confusion_analysis_single_human.txt` | Text report showing performance metrics for single analysis (TP, FP, FN, TN) |
| Output | `results/confusion_analysis_meta_human.txt` | Text report showing performance metrics for meta analysis per iteration |

### Explanation of confusion matrix
Note that "human analysis" refers both to LLM + human assessment as well as just the LLM assessment (files `analysis_meta.csv` and `analysis_single.csv`)
| Term | Condition in the Code | Meaning | Example Scenario |
|------|------------------------|----------|------------------|
| **True Positive (TP)** | `if gt_is_elicited and human_is_elicited:` | The ground truth says the requirement was elicited, and the human analysis also marked it as elicited. | The ground truth has “R.SA.1 = yes” and the human filled a value for R.SA.1. |
| **False Positive (FP)** | `elif not gt_is_elicited and human_is_elicited:` | The ground truth says the requirement was **not** elicited, but the human analysis incorrectly marked it as elicited. | The ground truth has “R.SA.1 = no” but the human filled a value for R.SA.1. |
| **False Negative (FN)** | `elif gt_is_elicited and not human_is_elicited:` | The ground truth says the requirement was elicited, but the human analysis failed to identify it. | The ground truth has “R.SA.1 = yes” but the human left R.SA.1 blank. |
| **True Negative (TN)** | `elif not gt_is_elicited and not human_is_elicited:` | Both the ground truth and the human analysis agree that the requirement was **not** elicited. | The ground truth has “R.SA.1 = no” and the human left R.SA.1 blank. |



In [None]:
from confusion_matrix import main

main()

### Terminal output
```
Please select a file to analyze:
 1: analysis_single.csv
 2: combined_analysis_meta_human.csv
 3: analysis_meta.csv
 4: combined_analysis_single_human.csv
 0: All files

Enter the number of the file 1 to 4 or 0 for all: 0
Loading ground truth with robust method...
Loading file: dataset_new.csv
 -> No 'Iteration' column found. Using (ID, Scenario) key.
Ground truth loaded.


Selected file: results/analysis_single.csv

Loading analysis file with robust method...
Loading file: analysis_single.csv
 -> No 'Iteration' column found. Using (ID, Scenario) key.
Data loaded successfully.

Saving confusion matrix results to: results/confusion_analysis_single.txt

Single analysis file detected. Running combined analysis.
Comparing data...
Comparison complete.

--- Performance Metrics ---
Sum True Positives:  175
Sum False Positives: 7
Sum False Negatives: 312
Sum True Negatives:  52
-------------------------
Accuracy:  41.58%
Precision: 96.15%
Recall:    35.93%
F1 Score:  52.32%

============================================================


Selected file: results/combined_analysis_meta_human.csv

Loading analysis file with robust method...
Loading file: combined_analysis_meta_human.csv
 -> Found 'Iteration' column. Using (ID, Scenario, Iteration) key.
Data loaded successfully.

Saving confusion matrix results to: results/confusion_combined_analysis_meta_human.txt

Meta file detected. Running analysis for iterations 1, 2, and 3.
--- Comparing data for Iteration 1 ---
Comparison complete.

--- Performance Metrics (Iteration 1) ---
Sum True Positives:  120
Sum False Positives: 4
Sum False Negatives: 367
Sum True Negatives:  55
-------------------------
Accuracy:  32.05%
Precision: 96.77%
Recall:    24.64%
F1 Score:  39.28%

============================================================

--- Comparing data for Iteration 2 ---
Comparison complete.

--- Performance Metrics (Iteration 2) ---
Sum True Positives:  116
Sum False Positives: 5
Sum False Negatives: 371
Sum True Negatives:  54
-------------------------
Accuracy:  31.14%
Precision: 95.87%
Recall:    23.82%
F1 Score:  38.16%

============================================================

--- Comparing data for Iteration 3 ---
Comparison complete.

--- Performance Metrics (Iteration 3) ---
Sum True Positives:  116
Sum False Positives: 3
Sum False Negatives: 371
Sum True Negatives:  56
-------------------------
Accuracy:  31.50%
Precision: 97.48%
Recall:    23.82%
F1 Score:  38.28%

============================================================


Selected file: results/analysis_meta.csv

Loading analysis file with robust method...
Loading file: analysis_meta.csv
 -> Found 'Iteration' column. Using (ID, Scenario, Iteration) key.
Data loaded successfully.

Saving confusion matrix results to: results/confusion_analysis_meta.txt

Meta file detected. Running analysis for iterations 1, 2, and 3.
--- Comparing data for Iteration 1 ---
Comparison complete.

--- Performance Metrics (Iteration 1) ---
Sum True Positives:  132
Sum False Positives: 6
Sum False Negatives: 355
Sum True Negatives:  53
-------------------------
Accuracy:  33.88%
Precision: 95.65%
Recall:    27.10%
F1 Score:  42.24%

============================================================

--- Comparing data for Iteration 2 ---
Comparison complete.

--- Performance Metrics (Iteration 2) ---
Sum True Positives:  135
Sum False Positives: 7
Sum False Negatives: 352
Sum True Negatives:  52
-------------------------
Accuracy:  34.25%
Precision: 95.07%
Recall:    27.72%
F1 Score:  42.93%

============================================================

--- Comparing data for Iteration 3 ---
Comparison complete.

--- Performance Metrics (Iteration 3) ---
Sum True Positives:  134
Sum False Positives: 4
Sum False Negatives: 353
Sum True Negatives:  55
-------------------------
Accuracy:  34.62%
Precision: 97.10%
Recall:    27.52%
F1 Score:  42.88%

============================================================


Selected file: results/combined_analysis_single_human.csv

Loading analysis file with robust method...
Loading file: combined_analysis_single_human.csv
 -> No 'Iteration' column found. Using (ID, Scenario) key.
Data loaded successfully.

Saving confusion matrix results to: results/confusion_combined_analysis_single_human.txt

Single analysis file detected. Running combined analysis.
Comparing data...
Comparison complete.

--- Performance Metrics ---
Sum True Positives:  84
Sum False Positives: 2
Sum False Negatives: 403
Sum True Negatives:  57
-------------------------
Accuracy:  25.82%
Precision: 97.67%
Recall:    17.25%
F1 Score:  29.32%

============================================================
```