### **1. Lists, Tuples, Sets, Dictionaries, and Frozensets**

Python collections can be classified by **order** (whether elements keep insertion order) and **mutability** (whether you can change them after creation):
| Order | Mutable | Immutable |
| --- | --- | --- |
| Ordered | List | Tuple |
| Unordered | Set, Dictionary | Frozenset |

### **2. Generic Python Explanation & Examples**

#### **2.1 Ordered Mutable – List**
- Can be changed after creation.
- Keeps elements in the order you add them.

In [36]:
# List
responses = ["Yes", "No", "May be"]

responses.append("Not Sure")

print("Responses", responses)

Responses ['Yes', 'No', 'May be', 'Not Sure']


#### **2.2 Ordered Immutable – Tuple**
- Fixed data once created

In [37]:
# Tuple
coordinates = (10.5, 20.1)

# coordinates[0] = 11.5 # Error: 'tuple' object does not support item assignment

print(coordinates)


(10.5, 20.1)


### **2.3 Unordered Mutable – Set**
- Unique items only, duplicates removed automatically.

In [38]:
# Set
models = {"gpt-5", "mistral", "deepseek-r1"}
models.add('local_llm')
print("Models: ", models)

Models:  {'gpt-5', 'mistral', 'local_llm', 'deepseek-r1'}


In [39]:
models.add("gpt-5") # ignore duplicates
print("Models: ", models) 

Models:  {'gpt-5', 'mistral', 'local_llm', 'deepseek-r1'}


### **2.4 Unordered Mutable – Dictionary**
- Key-value storage, keys must be unique.

In [40]:
# Dictonary
metrics = {"faithfulness": 0.95, "relevance": 0.89}
print("Metrics: ", metrics)

# Update "relevance"
metrics["relevance"] = 0.91
print("Updated Metrics: ", metrics)

Metrics:  {'faithfulness': 0.95, 'relevance': 0.89}
Updated Metrics:  {'faithfulness': 0.95, 'relevance': 0.91}


### **2.5 Unordered Immutable – Frozenset**
- Set that cannot be changed.

In [41]:
allowed_metrics = frozenset(["faithfulness", "relevance", "bias"])

# try to update 'allowed_metrics' 
#allowed_metrics[0] = 'hallucination' # Error: 'frozenset' object does not support item assignment

print("Allowed Metrics: ",allowed_metrics)

Allowed Metrics:  frozenset({'relevance', 'faithfulness', 'bias'})


### **3. Explanation Specific to AI Evaluation/Testing**

### **Lists**
- Store multiple model outputs or evaluation scores for a single prompt.

In [42]:
hallucination_scores = [0.05, 0.1, 0.0, 0.07]
print("Hallucination Scores: ", hallucination_scores)

Hallucination Scores:  [0.05, 0.1, 0.0, 0.07]


### **Tuples**
- Store fixed configuration values like thresholds `(min_score, max_score)`.

In [43]:
score_range = (0.8, 1.0)
print("Minimum Score: ", score_range[0])
print("Maximum Score: ", score_range[1])

Minimum Score:  0.8
Maximum Score:  1.0


### **Sets**
- Store unique prompts tested to avoid duplicate evaluations.

In [44]:
test_prompts = {"What is AI?", "Define machine learning"}
print("Test Prompt: ",test_prompts)

Test Prompt:  {'What is AI?', 'Define machine learning'}


### **Dictionaries**
- Map prompt IDs to multiple evaluation metrics.

In [45]:
import pprint

eval_results = {
    "TC_001" : {"faithfulness": 0.8, "bias": 0.2},
    "TC_002" : {"faithfulness": 0.9, "bias": 0.1},
    "TC_003" : {"faithfulness": 0.7, "bias": 0.5}
}

print(eval_results)
print("\n")
pprint.pprint(eval_results)

{'TC_001': {'faithfulness': 0.8, 'bias': 0.2}, 'TC_002': {'faithfulness': 0.9, 'bias': 0.1}, 'TC_003': {'faithfulness': 0.7, 'bias': 0.5}}


{'TC_001': {'bias': 0.2, 'faithfulness': 0.8},
 'TC_002': {'bias': 0.1, 'faithfulness': 0.9},
 'TC_003': {'bias': 0.5, 'faithfulness': 0.7}}


### **Frozensets**
- Store fixed allowed metric names to prevent accidental modification in large-scale evaluation pipelines.

In [46]:
required_metrics = frozenset(["faithfulness", "relevance"])
# required_metrics[1] = "bias" # Error: 'frozenset' object does not support item assignment

print("Required Metrics: ", required_metrics)

Required Metrics:  frozenset({'relevance', 'faithfulness'})


### **4. AI Evaluation/Testing Exercise**

**Goal:**

Store evaluation results for multiple prompts, ensure no duplicate prompts, and check if all metrics are valid.

**Code-1:**

In [47]:
# Example scores for one question
metrics = {"faithfulness": 0.95, "relevance": 0.9}

# Allowed metric names
allowed_metrics = ["faithfulness", "relevance"]

# Check if all metrics are allowed
all_metrics_ok = True

for metric in metrics:  # loop through each metric name
    if metric not in allowed_metrics:
        all_metrics_ok = False

print("All metrics valid:", all_metrics_ok)


All metrics valid: True


**Code-2:**

In [48]:
# Set: unique questions (no duplicates allowed)
questions = {"What is AI?", "Define machine learning"}

# Dictionary: scores for each question
scores = {
    "What is AI?": {"faithfulness": 0.95, "relevance": 0.9},
    "Define machine learning": {"faithfulness": 0.92, "relevance": 0.88}
}

# Frozenset: allowed metrics (like a set, but cannot be changed)
allowed_metrics = frozenset(["faithfulness", "relevance"])

# Tuple: example of fixed order data
example_tuple = ("AI", 2025)

# List: will store any invalid metric names found
invalid_metrics = []

# Loop through each question in the set
for question in questions:
    # Get the metrics dictionary for the current question
    metrics = scores[question]
    
    # Loop through each metric name (dictionary keys)
    for metric in metrics:
        if metric not in allowed_metrics:  # Compare with frozenset
            invalid_metrics.append(metric)  # Add to list if not allowed

# Output results
print("Questions (set):", questions)
print("Example tuple:", example_tuple)
print("Allowed metrics (frozenset):", allowed_metrics)
print("Invalid metrics found (list):", invalid_metrics)
print("All metrics valid:", len(invalid_metrics) == 0)


Questions (set): {'What is AI?', 'Define machine learning'}
Example tuple: ('AI', 2025)
Allowed metrics (frozenset): frozenset({'relevance', 'faithfulness'})
Invalid metrics found (list): []
All metrics valid: True
