# Python Lists: Complete Guide for AI & Data Science

**Part of AI Trailblazer Python Foundations**

Lists are one of the most fundamental and versatile data structures in Python. They are essential for data manipulation, processing collections of data, and serving as the foundation for more advanced data structures used in AI and Data Science.

## Learning Objectives
By the end of this notebook, you will:
- Understand what lists are and when to use them
- Master list creation and manipulation
- Learn efficient list operations for data processing
- Apply lists to real-world AI/Data Science scenarios
- Understand performance implications

## 1. Introduction to Lists

A **list** is an ordered, mutable collection of items in Python. Lists can contain items of different data types and are defined using square brackets `[]`.

### Key Characteristics:
- **Ordered**: Items maintain their order
- **Mutable**: Can be modified after creation
- **Dynamic**: Can grow or shrink in size
- **Heterogeneous**: Can contain different data types
- **Allow Duplicates**: Same value can appear multiple times

In [None]:
# Simple list example
ml_metrics = ['accuracy', 'precision', 'recall', 'f1-score']
print("ML Metrics:", ml_metrics)
print("Type:", type(ml_metrics))

## 2. Creating Lists

There are multiple ways to create lists in Python:

In [None]:
# Empty list
empty_list = []
empty_list_alt = list()

# List with initial values
model_scores = [0.85, 0.92, 0.78, 0.95]
print("Model Scores:", model_scores)

# Mixed data types
model_info = ['RandomForest', 0.89, True, 100]  # name, accuracy, is_trained, n_estimators
print("Model Info:", model_info)

# List from string
chars = list('Python')
print("Characters:", chars)

# List from range
epochs = list(range(1, 11))  # Training epochs 1 to 10
print("Epochs:", epochs)

# List with repeated elements
zeros = [0] * 5  # Initialize with zeros
print("Zeros:", zeros)

## 3. Accessing List Elements

### 3.1 Indexing
Lists use zero-based indexing. Negative indices count from the end.

In [None]:
algorithms = ['KNN', 'SVM', 'Decision Tree', 'Random Forest', 'Neural Network']

# Positive indexing (from start)
print("First algorithm:", algorithms[0])
print("Third algorithm:", algorithms[2])

# Negative indexing (from end)
print("Last algorithm:", algorithms[-1])
print("Second to last:", algorithms[-2])

# Length of list
print("Total algorithms:", len(algorithms))

### 3.2 Slicing
Extract sublists using the syntax: `list[start:stop:step]`

In [None]:
data_points = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

# Basic slicing
print("First 3 elements:", data_points[:3])
print("Last 3 elements:", data_points[-3:])
print("Middle elements (index 3 to 7):", data_points[3:7])

# Step slicing
print("Every 2nd element:", data_points[::2])
print("Every 3rd element from index 1:", data_points[1::3])

# Reverse list
print("Reversed:", data_points[::-1])

# Copy entire list
copy_data = data_points[:]
print("Copy:", copy_data)

## 4. Modifying Lists

Lists are mutable, meaning we can change their content after creation.

In [None]:
# Update single element
accuracy_scores = [0.75, 0.82, 0.88, 0.91]
print("Original:", accuracy_scores)

accuracy_scores[1] = 0.85  # Update second score
print("After update:", accuracy_scores)

# Update multiple elements using slicing
accuracy_scores[2:4] = [0.90, 0.93]
print("After multiple updates:", accuracy_scores)

## 5. List Methods

Python provides many built-in methods for list manipulation.

### 5.1 Adding Elements

In [None]:
# append() - Add single element to end
features = ['age', 'income', 'education']
features.append('experience')
print("After append:", features)

# extend() - Add multiple elements
features.extend(['location', 'gender'])
print("After extend:", features)

# insert() - Add element at specific position
features.insert(1, 'name')  # Insert at index 1
print("After insert:", features)

# Using + operator (creates new list)
more_features = features + ['occupation', 'zipcode']
print("Using + operator:", more_features)

### 5.2 Removing Elements

In [None]:
# remove() - Remove first occurrence of value
labels = ['spam', 'ham', 'spam', 'ham', 'spam']
labels.remove('spam')  # Removes first 'spam'
print("After remove:", labels)

# pop() - Remove and return element at index (default: last)
last_label = labels.pop()
print("Popped element:", last_label)
print("After pop:", labels)

second_label = labels.pop(1)  # Remove at index 1
print("Popped at index 1:", second_label)
print("After pop(1):", labels)

# del - Delete element(s) or entire list
del labels[0]
print("After del:", labels)

# clear() - Remove all elements
labels.clear()
print("After clear:", labels)

### 5.3 Searching and Counting

In [None]:
predictions = [1, 0, 1, 1, 0, 1, 0, 0, 1, 1]

# count() - Count occurrences
positive_count = predictions.count(1)
negative_count = predictions.count(0)
print(f"Positive predictions: {positive_count}")
print(f"Negative predictions: {negative_count}")

# index() - Find first occurrence index
first_negative = predictions.index(0)
print(f"First negative prediction at index: {first_negative}")

# in operator - Check membership
print("Is 1 in predictions?", 1 in predictions)
print("Is 2 in predictions?", 2 in predictions)

### 5.4 Sorting and Reversing

In [None]:
loss_values = [0.85, 0.42, 0.68, 0.31, 0.55, 0.29]

# sort() - Sort in place (modifies original)
loss_values.sort()
print("Sorted (ascending):", loss_values)

loss_values.sort(reverse=True)
print("Sorted (descending):", loss_values)

# sorted() - Return sorted copy (original unchanged)
original_scores = [0.75, 0.95, 0.82, 0.88]
sorted_scores = sorted(original_scores)
print("Original:", original_scores)
print("Sorted copy:", sorted_scores)

# reverse() - Reverse in place
epochs = [1, 2, 3, 4, 5]
epochs.reverse()
print("Reversed epochs:", epochs)

# reversed() - Return reversed iterator
print("Using reversed():", list(reversed([1, 2, 3, 4, 5])))

## 6. List Comprehensions

List comprehensions provide a concise way to create lists. They are often more readable and faster than traditional loops.

**Syntax**: `[expression for item in iterable if condition]`

In [None]:
# Basic list comprehension
squares = [x**2 for x in range(1, 11)]
print("Squares:", squares)

# With condition
even_squares = [x**2 for x in range(1, 11) if x % 2 == 0]
print("Even squares:", even_squares)

# Transform data
temperatures_celsius = [0, 10, 20, 30, 40]
temperatures_fahrenheit = [(9/5) * temp + 32 for temp in temperatures_celsius]
print("Celsius:", temperatures_celsius)
print("Fahrenheit:", temperatures_fahrenheit)

# String manipulation
words = ['machine', 'learning', 'artificial', 'intelligence']
uppercase_words = [word.upper() for word in words]
print("Uppercase:", uppercase_words)

# Filtering
scores = [0.45, 0.78, 0.92, 0.65, 0.88, 0.71]
high_scores = [score for score in scores if score > 0.7]
print("High scores (>0.7):", high_scores)

### Advanced List Comprehensions

In [None]:
# Nested list comprehension
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flattened = [num for row in matrix for num in row]
print("Flattened matrix:", flattened)

# If-else in comprehension
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
labeled = ['even' if n % 2 == 0 else 'odd' for n in numbers]
print("Labeled:", labeled)

# Multiple conditions
divisible_by_2_and_3 = [x for x in range(1, 31) if x % 2 == 0 if x % 3 == 0]
print("Divisible by 2 and 3:", divisible_by_2_and_3)

# Create pairs
features = ['x1', 'x2', 'x3']
targets = ['y1', 'y2', 'y3']
pairs = [(f, t) for f in features for t in targets]
print("Feature-Target pairs (first 5):", pairs[:5])

## 7. Common List Operations

### 7.1 Concatenation and Repetition

In [None]:
# Concatenation with +
train_data = [1, 2, 3, 4, 5]
test_data = [6, 7, 8, 9, 10]
all_data = train_data + test_data
print("Combined data:", all_data)

# Repetition with *
batch = [0] * 5
print("Batch:", batch)

# Repeat pattern
pattern = [1, 0] * 3
print("Pattern:", pattern)

### 7.2 Copying Lists

**Important**: Assignment (`=`) creates a reference, not a copy!

In [None]:
# Reference (NOT a copy)
original = [1, 2, 3, 4, 5]
reference = original
reference[0] = 999
print("Original after modifying reference:", original)  # Original is also changed!

# Shallow copy methods
original = [1, 2, 3, 4, 5]

# Method 1: Using slice
copy1 = original[:]

# Method 2: Using list()
copy2 = list(original)

# Method 3: Using copy()
copy3 = original.copy()

copy1[0] = 999
print("Original after modifying copy:", original)  # Original unchanged
print("Copy:", copy1)

# Deep copy (for nested lists)
import copy
nested_original = [[1, 2], [3, 4]]
shallow = nested_original[:]
deep = copy.deepcopy(nested_original)

shallow[0][0] = 999  # Modifies original too!
deep[1][0] = 888     # Doesn't modify original

print("Nested original:", nested_original)
print("Deep copy:", deep)

### 7.3 List Aggregation Functions

In [None]:
metrics = [0.75, 0.82, 0.88, 0.91, 0.79, 0.85]

# Basic statistics
print("Count:", len(metrics))
print("Sum:", sum(metrics))
print("Average:", sum(metrics) / len(metrics))
print("Min:", min(metrics))
print("Max:", max(metrics))

# All and any
high_scores = [0.85, 0.92, 0.88, 0.95]
print("All scores > 0.80?", all(score > 0.80 for score in high_scores))
print("Any score > 0.90?", any(score > 0.90 for score in high_scores))

## 8. Nested Lists

Lists can contain other lists, creating multi-dimensional structures (like matrices).

In [None]:
# 2D list (matrix representation)
confusion_matrix = [
    [50, 10],  # Actual Negative: [TN, FP]
    [5, 35]    # Actual Positive: [FN, TP]
]

print("Confusion Matrix:")
for row in confusion_matrix:
    print(row)

# Accessing nested elements
true_positives = confusion_matrix[1][1]
false_positives = confusion_matrix[0][1]
print(f"\nTrue Positives: {true_positives}")
print(f"False Positives: {false_positives}")

# Dataset structure
dataset = [
    ['Name', 'Age', 'Score'],
    ['Alice', 25, 0.92],
    ['Bob', 30, 0.85],
    ['Charlie', 28, 0.88]
]

print("\nDataset:")
for record in dataset:
    print(record)

# Creating matrix with comprehension
identity_matrix = [[1 if i == j else 0 for j in range(3)] for i in range(3)]
print("\nIdentity Matrix:")
for row in identity_matrix:
    print(row)

## 9. Iterating Through Lists

Multiple ways to loop through lists efficiently.

In [None]:
algorithms = ['Linear Regression', 'Logistic Regression', 'Decision Tree', 'Random Forest']

# Basic iteration
print("Method 1: Direct iteration")
for algo in algorithms:
    print(f"  - {algo}")

# With index using enumerate()
print("\nMethod 2: With index (enumerate)")
for idx, algo in enumerate(algorithms):
    print(f"  {idx + 1}. {algo}")

# Start enumerate at different number
print("\nMethod 3: Enumerate with start=1")
for idx, algo in enumerate(algorithms, start=1):
    print(f"  {idx}. {algo}")

# Iterate through multiple lists with zip()
print("\nMethod 4: Multiple lists (zip)")
accuracies = [0.75, 0.82, 0.88, 0.91]
for algo, acc in zip(algorithms, accuracies):
    print(f"  {algo}: {acc:.2f}")

# Iterate in reverse
print("\nMethod 5: Reverse iteration")
for algo in reversed(algorithms):
    print(f"  - {algo}")

## 10. Practical Examples for AI/Data Science

### Example 1: Data Preprocessing

In [None]:
# Clean and normalize data
raw_data = [100, 200, -1, 150, 999, 180, -1, 220]  # -1 represents missing values

# Remove missing values
clean_data = [x for x in raw_data if x != -1]
print("Clean data:", clean_data)

# Normalize to 0-1 range
min_val = min(clean_data)
max_val = max(clean_data)
normalized = [(x - min_val) / (max_val - min_val) for x in clean_data]
print("Normalized:", [f"{x:.3f}" for x in normalized])

# Replace outliers (values > 900) with median
threshold = 900
median = sorted(clean_data)[len(clean_data) // 2]
cleaned_outliers = [median if x > threshold else x for x in clean_data]
print("After outlier removal:", cleaned_outliers)

### Example 2: Train-Test Split

In [None]:
# Simple train-test split
data = list(range(1, 101))  # 100 data points
split_ratio = 0.8
split_index = int(len(data) * split_ratio)

train_set = data[:split_index]
test_set = data[split_index:]

print(f"Total data points: {len(data)}")
print(f"Training set size: {len(train_set)}")
print(f"Test set size: {len(test_set)}")
print(f"First 5 training samples: {train_set[:5]}")
print(f"First 5 test samples: {test_set[:5]}")

### Example 3: Batch Processing

In [None]:
# Create batches for training
def create_batches(data, batch_size):
    """Split data into batches"""
    batches = []
    for i in range(0, len(data), batch_size):
        batches.append(data[i:i + batch_size])
    return batches

samples = list(range(1, 26))  # 25 samples
batches = create_batches(samples, batch_size=8)

print(f"Total samples: {len(samples)}")
print(f"Batch size: 8")
print(f"Number of batches: {len(batches)}")
for i, batch in enumerate(batches, 1):
    print(f"Batch {i}: {batch}")

### Example 4: Calculating Metrics

In [None]:
# Calculate accuracy, precision, recall
y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 1]
y_pred = [1, 0, 1, 0, 0, 1, 1, 0, 1, 1]

# True Positives, False Positives, True Negatives, False Negatives
tp = sum([1 for true, pred in zip(y_true, y_pred) if true == 1 and pred == 1])
fp = sum([1 for true, pred in zip(y_true, y_pred) if true == 0 and pred == 1])
tn = sum([1 for true, pred in zip(y_true, y_pred) if true == 0 and pred == 0])
fn = sum([1 for true, pred in zip(y_true, y_pred) if true == 1 and pred == 0])

# Calculate metrics
accuracy = (tp + tn) / len(y_true)
precision = tp / (tp + fp) if (tp + fp) > 0 else 0
recall = tp / (tp + fn) if (tp + fn) > 0 else 0
f1_score = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0

print("Confusion Matrix:")
print(f"  TP: {tp}, FP: {fp}")
print(f"  FN: {fn}, TN: {tn}")
print(f"\nMetrics:")
print(f"  Accuracy:  {accuracy:.3f}")
print(f"  Precision: {precision:.3f}")
print(f"  Recall:    {recall:.3f}")
print(f"  F1-Score:  {f1_score:.3f}")

### Example 5: Feature Engineering

In [None]:
# Create polynomial features
x = [1, 2, 3, 4, 5]

# Create features: x, x^2, x^3
features = [[val, val**2, val**3] for val in x]

print("Original values:", x)
print("\nPolynomial features [x, x^2, x^3]:")
for original, engineered in zip(x, features):
    print(f"  x={original}: {engineered}")

# One-hot encoding simulation
categories = ['cat', 'dog', 'cat', 'bird', 'dog', 'cat']
unique_categories = list(set(categories))
print(f"\nUnique categories: {unique_categories}")

one_hot = []
for category in categories:
    encoding = [1 if cat == category else 0 for cat in unique_categories]
    one_hot.append(encoding)

print("\nOne-hot encoding:")
for original, encoded in zip(categories, one_hot):
    print(f"  {original}: {encoded}")

## 11. Performance Considerations

Understanding performance characteristics helps write efficient code.

In [None]:
import time

# List vs Generator for large datasets
n = 100000

# Using list comprehension (creates entire list in memory)
start = time.time()
squares_list = [x**2 for x in range(n)]
list_time = time.time() - start

# Using generator (lazy evaluation)
start = time.time()
squares_gen = (x**2 for x in range(n))
gen_time = time.time() - start

print(f"List comprehension time: {list_time:.6f} seconds")
print(f"Generator expression time: {gen_time:.6f} seconds")
print(f"\nNote: Generator is faster to create but evaluates lazily when used")

In [None]:
# append vs extend vs + operator
import time

def test_append(n):
    result = []
    for i in range(n):
        result.append(i)
    return result

def test_extend(n):
    result = []
    result.extend(range(n))
    return result

def test_list_comp(n):
    return [i for i in range(n)]

n = 10000

start = time.time()
test_append(n)
append_time = time.time() - start

start = time.time()
test_extend(n)
extend_time = time.time() - start

start = time.time()
test_list_comp(n)
comp_time = time.time() - start

print(f"append() time:         {append_time:.6f} seconds")
print(f"extend() time:         {extend_time:.6f} seconds")
print(f"list comprehension:    {comp_time:.6f} seconds")
print(f"\nList comprehensions are typically fastest for creating lists")

### Time Complexity of Common Operations

| Operation | Time Complexity | Example |
|-----------|----------------|----------|
| Access by index | O(1) | `lst[i]` |
| Append | O(1) | `lst.append(x)` |
| Pop last | O(1) | `lst.pop()` |
| Pop at index | O(n) | `lst.pop(i)` |
| Insert | O(n) | `lst.insert(i, x)` |
| Delete | O(n) | `del lst[i]` |
| Search | O(n) | `x in lst` |
| Slice | O(k) | `lst[i:j]` (k = j-i) |
| Sort | O(n log n) | `lst.sort()` |

**Key Takeaways:**
- Use `append()` instead of `insert(0, x)` when order doesn't matter
- Use list comprehensions for better performance
- For large datasets requiring frequent insertions/deletions, consider other data structures (deque, set, dict)

## 12. Best Practices

### DO's:
1. ✅ Use list comprehensions for readable and efficient code
2. ✅ Use `enumerate()` when you need both index and value
3. ✅ Use `zip()` to iterate through multiple lists together
4. ✅ Use slicing for efficient sublist operations
5. ✅ Use `in` operator for membership testing
6. ✅ Make copies when you need independent lists
7. ✅ Use `extend()` instead of `+` for adding multiple elements to existing list

### DON'Ts:
1. ❌ Don't modify a list while iterating over it
2. ❌ Don't use `+` operator in a loop (creates new list each time)
3. ❌ Don't use lists for lookups when dict/set would be faster
4. ❌ Don't forget that assignment creates reference, not copy
5. ❌ Don't use mutable default arguments in functions

In [None]:
# Example: Common pitfall - Modifying list while iterating
# WRONG WAY
numbers = [1, 2, 3, 4, 5, 6]
# Don't do this:
# for num in numbers:
#     if num % 2 == 0:
#         numbers.remove(num)  # Can skip elements!

# RIGHT WAY 1: Create new list
numbers = [1, 2, 3, 4, 5, 6]
odd_numbers = [num for num in numbers if num % 2 != 0]
print("Odd numbers (comprehension):", odd_numbers)

# RIGHT WAY 2: Iterate over copy
numbers = [1, 2, 3, 4, 5, 6]
for num in numbers[:]:
    if num % 2 == 0:
        numbers.remove(num)
print("After removing evens:", numbers)

In [None]:
# Example: Mutable default argument pitfall
# WRONG WAY
def add_item_wrong(item, item_list=[]):  # DON'T DO THIS!
    item_list.append(item)
    return item_list

# This will have unexpected behavior:
list1 = add_item_wrong('a')
list2 = add_item_wrong('b')  # list2 contains both 'a' and 'b'!
print("Wrong way - list1:", list1)
print("Wrong way - list2:", list2)

# RIGHT WAY
def add_item_right(item, item_list=None):
    if item_list is None:
        item_list = []
    item_list.append(item)
    return item_list

list3 = add_item_right('a')
list4 = add_item_right('b')
print("\nRight way - list3:", list3)
print("Right way - list4:", list4)

## 13. Summary

In this notebook, we covered:

1. **List Basics**: Creation, characteristics, and types
2. **Accessing Elements**: Indexing and slicing
3. **Modifying Lists**: Adding, removing, and updating elements
4. **List Methods**: append, extend, insert, remove, pop, sort, reverse, etc.
5. **List Comprehensions**: Concise and efficient list creation
6. **Common Operations**: Concatenation, copying, aggregation
7. **Nested Lists**: Multi-dimensional structures
8. **Iteration**: Multiple ways to loop through lists
9. **Practical Examples**: Real-world AI/ML applications
10. **Performance**: Time complexity and optimization tips
11. **Best Practices**: Do's and don'ts for clean code

### Key Takeaways:
- Lists are **ordered, mutable, and versatile**
- **List comprehensions** are powerful and Pythonic
- Understanding **performance characteristics** helps write efficient code
- **Copy vs Reference** is crucial to avoid bugs
- Lists are fundamental for data processing in AI/ML workflows

### Next Steps:
- Practice with real datasets
- Learn about NumPy arrays for numerical computing
- Explore Pandas for advanced data manipulation
- Study other data structures: tuples, sets, dictionaries

---

**Part of AI Trailblazer Python Foundations**  
*Master the fundamentals, build production-ready AI solutions*

## Practice Exercises

Try these exercises to reinforce your learning:

1. **Data Cleaning**: Given a list with missing values (represented as None), create a new list with missing values replaced by the mean of non-missing values.

2. **K-Fold Split**: Write a function that splits a dataset into k equal-sized folds for cross-validation.

3. **Moving Average**: Calculate a moving average with window size n for a time series list.

4. **Flatten Nested List**: Write a function to flatten a nested list of arbitrary depth.

5. **Top-K Elements**: Find the top k largest elements from a list without using the `sorted()` function.

*Solutions can be found in the repository's exercises folder.*