# Collections & Data Structures

## Lists

**Lists** are ordered, changeable collections. They are perfect for keeping track of sample names, concentrations, or a list of sequences.

- **Creation**: Use square brackets [].
- **Ordering**: Items stay in the order you put them in.
- **Flexibility**: You can add, remove, or change items.

In [1]:
# Creating a list of samples
samples = ["Control_1", "Control_2", "Mutant_1", "Mutant_2"]

# Adding a new sample to the end
samples.append("Mutant_3")

# Changing a specific entry (e.g., if you realized Mutant_1 was contaminated)
samples[2] = "Mutant_1_Changed"

print(f"Current Sample List: {samples}")
print(f"Total number of samples: {len(samples)}")

Current Sample List: ['Control_1', 'Control_2', 'Mutant_1_Changed', 'Mutant_2', 'Mutant_3']
Total number of samples: 5


### ðŸ§ª Useful List Methods

| Method | Action |  Python Syntax Example |
| :--- | :--- | :--- |
| **`.append()`** | Adds an item to the end |  `samples.append("Mouse_5")` |
| **`.insert()`** | Adds item at specific index |  `samples.insert(0, "Standard")` |
| **`.remove()`** | Removes a specific item |  `samples.remove("Degraded_DNA")` |
| **`.pop()`** | Removes item at index |  `next_up = samples.pop(0)` |
| **`.sort()`** | Sorts the list in place |  `samples.sort()` |
| **`.reverse()`**| Reverses the list order |  `samples.reverse()` |
| **`.extend()`** | Merges two lists |  `plate1.extend(plate2)` |
| **`.clear()`** | Removes all items |  `samples.clear()` |

A very common mistake is trying to combine two lists using `.append()`.

If you use `listA.append(listB)`, you get a list inside a list (a nested list).

If you use `listA.extend(listB)`, you get one long, continuous list.

`.extend()` is almost always what you want when merging data from different experimental runs!

In [2]:
# An example of using the .append() method
# Start with an empty list to store results
sample = []

sample.append(45.2)
sample.append(102.8)
sample.append(63.7)

print(f"Final sample list: {sample}")

Final sample list: [45.2, 102.8, 63.7]


In [3]:
# An example of using the .extend() method
samples = [45.2, 102.8, 63.7]

# Samples from Day 2
samples2 = [74.3, 115.9]

# Use .extend() to merge the second list to the first one
samples.extend(samples2)

print(f"Master Sample List: {samples}")

Master Sample List: [45.2, 102.8, 63.7, 74.3, 115.9]


### âž• Merging with the Plus Operator

While **`.extend()`** modifies the original list, the **`+`** operator creates a brand new list by joining two existing ones. This is particularly useful if you want to keep your original data separate and create a "Master List" for your analysis.

In [4]:
# Separate lists for different experimental groups
control_group = ["Control_A", "Control_B"]
treatment_group = ["Drug_10uM", "Drug_50uM"]

# Create a NEW list by adding them together
all_experimental_samples = control_group + treatment_group

print("Original Controls:", control_group)
print("Original Treatments:", treatment_group)
print("Combined Dataset:", all_experimental_samples)

Original Controls: ['Control_A', 'Control_B']
Original Treatments: ['Drug_10uM', 'Drug_50uM']
Combined Dataset: ['Control_A', 'Control_B', 'Drug_10uM', 'Drug_50uM']


Similar to Stirngs, the element of a list are accessed by an index:

In [6]:
print(f"The second element of a list: {sample[1]}")
print(f"Next to the last element of a list: {sample[-2]}")

The second element of a list: 63.7
Next to the last element of a list: 102.8


You can delete elements of the list using the `del` command. **Note**: `del` command does **not** ise parentheses:

In [5]:
del sample[0]
print(sample)

[102.8, 63.7]


---

## Dictionaries

We often work with lists of numbers corresponding to specific genes, patients, or protein IDs. A **Dictionary** is the perfect data structure for this because it links a Key (the unique identifier) to a Value (the data).

A **dictionary** is the ideal way to organize data from different brain regions, cell types, or experimental channels, where the "Key" is the specific identifier and the "Value" contains the measurement.

- **Keys**: Must be unique and unchanging (Strings, Numbers, or Tuples).
- **Values**: Can be anything (Numbers, Lists, even other Dictionaries).

We might want to store:

- words and their definitions
- sample names and their coordinates
- patient IDs and their health measurements (blood pressure, temperature, etc.)
- protein sequence names and their sequences
- DNA enzyme names and their motifs
- codons and their associated amino acid residues


In [None]:
# A partial codon table
genetic_code = {
    "AUG": "Methionine",
    "CUU": "Leucine",
    "UUU": "Phenylalanine",
    "GGC": "Glycine",
    "GCU": " Alanine"
}

### ðŸ”‘ Dictionary Methods for Data Management

| Method | Action |  Syntax Example |
| :--- | :--- | :--- |
| **`.keys()`** | Returns all Keys |  `my_dict.keys()` |
| **`.values()`**| Returns all Values | `my_dict.values()` |
| **`.items()`** | Returns Key-Value pairs | `my_dict.items()` |
| **`.get()`** | Safe lookup |  `my_dict.get("Unit_99", "N/A")` |
| **`.update()`**| Merges dictionaries | `dict_1.update(dict_2)` |

In [7]:
# --- Setup: Gene Expression dataset  ---
gene_data = {
    "GAPDH": 150.2,
    "TP53": 12.8,
    "BRCA1": 45.1
}

# .keys() - Retrieving Identifiers
# Useful for: Getting a unique list of all genes detected in an assay.
genes_detected = gene_data.keys()
print(f"Genes analyzed: {list(genes_detected)}") 


Genes analyzed: ['GAPDH', 'TP53', 'BRCA1']


In [None]:

# .values() - Running Statistics
# Useful for: Calculating global metrics like the mean expression across a sample.
all_expression_values = gene_data.values()
mean_expression = sum(all_expression_values) / len(all_expression_values)
print(f"Mean Expression: {mean_expression:.2f} FPKM")


In [9]:

#  .get() - safe(r) aproach to get the value 
# Useful for: Searching for a gene in a large dataset where it might be missing.
# Logic: .get(key, "default_if_missing")
query_gene = "TP53"
result = gene_data.get(query_gene, "Gene not detected in this library")
print(f"Search result for {query_gene}: {result}")



Search result for TP53: 12.8


In [10]:

# .update() - Merging Experimental Batches
# Useful for: Combining results from two different sequencing runs.
additional_gene_data = {
    "MYC": 88.4,
    "Sox2": 110.1
}
gene_data.update(additional_gene_data)

print(f"Final Combined Dataset: {gene_data}")

Final Combined Dataset: {'GAPDH': 150.2, 'TP53': 12.8, 'BRCA1': 45.1, 'MYC': 88.4, 'Sox2': 110.1}


------

## Tuples 

**A Tuple** is an ordered collection of items, just like a list, but with one major difference: it is immutable.

*Syntax:* Defined using parentheses ( ) instead of square brackets.

*Performance:* Slightly faster and more memory-efficient than lists.

*Note:* Perfect for data that should be "Read-Only"


### ðŸ”’ Tuple Methods Reference

| Method | Action |Python Syntax Example |
| :--- | :--- | :--- | 
| **`.count()`** | Counts occurrences  | `protocol.count("Control")` |
| **`.index()`** | Finds the first position | `markers.index("GFP")` |

In [11]:
# A fixed experimental design: 3 Controls, 2 Treatments
experimental_design = ("Control", "Control", "Treatment_A", "Control", "Treatment_A")

# Count how many controls are in the design
control_count = experimental_design.count("Control")
print(f"Number of control replicates: {control_count}")

Number of control replicates: 3


In [12]:
# A list of markers used in an imaging panel
markers = ("DAPI", "GFP", "mCherry", "Cy5")

# Find where GFP is located in the sequence
gfp_pos = markers.index("GFP")
print(f"GFP is located at index: {gfp_pos}")

GFP is located at index: 1


---

### ðŸ“Š Summary of Python Collections

| Structure | Syntax | Mutable? | Best Used For... |
| :--- | :--- | :--- | :--- |
| **List** | `[ ]` | Yes | Ordered samples, list of sequences, ELISA readings. |
| **Dictionary** | `{ : }`| Yes | Gene IDs to Expression values, Codon Tables, Metadata. |
| **Tuple** | `( )` | No | Fixed coordinates, RGB color values, Constants. |