# Python Basics Tutorial for Researchers

Welcome! This tutorial covers Python fundamentals you need before learning about reproducible software with Pixi. By the end, you'll understand:
- Core Python syntax and data types
- How to write functions and control program flow
- File handling for data processing
- The foundations needed for reproducible research workflows

**Prerequisites:** None! We start from the beginning.

Let's get started by running the cells below. Each code cell shows examples you can modify and experiment with.

## 1. Variables and Data Types

In Python, a **variable** is a name for a value. Python automatically determines the data type.

**Basic data types:**
- `int`: Integers (whole numbers)
- `float`: Decimal numbers
- `str`: Text strings
- `bool`: True or False

In [None]:
# Variables and Data Types

# Creating variables - no special syntax needed!
name = "Alice"
age = 30
height = 5.7
is_researcher = True

# Print them out
print(name)
print(age)
print(height)
print(is_researcher)

# Check the type of a variable
print(type(name))
print(type(age))
print(type(height))
print(type(is_researcher))

Alice
30
5.7
True
<class 'str'>
<class 'int'>
<class 'float'>
<class 'bool'>


In [None]:
# Arithmetic operations
x = 10
y = 3

print(f"Addition: {x} + {y} = {x + y}")
print(f"Subtraction: {x} - {y} = {x - y}")
print(f"Multiplication: {x} * {y} = {x * y}")
print(f"Division: {x} / {y} = {x / y}")
print(f"Integer division: {x} // {y} = {x // y}")
print(f"Remainder: {x} % {y} = {x % y}")
print(f"Power: {x} ** {y} = {x ** y}")

In [None]:
# String operations
first_name = "Alice"
last_name = "Smith"

# String concatenation
full_name = first_name + " " + last_name
print(full_name)

# String formatting (f-strings - the modern way)
age = 30
message = f"{first_name} is {age} years old"
print(message)

# String methods
text = "hello world"
print(text.upper())
print(text.capitalize())
print(text.split())  # Split into list
print(len(text))  # Length of string

## 2. Lists and Dictionaries

**Lists** are ordered collections of items. Use square brackets `[]`.

**Dictionaries** are key-value pairs. Use curly braces `{}`.

In [None]:
# Lists - ordered collections

# Creating a list
fruits = ["apple", "banana", "cherry", "date"]
print(fruits)
print(len(fruits))

# Indexing (position starts at 0!)
print(fruits[0])      # First item
print(fruits[-1])     # Last item
print(fruits[-2])     # Second to last

# Slicing (get a range)
print(fruits[1:3])    # Items at index 1 and 2 (3 is excluded)
print(fruits[:2])     # First 2 items
print(fruits[2:])     # From index 2 to end

# Modifying lists
fruits.append("elderberry")    # Add to end
print(fruits)

fruits.insert(1, "blueberry")  # Insert at position
print(fruits)

fruits.remove("cherry")        # Remove specific item
print(fruits)

# More useful methods
numbers = [3, 1, 4, 1, 5, 9, 2, 6]
print(f"Original: {numbers}")
print(f"Sorted: {sorted(numbers)}")
print(f"Reversed: {list(reversed(numbers))}")
print(f"Count of 1: {numbers.count(1)}")
print(f"Maximum: {max(numbers)}")
print(f"Sum: {sum(numbers)}")

In [None]:
# Dictionaries - key-value pairs

# Creating a dictionary
person = {
    "name": "Alice",
    "age": 30,
    "city": "Boston",
    "is_researcher": True
}
print(person)

# Accessing values by key
print(person["name"])
print(person["age"])

# Adding new key-value pairs
person["email"] = "alice@example.com"
print(person)

# Modifying values
person["age"] = 31
print(person)

# Removing key-value pairs
del person["email"]
print(person)

# Useful dictionary methods
print(person.keys())           # All keys
print(person.values())         # All values
print(person.items())          # Key-value pairs

# Checking if key exists
if "name" in person:
    print(f"Name: {person['name']}")

# Using get() method (safe way to access)
email = person.get("email", "not provided")  # Returns "not provided" if key doesn't exist
print(email)

## 3. Control Flow: if/else and Loops

**Conditionals** (if/else) run code only when conditions are true.

**Loops** repeat code multiple times.

In [None]:
# Conditional statements (if/else)

age = 25

if age < 18:
    print("You are a minor")
elif age < 65:
    print("You are an adult")
else:
    print("You are a senior")

# Comparison operators
x = 10
print(x > 5)       # Greater than
print(x == 10)     # Equal to
print(x != 5)      # Not equal to
print(x >= 10)     # Greater than or equal
print(x <= 20)     # Less than or equal

# Logical operators
age = 30
is_employed = True

if age > 25 and is_employed:
    print("You are an employed adult")

if age > 25 or age < 18:
    print("You are either older than 25 or younger than 18")

if not is_employed:
    print("Not employed")
else:
    print("Employed")

In [None]:
# For loops - iterate over a list

fruits = ["apple", "banana", "cherry"]
for fruit in fruits:
    print(f"I like {fruit}")

# Using range() to loop a specific number of times
for i in range(5):  # 0, 1, 2, 3, 4
    print(f"Count: {i}")

# Loop with indices
colors = ["red", "green", "blue"]
for index, color in enumerate(colors):
    print(f"{index}: {color}")

# Loop over dictionary
person = {"name": "Alice", "age": 30, "city": "Boston"}
for key, value in person.items():
    print(f"{key}: {value}")

In [None]:
# While loops - repeat while condition is true

count = 0
while count < 5:
    print(f"Count is {count}")
    count += 1  # Increment count

print("Done!")

# Breaking out of a loop
for i in range(10):
    if i == 5:
        break  # Exit the loop
    print(i)

# Continue to skip to next iteration
for i in range(5):
    if i == 2:
        continue  # Skip this iteration
    print(i)

## 4. Functions

**Functions** are reusable blocks of code. They:
- Accept **parameters** (inputs)
- Perform an operation
- **Return** a result (output)

In [None]:
# Simple function

def greet():
    print("Hello, researcher!")

# Call the function
greet()

# Function with parameters
def greet_person(name, title="Dr."):
    print(f"Hello, {title} {name}!")

greet_person("Smith")
greet_person("Johnson", "Prof.")

# Function with return value
def add(a, b):
    return a + b

result = add(3, 5)
print(f"3 + 5 = {result}")

# Function with multiple parameters and return
def calculate_average(numbers):
    total = sum(numbers)
    count = len(numbers)
    average = total / count
    return average

scores = [85, 90, 78, 92]
avg = calculate_average(scores)
print(f"Average score: {avg}")

In [None]:
# Function returning multiple values
def get_stats(numbers):
    minimum = min(numbers)
    maximum = max(numbers)
    average = sum(numbers) / len(numbers)
    return minimum, maximum, average

data = [10, 20, 30, 40, 50]
min_val, max_val, avg_val = get_stats(data)
print(f"Min: {min_val}, Max: {max_val}, Average: {avg_val}")

# Function with default parameters
def process_data(filename, verbose=True):
    if verbose:
        print(f"Processing {filename}...")
    return f"Results from {filename}"

print(process_data("data.txt"))
print(process_data("data.txt", verbose=False))

# Variable-length arguments (*args)
def sum_all(*numbers):
    return sum(numbers)

print(sum_all(1, 2, 3))
print(sum_all(10, 20, 30, 40, 50))

## 5. Working with Files

Reading and writing files is essential for data analysis. Python makes it simple with the `open()` function.

In [None]:
# Writing to a file
data_lines = [
    "Name,Age,Department\n",
    "Alice,30,Research\n",
    "Bob,28,Data Science\n",
    "Charlie,35,Engineering\n"
]

# Using with statement (automatically closes file)
with open("researchers.csv", "w") as file:
    for line in data_lines:
        file.write(line)

print("File written!")

# Reading the entire file
with open("researchers.csv", "r") as file:
    content = file.read()
    print("Full content:")
    print(content)

# Reading line by line
print("\nReading line by line:")
with open("researchers.csv", "r") as file:
    for line in file:
        print(line.strip())  # strip() removes newline character

# Reading all lines into a list
with open("researchers.csv", "r") as file:
    lines = file.readlines()
    print(f"\nTotal lines: {len(lines)}")
    print(f"First line: {lines[0]}")

In [None]:
# Parsing CSV data
import csv

# Writing CSV data properly
with open("data.csv", "w", newline="") as file:
    writer = csv.writer(file)
    writer.writerow(["Experiment", "Result", "Date"])
    writer.writerow(["Exp-001", "Success", "2024-01-01"])
    writer.writerow(["Exp-002", "Failed", "2024-01-02"])
    writer.writerow(["Exp-003", "Success", "2024-01-03"])

# Reading CSV data
with open("data.csv", "r") as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)

# Reading CSV as dictionaries (with headers)
print("\nReading as dictionaries:")
with open("data.csv", "r") as file:
    reader = csv.DictReader(file)
    for row in reader:
        print(f"Experiment: {row['Experiment']}, Result: {row['Result']}")

## 6. Combining Skills: Real-World Example

Let's analyze some research data by combining what we've learned.

In [None]:
# Real-world example: Analyze research results

import csv

# Create a sample dataset
results_data = [
    ["sample_id", "temperature", "yield", "quality"],
    ["S001", "25", "85.2", "good"],
    ["S002", "30", "88.5", "excellent"],
    ["S003", "25", "84.8", "good"],
    ["S004", "35", "91.2", "excellent"],
    ["S005", "30", "89.1", "excellent"],
]

# Write the data
with open("experiment_results.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerows(results_data)

print("Sample data created!\n")

# Now analyze the data
def analyze_results(filename):
    """Analyze experiment results from a CSV file"""
    
    temperatures = []
    yields = []
    quality_counts = {"good": 0, "excellent": 0}
    
    with open(filename, "r") as f:
        reader = csv.DictReader(f)
        
        for row in reader:
            # Convert to appropriate types
            temp = float(row["temperature"])
            yield_val = float(row["yield"])
            quality = row["quality"]
            
            temperatures.append(temp)
            yields.append(yield_val)
            quality_counts[quality] += 1
    
    # Calculate statistics
    avg_temp = sum(temperatures) / len(temperatures)
    avg_yield = sum(yields) / len(yields)
    max_yield = max(yields)
    
    return {
        "avg_temperature": avg_temp,
        "avg_yield": avg_yield,
        "max_yield": max_yield,
        "quality_counts": quality_counts
    }

# Run analysis
results = analyze_results("experiment_results.csv")

print("Analysis Results:")
print(f"  Average Temperature: {results['avg_temperature']:.1f}°C")
print(f"  Average Yield: {results['avg_yield']:.1f}%")
print(f"  Maximum Yield: {results['max_yield']:.1f}%")
print(f"  Good Quality Samples: {results['quality_counts']['good']}")
print(f"  Excellent Quality Samples: {results['quality_counts']['excellent']}")

## 7. Key Takeaways for Reproducible Research

✅ **Variables & Data Types**: Store and work with different kinds of data
✅ **Lists & Dictionaries**: Organize collections of related data
✅ **Control Flow**: Automate decision-making and repetitive tasks
✅ **Functions**: Write reusable code for analyses
✅ **File I/O**: Read data from files and save results

These skills form the foundation for:
- Data analysis and visualization
- Building analysis pipelines
- Creating reproducible workflows with Pixi

---

## Next Steps

1. **Practice**: Try modifying the examples above. Change values, add new features.
2. **Terminal**: Open a terminal and practice Unix commands (see UNIX_TERMINAL_GUIDE.md)
3. **Git**: Initialize a repository and track your progress (see GIT_GUIDE.md)
4. **Pixi**: Once comfortable, move on to reproducible environments with Pixi

---

## Useful Resources

- **Python Official Docs**: https://docs.python.org/3/
- **Real Python**: https://realpython.com/ (excellent tutorials)
- **Stack Overflow**: https://stackoverflow.com/ (community Q&A)

**Happy learning!**