# Computational Methods HT2025 - Week 1. Getting Started.

Before this lab we expect you to have: 
- A Github account
- A working personal computer with Python installed. 
	- For Mac, it is pre-installed. 
	- For Windows, ...
- Download and installed "Visual Studio Code"

**Data**: 3 text files. 

**Claim**: Numeric data can be summarised with means, text data can still be summarised with counts. 

**Representation**: Tabular or printed summaries


# Exercise 0: Cloning a repository on GitHub 

- There is a repository on GitHub named: https://github.com/oxfordinternetinstitute/compmeth25
- You should open Visual Studio and select "Clone repository" then enter this URL. 
- This will be the repo that you will use for this assignment. 
- When you have to update this repository for next week we will show you how to do that. 

Visual studio will want you to determine the version of Python that you should use. To do this, you should create a virtual environment. This will be shown in class. It's not mandatory but highly recommended.

# Exercise 1. Read / Write text and your first NIAH

- NIAH stands for "Needle in a Haystack". 
- There is a list of words in a text file. You should be able to ask if the following words are present in the file: 
	- word_a -> will be present
	- word_b -> will not be present
	- word_c -> will be present, but in lower case. 
- You are expected to write a python script that will:
	- Step 1. Load in the file as a single text blob (`file_text = filein.read()`).
	- Step 2. Split the text by newline character ( `file_list = file_text.split("\n")`).
	- Step 3. Check for the word (`print(word in file_list)` ).
	- Step 4. Manage the fact that wordC is in Upper case. 
	- Step 5. Create a new list with the absent word.
	- Step 6. Verify that the word was added to the new list.
	

In [None]:
# Exercise 1: Needle in a Haystack (NIAH)
# This script reads a word list and checks if certain words are present
# Replace <...> with working code. 

# Define the words we want to search for
word_a = "dog"      # This word IS in the file
word_b = "unicorn"  # This word is NOT in the file
word_c = "tiger"    # This word is in the file, but capitalized as "Tiger"

# Step 1: Open and read the file
filein = open(<...>)
file_text = filein.<...>
filein.close()

# Step 2: Split the text into a list (one word per line)
file_list = <...>.split("\n")

# Step 3: Check if each word is in the list
print("Searching for words in the list...")

# Check for word_a (should be True)
print(f"Is '{word_a}' in the list?")
print(<...>)
print()

# Check for word_b (should be False)
print(f"Is '{word_b}' in the list?")
print(<...>)
print()

# Check for word_c (will be False because of case mismatch!)
print(f"Is '{word_c}' in the list?")
print(<...>)
print()

# Step 4: Handle the case sensitivity issue
# We convert both list and target to lowercase for comparison
print("Now checking with case-insensitive search...")

# Create a lowercase version of the list
file_list_lower = []
for word in <...>:
    file_list_lower.append(<...>)

# Check again for word_c
print(f"Is '{word_c}' in the list (case-insensitive)?")
print(word_c.lower() in <...>)
print()

# Step 5: Add the missing word to the file
print(f"Adding '{word_b}' to the word list...")

# Open the file in append mode
fileout = open("word_list_new.txt", "w")
file_list = "\n".join(<...>)
<...> # append
fileout.write(<...>)
fileout.close()

print("Word added!")
print()

# Step 6: Verify the word was added
filein = open("<...>", "r")
file_text = filein.read()
filein.close()
file_list = file_text.split("\n")

print(f"Is '{word_b}' in the list now: {<...>})


# Exercise 2. Calculating values and representing them 

- The file `example_numbers.txt` is a file with a list of numbers, one per line. You should be able report some basic statistics about those numbers.
	- The average (mean)
	- The median 
	- The mode
- You are expected to write a python script that will:
	- load in the file as a single text blob (`file_text = filein.read()`)
	- split the text by newline character 
	- loop through each item and convert to a float value
	- Then build a report print out. The report should look like the following:
~~~
Report on numbers: 
Count:   xx
Mean:    xx.x
Median:  xx
Mode:    xx
~~~
To get all the numbers lined up you should `print(f"count\t{val}")`. The `f` means that this string will accept insertions. The {} refers to a variable that is inserted. It is framed as a "key":"value" pair. The key is the variable, the "value" is the formatting options. For mean, you would want `0.1f` . So if you have:

`val = 3.1415`

Then

`print(f"Here is the value:{val:0.1f}")`

Should print:
~~~
Here is the value: 3.1
~~~


In [None]:
# Exercise 2: Calculating values and representing them
# This script reads a list of numbers and calculates basic statistics
# Replace <...> with working code.

# Step 1: Open and read the file
filein = open(<...>, "r")
file_text = filein.<...>
filein.close()

# Step 2: Split the text into a list (one number per line)
file_list = file_text.<...>("\n")

# Step 3: Convert strings to numbers
# We need to loop through and convert each string to a float
numbers = []
for item in <...>:
    # Skip empty lines
    if item != "":
        numbers.append(<...>) # Here you will need to convert the numbers to 'float'

# Step 4: Calculate the count
count = <...>(numbers)

# Step 5: Calculate the mean (average)
# Mean = sum of all values / count of values
total = 0
for num in numbers:
    total = <...>
mean = <...>

# Step 6: Calculate the median (middle value)
# First we need to sort the list
sorted_numbers = sorted(<...>)

# Find the middle position
middle = count // 2 # Q: why did we use // instead of / for divide? 

# If count is odd, take the middle value
# If count is even, take average of two middle values
if count % 2 == 1: # Q: what does % mean here? 
    median = sorted_numbers[<...>]
else:
    median = (sorted_numbers[middle - 1] + sorted_numbers[<...>]) / 2

# Step 7: Calculate the mode (most frequent value)
# We will count how many times each number appears
counts = {}
for num in numbers:
    if num in counts:
        counts[num] = <...>
    else:
        counts[num] = <...>

# Find the number with the highest count
mode = None
highest_count = 0
for num in counts:
    if counts[num] <...> highest_count:
        mode = <...>
        highest_count = counts[num]

# Step 8: Print the report
print("Report on numbers:")
print(f"Count:\t{<...>}")
print(f"Mean:\t{<...>:0.1f}")
print(f"Median:\t{<...>}")
print(f"Mode:\t{<...>}")


# Expanding the above 

The above script implied that you could do these operations 'by hand' meaning not using any additional methods or functions. However, we could also do this using built in tools. Here are a few for a list:

| Function | What it does |
|----------|--------------|
| `statistics.mean()` | Average of all values |
| `statistics.median()` | Middle value |
| `statistics.mode()` | Most common value |
| `statistics.stdev()` | Sample standard deviation |
| `statistics.pstdev()` | Population standard deviation |
| `statistics.variance()` | Sample variance |
| `min()` | Smallest value (built-in, no import) |
| `max()` | Largest value (built-in, no import) |
| `sum()` | Sum of all values (built-in, no import) |
| `len()` | Count of values (built-in, no import) |


In [None]:
import statistics

print(statistics.mean(numbers))
print(statistics.median(numbers))
print(statistics.mode(numbers))
print(statistics.stdev(numbers))

# Exercise 2a. Wrap the "Report on numbers" in a function. 

It is possible to complete the above task without using functions. But the next task would require you to use functions to avoid repetition. So review your code and consider how to write a `reportNumbers` function. It should take in a list of numbers and return a string that is printed. 

In [None]:
# Exercise 2a: Wrap the "Report on numbers" in a function
# This script defines a function to calculate and report statistics

import statistics

def reportNumbers(numbers):
    # Calculate statistics
    count = <...>
    mean = <...>
    median = <...>
    mode = <...>
    stdev = <...>
    
    # Build the report string
    report = "Report on numbers:\n"
    report = report + f"Count:\t{count}\n"
    report = report + f"Mean:<...>"
    report = report + f"Median:<...>"
    report = report + f"Mode:<...>"
    report = report + f"StDev:<...>"
    
    # Return the report string
    return report


# --- Main script ---

# Step 1: Open and read the file
filein = open(<...>)
file_text = filein.<...>
filein.close()

# Step 2: Split the text into a list
file_list = file_text.<...>("\n")

# Step 3: Convert strings to numbers
number_list = []
for item in file_list:
    if item != "":
        number_list.append(float(item))

# Step 4: Call the function and print the result
result = <...>(number_list)
print(result)


# Exercise 3. Challenge exercise: Reading a table of numbers

Next week we will be looking at DataFrames and operating them as a table. However, we can also operate with a simpler data structure, the table of "Comma-separated values" or `.csv`. For this exercise you should load the CSV using the `csv` package. At the top of your code write: `import csv`. Then in your code you can: 

- Read the CSV file. 
- Extract a column of data. 
- Create a report on numbers for each of the columns IF they are numeric. 
- Each column will have a header which will not be numeric. `read_csv` can handle this. 
- You will notice that some columns will have `none` or missing data for some of the rows.  

The CSV object will be a dictionary with the column names as the keys and the list of data as a column for the values. You should:

- Write a for loop that will loop through the column names. 
- For each column print: 
~~~
Variable name: varname

Report on numbers: 
Count:   xx
Mean:    xx.x
Median:  xx
Mode:    xx
~~~

- If a column does not have numbers you should still be able to produce a report such as: 
~~~
Variable name: varname

Report on numbers: 
Count:   xx
Mean:    N/A
Median:  N/A
Mode:    xx
~~~

Finally, save this output as "filename_report.txt". Example data is provided as "example_data_report EXAMPLE". 

In [None]:
# Exercise 3: Challenge exercise - Reading a table of numbers
# This script reads a CSV file and generates a report for each column

import csv
import statistics

def reportNumbers(numbers, variable_name):
    # Calculate count (works for any list)
    count = len(numbers)
    
    # Try to calculate numeric statistics
    # If the data is not numeric, we will catch the error
    try:
        # Attempt to convert all values to floats
        numeric_values = []
        for item in numbers:
            # Skip empty or None values
            if item != "" and item is not None:
                numeric_values.append(float(item))
        
        # Update count to reflect only valid numeric values
        count = len(numeric_values)
        
        # Calculate statistics if we have numeric values
        if count > 0:
            mean = statistics.mean(numeric_values)
            median = statistics.median(numeric_values)
            mode = statistics.mode(numeric_values)
            is_numeric = True
        else:
            is_numeric = False
            
    except (ValueError, TypeError):
        # Data is not numeric
        is_numeric = False
        # For non-numeric, count non-empty values
        count = 0
        for item in numbers:
            if item != "" and item is not None:
                count = count + 1
    
    # Build the report string
    report = f"Variable name: {variable_name}\n"
    report = report + "Report on numbers:\n"
    report = report + f"Count:\t{count}\n"
    
    if is_numeric:
        report = report + f"Mean:\t{mean:0.1f}\n"
        report = report + f"Median:\t{median}\n"
        report = report + f"Mode:\t{mode}\n"
    else:
        report = report + "Mean:\tN/A\n"
        report = report + "Median:\tN/A\n"
        # For non-numeric, we can still find the mode (most common value)
        non_empty = []
        for item in numbers:
            if item != "" and item is not None:
                non_empty.append(item)
        if len(non_empty) > 0:
            mode = statistics.mode(non_empty)
            report = report + f"Mode:\t{mode}\n"
        else:
            report = report + "Mode:\tN/A\n"
    
    return report


# --- Main script ---

# Define the input filename
data_folder = "data"
filename = "example_data.csv"

# Step 1: Read the CSV file into a dictionary
# Each key is a column name, each value is a list of data
data = {}

# Notice the different way I opened the file: 
# this way does not require file.close()
with open(data_folder + "/" + filename, "r") as filein: 
    reader = csv.DictReader(filein)
    rows = list(reader)

    column_names = reader.fieldnames
    data = {}
    for col in <...>: # variable already mentioned
        data[col] = []
        for row in <...>: # variable already mentioned
            data[col].append(row[col])

filein.close()

# Step 2: Generate reports for each column
full_report = ""

for col in column_names:
    # Get the report for this column
    report = <...>(data[col], col)
    
    # Add to the full report with a blank line between columns
    <...> = full_report + <...> + "\n"

# Step 3: Print the report to the screen
print(full_report)

# Step 4: Save the report to a file
# Create the output filename by adding "_report.txt"
output_filename = "<...>"
output_folder = "output"

fileout = open(<...> + "/" + <...>, "w")
fileout.write(<...>)
fileout.close()

print(f"Report saved to: {output_filename}")


# AI Declaration 

The initial exercise was set up by myself. I used Claude Opus 4.5 in order to generate the word and number lists. Claude provided a draft code with the prompt of a beginning / intermediate coder. I modified the code, produced the <...>, tweaked some syntax and comments as appropriate. 

Working code is provided in the /scripts_embargo folder which is not checked in to GitHub but will be moved to an answer folder before the next lecture.