#### Notebook 5: Lists and Data Analysis

**IB Computer Science Learning Outcome**

- B2.1.3 Describe how programs use common exception handling techniques (partially).
- B2.2.2 Construct programs that apply Python Lists (partially).

**Reference:**
Head First Python (3rd Edition), Chapter 2 (pp. 108–115)

---

#### Objectives:

- Create, index, and slice Python lists.
- Use list methods to add, remove, and sort data.
- Read and store text data as lists.
- Understand the usefulness of lists as dynamic data structures.

#### Lists in Python

A list is a collection of values that can be stored under one variable name. Python lists are *dynamic* data structures, meaning they can grow or shrink in size as needed. We have dealt with lists in the previous notebooks. 

Do you remember the `list` of filenames?

In [3]:
import os

swimdata = os.listdir("swimdata") # list of filenames

print(swimdata)

['Abi-10-100m-Back.txt', 'Abi-10-100m-Breast.txt', 'Abi-10-50m-Back.txt', 'Abi-10-50m-Breast.txt', 'Abi-10-50m-Free.txt', 'Ali-12-100m-Back.txt', 'Ali-12-100m-Free.txt', 'Alison-14-100m-Breast.txt', 'Alison-14-100m-Free.txt', 'Aurora-13-50m-Free.txt', 'Bill-18-100m-Back.txt', 'Bill-18-200m-Back.txt', 'Blake-15-100m-Back.txt', 'Blake-15-100m-Fly.txt', 'Blake-15-100m-Free.txt', 'Calvin-9-50m-Back.txt', 'Calvin-9-50m-Fly.txt', 'Calvin-9-50m-Free.txt', 'Carl-15-100m-Back.txt', 'Chris-17-100m-Back.txt', 'Chris-17-100m-Breast.txt', 'Darius-13-100m-Back.txt', 'Darius-13-100m-Breast.txt', 'Darius-13-100m-Fly.txt', 'Darius-13-200m-IM.txt', 'Dave-17-100m-Free.txt', 'Dave-17-200m-Back.txt', 'Elba-14-100m-Free.txt', 'Emma-13-100m-Breast.txt', 'Emma-13-100m-Free.txt', 'Erika-15-100m-Breast.txt', 'Erika-15-100m-Free.txt', 'Erika-15-200m-Breast.txt', 'Hannah-13-100m-Back.txt', 'Hannah-13-100m-Free.txt', 'Katie-9-100m-Back.txt', 'Katie-9-100m-Breast.txt', 'Katie-9-100m-Free.txt', 'Katie-9-50m-Back.txt

How about the `list` of lines when we read a particular text file?

In [4]:
filename = "Hannah-13-100m-Back.txt"
folder = "swimdata/"
filepath = folder + filename # we can join two string variables together using the plus operator

with open(filepath) as file:
    lines = file.readlines() # list of lines

print(lines)

['1:35.75,1:32.78,1:34.01,1:32.57\n']


#### List Items

List items are **ordered**, this means that the items have a determined sequence for example the first item in our `swimdata` is `'Abi-10-100m-Back.txt'`, the second item is `'Abi-10-100m-Breast.txt'` and so on. If we write Python code that adds a new filename to the list, it will be added at the end.  

List items are indexed, meaning they have a unique position, the first item has index `[0]`, the second item has index `[1]` etc.

```python
first_file = swimdata[0]    #'Abi-10-100m-Back.txt'
second_file = swimdata[1]   #'Abi-10-100m-Breast.txt'
third_file = swimdata[2]    #'Abi-10-50m-Free.txt'
```

A list is changeable, meaning that we can **change** items in a list after it has been created.

```python
swimdata[0] = swimdata[0].removesuffix(".txt")
```

**Note**: Since an item has a unique index, list items can have the same value.


In [7]:
# Can you understand what this line of code is doing?

lines[0].strip().split(",")

['1:35.75', '1:32.78', '1:34.01', '1:32.57']

#### Accessing a Range of List Items 

Aside from accessing list items by index number, you can access a *slice* of a list by specifying a range of indices:

```python
my_list[index_start:index_end]
```
**Note**: The resulting list will include the item at `index_start` but exclude the item `index_end`.


In [10]:
swimdata[15:18]

['Calvin-9-50m-Back.txt', 'Calvin-9-50m-Fly.txt', 'Calvin-9-50m-Free.txt']

In [11]:
swimdata[18]

'Carl-15-100m-Back.txt'

If you are slicing a list from the beginning, there is no need to specify `index_start` as `0`:

```python 
my_list[:index_end] # This is allowed
```

In [None]:
swimdata[:5]

['Abi-10-100m-Back.txt',
 'Abi-10-100m-Breast.txt',
 'Abi-10-50m-Back.txt',
 'Abi-10-50m-Breast.txt',
 'Abi-10-50m-Free.txt']

Similarly, if you are slicing a list from some point to the end, there is no need to specify `index_end`:

```python
my_list[index_start:] # This is allowed
```

In [16]:
swimdata[51:]

['Ruth-13-100m-Back.txt',
 'Ruth-13-100m-Free.txt',
 'Ruth-13-200m-Back.txt',
 'Ruth-13-200m-Free.txt',
 'Ruth-13-400m-Free.txt',
 'Tasmin-15-100m-Back.txt',
 'Tasmin-15-100m-Breast.txt',
 'Tasmin-15-100m-Free.txt',
 'Tasmin-15-200m-Breast.txt']

#### Accessing Items Using Negative Indices

You can specify a position from the end by using a *negative index*. The last item has index `[-1]`, the second to last has index `[-3]` and so on.

```python
last_item = swimdata[-1]            #'Tasmin-15-200m-Breast.txt'
second_last_item = swimdata[-2]     #'Tasmin-15-100m-Free.txt'
```

You can use negative indices when specifying a range:

In [None]:
swimdata[-4:] # last four items

['Tasmin-15-100m-Back.txt',
 'Tasmin-15-100m-Breast.txt',
 'Tasmin-15-100m-Free.txt',
 'Tasmin-15-200m-Breast.txt']

#### Activity — Products Analyser

In this activity, we will practice using lists to store and manipulate data from a file.

You are given a CSV file called `products-100.csv` (located in the folder `datablist`). Each line of the file contains information about a product in the following format:

```
Index,Name,Description,Brand,Category,Price,Currency,Stock,EAN,Color,Size,Availability,Internal ID
```

For example:

```
3,Smart Blender Cooker,No situation per.,"Lawson, Keller and Winters",Kitchen Appliances,227,USD,726,1282898648918,SlateGray,XS,in_stock,70
```

Your task is to:

- Read the CSV file into a list of lines.
- Create a new list to store the parsed product data. Each item should itself be a list containing the 13 fields
- Display sample of product data in a neat table format with headers: Product, Category, Price, Stock. Print the first 5 products and the last 5 products with a "..." in between.
- Write an algorithm that checks whether a particular product is in stock.

In [None]:
# Write code for this activity

import os

products_file = "products-100.csv"
folder = "datablist/"
filepath = folder + products_file # we can join two string variables together using the plus operator

products = [] # create a new list for our products

try:
    with open(filepath) as file: # reading a file can throw an exception (run-time error)
        lines = file.readlines() # list of lines
        # the code below only makes sense if we read the file
        lines.pop(0) # clean up my list of lines by removing the first item
        products_data = lines[1:] # using slicing to get the product records only (leave the headings out)
        for item in products_data:
            product = item.strip().split(',') # strip removes white spaces
            products.append(product) # we have notes on append after this activity
            
except: # except is short for exception
    print("We could not open your file:", filepath) # when the error occurs our program can handle it a bit nicely

# to display product data - check that products is not empty
if len(products) > 0:
    print(f"{'Product':<40} {'Price':<6} {'Stock':<6}")
    print("----------------------------------------------------------------")
    first_five = products[:6]
    for p in first_five:
        product_desc = p[1]
        price = p[5]
        stock = p[7]
        print(f"{product_desc:<40} {price:<6} {stock:<6}")

# the data in the file is unclean so finishing the activity was not possible

Product                                  Price  Stock 
----------------------------------------------------------------
Compact Printer Air Advanced Digital     Books & Stationery USD   
Tablet                                   502    81    
Smart Blender Cooker                     Kitchen Appliances USD   
Advanced Router Rechargeable             121    896   
Portable Mouse Monitor Phone             1      925   
Radio                                    Skincare USD   


#### Activity — Traversing a List

This is essentially the same as looping through a list which was covered extensively in Notebook 4. 

To practice, let us extract a list of time values from `lines` and convert them to hundreths or a second as the authors of the HFP book show us:

```python
time = "1:35.75"
minutes, rest = time.split(":")
seconds, hundreths = time.split(".")
converted = (int(minutes) * 60 * 100) + (int(seconds) * 100) + int((hundreths))
```

In [19]:
filename = "Hannah-13-100m-Back.txt"
folder = "swimdata/"
filepath = folder + filename # we can join two string variables together using the plus operator

with open(filepath) as file:
    lines = file.readlines() # list of lines

lines # Here is the list of lines we are working with

['1:35.75,1:32.78,1:34.01,1:32.57\n']

In [31]:
# Write code for the activity
times = []

process_times = lines[0].strip().split(',')

print(process_times)

for convert in process_times:
    minutes, rest = convert.split(":")
    seconds, hundreths = rest.split(".")
    converted = (int(minutes) * 60 * 100) + (int(seconds) * 100) + int((hundreths))
    times.append(converted)

print(times)

['1:35.75', '1:32.78', '1:34.01', '1:32.57']
[9575, 9278, 9401, 9257]


#### Dynamic Creation of Lists

In a lot of scenarios you may need to dynamically populate a list, inside a `for loop` for instance. 

You need to know how to create an empty list. 

We will create an empty list to dynamically store the converted times:

```python
converted_times = []    # new empty list
```

We obviously need to be able to add items. You might be tempted to use the positions e.g., "store this value at [0]" but this will not work on an empty list — it will give you a run-time error which we normally refer to as an **exception**, specifically `IndexError` exception.

In [20]:
converted_times = []            # Empty list

converted_times[0] = 10567      # We cannot use indices to add items to a list

IndexError: list assignment index out of range

However, a `list` is an object in Python which comes with comes with a lot of useful functions like `append`, `remove` and `sort`.

In [23]:
print(dir(converted_times))

help(converted_times.append)
help(converted_times.remove)

['__add__', '__class__', '__class_getitem__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getstate__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']
Help on built-in function append:

append(object, /) method of builtins.list instance
    Append object to the end of the list.

Help on built-in function remove:

remove(value, /) method of builtins.list instance
    Remove first occurrence of value.
    
    Raises ValueError if the value is not present.



In [24]:
converted_times.append(10567)
converted_times

[10567]

In [25]:
converted_times.remove(10567)
converted_times

[]

#### Activity — Store Converted Times in a List and Sort Them

In [None]:
# Write code for the activity

# We already stored the converted times in a list in the previous activity

times.sort()
print(times)

[9257, 9278, 9401, 9575]


#### Challenge — Fastest Time Finder

You have learned how to extract data from a file and store the race times in a list.

Your task is to write a program that:
- Asks the user to enter the filename of any swimmer in the swimdata folder.
- Reads the file and stores all the race times in a list.
- Converts each time into hundredths of a second (e.g., "1:35.75" → 9575).
- Stores all converted values in a new list.
- Prints the fastest time in both formats: the original string (e.g., 1:32.78) and the converted integer (e.g., 9278).

**Expected Output Example**

```console
Enter a filename: Hannah-13-100m-Back.txt
Fastest time (string): 1:32.57
Fastest time (hundredths): 9257
```