#### Notebook 3: Exploring String Objects in Python

**IB Computer Science Learning Outcome:** 
B2.1.2 – Construct programs that can extract and manipulate substrings

**Reference:**
Head First Python (3rd Edition), Chapter 1 (pp. 52–77)

---

#### Objectives:
- Understand that strings are complex variables, or, *objects*, in Python
- Explore useful string methods for text manipulation
- Apply multiple assignment and function chaining
- Perform substring extraction

#### Notes on Strings and Objects in Python

**Recall -** A `str`, read as *string*, is a data type that represents textual data enclosed in *"quotation marks"*.

Example:
```python
my_str = "holiday_photo_2025.jpg"
```

Objects are *complex data types* that group data and specific behaviour enabling us to manipulate it. In Python, *everything* is an object, including `str` variables. This means that `my_str` does not only store the value we set, but, it also has *attributes* and *functions* embedded which we can use to change it in interesting ways. 

In [1]:
my_str = "holiday_photo_2025.jpg"


print(dir(my_str)) # display the attributes and functions of my str object

['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'removeprefix', 'removesuffix', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']


As you can see, there are many functions that we can apply to our string variable to manipulate it or generate variations of it.

Let us try a few simple functions. You can probably guess what they do.

In [2]:
upper = my_str.upper()
lower = my_str.lower()

print("my_str:", my_str, "upper:", upper, "lower:", lower)

my_str: holiday_photo_2025.jpg upper: HOLIDAY_PHOTO_2025.JPG lower: holiday_photo_2025.jpg


> The methods returned by the upper and lower methods are new string objects, one all UPPERCASE, while the other is all lowercase. Nothing happens to the original value, as confirmed by the output. (Head First Python pp. 57)

#### Extracting Data from a Filename

Consider a long and complicated filename:

```python
filename = "Darius-13-100m-Fly.txt"
```

This filename has four pieces of information related to a swimmer. We have the name, age, length of swim, type of stroke all separated by a "-". Since we have the data stored in the program, we can simply extract the data and organise it into four variables.

```python
swim_name = "Darius"
swim_age = "13"
swim_length = "100m"
swim_stroke = "Fly"
```
What if we don't know the name of the file at the time of writing our program? What if we want to be able to extract the data on any filename in the same format? There must be a better way. 

There is, of course!

#### Split Function

```python
swim_data = filename.split("-")
```


In [2]:
filename = "Darius-13-100m-Fly.txt"

help(filename.split) # Unsure on what a function does or how to use it? Ask for help! 

Help on built-in function split:

split(sep=None, maxsplit=-1) method of builtins.str instance
    Return a list of the substrings in the string, using sep as the separator string.
    
      sep
        The separator used to split the string.
    
        When set to None (the default value), will split on any whitespace
        character (including \n \r \t \f and spaces) and will discard
        empty strings from the result.
      maxsplit
        Maximum number of splits.
        -1 (the default value) means no limit.
    
    Splitting starts at the front of the string and works to the end.
    
    Note, str.split() is mainly useful for data that has been intentionally
    delimited.  With natural text that includes punctuation, consider using
    the regular expression module.



In [13]:
swim_data = filename.split("-")
print(swim_data)

['Darius', '13', '100m', 'Fly.txt']


The `split()` function broke the string into separate words based on where "-" is found. We told Python to store four values in one container named `swim_data`, so it gave us a list of words. However, *multiple assignment* of variables is possible. Since we know the structure of the filename, and since we know that split will extract four pieces of data then we can do this:

```python
swim_name, swim_age, swim_length, swim_stroke = filename.split("-")
```

In [14]:
swim_name, swim_age, swim_length, swim_stroke = filename.split("-");
print(swim_name, swim_age, swim_length, swim_stroke)

Darius 13 100m Fly.txt


#### Removing Suffix

At the moment, the `swim_stroke` includes the file extension ".txt", which we do not want. What if we could perform a clean up exercise on the filename, before extracting the data? It turns out that we can.

In [15]:
help(filename.removesuffix)

Help on built-in function removesuffix:

removesuffix(suffix, /) method of builtins.str instance
    Return a str with the given suffix string removed if present.
    
    If the string ends with the suffix string and that suffix is not empty,
    return string[:-len(suffix)]. Otherwise, return a copy of the original
    string.



In [3]:
filename_without_extension = filename.removesuffix(".txt")
print(filename_without_extension)

Darius-13-100m-Fly


#### Rewrite the Program Activity

Rewrite the entire program that extracts the data correctly from a filename in a single cell.



In [None]:
# TODO: Adapt the code in this notebook to write your program

#### Challenge - Music Library Parser

Parse the list of music track names and display them in a neat way as though they are in your favourite music streaming program.

Complete the program by making use of string functions we learned about. Remember, if we want to use another function, but we need more information, we can use `help`. 

In [None]:
my_music_tracks = [
    "01-Imagine-John_Lennon-1971.mp3",
    "02-Hey_Jude-The_Beatles-1968.mp3",
    "03-Bohemian_Rhapsody-Queen-1975.mp3",
    "04-Hotel_California-Eagles-1976.mp3",
    "05-Smells_Like_Teen_Spirit-Nirvana-1991.mp3",
    "06-Billie_Jean-Michael_Jackson-1982.mp3",
    "07-Shape_of_You-Ed_Sheeran-2017.mp3",
    "08-Lose_Yourself-Eminem-2002.mp3",
    "09-Rolling_in_the_Deep-Adele-2011.mp3",
    "10-Despacito-Luis_Fonsi-2017.mp3"
]

# Display the neat table header
print(f"{'No.':<4} {'Title':<30} {'Artist':<20} {'Year':<5}")
print("-" * 65)

# Looping through each track
for my_music_track in my_music_tracks:
    # TODO: Use what we have learned to make this program work
    #       Hint: why not try replace() to remove the "_"?

    # Display the extracted data into a neat table row
    print(f"{number:<4} {title:<30} {artist:<20} {year:<5}")

#### Note on Substrings and Slicing in Python

A **substring** is simply a smaller string contained inside another string.
Python gives you two main ways to work with substrings:

1. String methods like `split()`, `replace()`, `removeprefix()`, `removesuffix()`. This is what we have used so far.

2. Slicing syntax:

```python
my_string[start:end]
```
- start: index of first character, this is always zero
- end: index just after the last character you want (not included)

Negative indices count from the end.

We can understand this better if we look at a concrete example.

In [16]:
word = "PYTHON"

print(word[0:3])    # 'PYT'   (characters 0, 1, 2)
print(word[:4])     # 'PYTH'  (start at beginning)
print(word[2:])     # 'THON'  (from index 2 to end)
print(word[-3:])    # 'HON'   (last 3 characters)
print(word[::2])    # 'PTO'   (every 2nd character)

PYT
PYTH
THON
HON
PTO


#### Coding Exercises Activity

Complete the following coding exercises to work in some practice.

**Exercise 1 - Last Name Extractor**

Write a program that stores your full name in a variable and uses slicing to extract your surname.

In [None]:
# TODO: Write code for exercise 1

**Exercise 2 - File Extension Grabber**

Write a program that extracts the file extension from a file name stored in a string variable.

In [None]:
# TODO: Write code for exercise 2

**Exercise 3 - Date Parser**

Write a program that extracts the day, month and year of a string date in this format: `YYYYMMDD`.

In [None]:
# TODO: Write code for exercise 3