# Class 5 - Strings

## Introduction

In this notebook, we will explore the `string` data type, one of Python's most versatile and commonly used data types.

Strings are sequences of characters used to represent text. They are fundamental to almost every program you'll write, as they allow you to work with text data - from simple messages to complex document processing.

Python makes working with text particularly easy and intuitive compared to many other programming languages. By the end of this lesson, you'll be able to create, manipulate, and analyze text data in various ways.

## The `str` Data Type

In Python, text is represented using the `str` (string) data type. Strings are immutable sequences of Unicode characters, which means:
- They contain ordered collections of characters
- Once created, they cannot be changed (though you can create new strings based on them)
- They can represent virtually any text in any language

### String Literals

To create a string in Python, you enclose text in either single quotes (`'`) or double quotes (`"`). Both work exactly the same way, allowing you flexibility in your code.

In [None]:
# Creating a string with single quotes
'Hello world'

In [None]:
# Creating a string with double quotes
"Hello world"

**Why have two different quote styles?**

Having both single and double quotes is useful when your string itself contains quote characters:
- Use double quotes when your string contains single quotes
- Use single quotes when your string contains double quotes

In [None]:
# Using double quotes when the string contains a single quote (apostrophe)
print("I'm learning Python strings")

# Using single quotes when the string contains double quotes
print('She said, "Python is fun!"')

### Quick Exercise

Try creating a string that contains both single and double quotes. How would you do it?

In [None]:
# Your solution here
# Example: print("He said, 'Python is my favorite language!'")

### String Type

We can verify that we're working with strings using the `type()` function, which tells us the data type of any Python object.

In [None]:
# Checking the type of a string literal
type('Hello world')

In [None]:
# Assigning a string to a variable and checking its type
s = 'Hello world'
type(s)

### Converting Other Types to Strings

We can convert other data types to strings using the `str()` function. This is useful when you need to combine numbers with text or display numeric values as text.

In [None]:
# Converting a number to a string
id_number = str(31234567)
type(id_number)

In [None]:
# Demonstrating that it's now a string by using string multiplication
# This repeats the string 3 times (only works with strings, not numbers)
id_number * 3

## Multi-line Strings and Docstrings

Sometimes we need to work with text that spans multiple lines. Python provides several ways to handle this.

In [None]:
# A long string on a single line - can be hard to read in code
print('Some variables might be very very very long. You can write them down as a continuous sentence. However this might not be very readable in certain development environments.')

### Line Continuation with Backslash

You can use the backslash character (`\`) to indicate that a string continues on the next line. This is useful for breaking up long strings in your code while still treating them as a single line of text.

In [None]:
# Using backslash for line continuation
print('Some variables might be very very \
very long. You can write them down as a \
continuous sentence. However this might \
not be very readable in certain \
development environments.')

### Triple Quotes for Multi-line Strings

A more common approach is to use triple quotes (`'''` or `"""`) to create multi-line strings. These preserve the line breaks in your text and are especially useful for documentation.

In [None]:
# Using triple quotes for a multi-line string
print('''Some variables might be very very
very long. You can write them down as a


continuous sentence. However this might
not be very readable in certain
development environments.''')

### Docstrings

Triple-quoted strings are commonly used for documentation in Python. When placed at the beginning of a function, class, or module, they become "docstrings" that describe what the code does.

You can access these docstrings using the `help()` function.

In [None]:
# Use the help() function to view the docstring of the max() function
help(max)

### Quick Exercise

Create a function with a docstring that explains what the function does, then use `help()` to view your docstring.

In [None]:
# Your solution here
def greet(name):
    '''This function takes a name and returns a greeting message.
    
    Parameters:
    name (str): The name to include in the greeting
    
    Returns:
    str: A greeting message
    '''
    return f"Hello, {name}!"

# Now view the docstring
help(greet)

## Characteristics of Strings

Strings in Python have three key characteristics:

1. **Sequence of Characters**: A string is made up of individual characters.
2. **Ordered Collection**: The order of characters matters and is preserved.
3. **Fixed Length**: Each string has a specific length (number of characters).

These properties allow us to access individual characters, extract portions of strings, and perform various operations on them.

In [None]:
# Creating a string and checking its type
s = 'shalom'
type(s)

### String Length

We can find the length of a string (the number of characters it contains) using the `len()` function.

In [None]:
# Finding the length of a string
len(s)

In [None]:
# Another example with spaces
s = 'Hello World'
print(s)
print(len(s))  # Note: spaces count as characters!

### String Indexing

Since strings are ordered sequences, we can access individual characters using their position or "index" in the string. In Python, indexing starts at 0, not 1.

Think of a string as a row of mailboxes, numbered starting from 0:

```
H  e  l  l  o     W  o  r  l  d
0  1  2  3  4  5  6  7  8  9  10
```

In [None]:
# Accessing characters by index
print(s[0])  # First character
print(s[1])  # Second character
print(s[-1]) # Last character (negative indices count from the end)

### Negative Indexing

Python allows negative indices, which count from the end of the string. This is often more convenient than calculating positions from the beginning:

```
H   e   l   l   o      W   o   r   l   d
0   1   2   3   4   5   6   7   8   9   10
-11 -10 -9  -8  -7  -6  -5  -4  -3  -2  -1
```

In [None]:
s = 'Indexing in Python'
s[-2]  # Second-to-last character

### Finding the Last Character

There are two common ways to access the last character of a string:

In [None]:
s = 'abcd'
len(s)

How do we index the last character in a string if the length changes?

In [None]:
# Method 1: Calculate the last index
s = input()
last_index = len(s) - 1
print(s[last_index], last_index)

In [None]:
# Method 2: Use negative indexing (simpler and preferred)
print(s[-1])  # Last character
print(s[-2])  # Second-to-last character
print(s[-3])  # Third-to-last character

### Quick Exercise

Given a string containing your full name, write code to print your initials (first letter of each name).

In [None]:
# Your solution here
full_name = "John Adam Smith"
# Expected output: JAS

## String Slicing

Slicing allows us to extract a portion (substring) of a string. The syntax is:

```python
string[start:end:step]
```

Where:
- `start` is the index where the slice begins (inclusive)
- `end` is the index where the slice ends (exclusive)
- `step` is the stride between characters (optional, default is 1)

Think of it as specifying "from position X up to (but not including) position Y, taking every Z character."

In [None]:
# Basic substring extraction
x = 'keep it simple'
x[5:9]  # Characters at indices 5, 6, 7, 8

In [None]:
# Getting the length of the string
len(x)

### Omitting Slice Parameters

You can omit slice parameters to use their default values:
- Omitting `start` defaults to the beginning of the string (index 0)
- Omitting `end` defaults to the end of the string
- Omitting `step` defaults to 1 (every character)

In [None]:
# From index 5 to the end
print(x[5:len(x)])
print(x[5:])  # Equivalent, more concise

In [None]:
# From beginning to index 5 (not including 5)
print(x[:5])
print(x[0:5])  # Equivalent

In [None]:
# The entire string
x[:]

### Using Step in Slicing

The `step` parameter lets you take every nth character in the specified range.

In [None]:
# For reference, here's how range works with a step
list(range(2, 20, 3))  # From 2 to 20 (exclusive), step 3

In [None]:
# Skipping every other character (step 2)
x[1:10:2]

In [None]:
# Taking every third character from the entire string
x[::3]

### String Indices Must Be Integers

Unlike some other programming languages, Python requires string indices to be integers.

In [None]:
# This will cause an error
try:
    'hello'[2.0]
except TypeError as e:
    print(f"Error: {e}")

### Reversing a String

One of the most useful applications of the step parameter is reversing a string by using a negative step.

In [None]:
# Original string
s = 'abcd'
s

In [None]:
# Reversing a string with a negative step
s[::-1]

The `[::-1]` slice is a common Python idiom for reversing any sequence. It means "take all characters from beginning to end, in reverse order."

In [None]:
# Original string remains unchanged (strings are immutable)
s

In [None]:
# Other examples of negative step
s[:-2]  # All characters except the last two
s[::-2]  # Every other character, in reverse

### Quick Exercise: Palindrome Checker

A palindrome is a word or phrase that reads the same backward as forward. Using string slicing, write code to check if a given string is a palindrome.

In [None]:
# Your solution here
def is_palindrome(text):
    # Remove spaces and convert to lowercase for a more flexible check
    clean_text = text.lower().replace(" ", "")
    return clean_text == clean_text[::-1]

# Test with some examples
test_strings = ["radar", "hello", "A man a plan a canal Panama"]
for s in test_strings:
    print(f"\"{s}\" is a palindrome: {is_palindrome(s)}")

## Combining Strings

Python provides several ways to combine or concatenate strings.

### String Concatenation with the `+` Operator

The most basic way to combine strings is using the `+` operator.

In [None]:
# Basic concatenation
s = 'Hello'
name = 'Danni'
x = s + ' ' + name
print(x)

In [None]:
# Direct concatenation of string literals
'Hello' + 'World'

### String Repetition with the `*` Operator

You can repeat a string multiple times using the `*` operator.

In [None]:
# Repeating a string
(s + ' ') * 10

### String Formatting

For more complex string combinations, especially when including variables or expressions, Python offers several formatting methods:

1. f-strings (Python 3.6+)
2. The `format()` method
3. The older `%` operator (similar to C's printf)

In [None]:
name = "Alice"
age = 25

# f-string (recommended for most cases)
print(f"Hello, {name}! You are {age} years old.")

# format() method
print("Hello, {}! You are {} years old.".format(name, age))

# % operator (older style)
print("Hello, %s! You are %d years old." % (name, age))

## Strings are Immutable

One of the most important characteristics of strings in Python is that they are **immutable**, meaning they cannot be changed after creation. Any operation that appears to modify a string actually creates a new string.

In [None]:
# Create a string
s = 'Hello'

In [None]:
# We can access individual characters
print(s)
s[0]

In [None]:
# But we cannot modify individual characters
try:
    s[0] = 'h'
except TypeError as e:
    print(f"Error: {e}")

### Working with Immutable Strings

To "modify" a string, you need to create a new string with the desired changes.

In [None]:
# Creating a new string based on the original
s = 'hello'
s

In [None]:
# You can reassign the variable to a new string
s = input()
s

In [None]:
# To "change" a character, create a new string with the desired modification
s2 = s[0] + '#' + s[2:]
s2

### Quick Exercise

Write code to "censor" a string by replacing vowels with asterisks (*). For example, "hello world" would become "h*ll* w*rld".

In [None]:
# Your solution here
def censor_vowels(text):
    result = ""
    vowels = "aeiouAEIOU"
    for char in text:
        if char in vowels:
            result += "*"
        else:
            result += char
    return result

# Test the function
original = "Hello World"
censored = censor_vowels(original)
print(f"Original: {original}")
print(f"Censored: {censored}")

## String Methods

Python strings come with many built-in methods that allow you to manipulate and analyze text. A method is a function that "belongs to" an object - in this case, a string.

In [None]:
s = 'hello world'
s

### Case Conversion Methods

Python provides several methods to change the case of strings:

In [None]:
# Manual case conversion (not recommended)
print(s[0:5] + s[5:].upper())

In [None]:
# Using the upper() method
print(s.upper())

### Important: Methods Don't Modify the Original String

Because strings are immutable, string methods always return a new string. The original string remains unchanged.

In [None]:
# The original string is unchanged
s

In [None]:
# To keep the changes, assign the result back to a variable
s = s.upper()
print(s)

### Common String Methods

Here are some of the most useful string methods:

In [None]:
s = 'Keep it simple'
print(s)

# split() - Splits a string into a list of substrings
s.split()

In [None]:
# Case conversion methods
s = s.upper()
print(s)
print(s.lower())

In [None]:
# capitalize() - Capitalizes the first character
'keep it SIMPLE'.capitalize()

In [None]:
# count() - Counts occurrences of a substring
s = 'ABBACTTGCCABCAB'
s.count('AB')  # Count all occurrences of 'AB'

In [None]:
# count() with start and end parameters
s.count('AB', 2, len(s)-3)  # Count occurrences between positions 2 and len(s)-3

In [None]:
# replace() - Replaces occurrences of a substring
s = 'keep it simple'
s.replace('i', 'I')  # Replace all 'i' with 'I'

In [None]:
# replace() with count parameter
s.replace('e', 'E', 2)  # Replace only the first 2 occurrences of 'e'

In [None]:
# Original string remains unchanged
s

In [None]:
# find() - Returns the lowest index of a substring
s.find('p')  # First occurrence of 'p'

In [None]:
# rfind() - Returns the highest index of a substring (searching from the right)
s.rfind('p')  # Last occurrence of 'p'

### More Useful String Methods

Here are some additional string methods that are frequently used:

In [None]:
# strip() - Removes whitespace from beginning and end
"   hello world   ".strip()

In [None]:
# startswith() and endswith() - Check if string starts or ends with a substring
filename = "document.pdf"
print(f"Starts with 'doc': {filename.startswith('doc')}")
print(f"Ends with '.pdf': {filename.endswith('.pdf')}")

In [None]:
# join() - Combines a list of strings with the string as separator
words = ["Python", "is", "awesome"]
" ".join(words)

In [None]:
# Different separator
"-".join(words)

### Quick Exercise: String Methods

Write a function that takes a sentence and returns the same sentence with:
1. All words capitalized
2. Extra spaces removed
3. Periods replaced with exclamation marks

In [None]:
# Your solution here
def transform_sentence(sentence):
    # Remove extra spaces and split into words
    words = sentence.strip().split()
    # Capitalize each word and join back with single spaces
    capitalized = " ".join([word.capitalize() for word in words])
    # Replace periods with exclamation marks
    result = capitalized.replace(".", "!")
    return result

# Test the function
original = "   this is a  simple sentence.  with extra spaces.   "
transformed = transform_sentence(original)
print(f"Original: '{original}'")
print(f"Transformed: '{transformed}'")

## String Documentation

Python provides comprehensive documentation for all string methods. You can access it in several ways:

In [None]:
# Using help() on the string type
help(str)

In [None]:
# Using help() on a specific string method
help(str.replace)

You can also find the complete documentation online at:
- [Python String Methods](https://docs.python.org/3/library/stdtypes.html#string-methods)

For a quick video explanation on methods: https://youtu.be/dbU91k-C5aY

## Summary

In this lesson, we've covered the fundamental aspects of Python strings:

1. **Creating Strings**: Using single or double quotes
2. **Multi-line Strings**: Using triple quotes or line continuation
3. **String Properties**: Ordered, immutable sequences of characters
4. **String Operations**:
   - Indexing to access individual characters
   - Slicing to extract substrings
   - Concatenation to combine strings
   - Methods to transform and analyze strings

Strings are one of the most commonly used data types in Python and are essential for working with text data, file paths, user input, and much more. The skills you've learned in this lesson will be valuable throughout your Python programming journey.

## Practice Exercises

Here are a few exercises to practice what you've learned. Try to solve them on your own before looking at the solutions.

### Exercise 1: Name Formatter

Write a function that takes a full name (first and last name) and returns it in the format "Last, First". For example, "John Smith" should become "Smith, John".

In [None]:
# Your solution here


### Exercise 2: Email Validator

Write a function that checks if a string looks like a valid email address. For simplicity, consider an email valid if it contains an @ symbol with text before and after it, and ends with ".com", ".org", or ".edu".

In [None]:
# Your solution here


### Exercise 3: Word Counter

Write a function that counts the number of words in a sentence. For simplicity, assume words are separated by spaces.

In [None]:
# Your solution here
