# Strings and String Methods

- In this chapter, you will learn how to:
> - **Manipulate** strings with **string methods**,
> - Work with **user input**,
> - Deal with **strings of numbers**,
> - **Format strings** for printing.

## 1. What is a String?

### 1.1. The String Data Type

- Strings are one of the **fundamental Python data types**:
> - The term **data type** refers to what **kind** of data a value represents, strings are used to **represent text**,
> - Strings are a **fundamental** data type because they **can’t be broken down into smaller values of a different type**,
> - There are **more compound** data types (also known as **data structures**) that may **share some similarities with strings**.

- The string data type has a **special abbreviated name in Python**: `str`, you can see this by using the `type()` function, which is used to **determine the data type of a given value**:

In [1]:
type("GDSC - Getting Started with Python")

str

- Strings **have three properties** that you’ll explore in the coming sections:
> - Strings **contain characters**, which are individual letters or symbols, **surrounded by single, double or triple quotes**.
> - Strings **have a length**, which is the number of characters contained in the string,
> - Characters in a string appear in a **sequence**, meaning each character has a **numbered position** in the string.

- You can **create a string** by:
> - Simply typing the string explicitly (**string literal** - a string literal is a string value that is written explicitly in your code),
> - Using the `str()` function (**constructor**),
> - Capturing a string as **user input**.

In [2]:
year = "2023"  # String literal.
type(year)

str

In [3]:
year = str(2023)  # Constructor - Not a string literal.
type(year)

str

In [4]:
user_input = input("Name: ")  # Capture user input as a string.
type(user_input)

Name:  Guido Van Rossum


str

### 1.2. String Literals

- As you’ve already seen, you can create a string by **surrounding some text with quotation marks**.

In [5]:
course = 'GDSC - Getting Started with Python'
year = "2023"

- Either **single** quotes <`course`> or **double** quotes <`year`> can be used to create a string, **as long as both quotation marks are the same type**.
- The **quotes** surrounding a string are called **delimiters** because they **tell Python where a string begins and where it ends**.
- When one type of quotes is used as the delimiter, **the other type of quote can be used inside of the string**:
> - After Python reads the **first delimiter**, all of the characters after it are considered a part of the string until a **second matching delimiter** is read,
> - This is why you can use a **single quote in a string delimited by double quotes and vice versa**.

In [6]:
email = "I'm going to be a little late today"  # A Single quote inside two double quotes.
reply = 'He said, "It is ok!"'  # Double quotes inside two single quotes.

- If you try to use **double quotes** inside of a string that is delimited by **double quotes**, you will get an **error** because Python **doesn’t know how to interpret the rest of the line**:

In [7]:
email = 'I'm going to be a little late today'  # A Single quote inside two single quotes - raises an Error!

SyntaxError: unterminated string literal (detected at line 1) (896666338.py, line 1)

In [8]:
reply = 'He said, 'It is ok!''  # Single quotes inside two single quotes - also raises an Error!

SyntaxError: invalid syntax (2366019624.py, line 1)

- Whethe you prefer using **single** or **double** quotes, keep in mind that **there isn’t really a right or wrong choice!** The goal is to **be consistent**, because consistency helps make your code easier to read and understand.

- You can **break the string up** across multiple lines into a **multiline string**, suppose you need to fit the following **[text](https://peps.python.org/pep-0020/#the-zen-of-python)** into a **string literal**:

> Beautiful is better than ugly.<br>
Explicit is better than implicit.<br>
Simple is better than complex.<br>
Complex is better than complicated.<br>
Flat is better than nested.<br>
Sparse is better than dense.<br>
Readability counts.<br>
Special cases aren't special enough to break the rules.<br>
Although practicality beats purity.<br>
Errors should never pass silently.<br>
Unless explicitly silenced.<br>
In the face of ambiguity, refuse the temptation to guess.<br>
There should be one-- and preferably only one --obvious way to do it.<br>
Although that way may not be obvious at first unless you're Dutch.<br>
Now is better than never.<br>
Although never is often better than *right* now.<br>
If the implementation is hard to explain, it's a bad idea.<br>
If the implementation is easy to explain, it may be a good idea.<br>
Namespaces are one honking great idea -- let's do more of those!<br>

- This paragraph contains far **more than 79 characters**, so any line of code containing the paragraph as a string literal **violates [PEP 8](https://peps.python.org/pep-0008/)**, **what do you do?**
> - One way is to **break the string up** across multiple lines and put a **backslash** (`\`) at the end of all but the last line,
> - Or you can create a **multiline string** using triple quotes as delimiters (`"""` or `'''`).

In [9]:
# Use a backslash:
zen_of_python = "Beautiful is better than ugly.\
Explicit is better than implicit.\
Simple is better than complex.\
Complex is better than complicated.\
Flat is better than nested.\
Sparse is better than dense.\
Readability counts.\
Special cases aren't special enough to break the rules.\
Although practicality beats purity.\
Errors should never pass silently.\
Unless explicitly silenced.\
In the face of ambiguity, refuse the temptation to guess.\
There should be one-- and preferably only one --obvious way to do it.\
Although that way may not be obvious at first unless you're Dutch.\
Now is better than never.\
Although never is often better than *right* now.\
If the implementation is hard to explain, it's a bad idea.\
If the implementation is easy to explain, it may be a good idea.\
Namespaces are one honking great idea -- let's do more of those!"

In [10]:
print(zen_of_python)  # The output displayed on a single line.

Beautiful is better than ugly.Explicit is better than implicit.Simple is better than complex.Complex is better than complicated.Flat is better than nested.Sparse is better than dense.Readability counts.Special cases aren't special enough to break the rules.Although practicality beats purity.Errors should never pass silently.Unless explicitly silenced.In the face of ambiguity, refuse the temptation to guess.There should be one-- and preferably only one --obvious way to do it.Although that way may not be obvious at first unless you're Dutch.Now is better than never.Although never is often better than *right* now.If the implementation is hard to explain, it's a bad idea.If the implementation is easy to explain, it may be a good idea.Namespaces are one honking great idea -- let's do more of those!


In [11]:
# Use a multiline string
zen_of_python = """Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!"""

In [12]:
print(zen_of_python)  # Triple-quoted strings preserve whitespace.

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


- Triple-quoted strings **have a special purpose in Python**, they are used to **document code**:
> - You’ll often find them **at the top of a .py** with a description of the code’s purpose,
> - They are also used to **document custom functions**.
- When used to document code, triple-quoted strings are called **docstrings**.

- Strings can contain any valid **Unicode** character:

In [13]:
programming_language = "×Pýŧħøŋ×"
programming_language

'×Pýŧħøŋ×'

## 2. Concatenation, Indexing, and Slicing

### 2.1. String Concatenation

- Two strings can be combined, or **concatenated**, using the (`+`) **operator**:

In [14]:
first_name = "Guido"
last_name = "Van Rossum"
full_name = first_name + " " + last_name
full_name

'Guido Van Rossum'

### 2.2. Determine the Length of a String

- The **number of characters contained in a string**, including spaces, is called the **length** of the string.
- To determine a **string’s length**, you use Python’s built-in `len()` function:

In [15]:
letters = "abc"
len(letters)

3

### 2.3. String Indexing

- Each character in a string has a **numbered position** called an **index**.
- You can access the character at the **Nth** position by putting the number **N** in between **two square brackets** immediately after the string:

In [16]:
full_name

'Guido Van Rossum'

In [17]:
full_name[10]

'R'

- **Note that:**
> - In Python and most other programming languages **counting always starts at zero**,
> - **Whitespace** characters are counted,
> - **Forgetting that counting starts with zero** and trying to access the first character in a string with the index 1 results in an **IndexError**.

| G | u | i | d | o |   | V | a | n |   | R | o | s | s | u | m |
| --- | --- | --- | --- | --- | --- | --- |--- |--- |--- |--- |--- |--- |--- |--- |--- |
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |

In [18]:
nth = len(full_name)
full_name[nth]

IndexError: string index out of range

- Strings also support **negative indices**:

| G | u | i | d | o |   | V | a | n |   | R | o | s | s | u | m |
| --- | --- | --- | --- | --- | --- | --- |--- |--- |--- |--- |--- |--- |--- |--- |--- |
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
| -16 | -15 | -14 | -13 | -12 | -11 | -10 | -9 | -8 | -7 | -6 | -5 | -4 | -3 | -2 | -1 |

In [None]:
full_name[-1]

- Just like positive indices, Python raises an **IndexError** if you try to access a negative index **less than** the index of the first character in the string:

In [19]:
full_name[-17]

IndexError: string index out of range

### 2.4. String Slicing

- Suppose you need the string containing **just the first name**, you could access each character by index and concatenate them, like this:

In [20]:
full_name[0] + full_name[1] + full_name[2] + full_name[3] + full_name[4]

'Guido'

- Fortunately, Python provides an **easier way** to do this, you can extract a portion of a string, called a **substring**, by inserting a **colon** (`:`) between two index numbers inside of square brackets, like this:

In [21]:
full_name[0:5]

'Guido'

- **Note that:**
> - The `[0:5`] part of `full_name[0:5]` is called a **slice**.
> - `full_name[0:5]` returns the first five characters, **starting** with the character with index `0` and **going up to, but not including**, the character with index `5`.

| G | u | i | d | o |   | V | a | n |   | R | o | s | s | u | m |
| --- | --- | --- | --- | --- | --- | --- |--- |--- |--- |--- |--- |--- |--- |--- |--- |
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
| -16 | -15 | -14 | -13 | -12 | -11 | -10 | -9 | -8 | -7 | -6 | -5 | -4 | -3 | -2 | -1 |

- The **slice** `[:5]` is **equivalent to** the slice `[0:5]`:

In [22]:
full_name[:5]

'Guido'

In [23]:
full_name[0:5]

'Guido'

- Similarly, The **slice** `[6:]` is **equivalent to** the slice `[6:16]`:

In [24]:
full_name[6:]

'Van Rossum'

In [25]:
full_name[6:16]

'Van Rossum'

- If you **omit both the first and second numbers in a slice**, you get a string that starts with the character with **index** `0` and ends with the **last character**:

In [26]:
full_name[:]

'Guido Van Rossum'

- It’s important to **note that**, unlike string indexing, **Python won’t raise an IndexError** when you try to slice between boundaries **before or after** the beginning and ending boundaries of a string:

In [27]:
full_name[:20]

'Guido Van Rossum'

In [28]:
full_name[20:]

''

- You can use **negative numbers** in slices with the same rules:

| G | u | i | d | o |   | V | a | n |   | R | o | s | s | u | m |
| --- | --- | --- | --- | --- | --- | --- |--- |--- |--- |--- |--- |--- |--- |--- |--- |
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
| -16 | -15 | -14 | -13 | -12 | -11 | -10 | -9 | -8 | -7 | -6 | -5 | -4 | -3 | -2 | -1 |

In [29]:
full_name[-16:-11]

'Guido'

- Notice, however, that the **right-most boundary does not have a negative index**, the logical choice for that boundary would seem to be the **number** `0`, but that **doesn’t work**:

In [30]:
full_name[-10:0]

''

- If you need to include the final character of a string in your slice, you can **omit the second number**:

In [31]:
full_name[-10:]

'Van Rossum'

### 2.5. Strings are Immutable

- Strings are **immutable**, which means that you can’t change them once you’ve created them.
- **What happens when you try to assign a new letter to one particular character of a string?**

In [32]:
# Create a string and check its id address:
full_name = "Guido"
print(full_name)
print(id(full_name))

Guido
1776239461872


In [33]:
# Concatenate another string and double-check its id address:
full_name += " Van Rossum"
print(full_name)
print(id(full_name))

Guido Van Rossum
1776239367136


## 3.1. Manipulate Strings with Methods

- Strings come bundled with special functions called **[string methods](https://www.w3schools.com/python/python_strings_methods.asp)** that can be used to **work with and manipulate** strings.
- There are numerous string methods available, but we’ll focus on some of the **most commonly used ones**.

In [34]:
full_name = "Guido Van Rossum"

### Converting String Case

- To convert a string to **all upper case letters**, you use the string’s `.upper()` method:

In [35]:
full_name.upper()

'GUIDO VAN ROSSUM'

- The **opposite** of the `.upper()` method is the `.lower()` method, which **converts every character in a string to lower case**:

In [36]:
full_name.lower()

'guido van rossum'

- The `.casefold()` method returns a string where **all the characters are lower case**.

In [37]:
full_name.casefold()

'guido van rossum'


- **What is the difference between `.lower()` and `.casefold()`?**
>- The `.casefold()` method is **similar to** the `.lower()` method, but the `.casefold()` method **is stronger, more aggressive**, meaning that it will convert more characters into lower case, and will find more matches when comparing two strings and both are converted using the `.casefold()` method,
>- The `.casefold()` method is **more useful for caseless comparison**.

In [38]:
# Load str_01 & str_02 from case_insensitive_comparison_examples.py (Assets):
%run "../Assets/case_insensitive_comparison_examples.py"

In [39]:
str_01.lower() == str_02.lower()

False

In [40]:
str_01.casefold() == str_02.casefold()

True

In [41]:
print(str_01)
print(str_02)

Straße
strasse


- The `.capitalize()` method returns a string where the **first character is upper case**, and the **rest is lower case**:

In [42]:
full_name_lower_cased = full_name.lower()
full_name_lower_cased

'guido van rossum'

In [43]:
full_name_lower_cased.capitalize()

'Guido van rossum'

- The `.title()` method returns a string where the **first character in every word is upper case**:

In [44]:
full_name_lower_cased.title()

'Guido Van Rossum'

### Removing Whitespace From a String

- **Whitespace** is **any character that is printed as blank space**, this includes things like **spaces** and **line feeds**:

In [45]:
full_name = " Guido Van Rossum"
full_name[0]

' '

- Sometimes you need to **remove whitespace from the beginning or end of a string**, this is especially useful when working with strings that come from **user input, where extra whitespace characters may have been introduced by accident**:

In [46]:
user_input = input("Full name: ")
user_input

Full name:       Guido Van Rossum


'     Guido Van Rossum'

- There are **three string methods** that you can use to **remove whitespace** from a string:
> 1. `.rstrip()`: **removes any white spaces at the end of the string**,
> 2. `.lstrip()` : **removes any white spaces at the start of the string**,
> 3. `.strip()`: **removes any white spaces at both ends**.

In [47]:
user_input.rstrip()

'     Guido Van Rossum'

In [48]:
user_input.lstrip()

'Guido Van Rossum'

In [49]:
user_input.strip()

'Guido Van Rossum'

- **None** of the `.rstrip()`, `.lstrip()`, and `.strip()` methods **remove whitespace from the middle of the string**.

In [50]:
user_input = input("Full name: ")
user_input

Full name:    Guido Van Rossum


'  Guido Van Rossum'

In [51]:
print(user_input.rstrip())
print(user_input.lstrip())
print(user_input.strip())

  Guido Van Rossum
Guido Van Rossum
Guido Van Rossum


### Determine if a String Starts or Ends with a Particular String

- When you work with text, sometimes you need to **determine if a given string starts with or ends with certain characters**, you can use two string methods to solve this problem:
> 1. `.startswith()`: **returns True if the string starts with the specified value, otherwise False**,
> 2. `.endswith()`: **returns True if the string ends with the specified value, otherwise False**.

In [52]:
user_input = input("File name: ")
user_input

File name:  GDSC - Getting Started with Python.py


'GDSC - Getting Started with Python.py'

In [53]:
user_input.endswith(".py")

True

In [54]:
user_input.startswith("GDSC")

True

- They are **both case-sensitive**.

In [55]:
user_input.endswith(".PY")

False

In [56]:
user_input.startswith("gdsc")

False

### Splitting and Joining Strings

- If you need to **split a string into several parts**, you can use two string methods to solve this problem:
> - `.split()`: **splits a string into a list starting from the left**,
> - `.rsplit()`: **splits a string into a list starting from the right**.

In [57]:
grocey_list = input("Enter your order here: ")
grocey_list.split()

Enter your order here:  apples bananas strawberries


['apples', 'bananas', 'strawberries']

- You can specify the **separator**, default separator is **any whitespace**, also when **maxsplit** is specified, the list will contain the specified number of elements plus one:

In [58]:
full_path = "H:/Programming Lab/GDSC - Gettinf Started with Python.py"

In [59]:
# Get the directory path:
full_path.rsplit("/", maxsplit=1)[0]

'H:/Programming Lab'

In [60]:
# Get the file name:
full_path.rsplit("/", maxsplit=1)[-1]

'GDSC - Gettinf Started with Python.py'

In [61]:
# Get the file extension:
full_path.rsplit(".")[-1]

'py'

- If you need to do the opposite, **join all items in a list or a tuple into a string, using a character as separator**, you can use a string methods called `.join()` which **takes all items in an iterable and joins them into one string**:

In [62]:
dir_list = ["H:", "Programming Lab"]

In [63]:
# Rebiuld the directory path:
directory = "/".join(dir_list)
directory

'H:/Programming Lab'

## 4. String and Operators

### Arithmetic Operators

- The (`+`) operator **concatenates** two string together:

In [64]:
n = "5"
n + n

'55'

- Strings can be **multiplied by a number** as long as that number is an **integer**:

In [65]:
asterisk = "*"
asterisk * 5

'*****'

- **What do you think happens if you use the (`*`) operator between two strings?**

In [66]:
asterisk * n

TypeError: can't multiply sequence by non-int of type 'str'

- **What do you think happens when you try to add a string and a number?**

In [67]:
asterisk + 5

TypeError: can only concatenate str (not "int") to str

- The `TypeError` errors you saw in the previous section highlight a common problem encountered when working with user input, **type mismatches** when trying to use the input in an operation that requires a number and not a string.
- Let’s look at an **example**:
> Circle Area = $2 * PI * r$

In [68]:
# Get the radius from user input:
r = input("Radius: ")

# Calculate the result:
PI = 22/7
area = 2 * PI * r
area

Radius:  5


TypeError: can't multiply sequence by non-int of type 'float'

In [69]:
# Get the radius from user input & cast it into an intefer:
r = input("Radius: ")
r = float(r)

# Calculate the result:
PI = 22/7
area = 2 * PI * r
area

Radius:  5


31.428571428571427

- Strings in Python have a **unique built-in operation** that can be accessed with the (`%`) operator, it’s a shortcut that lets you do **simple positional formatting** very easily:

In [70]:
name = "Moheb"
age = 36
"My name is %s, I am %d years old!" % (name, age)  # Old-style formatting.

'My name is Moheb, I am 36 years old!'

### Comparisom Operators

In [71]:
a, b = "a", "b"
a > b

False

In [72]:
ord("a"), ord("b")

(97, 98)

In [73]:
course_01 = "GDSC - Getting Started with Python"
course_02 = "GDSC - Getting Started with Python"
course_01 == course_02

True

### Membership Operators

In [74]:
course = "GDSC - Getting Started with Python"
"Python" in course

True

### Identitiy Operators

In [75]:
course_01 = "GDSC - Getting Started with Python"
course_02 = "GDSC - Getting Started with Python"
course_01 is course_02

False

In [76]:
lang_01 = "python"
lang_02 = "python"
lang_01 is lang_02

True

## 5. “New Style” String Formatting

- This **new style** string formatting **gets rid of the (`%`) operator special syntax** and makes the syntax for string formatting more regular.
- Formatting is now handled **either by**:
> - **Calling** a `.format()` method on a string object,
> - Literal string **interpolation** (Python 3.6+).

In [77]:
name = "Moheb"
age = 36
"My name is {}, I am {} years old!".format(name, age)  # Calling .format() method.

'My name is Moheb, I am 36 years old!'

In [78]:
name = "Moheb"
age = 36
f"My name is {name}, I am {35 + 1} years old!"  # Literal string interpolation.

'My name is Moheb, I am 36 years old!'