# Manipulating Strings


<table align="left">
  <td>
    <a href="https://colab.research.google.com/github/phonchi/nsysu-math106A/blob/master/static_files/presentations/06_Manipulating_string.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
  </td>
  <td>
    <a target="_blank" href="https://kaggle.com/kernels/welcome?src=https://github.com/phonchi/nsysu-math106A/blob/master/static_files/presentations/06_Manipulating_string.ipynb"><img src="https://kaggle.com/static/images/open-in-kaggle.svg" /></a>
  </td>
</table>

Text is one of the most common forms of data your programs will handle. You already know how to concatenate two `string` together with the `+` operator, but you can do much more than that! You can extract partial strings from `string` just like sequence, add or remove spacing, convert letters to lowercase or uppercase, and check that `strings` are formatted correctly!

## String

There are several ways to create a new `string`; the simplest is to enclose the elements in single or double quotes:

In [17]:
type(''), type("")

(str, str)

> One benefit of using double quotes is that the string can have a single quote character in it.

In [18]:
print("I'am fine")

I'am fine


A `string` is a **sequence that maps index to case sensitive characters and thus belongs to sequence data type**. Anything that we can apply to the sequence can also be applied to `string`. For instance, you can access the **items (characters)** one at a time with the bracket operator:

In [19]:
fruit = 'banana'
fruit[1]

'a'

So "b" is the 0th letter ("zero-th") of "banana", "a" is the 1th letter ("one-th"), and "n" is the 2th ("two-th") letter.

<center><img src="https://www.py4e.com/images/string.svg"></center>
<div align="center"> source: https://www.py4e.com/html3/06-strings </div>

`len()` can be used to return the number of characters in a `string`:

In [20]:
len(fruit)

6

We can use negative indices, which count backward from the end of the string.

In [21]:
fruit[-1], fruit[-2]

('a', 'n')

Slicing also works on `string` to extract a substring from the original string. Remember that we can slice sequences using `[start:stop:step]`. The operator `[start:stop]` returns the part of the string from the “start-th” character to the “stop-th” character, including the first but excluding the last with `step=1`. If we omit the first index (before the colon), the slice starts at the beginning of the `string`. If we omit the second index, the slice goes to the end of the `string`:

In [22]:
s = 'Cool-Python'

print(s[:5]) #same as s[0:5] 
print(s[5:]) #same as s[5:len(s)] 
print(s[::2]) #same as s[0:len(s):2]
print(s[::]) #same as s[:] and s[0:len(s):1] => copy the string
print(s[::-1]) #same as s[-1:-(len(s)+1):-1] => reverse the string

Cool-
Python
Co-yhn
Cool-Python
nohtyP-looC


`Strings` are "immutable", which means that it cannot be modified:

In [23]:
s = "hello"
s[0] = 'y' 

TypeError: 'str' object does not support item assignment

The "object" in this case, is the `string` and the "item" is the character you tried to assign. The best you can do is create a new `string` that is a variation on the original:

In [24]:
print(id(s))
s = 'y' + s[1:len(s)]
print(id(s))
print(s)

1673653634096
1673687868400
yello


A lot of computations involve processing a `string` one character at a time. Often they start at the beginning, select each character in turn, do something to it, and continue until the end. The traversal of `string` is just like we see before:

In [25]:
# Test if s contains 'o'
for char in s: # Retrieve item (character) one by one
    if char == 'o':
        print("There is an o")
        break

There is an o


The `in` and `not in` operators can be used with `strings` just like with `list`. An expression with two strings joined using `in` or `not in` will evaluate to a Boolean `True` or `False`:

In [26]:
print('Hello' in 'Hello, World')
print('cats' not in 'cats and dogs')

True
False


### Escape Characters

An ***escape character*** lets you use characters that are otherwise impossible to put into a `string`. An escape character consists of a backslash (`\`) followed by the character you want to add to the string. (Despite consisting of two characters, it is commonly referred to as a singular escape character.) For example, the escape character for a single quote is `\'`. You can use this inside a string that begins and ends with single quotes

In [33]:
spam = 'Say hi to Bob\'s mother.'
spam

"Say hi to Bob's mother."

Python knows that since the single quote in `Bob\'s` has a backslash, it is not a single quote meant to end the `string`. The escape characters `\'` and `\"` let you put single quotes and double quotes inside your strings, respectively.

<center>

| Escape character | Prints as            |
|------------------|----------------------|
| `\'`               | Single quote         |
| `\"`               | Double quote         |
| `\\`               | Backslash            |
| `\t`               | Tab                  |
| `\n`               | Newline (line break) |

</center>

In [37]:
print("Hello there!\nHow are you?\n\tI\'m doing fine.")

Hello there!
How are you?
	I'm doing fine.


While you can use the `\n` escape character to put a newline into a `string`, it is often easier to use ***multiline strings***. A multiline string in Python begins and ends with either three single quotes or three double quotes. Any quotes, tabs, or newlines in between the "triple quotes" are considered part of the `string`.

In [35]:
print('''Hello there,
How are you?
        I'm doing fine
''')

Hello there,
How are you?
        I'm doing fine



Notice that the single quote character in `I'm` does not need to be escaped. Escaping single and double quotes is optional in multiline strings.

#### Raw Strings

You can place an `r` before the beginning quotation mark of a `string` to make it a ***raw string***. **A raw string completely ignores all escape characters** and prints any backslash that appears in the `string`. 

In [43]:
print(r'That is Carol\'s cat.')

That is Carol\'s cat.


Because this is a raw string, Python considers the backslash as part of the `string` and not as the start of an escape character. Raw strings are helpful if you are typing strings that contain many backslashes, such as the `strings` used for Windows file paths like `r'C:\Users\Al\Desktop'`.

### Putting Strings Inside Other Strings

Putting `strings` inside other `strings` is a common operation in programming. So far, we've been using the `+` operator and string concatenation to do this:

In [11]:
name = 'Al'
age = 33
language = 'Python'
print("\nHey! I'm " + name + ", " + str(age)+ " old and I love " + language + " Programing")


Hey! I'm Al, 33 old and I love Python Programing


However, this requires a lot of tedious typing. A simpler approach is to use ***string interpolation***. The format operator, `%` allows us to construct `strings`, replacing parts of the `strings` with the data stored in variables. **When applied to integers, `%` is the modulus operator. But when the first operand is a `string`, `%` is the format operator.**

The first operand is the ***format string***, which contains one or more ***format specifiers*** that specify how the second operand is formatted. The result is a `string`. For example, the format specifiers `%d` means that the second operand should be formatted as an integer ("d" stands for "decimal"). One benefit is that `str()` doesn’t have to be called to convert values to `strings`:

<center><img src="https://miro.medium.com/v2/resize:fit:720/format:webp/1*xL6ZLRAkizoZAb1z3nCtfA.png"></center>
<div align="center"> source: https://towardsdatascience.com/python-string-interpolation-829e14e1fc75 </div>

In [7]:
print("\nHey! I'm %s, %d years old and I love %s Programing"%(name,age,language))   # Like the printf in C


Hey! I'm Al, 33 years old and I love Python Programing


We can have more control over the formatting, for instance:

<center><img src="https://drive.google.com/uc?id=1apa_s6B69AbXNFpxUTyXD14bxmSUQ3WI"></center>
<div align="center"> source: https://refactored.ai/microcourse/notebook?path=content%2F02-Python_for_Data_Scientists%2F03-Data_Structures_in_python%2F01-Basic_data_types_and_operators.ipynb </div>

In [47]:
a = 32
b = 32.145
print('a=%4d, b=%6.2f' % (a,b))

a=  32, b= 32.15


By default, Python right-aligns numbers and left-aligns other values such as strings. The numbers after `%` is the total field width and the field width for the decimal part (separated by `.`). For values with fewer characters than the field width, the remaining character positions are filled with spaces. The `%f` is used to format floating points and note that variable `b` has been rounded.

Python 3.6 introduced ***f-strings*** (The `f` is for format), which is similar to string interpolation except that braces are used instead of `%s`, with the expressions placed directly inside the braces. Like raw strings, f-strings have an `f` prefix before the starting quotation mark. (Note that it is even possible to do inline arithmetic)

In [13]:
print(f"\nHey! I'm {name}, {age+2} years old and I love {language} Programing") 


Hey! I'm Al, 35 years old and I love Python Programing


We can have more control with the f-string besides the field width, like specifying left, right and center alignment with `<`, `>` and `^`. Note now the format specifiers are placed after the variable separated by a colon:

In [50]:
print(f'[{a:<15d}]')
print(f'[{b:^9.2f}]')

[32             ]
[  32.15  ]


In addition, you can use `+` before the field width specifies that a positive number should be preceded by a `+`. A negative number always starts with a `-`.  To fill the remaining characters of the field with 0s rather than spaces, place a `0` before the field width (and after the `+` if there is one):

In [51]:
print(f'[{a:+10d}]')
print(f'[{a:+010d}]')

[       +32]
[+000000032]


See https://docs.python.org/3/library/string.html#formatspec for more details.

> Yet another is the `format()` method, see https://realpython.com/python-string-formatting/#toc for more details.

### Exercise 1: Assuming we are designing a word game called "The Mysterious Island" and we need to print the statistics of the player each time the game begins. Try to complete the following function that receives the variables from the game and displays the information that right aligns with each other using the f-string:

```
Player1 Stats:
Health:     100/100
Experience:   0/150
Gold:   50.00/60.00


Player2 Stats:
Health:      60/100
Experience: 120/150
Gold:   40.00/60.00
```

Hint: You can first calculate the maximal width required for each row.


<center><img src="https://cdn.leonardo.ai/users/f26a2ba8-8273-45e9-8db9-958f83058486/generations/052c570f-1830-4947-81f0-7bc603a7e891/Leonardo_Creative_A_vibrant_detailed_illustration_of_a_myster_0.jpg"></center>

In [54]:
def print_stats(player_name, health, experience, gold):
    print(f"{player_name} Stats:")
    print(f"Health:____/100")
    print(f"Experience:______/150")
    print(f"Gold:_____/60.00")

In [55]:
game_title = "The Mysterious Island"

welcome_message = f'Welcome to "{game_title}" adventure!\n\n'
# 1. Print the welcome_message
print(welcome_message)

# 2. Use string and number formatting to print out the statistics
player_name = "Player1"
health = 100
experience = 0
gold = 50.000

print_stats(player_name, health, experience, gold)

print("\n")

player_name = "Player2"
health = 60
experience = 120
gold = 40.0

print_stats(player_name, health, experience, gold)

Welcome to "The Mysterious Island" adventure!


Player1 Stats:
Health:     100/100
Experience:   0/150
Gold:   50.00/60.00


Player2 Stats:
Health:      60/100
Experience: 120/150
Gold:   40.00/60.00


### String method

`Strings` are an example of Python objects. An object contains both data (the actual `string` itself) and methods, which are effective functions that are built into the object and are available to any instance of the object.

Python has a function called `dir()`, which lists the methods available for an object. 

In [56]:
dir(s)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'removeprefix',
 'removesuffix',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',


While the `dir()` function lists the methods, and you can use `help()` to get some simple documentation on a method, a better source of documentation for `string` methods would be https://docs.python.org/library/stdtypes.html#string-methods.

#### The `upper()`, `lower()` Methods

The `upper()` and `lower()` string methods return a **new `string`** where all the letters in the original `string` have been converted to uppercase or lowercase:

In [57]:
spam = 'Hello, world!'
spam = spam.upper()
print(spam)
spam = spam.lower()
print(spam)

HELLO, WORLD!
hello, world!


Note that these methods do not change the `string` itself but return new `string` values. If you want to change the original `string`, you have to call `upper()` or `lower()` on the string and then assign the new string to the variable where the original was stored. This is why you must use `spam = spam.upper()` to change the string in spam instead of simply `spam.upper()`. (This is just like if a variable `eggs` contains the value 10. Writing `eggs + 3` does not change the value of `eggs`, but `eggs = eggs + 3` does.). These data types are immutable and can not be modified in-place.

The `upper()` and `lower()` methods are helpful if you need **to make a case-insensitive comparison**. For example, the strings `'great'` and `'GREat'` are not equal to each other. But in the following small program, it does not matter whether the user types `Great`, `GREAT`, or `grEAT`, because the `string` is first converted to lowercase.

In [59]:
print('How are you?')
feeling = input()
if feeling.lower() == 'great':
    print('I feel great too.')
else:
    print('I hope the rest of your day is good.')

How are you?
I feel great too.


#### The `isX()` Methods

There are several other `string` methods that have names beginning with the word `is`. These methods return a Boolean value that describes the nature of the `string`. Here are some common `isX()` string methods:

- `isupper()/islower()` Returns `True` if the string has at least one letter and all the letters are uppercase or lowercase

- `isalpha()` Returns `True` if the string consists only of letters and isn't blank

- `isalnum()` Returns `True` if the string consists only of letters and numbers and is not blank

- `isdecimal()` Returns `True` if the string consists only of numeric characters and is not blank

- `isspace()` Returns `True` if the string consists only of spaces, tabs, and newlines and is not blank

- `istitle()` Returns `True` if the string consists only of words that begin with an uppercase letter followed by only lowercase letters

In [61]:
print('Hello, world!'.islower()) 
print('hello, world!'.islower())
print('hello'.isalpha())
print('hello123'.isalnum())
print('hello123'.isdecimal())
print(' '.isspace())
print('This Is Title Case'.istitle())

False
True
True
True
False
True
True


The `isX()` string methods are helpful when you need to validate user input. For example, the following program repeatedly asks users for their `age` and a `password` until they provide valid input:

In [62]:
while True:
    print('Enter your age:')
    age = input()
    if age.isdecimal():
        break
    print('Please enter a number for your age.')

while True:
    print('Select a new password (letters and numbers only):')
    password = input()
    if password.isalnum():
        break
    print('Passwords can only have letters and numbers.')

Enter your age:
Please enter a number for your age.
Enter your age:
Please enter a number for your age.
Enter your age:
Select a new password (letters and numbers only):
Passwords can only have letters and numbers.
Select a new password (letters and numbers only):


In the first while loop, we ask the user for their age and store their input in `age`. If `age` is a valid (decimal) value, we break out of this first while loop and move on to the second, which asks for a `password`. Otherwise, we inform the user that they need to enter a number and again ask them to enter their `age`. In the second while loop, we ask for a `password`, store the user's input in `password`, and break out of the loop if the input is alpha­numeric. If it wasn't, we're not satisfied, so we tell the user the `password` needs to be alphanumeric and again ask them to enter a password.

#### The `startswith()` and `endswith()` Methods

The `startswith()` and `endswith()` methods return `True` if the `string` they are called on begins or ends (respectively) with the `string` passed to the method; otherwise, they return `False`:

In [14]:
print('Hello, world!'.startswith('Hello'))
print('abc123'.endswith('12'))

True
False


#### The `replace()` methods

The `replace()` function is like a “search and replace” operation in a word processor:

In [15]:
greet = 'Hello Bob'
nstr = greet.replace('Bob','Jane')
print(nstr)

Hello Jane


#### The `join()` and `split()` Methods

The `join()` method is useful when you have a list of strings that need to be joined together into a single `string`. The `join()` method is called on a `string`, gets passed a list of strings, and returns a `string`. The returned `string` is the concatenation of each `string` in the passed-in list. 

In [24]:
print(', '.join(['cats', 'rats', 'bats']))     #Separated by comma
print(' '.join(['My', 'name', 'is', 'Simon'])) #Separated by white space

cats, rats, bats
My name is Simon


Notice that the string `join()` calls on is inserted between each string of the list argument. For example, when `join(['cats', 'rats', 'bats'])` is called on the `', '` string, the returned string is `'cats, rats, bats'`.

The `split()` method does the opposite: It’s called on a string and returns a list of strings.

In [17]:
'My name is Simon'.split()

['My', 'name', 'is', 'Simon']

By default, the `string` 'My name is Simon' is split wherever whitespace characters such as the space, tab, or newline characters are found. These whitespace characters are not included in the strings in the returned list. You can pass a delimiter string to the `split()` method to specify a different string to split upon:

In [22]:
'cats, rats, bats'.split(',')

['cats', ' rats', ' bats']

A common use of `split()` is to split a multiline string along the newline characters:

In [25]:
spam = '''Dear Alice,
How have you been? I am fine.
There is a container in the fridge
that is labeled "Milk Experiment."

Please do not drink it.
Sincerely,
Bob'''

spam.split('\n')

['Dear Alice,',
 'How have you been? I am fine.',
 'There is a container in the fridge',
 'that is labeled "Milk Experiment."',
 '',
 'Please do not drink it.',
 'Sincerely,',
 'Bob']

Passing `split()` the argument `'\n'` lets us split the multiline string stored in `spam` along the newlines and return a list in which each item corresponds to one line of the `string`.

#### Removing Whitespace with the `strip()`, `lstrip()` and `rstrip()` Methods

Sometimes you may want to strip off whitespace characters (space, tab, and newline) from the left side, right side, or both sides of a string. The `strip()` string method will return a new string without any whitespace characters at the beginning or end. The `lstrip()` and `rstrip()` methods will remove whitespace characters from the left and right ends, respectively.

In [26]:
spam = '    Hello, World    '
spam.strip()

'Hello, World'

In [27]:
spam.lstrip()

'Hello, World    '

In [28]:
spam.rstrip()

'    Hello, World'

### Exercise 2: When editing the markdown document, you can create a bulleted list by putting each list item on its own line and placing a `-` in front. But say you have a really large list to which you want to add bullet points. You could just type those `-` at the beginning of each line, one by one. Or you could automate this task with a short Python program! For example, if I have following text:

```
Lists of resources
Lists of books
Lists of videos
Lists of blogs
```

After running the program, the text should contain the following:

```
- Lists of resources
- Lists of books
- Lists of videos
- Lists of blogs
```

<center><img src="https://cdn.leonardo.ai/users/f26a2ba8-8273-45e9-8db9-958f83058486/generations/5b879f08-09e2-424e-ad34-4b395dbe81e0/Leonardo_Creative_editing_the_markdown_document_you_can_create_3.jpg"></center>

In [30]:
text = """Lists of resources
Lists of books
Lists of videos
Lists of blogs"""

# 1. Separate lines into list using string method.
lines = text.________

# 2. Add -
for i, line in enumerate(lines):    # loop through all indexes for "lines" list
    lines[i] = ________             # add - to each string in "lines" list

# 3. Use string method to conctenate list of strings back to string
text = _____________
print(text)

- Lists of resources
- Lists of books
- Lists of videos
- Lists of blogs


Text is a common form of data, and Python comes with many helpful string methods to process the text stored in `string`. You will make use of indexing, slicing, and string methods in almost every Python program you write. The programs you are writing now don’t seem too sophisticated—they don’t have graphical user interfaces with images and colorful text. So far, you’re displaying text with `print()` and letting the user enter text with `input()`. However, another way to manipulate large amounts of text is by reading and writing files directly off the hard drive. You’ll learn how to do this with Python later on.

That just about covers all the basic concepts of Python programming! You’ll continue to learn new concepts throughout the rest of this course, but you now know enough to start writing some useful programs that can automate tasks. If you’d like to see a collection of short, simple Python programs built from the basic concepts you’ve learned so far, check out https://github.com/asweigart/pythonstdiogames/. Try copying the source code for each program by hand and then make modifications to see how they affect the behavior of the program. Once you have an understanding of how the program works, try re-creating the program yourself from scratch. You don’t need to re-create the source code exactly; just focus on what the program does rather than how it does it.