<div class="pagebreak"></div>

# Strings
Programmers constantly deal with text - reading it, processing it, and writing it.  Think of the last few times that you used your computer or smartphone.  How much did you deal with text versus numbers?  User names, messages, stock symbols, etc.

To be an effective programmer, you must understand how to use, manipulate, and format strings.

In Python 3, strings are now a sequence of Unicode characters. Unicode is an international standard to provide consistent encoding and representation of text.  Python 2 used ASCII for the character set instead. The ASCII standard provided a representation for 128 characters (only 95 of these were printable - the other 33 were control codes).  In most situations, this difference is insignificant as the first 127 characters of Unicode overlap with ASCII. However, if you ever deal with non-English characters, Unicode and the UTF-8 encoding to store and transmit those characters becomes critical. By default, UTF-8 is the encoding for Python files. 

- [Online Table of Unicode Characters](https://www.rapidtables.com/code/text/unicode-characters.html)
- [ASCII](https://www.rapidtables.com/code/text/ascii-table.html)  
- [Wikipedia: Unicode](https://en.wikipedia.org/wiki/Unicode)
- [Wikipedia: ASCII](https://en.wikipedia.org/wiki/ASCII)
- [Wikipedia: UTF-8](https://en.wikipedia.org/wiki/UTF-8)

The Unicode 14.0 standard, published in September 2021, defines 144,697 characters.

Sample Unicode characters: (Notice in the subsequent two code blocks that Unicode characters can be represented by their actual symbol or an escape sequence such as \\u20ac.

In [None]:
print('\u20ac')  # euro
print('\u2603')  # snowman

We can also convert Unicode characters to and from numbers with the `ord()` and `chr()` built-in functions.

In [None]:
print('A to #:',ord('A'))
print('65 to Unicode:',chr(65))
print('Unicode to #:',ord('☃'))
print('9731 to Unicode',chr(9731))

Try calling `ord()` with more than one character, such as the string 'hello', to see what occurs.

In [None]:
# make the call and see the resulting error.

Like Java, strings in Python are immutable &#10142; you can not change their value once defined; any alteration creates a new string object. However, in C and C++, string values can be altered.

## Creating Strings
You have already seen three different methods for creating strings through these notebooks:
1. Literals
2. Converting from other types with `str()`
3. `input()`

For literals, you can use either single quotes or double quotes.  What you use should be based upon any project team standards and convenience for creating strings if they contain quotes or double quotes.

In [None]:
print('Charles')
print("Dickens")
print("Charles Dickens's book, the Tale of Two Cities, is ... ")
print('Charles Dickens\'s book, the Tale of Two Cities, is ... ')

In the last example, notice that we escaped the ' in the middle of the string by using a backslash `\`. We can also escape double qoutes with `\"`.

In [None]:
print("How do you pronouce the word \"tomato\"?")

Common escape sequences:

|Escape<br>Sequence|Result|
|:----|:---|
|`\n` | newline character |
|`\t` | tab  (used to align text) |
|` \\ ` | \ |
|`\'` | '|
|`\"` | " |

In [None]:
print("It was the best of times,\nit was the worst of times,\nit was the age of wisdom,\nit was the age of foolishness...")

We can also create string literals with three single quotes(`'''`) or three double quotes(`"""`) rather than just `'` or `"`.  While triple quotes can be used for short strings, they are most commonly used to create multiline strings.  We've already seen this with some of the docstrings  in earlier notebooks.

In [None]:
# line continuation so we can start the string on the next line. 
# Note: no characters can occur after the "\" in a line
opening = \
"""It was the best of times,
it was the worst of times,
it was the age of wisdom,
it was the age of foolishness..."""
print(opening)

Triple quote strings are also convenient if you need to include single quotes or double quotes.

In [None]:
print("""John's book stated "Programming is fun!" """)  # syntax error with 4 quotes in a row. try removing the space

Use `str()` to convert another type into a string.

In the last line of the following block, we return the value of `b` rather than printing it to show the output difference from the previous line. If the last line of a code block is a value, Jupyter considers this a return value and displays it automatically.

In [None]:
print(type(1842))
x = str(1842)
print(type(x))
print(x)
print(type(None))
y = str(None)
print(type(y))
print(y)
b = str(True)
print(type(b))
print(b)
b

Using `input()` to get a string value from the user:

In [None]:
x = input("Enter something: ")
x

## Concatenation
We can concatenate (combine) two strings together with `+`

In [None]:
"Duck" + "Duck" + "Go"

We also concatenate literal strings by placing one immediately after another string:

In [None]:
"test"'test'"""test"""

Python does not automatically insert spaces or other characters when concatenating strings.

## Duplication
We can make multiple copies of a string with `*` followed by an integer.

In [None]:
print("test" * 5)
a = 2
s = "test"
print(s * a)

## Length
Use `len(str)` to get the length of a string.

In [None]:
len("test"*5)

## Accessing a Single Character
To get a single character, we can use square brackets with the character offset inside the bracket.  The offset starts at 0 (the leftmost character) and goes to the length of the string minus one.

In [None]:
digits = "0123456789"
print(digits[0])
print(digits[9])

You can also index characters with negative integers. For example, -1 specifies the rightmost character (same as length -1), and -2 specifies the one before.

In [None]:
print(digits[-1])
print(digits[-2])
print(digits[-10])
print(digits[-len(digits)])

If you attempt to access a character outside of a length of the string, you will receive an IndexError:

In [None]:
print(digits[10])

Notice that we can not use this access method to change characters in a string - the string is immutable!

In [None]:
digits[5] = 'A'

## Slicing for Substrings
In addition to representing text, strings are the first example of sequences in these notebooks. (Lists and tuples are sequences as well.)

As mentioned above, strings are sequences of Unicode characters.  The methods used below to manipulate strings apply to lists as tuples.

We can extract parts of a string (a _substring_) by using a _slice_. A slice is a range of index numbers separated by a colon within square brackets.

| Slice | Description
|---:|:-----
|[:]| extracts the entire sequence from start to end
|[_start_:] | specifies the sequence from the _start_ to the end
|[ : _end_ ]| specifies the sequence from the beginning to the _end_ offset - 1 
|[ _start_ : _end_ ] | specifies the sequence from the _start_ to the _end_ offset -1 
|[ _start_ : _end_ : _step_] | specifies the sequence from the _start_ to the _end_ offset -1, skipping characters by _step_


The following image summarizes how we use indexes and slices:<br> 
![](images/string-slicing.png)

<sub><sup>Source: https://www.faceprep.in/python/string-slicing-in-python/</sup></sub>

In [None]:
opening_line = "It was the best of times"
print(len(opening_line))
print(opening_line[:])
print(opening_line[19:])
print(opening_line[:2])
print(opening_line[3:6])
print(opening_line[1:10:2])

Try repeating the above code by using 'abcdefghijklmnopqrstuvwxyz' as the value for `opening_line`.

As with accessing characters, we can also use negative offsets with slices.  We can also combine positive and negative values.

In [None]:
digits = "0123456789"
print(digits[-3:])
print(digits[:-6])
print(digits[-6:-3])
print(digits[-6:9])
print(digits[-10:-1:2])

The Python interpreter returns an empty string if you specify a slice where the start offset occurs after the end offset.

In [None]:
digits[5:1]

Unlike accessing characters, if we specify an index out of the valid range, the minimum or maximum value is used in its place.

In [None]:
digits[0:15]

In [None]:
digits[-12:]

## Splitting Strings
To split a string into multiple strings based upon some separator within that string, you can call the method `split()`.  (Notice, we called `split()` a method rather than a function as it belongs to an object of type string.)  As shown in the below example, `split(' ')` creates a list of the character sequences separated by a space.

In [None]:
opening_line = "It was the best of times."
word_list = opening_line.split(' ')
print(word_list)
print(type(word_list))

In [None]:
opening_line.split('e')      #splitting on the character e

In [None]:
opening_line.split('the')    #splitting on the string 'the'

Not specifying a separator will split the string based on whitespace, including spaces, tabs, and newlines.

In [None]:
s = "This is a\ntest\tof splitting whitespace."
print(s)
s.split()

## Joining Strings
Use the method `join()` to combine a list of strings into a single string.  Rather than belonging to the `list` class, `join()` belongs to the `str` (string) class. From a design point of view, this makes sense for `join()` to belong to `str` as we use a string as the value between the combined strings.

In [None]:
split_list = ['This', 'is', 'a', 'test', 'of', 'joining', 'strings']
print("".join(split_list))
print(":".join(split_list))
print(", ".join(split_list))

## Search and Replace
Python contains multiple ways to find if one string contains another.

First, you can use the `in` operator, which returns `True` or `False`.

In [None]:
line = "can we search for the specified value in the string?"
"value" in line

We can also use `find()` or `index()` to get the position where the string starts within another string.  These two functions are the same, except `find()` returns -1 if the substring is not found, while `index()` raises an exception(`ValueError`) if the substring is not present. As Python can use negative numbers to index characters in a string, we would need to check the return value prior to using it explicitly.  By using `index()`, we ignore checking the return value and rely upon exception processing to handle the case where the substring does not exist. 

Both methods also allow you to limit the search based on the starting and, possibly, ending positions.

In [None]:
help(line.index)
print(line.index("the"))

To find the last occurence of a substring within a string, you can use `rfind()` and `rindex()`.

In [None]:
print(line.rfind("the"))

The string class also offers a simple mechanism to search for values in the string and replace them with a different string, returning a new string.

<code>string.replace('<i>searchValue</i>','<i>replaceValue</i>')</code>

In [None]:
s = "hello, how are you?"
s.replace('o',"xxx")

In [None]:
s.replace('how','test')

In [None]:
# demonstrating that s.replace() returns a new object reference/id
print(id(s))
t = s.replace(",", " -")
print(t)
print(id(t))

## Striping Characters
We can also remove any whitespace (' ','\t','\n') from the start and/or end of a string.  Quite often, programmers will perform this task to "clean up" any text a user may enter.

- `string.strip()` removes whitespace from the start and end
- `string.lstrip()` removes whitespace from the start
- `string.rstrip()` removes whitespace from the end

In [None]:
s = "  Hitchhiker's Guide to the Galaxy  "
s

In [None]:
s.strip()

In [None]:
s.lstrip()

In [None]:
s.rstrip()

Python doesn't limit the striping capabilities to whitespace. You can also specify a string argument that would strip any character in that string.

In [None]:
s.strip(' Hi')

Of course, you can use a single character as the argument.

In [None]:
s = '!!Are you crazy?!!'
s.strip('!')

## Changing Case
Python also has several string methods to change the case of a string.

In [None]:
line = "a tale of two cities"

Capitalize the first word:

In [None]:
line.capitalize()

Capitalize all of the words:

In [None]:
line.title()

Convert all characters to uppercase:

In [None]:
line.upper()

Convert all characters to lower case:

In [None]:
'A Tale Of Two Cities'.lower()

## Aligning Strings
Python also offers a way to align a string within a given number of characters (by default, a space).  You can specify a fill character in the optional second argument.

In [None]:
line = 'A tale of two cities'
line.center(50)   # center the string within 50 spaces

In [None]:
line.center(50,'-')

In [None]:
line.ljust(50)    # left justify the string within 50 spaces

In [None]:
line.rjust(50)    # right justify the string within 50 spaces

## String comparisons
Strings use the same comparison operators as numbers use. In addition, Python compares strings in a lexicographic order (dictionary order) where uppercase letters have smaller values than lowercase letters.

In [None]:
a = "alpha"
b = "alp" +"ha"
c = "test"

In [None]:
print("a <  b",a < b)
print("a <= b", a <= b)
print("a >= b", a >= b)
print("a >  b", a > b)
print("a == b", a == b)
print("a <  c", a < c)
print("a == c", a == c)
print("a >  c", a > c)

## Exercises

1. How would you get more information about the `ord()` built-in function?
2. How would you get more information about the `center()` method that belongs to the `str` class
3. Create a string literal for the value after the colon and assign it to the variable `test`: Let's go run now!
4. Create a multiline string literal for one of the verses of [Row, Row, Row Your Boat](https://en.wikipedia.org/wiki/Row,_Row,_Row_Your_Boat)
5. Create a variable called `excited` that has the following string repeated 10 times: "I'm so excited!\n"
6. How long is the string in the variable `excited`? Use a function to get the answer
7. Using code, what is the first letter of `excited`?  The last letter? 
8. Get a string (a substring) from `excited` with the last 9 characters.
9. Get a substring from `excited` starting at the first position(0), but skip every 17th character. What is unusual about this result?
10. Split the following string by dashes:<br>```a = "20220502-https://python.org-200-127.0.0.1"```
11. You have just arrived at a dystopian society that does not believe in vowels.  Write a function to remove any vowels from a string and return a result.  For some reason, they don't mind y's.
12. Capitalize each word in the following string:<br>```book_title = 'the hitchhiker\'s guide to the galaxy'```
13. Given the following string, extract the portion after the colon and convert it to an integer variable.
```
line = 'Conference Attendance:14000'
```
Use `find()` and string slicing to extract the string.<br>
14. Repeat 13, but use `split()`.

If you'd like more practice, look at some of the functions not covered in the exercises.  Try comparing different strings.