#CSE 101: Computer Science Principles
####Stony Brook University
####Kevin McDonnell (ktm@cs.stonybrook.edu)
##Module 7: Python Strings

A value's **data type** determines the kinds of operations that can be performed with the value. For example, with numerical values we can perform various arithmetical operations.

Strings support a different set of operations, which this module covers.

### Concatenation and Repetition

The `+` operator for strings joins two strings together in an operation called **concatenation** (verb form: *concatenate*).

In [0]:
first_name = 'Mickey'
last_name = 'Mouse'
full_name = first_name + ' ' + last_name
full_name

'Mickey Mouse'

The `*` operator for strings concatenates multiple copies of a string together:

In [0]:
chant = ('Go ' * 3) + 'Seawolves!'  # parentheses not required, but include for clarity
chant

'Go Go Go Seawolves!'

### Strings as Collections

Unlike an integer, a string is a collection of individual entities (characters, in this case). We can use a special notation (syntax) to access the characters of the string.



In [0]:
from IPython.display import display, HTML
display(HTML('''<img src="https://www.cs.stonybrook.edu/~ktm/courses/cse101/colab_images/indexvalues.png">'''))

Each character in the string can be accessed ("indexed") using 0, 1, ... from left-to-right, or -1, -2, ... from right-to-left.

The string `'Luther College'` has 14 characters. The **length** of a string is the number of characters in the string. Note that the valid indices range from `0` to `13` (one less than 14) and `-1` through `-14`. Take careful note of the relationship between the ranges of valid indices and the string's length.

We can read characters from a string using square brackets.

In [0]:
school = 'Stony Brook University'
print(f'school[0] = {school[0]}')
print(f'school[1] = {school[1]}')
print(f'school[-1] = {school[-1]}')

school[0] = S
school[1] = t
school[-1] = y


Attempting to access a string using an invalid index causes a runtime error:

In [0]:
#print(f'school[30] = {school[30]}')

The length of a string can be determined by calling the `len` function.

In [0]:
len(school)

22

In [0]:
print(f'Last character of school = {school[len(school)-1]}')

Last character of school = y


The **empty string**, the string of no characters, is denoed `''`. It has zero length:

In [0]:
len('')

0

Strings are **immutable** objects, meaning we can't change the contents of a string.

In [0]:
#school[0] = 's'

Python's **slicing** notation lets us read several contiguous characters from a string ("substring").

In [0]:
first_word = school[0:5]
first_word

'Stony'

Note that the character at the second index is not included in the substring. Another example:

In [0]:
letters = school[2:4]
letters

'on'

Slicing works with negative indices too:

In [0]:
last_word = school[-10:-1]
last_word

'Universit'

Oops! The second index is non-inclusive. Try again:

In [0]:
last_word = school[-10:0]
last_word

''

Oh no! Index `0` is to the left of index `-10`, so that's why it didn't work.

In [0]:
last_word = school[-10:len(school)]
last_word

'University'

Seriously??

Extract all but the first and last characters:

In [0]:
letters = school[1:-1]
letters

'tony Brook Universit'

If we want to select all characters to the left or right of a particular index, we can omit the first or second index, respectively:

In [0]:
first_ten = school[:10]
last_ten = school[-10:]
first_ten, last_ten

('Stony Broo', 'University')

If a third value is supplied, this number is treated as the **step** size. For instance, a step size of 2 means "extract every 2nd character".

In [0]:
even_index_chars = school[0:len(school):2]
odd_index_chars = school[1:len(school):2]
even_index_chars, odd_index_chars

('SoyBokUiest', 'tn ro nvriy')

Of course, if we start at index `0`, we can omit it from the notation:

In [0]:
even_index_chars = school[:len(school):2]
even_index_chars

'SoyBokUiest'

In fact, if we want to extract characters to the end, we can omit the second index too:

In [0]:
even_index_chars = school[::2]
odd_index_chars = school[1::2]
even_index_chars, odd_index_chars

('SoyBokUiest', 'tn ro nvriy')

A negative step size causes the characters to be accessed in reverse order, from right to left:

In [0]:
reversed = school[::-1]
reversed

'ytisrevinU koorB ynotS'

Using a negative step size other than `-1` lets us skip over characters:

In [0]:
reversed_skipped = school[::-3]
reversed_skipped

'ysvUoBnS'

Slicing can be combined with concatentation and repetition too, naturally:

In [0]:
chant = (school[:5] + '! ') * 3 + 'Go Seawolves!'
chant

'Stony! Stony! Stony! Go Seawolves!'

Your imagination is your only limitation in exploring the possibilities!

### String Methods

Python provides many other operations for working with strings in the form of **methods**. A method is a function that is called by providing the name of a string variable, the dot operator, followed by the name of the function: `variable_name.method_name()`.

Python has many [methods](https://docs.python.org/3/library/stdtypes.html#string-methods) for manipulating strings.

[`upper`](https://docs.python.org/3/library/stdtypes.html#str.lower) and [`lower`](https://docs.python.org/3/library/stdtypes.html#str.lower) make a copy of the original string in uppercase and lowercase, respectively. The original string remains unchanged. Non-letter characters are unaffected.

In [0]:
name = 'Ada Lovelace'  # Perhaps the first programmer?
uppercase_name = name.upper()
lowercase_name = name.lower()
name, uppercase_name, lowercase_name

('Ada Lovelace', 'ADA LOVELACE', 'ada lovelace')

The [`find`](https://docs.python.org/3/library/stdtypes.html#str.find) method returns the lowest index that matches a target substring. It returns `-1` if the target substring is not found.

In [0]:
name.find('Love')

4

In [0]:
name.find('love')

-1

To test if a substring is present in a string, we can use the `in` operator:

In [0]:
found = 'Love' in name
found

True

The [`replace`](https://docs.python.org/3/library/stdtypes.html#str.replace) method replaces all occurrences of one substring with another. 

In [0]:
rule = 'The first rule of Fight Club is: You do not talk about Fight Club.'
rule = rule.replace('Fight', 'Cuddle')  # replacing the original value of "rule"
rule

'The first rule of Cuddle Club is: You do not talk about Cuddle Club.'

In [0]:
rule = rule.replace('Cuddle', 'Book').replace('not ', '').replace('about', 'at')
rule

'The first rule of Book Club is: You do talk at Book Club.'

The `count` method counts the number of non-overlapping occurrences of a substring in a string.

In [0]:
nucleotides = 'AATCCGCTAGATTTACAT'
at_count = nucleotides.count('AT')
at_count

3

The [`strip`](https://docs.python.org/3/library/stdtypes.html#str.strip) method removes **whitespace** or any characters we want from either end of a string.

In [0]:
name = '    Stony Brook University  '
name.strip()  # by default, strip() removes whitespace

'Stony Brook University'

In [0]:
heading = '####*** Part 1 ***###'
heading.strip(' *#')

'Part 1'

### Comparing Strings

As we have seen, strings can be compared using the `==` and `!=` operators. These comparisons are usually performed in if-statements:

In [0]:
standing = 'U3'
status = 'unknown'
if standing[0] == 'U':
    status = 'undergraduate'
elif standing[0] == 'G':
    status = 'graduate'
status

'undergraduate'

The operators `<`, `<=`, `>` and `>=` are also defined for strings and compare strings using [lexicographical ordering](https://en.wikipedia.org/wiki/Lexicographical_order), which is a generalization of alphabetical ordering.

Integer codes are used to represent characters in memory. For instance, `'A'` through `'Z'` have codes `65` through `90`; `'a'` through `'z'` have codes `97` through `122`. This leads to some curious results because the codes for the lowercase letters are higher than those for the uppercase letters.

In [0]:
name1 = 'Dave'
name2 = 'Susan'
name1 < name2, name1 > name2

(True, False)

In [0]:
name1 = 'dave'
name2 = 'Susan'
name1 < name2, name1 > name2

(False, True)

If this is not acceptable, convert the strings to lowercase first so that we get alphabetical ordering:

In [0]:
name1 = 'dave'
name2 = 'Susan'
name3 = name1.lower()
name4 = name2.lower()
name3 < name4, name3 > name4

(True, False)

### Example: Reformatting a Date String

Very often we have data in one format that we need to convert to another. Suppose we have a date in MM/DD/YYYY format, and we want to conver it to YYYY-MM-DD format. String slicing and concatenation will make this a breeze.

In [0]:
date = '07/04/2020'
month = date[:2]
day = date[3:5]
year = date[-4:]
new_date = year + '-' + month + '-' + day
new_date

'2020-07-04'

Here's another approach, using an f-string:

In [0]:
date = '07/04/2020'
month = date[:2]
day = date[3:5]
year = date[-4:]
new_date = f'{year}-{month}-{day}'
new_date

'2020-07-04'

There are yet other, shorter ways, but we haven't covered the relevant material yet.

### Example: Reformatting a Date String Revisited

What if the input date does not have two digits for the month and/or day, as in 7/4/2020? The code above won't work properly. Although there are easier ways of dealing with this, let's address this situation using slicing again.

The `find` function can take a second, *optional* argument that indicates at what index to start the search. We will use this feature to find the second slash in the input string.

In [0]:
date = '7/4/2020'
slash1_index = date.find('/')
slash2_index = date.find('/', slash1_index+1)
month = date[:slash1_index]
day = date[slash1_index+1:slash2_index]
year = date[slash2_index+1:]
new_date = f'{year}-{month}-{day}'
new_date

'2020-7-4'

Well, that almost worked. We really should insert a `'0'` in front of single-digit months or days.

In [0]:
date = '7/4/2020'
slash1_index = date.find('/')
slash2_index = date.find('/', slash1_index+1)
month = date[:slash1_index]
day = date[slash1_index+1:slash2_index]
year = date[slash2_index+1:]
if len(month) == 1:
    month = '0' + month
if len(day) == 1:
    day = '0' + day
new_date = f'{year}-{month}-{day}'
new_date

'2020-07-04'

We will explore a string method called `split` in a future module which will make solving this problem a bit easier. We will revisit this problem at that point.