## **Strings, Lists, and Dictionaries - Collections in Python**

Python strings: 
* https://www.w3schools.com/python/python_strings.asp
* https://www.programiz.com/python-programming/string
* https://www.tutorialspoint.com/python/python_strings.htm
* https://realpython.com/python-strings/

### 1. A String is a Collection of Characters

We have already met and worked with strings, which are a <b>collection</b>, or array, of characters. Unlike other languages like C, C++, Java, etc., Python does not have a <code>character</code> data type. A single character (like <code>'c'</code>) is simply a string of length 1. <br>

TODO: provide links to references

Earlier, we learned that string literals can be written with either single or double quotes:

In [6]:
# code cell 1

print('This is a string.')
print("This is also a string.")
print("This is how to create a 'string within a string' in Python.")
print('You can also create a "string within a string" this way.')

This is a string.
This is also a string.
This is how to create a 'string within a string' in Python.
You can also create a "string within a string" this way.


However, if we want to write a literal single or double quote in a string, it must be escaped (preceded by a backslash character <code> \ </code>, with no space, as shown below):

In [5]:
# code cell 2

print("This is how to print a double quote \" inside a string.")
print("This is how to print a single quote \' inside a string.")

This is how to print a double quote " inside a string.
This is how to print a single quote ' inside a string.


We have also learned about operators that work with strings: <code>+</code> concatenates (joins) strings, while <code>*</code> creates multiple copies of a string (but ONLY IF the other operand is an integer):

In [24]:
# code cell 3

a = "this is "
b = "a string "
print("a = " + a)
print("b = " + b)
print()

c = a + b
d = 3 * a
e = b * 2

print("c = a + b:", c)
print("d = 3 * a:", d)
print("e = b * 2:", e)

a = this is 
b = a string 

c = a + b: this is a string 
d = 3 * a: this is this is this is 
e = b * 2: a string a string 


Here's something new we can do with a string: we can reference individual characters in it using brackets (<code>[]</code>), as in the code cell below. Note that Python uses <b>zero-based indexing</b>, meaning that the first item in a collection has the index 0. In other words, the position of a character in a string is one greater than its index.
<div class="alert alert-block alert-info">
Run the following cell several times with numbers between 0 and the length of the string <code>s</code> minus one. Then, see what happens if you enter a number that is less than zero or greater than or equal to the length of the string <code>s</code>. <br>

Something interesting happens if you enter <code>-1</code>. Try changing the string <code>s</code> (make it longer, for instance), and see if you can figure out what's going on!
</div>

In [3]:
# code cell 4

s = "abcde"
n = len(s)
i = int(input("Enter a number between 0 and " + str(n-1) + ": "))

outStr = "Character " + s[i]
outStr += " is at index " + str(i)
if i >= 0:
    outStr += " (position " + str(i+1) + ")"
outStr += " in the string "
outStr += "\'" + s + "\'"

print(outStr)

Enter a number between 0 and 4: 3
Character d is at index 3 (position 4) in the string 'abcde'


We can also loop through each character in a string using a <code>for</code>-loop and the <code>in</code> operator (yes, <code>in</code> is an operator!), as shown below.

In [2]:
# code cell 5

s = "abcde"
i = 0
for x in s:
    outStr = "The character at index " + str(i) 
    outStr += " in string \'"
    outStr += s + "\'"
    outStr += " is " + x
    print(outStr)
    i += 1

The character at index 0 in string 'abcde' is a
The character at index 1 in string 'abcde' is b
The character at index 2 in string 'abcde' is c
The character at index 3 in string 'abcde' is d
The character at index 4 in string 'abcde' is e


#### Strings are immutable!

However, just because characters in a string can be referenced does not mean that they can be changed. In Python, strings are <b>immutable</b>; that is, characters in a string cannot be modified, appended to, or deleted from *in place*. 

In [4]:
# code cell 6

s = "abcde"
s[2] = "q"  # this will cause an error: 'str' object does not support item assignment
print(s)

TypeError: 'str' object does not support item assignment

But wait! We can append to a string using the augmented assignment operator. Doesn't that make strings mutable (changeable) after all? <br>

As it turns out, the answer is no. Like any other object in Python (everything in Python is an object in memory), a string resides at an address or location in memory. We can query the address of an object (variable, string, number, Boolean, etc.) by using the built-in <code>id()</code> function. If we truly changed an object like a string *in place*, then it should have the same memory address (location) before and after the change. In the case of a string <code>s</code> to which we append another string, this is NOT the case, as seen in the following code cell. Thus, instead of changing (mutating) the string <code>s</code> in place by appending to it, we have instead created a new object at a different memory address (location), but with the same name <code>s</code> as before.

In [1]:
# code cell 7

s = "abcde"
outStr = "String s = \'" + s + "\' has address id(s) = " + str(id(s))
print(outStr)

s += "fgh"
outStr = "String s += \'fgh\', which becomes s = \'" + s + "\', now has address id(s) = " + str(id(s))
print(outStr)

String s = 'abcde' has address id(s) = 2583339180656
String s += 'fgh', which becomes s = 'abcdefgh', now has address id(s) = 2583339180336


### 2. String Functions and Methods

* https://towardsdatascience.com/15-must-know-python-string-methods-64a4f554941b
* https://www.w3schools.com/python/python_ref_string.asp

Formatting output in strings:
* https://www.geeksforgeeks.org/string-formatting-in-python/
* https://realpython.com/python-formatted-output/

Here, we will make a distinction between built-in functions that operate on strings and string methods (functionality that belongs to string objects). Functions that operate on strings take a string as an input, and do something (print it, return information about the string, etc.), while string methods are functions (services) that string objects provide.

Commonly used built-in <b>string functions</b> (functions that operate on strings):

| function | description |
| --: | :-- |
| <code>len(s)</code> |  returns the length of string <code>s</code> can be determined by using built-in <code>len()</code> function. |
| <code>float(s)</code> | returns a floating point number represented by the string <code>s</code> |
| <code>int(s)</code> | returns a base-10 integer represented by the string <code>s</code> |
| <code>bool(s)</code> | returns a Boolean value (either <code>True</code> or <code>False</code>) represented by the string <code>s</code> |
| <code>print(s)</code> | prints the string <code>s</code> to the console |
| <code>type(s)</code> | This function returns the type of an object (in this case, the string <code>s</code>) |
| <code>id(s)</code> | returns the “identity” (memory address) of the string <code>s</code> |

The memory address of an object is an integer, which is guaranteed to be unique and constant for this object during its lifetime.

We have used almost all of these functions earlier:

In [24]:
# code cell 8

s = "the quick brown fox jumped over the lazy dog"

print('the string s = \"' + s + '\"')
print('the length of s is' , len(s))
print('the type of s is', type(s))
print('the memory location of s is', id(s))
print()


t = "1.2E-01"

print('the string t = \"' + t + '\"')
print('the length of t is', len(t))
print('the numerical value represented by t is', float(t))
print('the type of t is', type(t))
print('the type of the numerical value represented by t is', type(float(t)))
print()


u = "-451"

print('the string u = \"' + u + '\"')
print('the length of u is', len(u))
print('the numerical value represented by u is', int(u))
print('the type of u is', type(u))
print('the type of the numerical value represented by u is', type(int(u)))
print()


v = "False"

print('the string v = \"' + v + '\"')
print('the length of v is', len(v))
print('the Boolean value represented by v is', bool(v), ' <-- why is this?')
print('the type of v is', type(v))
print('the type of the numerical value represented by v is', type(bool(v)))
print()

the string s = "the quick brown fox jumped over the lazy dog"
the length of s is 44
the type of s is <class 'str'>
the memory location of s is 2583338307472

the string t = "1.2E-01"
the length of t is 7
the numerical value represented by t is 0.12
the type of t is <class 'str'>
the type of the numerical value represented by t is <class 'float'>

the string u = "-451"
the length of u is 4
the numerical value represented by u is -451
the type of u is <class 'str'>
the type of the numerical value represented by u is <class 'int'>

the string v = "False"
the length of v is 5
the Boolean value represented by v is True  <-- why is this?
the type of v is <class 'str'>
the type of the numerical value represented by v is <class 'bool'>



There is a larger set of functions (more properly, _methods_) that string objects themselves provide. 

Commonly used <b>string methods</b> (functions provided by string objects):

| function | description |
| --: | :-- |
| <code>s.count()</code> | returns the number of times a specified substring occurs in a string <code>s</code> |
| <code>s.index()</code> | searches the string <code>s</code> for a specified value and returns the position of where it was found |
| <code>s.lower()</code> | converts string <code>s</code> into lower case |
| <code>s.upper()</code> | converts string <code>s</code> into upper case |
| <code>s.isalnum()</code> | returns ```True``` if all characters in the string <code>s</code> are alphanumeric | 
| <code>s.isalpha()</code> | returns ```True``` if all characters in the string <code>s</code> are in the alphabet |
| <code>s.isascii()</code> | returns ```True``` if all characters in the string <code>s</code> are ASCII characters |
| <code>s.isdecimal()</code> | returns ```True``` if all characters in the string <code>s</code> are decimals |
| <code>s.isdigit()</code> | returns ```True``` if all characters in the string <code>s</code> are digits |
| <code>s.isnumeric()</code> | returns ```True``` if all characters in the string <code>s</code> are numeric |
| <code>s.find()</code> | searches the string <code>s</code> for a specified substring and returns the position of where it was found |
| <code>s.replace()</code> | returns a string where a specified value is replaced with a specified value |
| <code>s.format()</code> | formats specified values in a string <code>s</code> |
| <code>s.lstrip()</code> | returns a left trim version of the string <code>s</code> |
| <code>s.rstrip()</code> | returns a right trim version of the string <code>s</code> |
| <code>s.strip()</code> | returns a trimmed version of the string <code>s</code> |
| <code>s.join()</code> | converts the elements of an iterable into a string <code>s</code> |
| <code>s.split()</code> | splits the string <code>s</code> at the specified separator, and returns a list |

Note the difference in syntax between using string <b>functions</b> and string <b>methods</b>: <br>

* string <b>functions</b> operate <b>on</b> a string <code>s</code>: <code>len(s)</code>, <code>type(s)</code>, <code>print(s)</code>, etc. <br>
* string <b>methods</b> are called <b>by</b> a string <code>s</code> using the <b>dot operator</b>: <code>s.count()</code>, <code>s.index()</code>, <code>s.isnumeric()</code>, etc.


What are __[ASCII characters](https://www.w3schools.com/charsets/ref_html_ascii.asp#:~:text=The%20ASCII%20Character%20Set&text=ASCII%20is%20a%207%2Dbit,are%20all%20based%20on%20ASCII")__?

In [65]:
# code cell 9

#    01234567890123456789012345678901234567890123456789
s = "    The quick brown fox jumped over THE lazy dog  "

print('the string s = \"' + s + '\"')
print('the length of s is', len(s))
print('the letter \"o\" occurs', s.count('o'), 'times in the string s')
print('a pair of space characters \"  \" occurs', s.count('  '), 'times in the string s')
print()

print('the substring \"THE\" occurs at index', s.find('THE'), 'in string s, using find()')
print('the substring \"the\" occurs at index', s.find('the'), 'in string s (-1 means \'not found\')')
print('the substring \"The\" occurs at index', s.index('The'), 'in string s, using index()')
# the following line will cause an error because 'the' is not found in string s
# print('the substring \"the\" occurs at position', s.index('the'), 'in string s')
print('replace substring \"THE\" with substring \"a\" in string s: ' + s.replace('THE', 'a'))
print()

print('convert string s to uppercase:', s.upper())
print('convert string s to lowercase:', s.lower())
print()

print('trim spaces from left side of string s: \"' + s.lstrip() + '\"')
print('trim spaces from right side of string s: \"' + s.rstrip() + '\"')
print('trim spaces from both sides of string s: \"' + s.strip() + '\"')
print()

print('all characters in string s are alphanumeric:', s.isalnum())
print('all characters in string s are alphabetic:', s.isalpha())
print('all characters in string s are ASCII:', s.isascii())
print()

t = "12345"

print('the string t = \"' + t + '\"')
print('the length of t is', len(t))
print('all characters in string t are alphanumeric:', t.isalnum())
print('all characters in string t are decimal:', t.isdecimal())
print('all characters in string t are digits:', t.isdigit())
print('all characters in string t are numeric:', t.isnumeric())
print()

the string s = "    The quick brown fox jumped over THE lazy dog  "
the length of s is 50
the letter "o" occurs 4 times in the string s
a pair of space characters "  " occurs 3 times in the string s

the substring "THE" occurs at index 36 in string s, using find()
the substring "the" occurs at index -1 in string s (-1 means 'not found')
the substring "The" occurs at index 4 in string s, using index()
replace substring "THE" with substring "a" in string s:     The quick brown fox jumped over a lazy dog  

convert string s to uppercase:     THE QUICK BROWN FOX JUMPED OVER THE LAZY DOG  
convert string s to lowercase:     the quick brown fox jumped over the lazy dog  

trim spaces from left side of string s: "The quick brown fox jumped over THE lazy dog  "
trim spaces from right side of string s: "    The quick brown fox jumped over THE lazy dog"
trim spaces from both sides of string s: "The quick brown fox jumped over THE lazy dog"

all characters in string s are alphanumeric: False
all 

<div class="alert alert-block alert-info">
    
Now, it's your turn! <br>
    
1. Assign the string <code>"Supercalifragilisticexpialidocious"</code> to a variable, and print it out <br>
2. Use string function <code>len()</code> to find the length of the string, and print out that number <br>
3. Use string method <code>count()</code> to find the number of times the letter <code>i</code> appears in the string, and print out that number <br>
4. Use string method <code>index()</code> to find the index of the first occurrence of the letter <code>i</code> in the string, and print out that index <br>
5. Use Google to look up how to use string method <code>index()</code> to find the index of the next occurrence of the letter <code>i</code> in the string, and print out that index <br>
6. Create a loop (use either <code>for</code> or <code>while</code>) to find the indices of <b>all</b> occurrences of the letter <code>i</code> in the string, and print out those indices <br>
7. Use Google to look up a string method that will determine whether all letters in a string are lowercase, apply it to this string, and print out what that function returns <br>
8. From the string methods above, find one that will convert all letters in a string to lowercase, apply it to this string, assign what it returns to a variable, and print out that variable <br>
9. Repeat 6, but with the variable you used in 7. <br>

</div>

### 2. A List is a Collection of Objects
Coding languages like C, C++, C#, and Java all have an entity that holds an <b>array</b>, or collection, of objects of the same data type. In Python, this entity is called a <b>list</b>, but it may hold a collection of objects of any type at all. (That is, it can holds objects of different data types.) At this time, we will only consider lists containing objects of the same data type.<br>

The Python list is enclosed by brackets, and elements are delimited (set off from each other) with commas. An empty list has nothing between the opening and closing brackets.<br>
<pre>
list0 = []
list1 = [-1, 2, -3, 4, 5]
list2 = [0.707, -2.0E+06, 3.14]
list3 = ["apple", "banana", "cherry"]
list4 = [True, False, False, True]
</pre>

A list can be initialized as a specified number of values (and can have duplicate elements, too):
<pre>
list5 = [0]*5   # this assigns [0, 0, 0, 0, 0] to list5
</pre>

A list can be printed:
<pre>
list1 = [-1, 2, -3, 4, 5]
print(list1)    # this prints [-1, 2, -3, 4, 5]
</pre>

The number of elements of a list can be determined using <code>len()</code>:
<pre>
list1 = [-1, 2, -3, 4, 5]
N = len(list1)  # this assigns 5 to N
</pre>

The elements of a list can be referred to using an <b>index</b> - the first element of <code>list1</code> is <code>list1</code>, while the second element is <code>list1[1]</code>, and the last element can be referred to as either <code>list1[4]</code> or <code>list1[-1]</code>. <b>NOTE</b>: for a list of <code>N</code> elements, the first element is always <code>list1[0]</code>, while the last element is either <code>list1[N-1]</code> or <code>list1[-1]</code>.
<pre>
list1 = [-1, 2, -3, 4, 5]
first_elem = list1[0]    # this assigns -1 to first_elem
second_elem = list1[1]   # this assigns 2 to second_elem
last_elem = list1[4]     # this assigns 5 to last_elem
last_elem = list1[-1]    # this also assigns 5 to last_elem
</pre>

Elements of a list can be changed (that is, unlike strings, lists are <b>mutable</b>):
<pre>
list1 = [-1, 2, -3, 4, 5]
list1[2] = -42    # list1 is now [-1, 2, -42, 4, 5]
</pre>

Elements can be inserted or deleted from lists:
<pre>
list1 = [-1, 2, -3, 4, 5]
list1.insert(1, 0)    # list1 is now [-1, 0, 2, -3, 4, 5]
list1.pop(3)       # list1 is now [-1, 0, 2, 4, 5]
</pre>

The order of elements in a list can be reversed:
<pre>
list1 = [-1, 2, -3, 4, 5]
list1.reverse()    # list1 is now [5, 4, -3, 2, -1]
</pre>

The order of elements in a list can be sorted in ascending or descending order:
<pre>
list1 = [-1, 2, -3, 4, 5]
list1.sort()    # list1 is now [-3, -1, 2, 4, 5]
list1.sort(reverse=True)    # list1 is now [5, 4, 2, -1, -3]
</pre>

Elements can be cleared from a list:
<pre>
list1 = [-1, 2, -3, 4, 5]
list1.clear()    # list1 is now []
</pre>

Elements can be tacked on to the end of a list:
<pre>
list1 = [-1, 2, -3, 4, 5]
list1.append(-6)    # list1 is now [-1, 2, -3, 4, 5, -6]
</pre>

Two lists can be concatenated (like strings):
<pre>
list1 = [-1, 2, -3, 4, 5]
list2 = [-6, 7, -8]
list1.extend(list2)    # list1 is now [-1, 2, -3, 4, 5, -6, 7, -8]
</pre>

<div class="alert alert-block alert-info">
In the cell below, write code that does the following: <br>
    
1. Assign an empty list to <code>list1</code> <br>
2. Append the strings <code>"Europa"</code>, <code>"Ganymede"</code>, <code>"Io"</code>, and <code>"Callisto"</code> to the list <b>one at a time</b>, and print out the list after each addition <br>
3. Reverse the order of <code>list1</code>, and print it out <br>
4. Sort <code>list1</code>, and print it out <br>
5. Print out <code>list1[0]</code>, <code>list1[1]</code>, and <code>list1[-1]</code> <br>
6. Extend <code>list1</code> with <code>list2 = ["Himalia", "Amalthea", "Thebe"]</code> <br>
7. Print out the length of <code>list1</code> <br>
8. Remove the last element of <code>list1</code>, and print out <code>list1</code> <br>
9. Insert the element you removed at the beginning of <code>list1</code> <br>

Each time you run the cell, make sure the output is what you expect it to be.
</div>

lists: list functions

lists are mutable

elements are referred to by zero-based index (position)

elements can be inserted, deleted, appended, cleared, determine whether an element is in it, count how many times it occurs, searched, etc.

exercises: roll numDice dice game from prev notebook

use list to keep track of rolls

instead of using boolean, use list to store rolls and see how many 1's are in list

card game

introduce dictionaries: instead of referring to elements by their index (position), refer to values by the unique keys associated with each

dictionaries are mutable

dictionary functions

deep vs. shallow copy

exercises: card game II

teachers and students?

