# Data Structures in Python
----

We have have already seen *built-in* numeric data types in Python: `int` and `float`. We have also seen `str` for strings which is technically called a *text sequence type*. First, we will revisit the `str` data type. Then, we will explore a few more built-in data types that will prove very useful: `list`, `tuple`, `range`, and `dict`. 

-----

## Sequence Types

### Strings - the `str` data type

The `str` data type is a special sequence type - a *text* sequence type. Recall that we create strings by providing a sequence of zero or more characters enclosed in either a pair of single quote characters, `'`, or a pair of double quote characters, `"`.

In [None]:
# Create a couple of strings, print them and their types out
string1 = "My first string!"
print("string1 =", string1)
print("   type =", type(string1))

string2 = "3000"
print("\nstring2 =", string2)
print("   type =", type(string2))

-----
We will often need to manipulate string data. Luckily, there are several useful methods for strings. We look at several next.

-----

### Concatenate Strings

In [None]:
# Concatenate two strings together
# The + operator is "overloaded", so it works on strings in addition to numbers
string1 + string2

### Repeating Strings

In [None]:
# Repeating strings
# The * operator is also overloaded, allowing it work on strings
string2 * 4

### Extracting Pieces of Strings

To extract characters from a string, you pass an *index* number inside of square brackets `[]`. Indexing starts at 0. So, to get the first character from the string `string1` you would issue the command `string1[0]`. 

In [None]:
# We may want to extract parts of a string
# Get the first 2 characters of string1
string1[0:2]

In [None]:
# Get the last character of string1
string1[-1]

In [None]:
# We can also check to see if a substring exists (or not) in the string
# We use the `in` operator
# Check to see if the word `first` is in string1
"first" in string1

In [None]:
# Check to see if `hello` is in string2
"hello" in string2

In [None]:
# Is hello *not in* string2?
"hello" not in string2

-----

### Other Useful String Functions and Methods

Let's explore a few of the most commonly used string manipulations that we will use.

### Length of Strings

In [None]:
# How long is a string?
# Use the len() function
print("length of string1 =", len(string1))
print("length of string2 =", len(string2))

### Changing the Case of Strings

In [None]:
# What if we want to convert everything to lower case or upper case
# Convert string1 to lower
print(string1.lower())

# Convert string1 to upper
print(string1.upper())

## NOTE: Neither one of these modify the original string1. Instead they return a new string.

-----

### Cleaning Strings

A common need when cleaning text data is to remove white space from the **beginning and end** of strings. We can use the `strip` method to accomplish this. White space includes space, tab, and newline characters.

In [None]:
# Create a string with spaces at beginning and end
badString = "     Spaces at the beginning and end.     "
print("before:|", badString, "|", sep="")
print("after :|", badString.strip(), "|", sep="")

In [None]:
# Create a string with spaces at beginning and end
# Add newline and tab characters and spaces in middle
# Will strip work?
badString2 = """     Spaces at the beginning and end.     Now let's try newline \n and
    even a tab \t or \t two to see what happens.\n"""
print("before:|", badString2, "|", sep="")
print("after :|", badString2.strip(), "|", sep="")

Well, that did not work. Hence, `strip()` only works for beginning and ending whitespace, even the newline character as we just saw. Luckily for us, there is another useful function that will help out. We will use `split()` to remove spaces, tabs, and newlines in a string. Splitting the string puts the words into a `list` (more later on `list`s). We can reconstruct the string with just spaces by using the `join()` method. Let's try it.


In [None]:
# Split it and look at the list
badStringList = badString2.split()
print(badStringList)

# Now, reconstruct with join
goodString = " ".join(badStringList)
print(goodString)

----

Similar to `strip`, there are also `lstrip` and `rstrip` that strips off whitespace on at the beginning (the left) and the end (right), respectively. 

We may also want to find substrings within a string. There are various methods for this taks similar to the `in` operator we saw earlier. We will look at `startswith`, `endswith`, `find`, and `replace`. The methods `startswith` and `endswith` do exactly what they say: they will return a boolean indicating if the string starts or ends with the specified substring. The `find` method searches for a substring within a string. If it is found, the *index* of the first occurrence is returned. If the substring is not found it returns -1.  You can use `replace` to replace one substring with another within a string. By default it replaces *all* occurrences of the substring. An optional *count* argument allows you to specify the number of replacements.

In [None]:
# Create a string
string3 = "I think text analysis is fun because text is where the hidden messages are."

# Does string3 start with "I"?
print(string3.startswith("I"))

# What about case? Test with "i"
print(string3.startswith("i"))

# Does string3 end with "text"?
print(string3.endswith("text"))

In [None]:
# Where does the first occurrence of "text" occur in string3?
string3.find("text")

In [None]:
# Replace the word "text" with "TEXT" for all occurrences
print(string3.replace("text", "TEXT"))

# Replace "text" with "TEXT" once - just the first occurrence
print(string3.replace("text", "TEXT", 1))

----

You may also want to count the number of occurrence of a substring within in a string. You can use the `count` method.

*Thought Excercise:* Are there any other ways to accomplish the same task?

In [None]:
# How many times does "text" show up in string3?
# We are going to convert everything to lower case and then count
string3.lower().count("text")

----

### Raw and Formatted String Literals

As we saw in `badString2` above, we can include newline and tab characters within a string by using `\n` and `\t`, respectively. If you want to have the characters `\n` show up in the string instead of being replaced with a newline, then you have two options:

1. Escape it by adding an additional backslash: `\\n`.
2. Make the string a *raw string literal* by preceding the opening quotation mark with the letter `r`. 

In [None]:
# Create a string with a new line character in it
s1 = "String over \n two lines?"
print(s1)

# Create a string with escaped new line character
s2 = "String over \\n two lines?"
print(s2)

# Create a raw literal string
s3 = r"String over \n two lines?"
print(s3)

#### Formatted Literal Strings

To help with formatting when printing out strings, Python provides the concept of *formatted string literals*, also called f-strings. You can include the value of Python expressions inside a string by prefixing the string with an `f` or `F` and writing expressions as `{expression}`. 

In [None]:
# Create an f-string
myVar = 3*3
myFString = f"The value of myVar is {myVar}"
print(myFString)

myFString2 = f"The value of myVar with some formatting is {myVar:0.2f}"
print(myFString2)

-----

<font color='red' size = '5'> Student Exercise </font>

In the **Code** cell below is a long string variable named `bfb2020AnnualReport`. It contains language from Brown-Forman's [website](https://www.brown-forman.com/investors/annual-report/) under 'Investors'. 

Complete the following tasks in the empty **Code** cell below the cell that contains `bfb2020AnnualReport`. Be sure to run that cell of code before trying your own code.

1. Convert the string to lower case and print it out.
2. Convert the string to upper case and print it out.
3. Count the number of times the substring "150" occurs.
4. Count the nubmer of times the substring "we" occurs.
5. How many words are in the string?

-----

In [None]:
bfb2020AnnualReport = """As we mark our 150th year, we take time to pause and consider the many people,
places,and products of Brown-Forman, and our continuing ability to deliver on our corporate ambition so
aptly described as "Nothing Better In The Market."

This ambition has allowed us to successfully navigate through many industry, economic, and geopolitical
challenges, and changes over the span of 15 decades, from world wars and U.S. Prohibition to recessions
and global crises. Through it all, the promise first made by our founder, George Garvin Brown, and 
inscribed on every bottle of Old Forester since 1870, has guided our growth and performance.

When facing the unpredictable, our culture of collaboration and inclusion ensures that we endure. This 
holds true in the creative, resilient, and exemplary response of our employees to the COVID-19 pandemic. 
There is "Nothing Better in the Market" than the character of our people.

As our portfolio and geographies become more diverse, so do our employees, consumers, and communities. 
Our long tradition of being responsible in everything we do keeps us steadfast in our efforts to promote 
alcohol responsibility and advance environmental sustainability. We believe our continued, thoughtful, 
long-term perspective, coupled with our diverse, inclusive, and caring culture, serve us today and will 
serve the generations that follow. There is “Nothing Better in the Market” than our values and our culture.

Whether it’s a flavorful whiskey on the rocks, a favorite cocktail crafted with one of our many fine spirits,
or an expressive glass of wine, our products are present during some of life’s important moments. They are 
part of bringing people together in times of celebration, as well as quiet moments of reflection. There is 
"Nothing Better in the Market" than our brands.

When we look back at all that we have accomplished in the last 150 years and where we stand today, we take 
pride in both our performance and our potential. Our ambition for Brown-Forman is a journey we are always on,
constantly striving and continually discovering new ways to make ourselves, our brands, and our company even
better. There is "Nothing Better in the Market" than Brown-Forman— yesterday, today, tomorrow, and for 
generations to come."""

-----

## Other Sequence Types

In addition to the text sequence type of `str`, there are three others that are built-in sequence types: `list`, `tuple`, and `range`.

### The `list` Sequence Type

We have already seen a `list` above, when we used `str.split()` to break the string up into substrings using whitespace as a delimiter. So, what exactly is a `list`? A `list` is an ordered, *mutable* collection of objects. *Mutable* means you can make changes to it: adding, deleting, or changing the objects in the collection. In Python, a `list` can contain different data types.

#### Creating Lists

You create a list by enclosing data inside square brackets, `[]`, and separating each item with a comma. Let's create a few different lists.

In [None]:
# Create a list that only contains integers
intList = [2, 4, 6]
print(intList)
print(type(intList))

In [None]:
# Create a list that contains integers and floats
numList = [2, 4.4, 6, 8.8]
print(numList)
print(type(numList))

In [None]:
# You can put string in the list too
strAndNumList = ["one", 2, 3.0]
print(strAndNumList)
print(type(strAndNumList))

In [None]:
# You can even create a list of lists
twoDList = [[1,1], [2,2], [3,3]]
print(twoDList)
print(type(twoDList))

#### Retrieving Elements of Lists

We've already seen how to retrieve characters out of a string. The process for a `list` is the same: we access an element of the list by typing the name of the list followed by the *index* of the element you want inside square brackets. **Indexing starts at 0.** For example, to retrieve the first element of the list `strAndNumList`, which is "one", we would use `strAndNumList[0]`. Let's try it.

In [None]:
# Get the first element of strAndNumList
print(strAndNumList[0])
print(type(strAndNumList[0]))

In [None]:
# To get the last element you can use the index -1
# This implies that we can count from either the beginning (0) or the end (-1)
strAndNumList[-1]

#### Slicing Lists

If we want more than a single element of a list, that is also quite easily done. The syntax is `listName[start:end:step]`, where `start` is the index of the first element we want to retrieve (inclusive lower bound), `end` is one more than the index of the last element we want to retrieve (exclusive upper bound), and `step` is the gap between indicies (default gap is 1).

In [None]:
# Create a new list
letters = ["a", "b", "c", "d", "e", "f", "g", "h"]
print(f"letters[0:2]   : {letters[0:2]}")
print(f"letters[2:2]   : {letters[2:2]}")
print(f"letters[:2]    : {letters[:2]}")
print(f"letters[4:]    : {letters[4:]}")
print(f"letters[0:8:2] : {letters[0:8:2]}")
print(f"letters[8:0:-2]: {letters[8:0:-2]}")

#### Modifying List Elements

To modify a single element of a list, simply reference that index and assign a different value to it. For example, to change the letter "a" to "first" in the `letters` list from above, we would type `letters[0] = "first"`. To change multiple elements at once, you can assign the new values using a list slice.

In [None]:
# Change "a" to "first" in letters
letters[0] = "first"
print(letters)

In [None]:
# Now, change the last two elements using a list slice
# We are counting backwards now: start at -1, end at -3 (exclusive, remember), use step = -1
# Verify it's the two we want
print(letters[-1:-3:-1])

# now change them
letters[-1:-3:-1] = ["last", "second to last"]
print(letters)

#### Copying Lists

If you try copying a list, say `myList`, to a second list called `yourList` with the following command `yourList = myList`, then you have created a *shallow* copy. In effect, you have simply created a new variable (or symbol if you will) called `yourList` that points to the exact same data as `myList` in the underlying memory space. Therefore, when you make changes to `myList`, those changes will show up in `yourList` and vice versa. If you do **not** want a shallow copy, then you need to create a *deep* copy by using the method `copy`.

In [None]:
# Create myList, print it out
myList = [1, 2, 3, 4, 5]
print(f"myList  : {myList}")

# Create yourList and print it out
yourList = myList
print(f"yourList: {yourList}")

# Now change the first element of myList
print("\nChanging the first element of myList to 999 ...")
myList[0] = 999
print(f"myList  : {myList}")
print(f"yourList: {yourList}")

# Try the other direction
print("\nChanging the last element of yourList to 999 ...")
yourList[-1] = 999
print(f"myList  : {myList}")
print(f"yourList: {yourList}")

In [None]:
# Let's try a deep copy instead
myNewList = [2, 4, 6, 8]
print(f"myNewList  : {myNewList}")

# Create yourNewList with .copy()
yourNewList = myNewList.copy()
print(f"yourNewList: {yourNewList}")

# Changing the first element of myNewList
print("\nChanging the first element of myNewList to -999 ...")
myNewList[0] = -999
print(f"myNewList  : {myNewList}")
print(f"yourNewList: {yourNewList}")

# Try the other direction
print("\nChanging the last element of yourNewList to -999 ...")
yourNewList[-1] = -999
print(f"myNewList  : {myNewList}")
print(f"yourNewList: {yourNewList}")

#### Other List Operations

We've already seen how to find how many elements are in a `list` by using the function `len`.  We can join two lists together by using the `+` operator. Similarly, we can use `*` to make copies of a list and append them to the end, thus duplicating lists *n* times. Other various helpful methods include `append`, `insert`, `remove`, `sort`, and `reverse` among others.

In [None]:
# Concatenate two lists
bigList = myNewList + yourNewList
print(bigList)

In [None]:
# Duplicate list
myNewList3Times = myNewList * 3
print(myNewList3Times)

In [None]:
# Append a new element to bigList
bigList.append("New Element")
print(bigList)

----

### Tuples

A `tuple` is a collection that is ordered and *immutable*. Creating a `tuple` is very similar to creating a `list` except you use parentheses `()` instead of square brackets, `[]`. The process of accessing elements of a `tuple` is identical to that of a `list`. You need to be aware of `tuple`s because some functions either return them or require them in various packages/modules that you will encounter. Let's try it.

In [None]:
# Create a tuple
t = (1, 2, 3)
print(t)
print(type(t))

In [None]:
# Get the first element of the tuple t
print(t[0])

# Try to change the first element
t[0] = 999

----

### Range

A `range` represents an immutable sequence of numbers is commonly used for looping a specific number of times in a `for` loop. You call `range(stop)` where `stop` represents the number of elements you want in the sequence. By default `range` starts indexing at 0. You can change this behavior using the other constructor call of `range(start, stop, [step])`. The optional argument of `step` defaults to 1.

In [None]:
# Call a few different ones to see how it works
print(range(10))

In [None]:
# Okay, that didn't tell me much
# Let's wrap it in a list and then print it out
print(list(range(10)))

# Change start to 1 ... notice stop is EXCLUSIVE
print(list(range(1, 10)))

# Count by 2s staring at 2 and going up to and INCLUDING 10
print(list(range(2, 11, 2)))

-----

<font color='red' size = '5'> Student Exercise </font>

Complete the following tasks in the empty **Code** cell below.

1. Create a new `list` called `theList` that contains the odd numbers from 1 to 9.
    - Challenge: Can you complete this task using `range`?    
2. Print out the first 2 element of `theList`.
3. Print out the last 2 elements of `theList`.
4. Make a copy of the `theList`, call it `reversedList`, reverse the elements of it, and print it out.
    - Make sure you do **not** change the order of the original `theList`.
5. Print out the combined list of `theList` and `reversedList`.

-----

-----

## Dictionaries

One of the most useful built-in data types in Python is the dictionary or the `dict` type. A dictionary is a collection that is *unordered* and *mutable*. The elements of the collection are **key-value** pairs. Instead of being indexed by a range of numbers (like a `list` or `tuple`), a dictionary is indexed by *keys* which can be be any immutable type. For example, strings and numbers can always be keys. You **cannot** use `list`s as keys since they can be modified in place. The values can be any valid data type.

It is best to think of dictionaries as *key:value* pairs with the requirement that keys are unique. To create a dictionary, you place comma-separted key:value pairs inside of curly braces, `{}`. 

In [None]:
# Create an income statement dictionary
incomeStmt = {"Revenue": 100,
             "COGS": 52,
             "Gross Margin": 45,
             "SG&A": 40,
             "Net Income": 5}

print(incomeStmt)

In [None]:
# Retrieve an element using the key
incomeStmt["COGS"]

In [None]:
# To add a key-value to a dictionary, assign the value to a new key
# Add "Fiscal Year": 2018
incomeStmt["Fiscal Year"] = 2018
print(incomeStmt)

In [None]:
# To change a value, access it using the key and reassign
# Change the fiscal year to 1998
incomeStmt["Fiscal Year"] = 1998
print(incomeStmt)

In [None]:
# Get the keys of the dictionary as a list
list(incomeStmt)

In [None]:
# You can use the `in` operator to determine if the key exists
"COGS" in incomeStmt

In [None]:
"cogs" in incomeStmt

In [None]:
# To get all the items as an iterable object, you call .items()
incomeStmt.items()

In [None]:
# As a preview let's loop through the dictionary
for k, v in incomeStmt.items():
    print(f"key: {k:15}value: {v}")

## Ancillary Information

The following links are to additional documentation that you might find helpful in learning this material. 

1. The official Python tutorial about [formatting strings][1].
2. A nice post about [f-strings][2].
3. The official Python tutorial about [datastructures][3].


-----

[1]: https://docs.python.org/3/tutorial/inputoutput.html
[2]: https://realpython.com/python-f-strings/
[3]: https://docs.python.org/3/tutorial/datastructures.html

**&copy; 2021 - Present: Matthew D. Dean, Ph.D.   
Clinical Associate Professor of Business Analytics at William \& Mary.**