# Python for Digital Humanities

## Unit #7: Data Types

* Strings
* Lists
* Sets
* Dictionaries
* Tuples

<font color=blue>---------------------------------------------------------------</font>

In a previous unit, we talked a little bit about variables and how they can hold information.  We saw how a variable can hold an entire text from a file as a string.  In that case, the _data type_ of the variable was a string.  

There are many more data types in Python.  Each data type comes with its own set of tools that will work on it or be applied to it.  Think about it -- if your data type is _numeric_, you would want to be able to manipulate by adding, subtracting, multiplying, or dividing its value.  But, these operations may not make sense if your data type is a string.  For strings, we may want to concatenate two strings or convert a string to all lower case.

Other data types that we will study are lists, sets, dictionaries and tuples.

## 7.1  Strings and String Methods

We know that we can have variables that hold a string.  Now let's see what we can do with strings.

#### Splitting strings on characters:
```
phone = "434-924-1812"
print(phone.split('-'))
```
#### Concatenating strings:
```
area_code = "(434)"
number = "924-1812"
phone = area_code + number
print(phone)
```
#### Counting occurrences of substrings
```
with open('austen-emma-excerpt.txt') as infile:
      text = infile.read()

print(text.count("her"))
```

#### Converting to all lower case
```
print(text.lower())
```

#### Performing consecutive operations
```
print(text.lower().count('her'))
```

Question: Can we change a letter in a string?

In [1]:
title = "Harry Potter and the Chamber of Secrets"
print(title[6])
title[6]='p'


P


TypeError: 'str' object does not support item assignment

There are two things wrong with what we tried to do above.  

* First, strings are **immutable** -- they cannot be changed.  But, we can overwrite a string with the changes that we want to make.
* Instead of re-assigning a letter (i.e., `title[6]='p'`), we need to 'replace' the letter:




In [2]:
title = "Harry Potter and the Chamber of Secrets"
title.replace('P', 'p')
print(title)

title = title.replace('P', 'p')
print(title)



Harry Potter and the Chamber of Secrets
Harry potter and the Chamber of Secrets


We do need to be careful with the 'replace' method -- it will replace all occurrences of a letter:

In [1]:
title = "Harry Potter and the Chamber of Secrets"

print(title.replace('e', 'a'))


Harry Pottar and tha Chambar of Sacrats


## 7.2 Lists and List Methods

A list is a collection of ordered items. For example, it could be a collection of your favorite authors:

``` 
favorite_authors = ['Shakespeare', 'Dostoevsky', 'Twain']
```
Or, a list of groceries:
```
groceries = ['milk', 'bread', 'bananas', 'chocolate', 'more chocolate']
```
Or, your locker combination from high school:
```
locker_combo = [25, 36, 3]
```

The notation for a list is the items, separated by commas, and enclosed by square brackets:
``` 
list_name = [item1, item2, item3, item4, item5]
```

If the items are numeric, they do not need to be surrounded by quotes.  Strings do need the quotes.
We can have as many items as we need in a list (assuming the data can be held within memory.

One of the best features of lists is that we can have mixed data types (e.g., strings and numbers and other lists).
```
mylist = ['Tom Hanks', 64, 
           ['Apollo 13', 'Toy Story', 'Forest Gump', 'A League of Their Own', 'Big'] ]
```

You can access the items by indexing into the list.  (However, Python starts counting with 0.)


In [2]:
mylist = ['Tom Hanks', 64, 
           ['Apollo 13', 'Toy Story', 'Forest Gump', 'A League of Their Own', 'Big'] ]
name = mylist[0]
age = mylist[1]
movies = mylist[2]
print(f'{name} is {age} years old and starred in "{movies[0]}".')

Tom Hanks is 64 years old and starred in "Apollo 13".


#### Lists are mutable.
They can be changed or expanded.


In [3]:
favorite_authors = ['Shakespeare', 'Dostoevsky', 'Twain']
favorite_authors[0] = 'Austen'
print(favorite_authors)

favorite_authors.append('Moliere')
print(favorite_authors)

['Austen', 'Dostoevsky', 'Twain']
['Austen', 'Dostoevsky', 'Twain', 'Moliere']


#### Searching a list for a specific item
We can find an item in a list by identifying its index (i.e., it's position in the list).  Remember, Python starts counting with 0.

In [4]:
print(favorite_authors.index('Twain'))

2


But if the item is not in the list, we will get an error message:

In [5]:
print(favorite_authors.index('Shakespeare'))

ValueError: 'Shakespeare' is not in list

Before searching for an item in list, we can check to see if it exists by counting the number of occurrences:

In [6]:
print(favorite_authors.count('Shakespeare'))

0


## 7.3 Dictionaries and Dictionary Methods

A `dictionary` in Python is another way to organize data within your code.

To understand how a dictionary works, think about a physical dictionary. Each entry has two parts:  the word that you want to look up and the definition of that word.  

But, in programming, we want our dictionaries to be more general.  Instead of having a word and its definition, we want a `key` that we can search on and its `value`(any information that we want to associate with the key).

In programming, these are called `key:value` pairs.


The syntax for creating a dictionary is as follows:

```
variable = {key1:value1, key2:value2, key3:value3, key4:value4}
```
Notice the use of the curly brackets and the colons. The curly brackets let Python know that the variable will hold a dictionary.  The colons indicated the association between the keys and the values.  The key must be on the left of the colon and the value on the right.

We can include as many key:value pairs as we want.  

To access the information in a dictionary, we can use a key value as an "index".  For example:


In [7]:
state_capitals = {"Virginia":"Richmond", 
                  "Florida":"Tallahassee", 
                  "Maryland":"Annapolis",
                  "California": "Sacramento"}

print("The capital of Florida is {:s}.".format(state_capitals["Florida"]))

The capital of Florida is Tallahassee.


We can start with an empty dictionary and add information as needed.


In [8]:
title_lengths = {}  #Empty dictionary
print(title_lengths)

title = "Harry Potter and the Chamber of Secrets"
title_lengths[title] = len(title)
print(title_lengths)
        
title = "The Sun Also Rises"
title_lengths[title] = len(title)
print(title_lengths)
         



{}
{'Harry Potter and the Chamber of Secrets': 39}
{'Harry Potter and the Chamber of Secrets': 39, 'The Sun Also Rises': 18}



To get a list of all of the keys in our dictionary, we can use the keys() method.

```
print(title_lengths.keys())
```
Similarly, we can use the values() method to get a list of values.

```
print(title_lengths.values())
```

In [9]:
print(title_lengths.keys())
print(title_lengths.values())
some_key = list(title_lengths.keys())[1]
print(some_key)

dict_keys(['Harry Potter and the Chamber of Secrets', 'The Sun Also Rises'])
dict_values([39, 18])
The Sun Also Rises


<font color=blue>---------------------------------------------------------------</font>

### Activity:  Creating a Simple Dictionary

Write a small program that will create a dictionary called `birthdays` that holds the birthdates or your closest family or friends.  Then, have the program prompt the user to input a name and displays the birthday associated with that name.

    
<font color=blue>---------------------------------------------------------------</font>

In [12]:
birthdays = { "Ryan":"19 Aug 1971", "Steve":"10 Jan 1968", "Kristi":"3 Sep 1975"}

name = input("Please enter a name: ")
print("The birthdate of {:s} is {:s}.".format(name, birthdays[name]))

Please enter a name: Ryan
The birthdate of Ryan is 19 Aug 1971.


## 7.4 Sets

A set is an unordered collection of unique items and is **immutable**.

In [13]:
Johns_hobbies = ["guitar", "welding", "cooking"]
Marys_hobbies = ["cooking", "car repair", "video games"]
Ians_hobbies = ["video games", "math", "welding"]

hobby_list = Johns_hobbies + Marys_hobbies + Ians_hobbies
print("hobby list:  ", hobby_list)
hobby_set = set(hobby_list)
print("hobby set:  ", hobby_set) 

hobby list:   ['guitar', 'welding', 'cooking', 'cooking', 'car repair', 'video games', 'video games', 'math', 'welding']
hobby set:   {'video games', 'guitar', 'cooking', 'math', 'welding', 'car repair'}


Because sets are _unordered_, you cannot index into them.

In [14]:
hobby_set[1]

TypeError: 'set' object is not subscriptable

## 7.5 Tuples and Tuple Methods

A tuple is a collection of ordered items and is **immutable**.   The notation for a tuple is the items, separated by commas, and enclosed by parentheses:
```
my_tuple = ('red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet')
```

The notation for tuples can be a little strange.  For example, a tuple with a single item still needs a comma:
```
no_color = ('black', )
```

Similar to lists, you can use `index()` and `count()` to find items in a tuple:

In [15]:
my_tuple = ('red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet')
print("Number of times 'black' is in the tuple:  ", my_tuple.count('black'))
print("Position of 'indigo' in the tuple:  ", my_tuple.index('indigo'))


Number of times 'black' is in the tuple:   0
Position of 'indigo' in the tuple:   5
