# MSDS 631 - Lecture 2 (January 30, 2019)
### Modifying Data Structures, Built-in Functions/Methods

### More about Data Types

#### Strings
There are several characters that have special interpretations in Python (and many other languages). As a result, they cannot be printed as you would normally in quotes. The easiest example of this would be if I wanted to print 'He's pretty happy.' Here, I have a single quote for the conjunction "He's". I gave you one alternative to this by telling you to just use double quotes, but if you run into a situation where you cannot do this, you can use the backslash character (\) in front of the character to "escape" it. This means Python will ignore its special properties and just consider it a normal string character.

In [1]:
#Case above
print('He's pretty happy')

SyntaxError: invalid syntax (<ipython-input-1-f2e4ae73f749>, line 2)

In [2]:
#Case above
print('He\'s pretty happy')

He's pretty happy


### Postscript on Data Structures

Remember that all data structures can have lengths. Their lengths are measured by the number of accessable values for that container. This means that a list inside of a list does not inherit the length of other objects that they contain. In the example below, you have a list with four (4) values: three integers and one list. The internal list only counts as one value, not three.

Remember that the choice to reassign values to intermediate variables is a personal one based on what you think makes easily understood code to you and to other people. You want to improve the ability to detect errors when you find "bugs" and you want to reduce the chance of introducing bugs due to excessively complex and difficult to understand code (think about the grocery bill example from Homework #1).

In [3]:
list1 = [1,2,3,[6,7,8]]
print(len(list1))

4


In [4]:
#You can access the internal list without having to assign its existance as a separate list.
internal_list = list1[3]
second_value_of_internal = internal_list[1] #Option 1
print(second_value_of_internal)

7


In [5]:
second_value_of_internal = list1[3][1] #Option 2
print(second_value_of_internal)

7


## Modifying data structures
Last week we defined what the different Python data structures are. However, these "containers" do not have to be static. You can add and remove items as is required. Each type has a different way of achieving the desired effects. Below we'll dive into different things we can do to the structures to modify them.

### Modifying Lists
##### Adding data

In [6]:
# When we need to add to a list, we append to it.
fruit_list = ['apples', 'oranges', 'limes']
fruit_list.append('kumquats') # Appending is an "in-place" method, meaning that you don't have to redefine it
print(fruit_list)

['apples', 'oranges', 'limes', 'kumquats']


In [7]:
# If you try to store the output of the appended list into a new variable, you'll not get what you thought you'd get
bigger_list = fruit_list.append('bananas')
print(bigger_list)

None


In [8]:
# Despite nothing getting getting assigned to the bigger list, grocery_list was still modified
print(fruit_list)

['apples', 'oranges', 'limes', 'kumquats', 'bananas']


In [13]:
# If you keep running the same code, you'll find yourself with more items than you anticipated
fruit_list.append('grapes')
# fruit_list.append('strawberries')
print(fruit_list)
unwantedFruit = fruit_list.pop

['apples', 'oranges', 'limes', 'kumquats', 'bananas', 'grapes', 'grapes', 'grapes', 'grapes', 'grapes']


In [14]:
# You can also combine the elements of lists
meat_list = ['turkey', 'bacon', 'bacon', 'bacon', 'bacon']
grocery_list = fruit_list + meat_list # This is the equivalent of fruit_list.extend(meats) done in-place
print(grocery_list)

['apples', 'oranges', 'limes', 'kumquats', 'bananas', 'grapes', 'grapes', 'grapes', 'grapes', 'grapes', 'turkey', 'bacon', 'bacon', 'bacon', 'bacon']


##### Removing items from lists

In [20]:
# Sometimes you want to retrieve an item from a list AND remove it
one_removed_item = grocery_list.pop(6)
print(one_removed_item)
print(grocery_list)



grapes
['apples', 'oranges', 'limes', 'kumquats', 'bananas', 'grapes', 'turkey', 'bacon', 'bacon', 'bacon']


In [116]:
one_specific_item = grocery_list.pop()
print(one_specific_item)
print(grocery_list)

turkey
['apples', 'oranges']


In [26]:
nice_try = grocery_list.pop(2)
print (grocery_list)

['apples', 'oranges', 'turkey', 'bacon', 'bacon', 'bacon']


##### Changing items in a list

In [27]:
print(fruit_list)
fruit_list[0] = 'honeycrisp apples'
print(fruit_list)

['apples', 'oranges', 'limes', 'kumquats', 'bananas', 'grapes', 'grapes', 'grapes', 'grapes', 'grapes']
['honeycrisp apples', 'oranges', 'limes', 'kumquats', 'bananas', 'grapes', 'grapes', 'grapes', 'grapes', 'grapes']


In [28]:
# BE CAREFUL CHANGING ITEMS IN A LIST!!!
important_things = ['do taxes', 'pay rent', 'finish homework', 'call mom']
tasks_today = important_things
tasks_today[0] = 'eat ice cream'
print(important_things)

['eat ice cream', 'pay rent', 'finish homework', 'call mom']


Lists are what are called "mutable" objects in Python. In the simplest terms, this means that they can be modified. It also means that when you create another instance of that variable, you aren't actually creating a new object, but rather a "pointer" to the same location in the computer's memory. If you change the derived version of the data, you will change the original version of the data. We will encounter this challenge frequently, so be aware when you are dealing with mutable objects and the ramification of modifying them.

In [29]:
# Assigning b as a does not actually create a new place in memory
a = [1,2,3]
b = a
b.append(4)
print(a)

[1, 2, 3, 4]


In [32]:
# It is not the values of a that matters, but rather the way in which you create the assignment
a = [1,2,3]
b = a + [] #This tricks Python into creating a new place in memory for b
b.append(4)
print(a)


[1, 2, 3]


AttributeError: 'list' object has no attribute 'add'

There are other methods to modify lists such as <strong>`insert`</strong> and <strong>`remove`</strong>. These are not typically used in data science, so I will not cover them.

### Modifying Sets
##### Adding data

In [31]:
# When we need to add to a set, we add to it.
fruit_set = {'apples', 'oranges', 'limes'}
fruit_set.add('kumquats') # Appending is an "in-place" method, meaning that you don't have to redefine it
print(fruit_set) # The order of the set is not guaranteed

{'apples', 'limes', 'oranges', 'kumquats'}


In [40]:
# If you try to store the output of the appended list into a new variable, you'll not get what you thought you'd get
bigger_set = fruit_set.add('bananas')
print(bigger_set)

bigger_set = fruit_set | ('lemons')


None


TypeError: unsupported operand type(s) for |: 'set' and 'str'

In [41]:
# Despite nothing getting getting assigned to the bigger list, grocery_list was still modified
print(fruit_set)

{'limes', 'kumquats', 'bananas', 'apples', 'oranges'}


In [42]:
# You can also combine the elements of sets
meat_set = {'turkey', 'bacon', 'bacon', 'bacon', 'bacon'}
grocery_set = fruit_set | meat_set # Note how the 'bacon' was reduced to a single instance
print(grocery_set)

{'limes', 'kumquats', 'bananas', 'turkey', 'apples', 'bacon', 'oranges'}


##### Removing items from a set

In [43]:
# Sometimes you want to retrieve an item from a set AND remove it
one_removed_item = grocery_set.pop() # You won't know which item you'll get back and cannot specify which one either
print(one_removed_item)
print(grocery_set)

limes
{'kumquats', 'bananas', 'turkey', 'apples', 'bacon', 'oranges'}


In [49]:
# Unlike lists, you CAN "subtract" out items from a set
bacon_set = {'bacon'}
reduced_grocery_set = grocery_set - bacon_set # Items in meat_set do not HAVE to be in grocery_set
print(reduced_grocery_set)



{'kumquats', 'bananas', 'turkey', 'apples', 'oranges'}


##### Changing items in a set

In [52]:
# You cannot modify specific values of a set (since you cannot index a set)
# You still have challenges with sets being mutable
a = {1,2,3}
b = a
b.add(4)
print(a)
print(b)


SyntaxError: invalid syntax (<ipython-input-52-14feb5461b6e>, line 8)

### Modifying Dictionaries
##### Adding data

In [53]:
# We can add to either the keys OR the values
grocery_dict = {'fruit': ['honeycrisp apples', 'oranges', 'limes', 'lemons', 'kumquats', 'bananas', 'grapes'],
                'meat': ['turkey', 'bacon', 'bacon', 'bacon', 'bacon']}
grocery_dict['drinks'] = ['Coke', 'coffee', 'orange juice', 'milk'] # When adding new key, must also add some value
print(grocery_dict)

{'fruit': ['honeycrisp apples', 'oranges', 'limes', 'lemons', 'kumquats', 'bananas', 'grapes'], 'meat': ['turkey', 'bacon', 'bacon', 'bacon', 'bacon'], 'drinks': ['Coke', 'coffee', 'orange juice', 'milk']}


In [54]:
# We can also add a single item to one of the values
grocery_dict['drinks'].append('apple juice')
print(grocery_dict)

{'fruit': ['honeycrisp apples', 'oranges', 'limes', 'lemons', 'kumquats', 'bananas', 'grapes'], 'meat': ['turkey', 'bacon', 'bacon', 'bacon', 'bacon'], 'drinks': ['Coke', 'coffee', 'orange juice', 'milk', 'apple juice']}


In [65]:
# Remember, that you can add many layers of different types into your dictionary
grocery_dict['budget'] = [{'fruit': 50, 'meat': 25, 'drinks': {'alcoholic': 10, 'non-alcoholic': 20}}, 10]
print(grocery_dict['budget'][0]['fruit'])
print(grocery_dict['budget'][0]['meat'])
print(grocery_dict['budget'][1])

50
25
10


In [66]:
print(grocery_dict['fruit'])
print(grocery_dict['budget'])

['honeycrisp apples', 'oranges', 'limes', 'lemons', 'kumquats', 'bananas', 'grapes']
[{'fruit': 50, 'meat': 25, 'drinks': {'alcoholic': 10, 'non-alcoholic': 20}}, 10]


In [67]:
# Remember, the values of the dictionary do not need to be of the same type
grocery_dict['budget'] = 100
grocery_dict

{'fruit': ['honeycrisp apples',
  'oranges',
  'limes',
  'lemons',
  'kumquats',
  'bananas',
  'grapes'],
 'meat': ['turkey', 'bacon', 'bacon', 'bacon', 'bacon'],
 'drinks': ['Coke', 'coffee', 'orange juice', 'milk', 'apple juice'],
 'budget': 100}

##### Removing data from a dictionary

In [68]:
# You can delete a key
del(grocery_dict['meat'])
print(grocery_dict)

{'fruit': ['honeycrisp apples', 'oranges', 'limes', 'lemons', 'kumquats', 'bananas', 'grapes'], 'drinks': ['Coke', 'coffee', 'orange juice', 'milk', 'apple juice'], 'budget': 100}


In [69]:
# You cannot just delete a whole value, but you CAN remove elements from data structures within a value
removed_from_dict_value = grocery_dict['fruit'].pop()
print(removed_from_dict_value)
print(grocery_dict)

grapes
{'fruit': ['honeycrisp apples', 'oranges', 'limes', 'lemons', 'kumquats', 'bananas'], 'drinks': ['Coke', 'coffee', 'orange juice', 'milk', 'apple juice'], 'budget': 100}


##### Changing values in a dictionary

In [70]:
grocery_dict['budget']

100

In [71]:
# You can also modify any value in any valid way
print(grocery_dict['budget'])
grocery_dict['budget'] += 1 # Compound operator that adds to the value of a numeric value by 1
print(grocery_dict['budget'])

100
101


In [74]:
groceryList = [1,2,3,45]
groceryList[1] += 1
print (groceryList)


[1, 3, 3, 45]


## Built-in Functions and Methods

Python has several capabilities built into its code base. We will go into much greater detail next week about functions, but for now, just know that a function "does something" with that you "pass" it. Methods are when you ask an object to do something to itself.

In [81]:
# How long is a string?
len('my_list1') # "len" is a function
maisiesList= [1,2,3]
len(maisiesList)


3

In [82]:
# How long is a list?
my_list1 = [12, 39, 27, 57, 23, 95, 12, 63, 96, 54, 10, 38, 91, 36, 7]
len(my_list1) # "len" is a function

15

In [83]:
# Cast the range object as a list
list(range(5)) # "list" is a function (can also do for set or tuple)

[0, 1, 2, 3, 4]

In [101]:
# Find the minimum and maximum values of a list (note that I am chaining functions)
list_of_numbers = [15, 39, 27, 57, 26, 95, 12, 63, 96, 54, 10, 38, 91, 36, 7]
print(min(list_of_numbers)) # print and min are functions (can also do for set or tuple)
print(max(list_of_numbers)) # print and max are functions (can also do for set or tuple)
# Want to avoid doing too much within the arguments of a function; try creating new variables whenever possible
min_of_nums = min(list_of_numbers)
max_of_nums = max(list_of_numbers)
print(min_of_nums)
print(max_of_nums)

7
96
7
96


In [102]:
# Round a float value
pi_0 = round(3.1415926535) # Round float value to 0 decimals
pi_1 = round(3.1415926535, 1) # Round float value to 1 decimal
pi_9 = round(3.1415926535, 9) # Round float value to 9 decimals
print(pi_0)
print(pi_1)
print(pi_9)

3
3.1
3.141592654


In [103]:
list1 = [1.222, 2.222, 3.222]
roundedList1 = [round(elem) for elem in list1]
print(roundedList1)

[1, 2, 3]


In [104]:
# Sum of list of numbers
sum_nums = sum(list_of_numbers) # "sum" is a function (can also do for set or tuple)
print(sum_nums)

666


In [105]:
# Sort values in a list
list_of_numbers.sort() # Note that this is an in-place method; do NOT need to re-assign output
print(list_of_numbers)

[7, 10, 12, 15, 26, 27, 36, 38, 39, 54, 57, 63, 91, 95, 96]


In [107]:
# Rerverses the order of a list (NOT the same as ordering things in reverse)
list_of_numbers = [12, 39, 27, 57, 23, 95, 12, 63, 96, 54, 10, 38, 91, 36, 7]
print(list_of_numbers)
list_of_numbers.reverse() # Note that this is an in-place method; do NOT need to re-assign output
print(list_of_numbers)
list_of_numbers.sort()


[12, 39, 27, 57, 23, 95, 12, 63, 96, 54, 10, 38, 91, 36, 7]
[7, 36, 91, 38, 10, 54, 96, 63, 12, 95, 23, 57, 27, 39, 12]


In [None]:
# Strings can cause unintuitive results
max(['1', '2', '10']) # The first digit of '10' comes before the first digit of '2'; thus '2' has a "larger" value

In [None]:
max(['1', '2', '10', 'a']) # Letters have larger values than integer characters