# Creating and working with collections of data

### Tuples

Tuples are a type of **collection** declared using parenthesis, collections are a group of variables/data. There are multple ways to collect variables together in python and we will go through some of the basic ones here. Tuple values can be accessed via an index integer. 

In [None]:
a_tuple_of_strings = ('Hi!', 'I work', 'at', 'CRC!')
print(a_tuple_of_strings)
print(a_tuple_of_strings[0]) # remember indexes start at zero in python
print(type(a_tuple_of_strings))

Let's try to change which company we work at

In [None]:
a_tuple_of_strings[3] = 'Chevron'

What happened? Maybe we can't escape that easy... (obligatory **MUHAHAHAHA!** here...)

Tuples can store more that just strings, try creating a tuple that stores both numbers and strings below, print out something from the tuple

### Lists

Lists are another type of collection that are declared with square brackets []. Let's make one.

In [None]:
my_favorite_things = ['computers', 'cute babies', 'ice cream', 'the weather outside right now', 'kittens', 'puppies', 7]

Someone asks you to list some of your favorite things, you search the list above in your brain

In [None]:
# They seem like an animal person so you say:
print("I really like {}. Aren't they cute!".format(my_favorite_things[4]))

It turns out they are allergic to cats, they ask you what else you like.

In [None]:
# Lets try listing off a few things this time
print('I really like {}'.format(my_favorite_things[:3]))

Notice the ':' in the index for our list of favorite things? In python this is called **slicing** a collection.

In [None]:
# Let try slicing our list of favorite things other ways, the above example was a to index slice
print(my_favorite_things[1:3]) # we can slice various ranges
print(my_favorite_things[3:5])
print(my_favorite_things[6:]) # this is a slice from an index

You can also slice using negative numbers if you want.

In [None]:
print(my_favorite_things[:-3]) # we sliced off the last 3 entries
print(my_favorite_things[-3:]) # we sliced off all but the last 3 entries

Try to slice the list to include cute_babies and ice cream only using at least two different ways

Ok so that is slicing, now your baby starts crying and continues to do so for an hour... You will always love your baby but it isn't among your favorite things at the moment... Let's update our list.

In [None]:
# the simplest way to remove cute babies from the list is the .remove() command
print(my_favorite_things)
my_favorite_things.remove('cute babies')
print(my_favorite_things)

The baby stops crying and is restored to his/her rightful place in your mental list of favorite things. 

In [None]:
# we can add things to a list using .append()
print(my_favorite_things)
my_favorite_things.append('cute babies')
print(my_favorite_things)

The baby once again wails their little heart out for all its worth, this time a new favorite thing needs to replace cute babies, peace and quiet...

In [None]:
print(my_favorite_things)
my_favorite_things[6] = 'peace and quiet'
print(my_favorite_things)

Notice how this is different than Tuples, when we tried to replace CRC with Chevron we couldn't. In programmer-speak this is a difference in **mutability**. Tuples are immutable and can't be changed once they are defined. Lists are mutable and can be changed dyanamically at any time.

We can also remove things from our list of favorite things using the 'del' command. The baby once again stops crying and can be added to the list.

In [None]:
print(my_favorite_things)
del my_favorite_things[5]
my_favorite_things.append('cute babies')
print(my_favorite_things)

After all that screaming, peace and quiet stays on the list, but babies can have your favorite number's spot.

Change the list of your favorite things to reflect **your** favorite things using what you learned above.

### Dictionaries

Dictionaries are an important type of collection where data is retrieved and added to the collection via 'keys' instead of indexes. In python these collections are known as dictionaries, but in some programming languages they are known as 'hash maps' or just maps. Dictionaries work by taking the key and converting to a unique 'hash' which is a shortcut in memory to the location where a piece of data is stored. When you are looking up a piece of data this saves a tremendous amount of time in a large collection of data because you can retrieve data near instantly instead of having to search through the data piece by piece until you locate it.

In [None]:
# You can define dictionaried in code like so with: {key : value,}
CRC_Roles = {'CEO' : 'Todd Stevens',
             'EVP Development': 'Shawn Kerns',
             'EVP Exploration Type_Stuff': 'Darren Williams',
             'Lowly Data Scientist' : 'Nathan Jones',}
# Values can be retrieved via the key
print(CRC_Roles['CEO'])

You can assign new keys

In [None]:
CRC_Roles['Emperor-King of BDA'] = 'Mike Moustakis'
print(CRC_Roles)

Each key can only be assigned to one piece of data.

In [None]:
print(CRC_Roles['Lowly Data Scientist'])
CRC_Roles['Lowly Data Scientist'] = 'Eric Robinson'
print(CRC_Roles['Lowly Data Scientist'])

**BUT** that piece of data can be a list...

In [None]:
print(CRC_Roles['Lowly Data Scientist'])
CRC_Roles['Lowly Data Scientist'] = ['Nathan Jones' , 'Eric Robinson']
print(CRC_Roles['Lowly Data Scientist'])

You can pull values out of a list in a dictionary just as if it were a list

In [None]:
print(CRC_Roles['Lowly Data Scientist'][0])
print(CRC_Roles['Lowly Data Scientist'][1])

You can extract the keys available in the dictionary the .keys() function

In [None]:
print(CRC_Roles.keys())


You can make the keys a list using list()

In [None]:
print(list(CRC_Roles.keys()))
print(list(CRC_Roles.keys())[0]) # You can chain the call to the list index with square brackets

You can do the same for the values in the dictionary.

In [None]:
print(list(CRC_Roles.values())) # Note the list with in a list below

Add a few members of your team to the dictionary, print out three people's names with their role in the company

### Sets

Python sets are another useful type of data collection. They ensure that unique data values only appear once in the collection. The also allow for fast set operations (intersections, unions, etc). They can be created with {value} or from a collection with the set() function.

In [None]:
# Sets can be declared with curly braces
ways_to_name_perfs = {'perf', 'PERF', 'perforated', 'PERFORATED', "perf'd", "PERF'D", 'Perf', "lots 'O holes"}
print(ways_to_name_perfs)
print(type(ways_to_name_perfs))

In [None]:
# This dataset comes is a list
ways_to_name_perfs_dataset2 = ['perf', 'perf', 'PERF', "PERF'D", 'slotted'] 
#Let convert it to a set
ways_to_name_perfs_dataset2 = set(ways_to_name_perfs_dataset2)
print(type(ways_to_name_perfs_dataset2))
print('Note that the duplicate perf is gone')
print(ways_to_name_perfs_dataset2)

Let's see what the two sets have in common

In [None]:
same_in_both = ways_to_name_perfs_dataset2.intersection(ways_to_name_perfs)
print(same_in_both)
# You can also do:
print(ways_to_name_perfs_dataset2 & ways_to_name_perfs)

Let's see what both sets together looks like

In [None]:
all_combined = ways_to_name_perfs_dataset2.union(ways_to_name_perfs)
print(all_combined)
# You can also do:
print(ways_to_name_perfs_dataset2 | ways_to_name_perfs)

Things that are in ways_to_name_perfs_dataset2 but not in ways_to_name_perfs

In [None]:
difference = ways_to_name_perfs_dataset2.difference(ways_to_name_perfs)
print(difference)
# You can also do:
print(ways_to_name_perfs_dataset2 - ways_to_name_perfs)

Full list of set operations here: https://docs.python.org/2/library/sets.html
        

After making a set, you can convert it back to a list using list()

In [None]:
all_ways_to_name_perfs = list(all_combined)
print(all_ways_to_name_perfs)

Try making a set of all the ways you have seen API numbers names in CRC's data tables, afterwards copy somebody else's list and make a list of the names you had in common and the ones where your list had something that their's didn't.