# Other Data Structures

We've seen lists so far, but there are other collections of data in python which you will find yourself using quite often. Here we'll see the three main ones: sets, dictionaries, and tuples.

## Sets

Sets come from Mathematics, where they are used to track unique elements. From the python perspective, a set is an unordered collection data type that is iterable, mutable and has **no duplicate** elements.

The major advantage of using a set, as opposed to a list, is that it has a highly optimized method for checking whether a specific element is contained in the set. This makes

In [1]:
# Same as {"a", "b", "c"} 
Set = set(["a", "b", "c"]) 
  
print("Set: ") 
print(Set) 
  
# Adding element to the set 
Set.add("d") 
print("\nSet after adding: ") 
print(Set) 

#Adding an element that already exists
Set.add("a") 
print("\nSet after adding: ") 
print(Set) 

Sets are listy and you can iterate over them

In [3]:
for ele in Set:
    print(ele)

Set comprehensions are similar to list comprehensions. You can use them to iterate over sets. The only difference between them is that set comprehensions use curly brackets { }. 

In [4]:
# Using Set comprehensions to create an output set which contains only the even numbers that are present in the input list.
  
input_list = [1, 2, 3, 4, 4, 5, 6, 6, 6, 7, 7] 
set_using_comp = {var for var in input_list if var % 2 == 0} 
  
print("Output Set using set comprehensions:", set_using_comp)

And this is super fast:

In [16]:
'c' in Set

### A maths example

![inline](images/sets-and-venn-diagrams-figure-2.png)

In [5]:
ζ = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}
A = {1, 3, 5, 7, 9, 11}
B = {2, 3, 5, 7, 11}

In [6]:
A.union(B)

In [7]:
A.intersection(B)

In [9]:
# A's complement
ζ.difference(A)

In [10]:
A - B

In [11]:
A.symmetric_difference(B)

### An application

In [15]:
## Read a file, parse lines, and get all UNIQUE words

wordset = set() # make a set with unique items  
fd = open("data/Julius Caesar.txt")
lines = fd.readlines()
fd.close()
# strip newline characters and other whitespace off the edges
cleaned_lines = [line.strip() for line in lines] 
# make a list of lists. 
# each inner list if the list of words on that line
list_of_lines_words = [line.split() for line in lines]
# Take each list of words, and get all the words
for lines_words in list_of_lines_words:
    wordset.update(lines_words) # update the wordset using the new list.
unique_words = list(wordset)
unique_words[:100] # 100 of these words

## Dictionaries

A "bag" of **value**s, each with its own label, called a **key**. The 'most powerful' data collection type,and one I suspect you will find yourself using a lot!

A dictionary is similar to a list, and you can iterate over it but:
- the order of items doesn't matter (use an `OrderedDict` for this)
- they aren't selected by an index such as 0 or 5. 

Instead, a unique 'key' is associated with each 'value' . The 'key' can be any **immutable data type**: boolean, float, int, tuple, string (but it is often a string)

Dictionaries themselves are "Mutable" (the values can be changed).

In [17]:
# Creating a dictionary:
# 1. Using {}
empty_dict = {} 
print (type(empty_dict))
new_dict = { "day":5, "venue": "GJB", "event": "Python Carnival!" }
print(new_dict)

In [18]:
#2. Using dict()
purse = dict(type="wallet", material="leather")
purse

In [19]:
purse['make'] = "Versace"
purse

### Dictionary operations

In [20]:
# (a) Nested Dict
D ={'to': {'name': 'Alice', 'age':18}}

In [21]:
# (b) Alternative construction techniques:

# (i) dict_var = dict([(key1, value1),(key2, value2), ...])	
D = dict([('name','Alice'),('age',18)])
print(D)

This is construction from a list of tuples. We'll see tuples soon.

In [22]:
# (iii) Creating dict from keys only
# dict_var = dict.fromkeys([key1, key2, ...])
D = dict.fromkeys(['name','age','place'])
D

Notice that the values have been asigned to a special object in python `None` which is of type `NoneType`. Its specially created to handle the situation of missing values, and evaluates as falsy in conditionals.

In [24]:
if D['age']:
    print("Age is", d['age'])
else:
    print("Nothing specified")

Its often used thus, using the opearator `not` in some flow..

In [25]:
if not D['age']:
    print("Age not given..ask!")

In [26]:
# (c) Indexing by key dict_var['key']
print (D['age'])

# (d) Membership operation 'key' in dict_var
'place' in D

Dictionaries are listy:

In [31]:
for key in new_dict:
    print(key, ":", new_dict[key])

In [33]:
for key, value in new_dict.items():
    print(key, ";", value)

Some other useful methods:

1. All keys `dict_var.keys()`
2. All values `dict_var.values()`
3. All key + value tuples `dict_var.items()`  
4. Copy method `dict_var.copy()`
5. Remove all items `dict_var.clear()`
6. Merging keys from different dict `dict_var1.update(dict_var2)`
7. Fetch by key, if absent default `dict_var.get(key, default)`
8. Remove by key, if absent default `dict_var.pop(key, default)`
9. Fetch by key, if absent set default `dict_var.setdefault(key, default)` 
10. deleting items by key `del dict_var[key]`

Dictionaries can be iterated over using dictionary comprehensions which look thus:
`output_dict = {key:value for (key, value) in iterable if (key, value satisfy this condition)}`

In [27]:
# Using Dictionary comprehensions to create an output dictionary which contains only the odd numbers that are present in the input list as keys and their cubes as values
  
input_list = [1,2,3,4,5,6,7] 
dict_using_comp = {var:var ** 3 for var in input_list if var % 2 != 0} 
  
print("Output Dictionary using dictionary comprehensions:", dict_using_comp)

## Tuples

They are a fast kind of sequence that functions much like a list - they have elements which are indexed starting at 0. They work exactly like lists, except that tuples can't be changed in place!! This means they are immutable, and this guarantee gives them their speed.


BASIC PROPERTIES:

- Ordered collections of arbitrary objects
- Accessed by index
- Of the category "immutable sequence"
- Fixed-length, heterogeneous and arbitrarily nested

The fixed length is important for performance. Unlike lists, they cannot be grown or shrunk.

Theare are some ways to make tuples:

In [35]:
# CREATING TUPLEs
# (a) Using tuple()
x = tuple() 
type(x)	

In [36]:
# (b) Using only ()
t=() 
type(t) 

The above tuples are 0-length and not so useful. Because tuples are immutable, the following code will not work.

In [37]:
t[0] = 5

You will usually see them defined thus:

In [39]:
#(c) Casual way! 
z = 1,2,3,4 # or z = (1, 2, 3, 4)
type(z) 
print(z)

Of-course `z` is immutable

In [40]:
z[2] = 3

### Tuple operations

In [41]:
# TUPLE LITERALS AND OPERATIONS
# (a) Nested tuples
T = ('Bob', ('Developer','Manager'))

# Print the message below using tuple, T? 
# Bob is a Developer
# your code here

In [42]:
# (b) Indexing and Slicing
T1 = (1, 2, 3, 4, 5)
print(T1[0:2]) 
print (T1[::-1]) 
print (T1[0], T1[2:4])

And tuples are, as you might expect, listy

In [44]:
# (c) Iteration and Membership
T1 = (1, 2, 3, 4, 5) 
for ele in T1:
    print(ele)

Think of why the following works...

In [45]:
# TUPLE ASSIGNMENT
# Whenever we need to swap two variables, we use the conventional method: Using a temporary variable,
# temp = a 
# a = b 
# b = temp

#It is rather simple to perform swapping using tuple assignment (does not require 'temp' variable!)
A = (1, 2, 3) 
B = (4, 5, 6) 
A, B = B, A 
print (A)
print (B)

## An application of dictionaries

Previously we had used a set to store the unique words in Julius Ceaser. Now we'll count how often these words are used!

In [46]:
## Read a file, parse lines, and get all UNIQUE words

worddict = dict() # make a set with unique items  
fd = open("data/Julius Caesar.txt")
lines = fd.readlines()
fd.close()
# strip newline characters and other whitespace off the edges
cleaned_lines = [line.strip() for line in lines] 
# make a list of lists. 
# each inner list if the list of words on that line
list_of_lines_words = [line.split() for line in lines]
# Take each list of words, and get all the words
for lines_words in list_of_lines_words:
    for word in lines_words:
        if not word in worddict:
            worddict[word] = 1
        else:
            worddict[word] += 1

Now here is where the iterative nature of dictionaries can be used to our benefit. We sort the worddict, using the function `worddict.get` to provide the values, which are the counts.

In [50]:
topwords = sorted(worddict, key = worddict.get, reverse=True)

In [52]:
for word in topwords[:20]:
    print(word, worddict[word])

You can even make a hacky histogram for this by creating a '#' for every 10 occurences

In [60]:
for word in topwords[:20]:
    print(word+(20 - len(word))*' ', (worddict[word]//10)*'*')