## <p style="text-align: center;">COMP10001 Foundations of Computing<br>Semester 2, 2022<br>Tutorial Questions: Week 6</p>
### <p style="text-align: center;">Tutor: [Jiyu Chen](https://jiyuc.live)</p>

## Before the discussion questions, try Exercises 1–2 to revise last week’s material

In [11]:
# install requirements

! pip install nltk
import nltk
nltk.download('brown')
from nltk.corpus import brown
article = brown.words()[:300]

## Outline: Two new data structures -- dictionary & set
__________
### What data structures can we use to store songs for backend support of music stream app?

#### Sequences are suitable. We can use a tuple to store details of each song, and store each song into a big list.
__________
```
album_cover = ('Memories do not open', 'The Chainsmokers', '2017', '43:00')
song_a = ('The One', 'The Chainsmokers', '2017', '2:57')
song_b = ('Paris', 'The Chainsmokers', '2017', '3:44')
song_c = ('Something Just Like This', 'The Chainsmokers and Coldplay', '2017', '4:07')
... ...

my_album = [album_cover, song_a, song_b, song_c]

```
__________
#### What if we want to search if *Last Day Alive (feat. Florida Georgia Line)* is within this album? We probably need a for loop
_____
```
def search_song_by_name(name):
    for song in my_album:
        if song[0] == name:
            return song
    return False
    
result = search_song_by_name('Last Day Alive')
```
_____
#### Shortcoming: Inefficiency -- Time Complexity
What if we extend the database to include eccesive amount of songs, let's assume, 1 million. It will take a computer interate through 1/2 million records for each searching query in average.


#### Solution: Group songs with certain keys and use pointers to connect the key with less songs. By doing so, we only need to iterate through keys, instead of entire collection of song records. Let's assume we group the songs by English characters (analog to a dictionary book). We only have to search songs that named a specific English character as initial.
__________
```
song_lookup:

key -pointer-> value

T -> The One
P -> Paris
S -> Something Just Like This


def search_song_by_name(name):
    key = name[0]
    
    for grouped_song in song_lookup[key]:
        if name == grouped_song[name]:
            return grouped_song
            
    return False
    
result = search_song_by_name('Last Day Alive')
```
__________

The lookup table, with definition of **pointer** connecting specific **key** and **value**, is called `dictionary`.


            



##  1. In what situations would we use a `dictionary`. How is it structured, how do we add and delete items?

A dictionary holds relations (pointers) between `keys` and `values`. It’s useful for **counting frequencies** or storing information related to different objects in your code. Dictionaries are accessed in a similar way to other sequences, by using index notation `d[key]`. 

`values` are retrieved by indexing with the associated key. `values` are added by `key indexing` with assignment `d[key]=value`

`values` are deleted with the `.pop(key)` method, which takes as an argument the key we wish to delete. 

Working with data stored in dictionaries is easy using the `.keys()` and `.values()` methods, which return a collection of the keys and values respectively; and `.items()` which returns a collection of tuples representing each entry in the dictionary, in `(key, value)` format.

An empty dictionary is declared with a pair of braces `{}` or `dict()`.

In [4]:
' '.join(article)

"The Fulton County Grand Jury said Friday an investigation of Atlanta's recent primary election produced `` no evidence '' that any irregularities took place . The jury further said in term-end presentments that the City Executive Committee , which had over-all charge of the election , `` deserves the praise and thanks of the City of Atlanta '' for the manner in which the election was conducted . The September-October term jury had been charged by Fulton Superior Court Judge Durwood Pye to investigate reports of possible `` irregularities '' in the hard-fought primary which was won by Mayor-nominate Ivan Allen Jr. . `` Only a relative handful of such reports was received '' , the jury said , `` considering the widespread interest in the election , the number of voters and the size of this city '' . The jury said it did find that many of Georgia's registration and election laws `` are outmoded or inadequate and often ambiguous '' . It recommended that Fulton legislators act `` to have t

In [12]:
article

['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', ...]

In [13]:
# declaration
word_count = dict()
word_count = {}

In [14]:
# assign items -- i.e., create a pointer from key (indexing) to value (assignment)

for word in article:
    #print(word)
    if word not in word_count:  # if the word (key) does not exist in the dictionary
        word_count[word] = 1  # create a new key->occurrence record, the starting value is 1
    else:  # the word(key) already exist in the dictionary
        word_count[word] += 1  # access the occurrence(value) through key indexing, and increment by 1

In [7]:
len(word_count)  # size of the dictionary

156

In [8]:
# collection of keys
word_count.keys()

dict_keys(['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', 'Friday', 'an', 'investigation', 'of', "Atlanta's", 'recent', 'primary', 'election', 'produced', '``', 'no', 'evidence', "''", 'that', 'any', 'irregularities', 'took', 'place', '.', 'jury', 'further', 'in', 'term-end', 'presentments', 'the', 'City', 'Executive', 'Committee', ',', 'which', 'had', 'over-all', 'charge', 'deserves', 'praise', 'and', 'thanks', 'Atlanta', 'for', 'manner', 'was', 'conducted', 'September-October', 'term', 'been', 'charged', 'by', 'Superior', 'Court', 'Judge', 'Durwood', 'Pye', 'to', 'investigate', 'reports', 'possible', 'hard-fought', 'won', 'Mayor-nominate', 'Ivan', 'Allen', 'Jr.', 'Only', 'a', 'relative', 'handful', 'such', 'received', 'considering', 'widespread', 'interest', 'number', 'voters', 'size', 'this', 'city', 'it', 'did', 'find', 'many', "Georgia's", 'registration', 'laws', 'are', 'outmoded', 'or', 'inadequate', 'often', 'ambiguous', 'It', 'recommended', 'legislators', 'act', 'have', 't

In [9]:
# collection of values
word_count.values()

dict_values([6, 4, 2, 1, 1, 7, 1, 1, 1, 15, 1, 1, 2, 5, 1, 11, 1, 1, 11, 5, 1, 2, 1, 1, 10, 7, 1, 5, 1, 1, 19, 3, 1, 1, 9, 5, 2, 1, 1, 1, 1, 9, 1, 2, 1, 1, 3, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 6, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 3, 3, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1])

In [10]:
# get items , i.e., full map between key -> value
word_count.items()

dict_items([('The', 6), ('Fulton', 4), ('County', 2), ('Grand', 1), ('Jury', 1), ('said', 7), ('Friday', 1), ('an', 1), ('investigation', 1), ('of', 15), ("Atlanta's", 1), ('recent', 1), ('primary', 2), ('election', 5), ('produced', 1), ('``', 11), ('no', 1), ('evidence', 1), ("''", 11), ('that', 5), ('any', 1), ('irregularities', 2), ('took', 1), ('place', 1), ('.', 10), ('jury', 7), ('further', 1), ('in', 5), ('term-end', 1), ('presentments', 1), ('the', 19), ('City', 3), ('Executive', 1), ('Committee', 1), (',', 9), ('which', 5), ('had', 2), ('over-all', 1), ('charge', 1), ('deserves', 1), ('praise', 1), ('and', 9), ('thanks', 1), ('Atlanta', 2), ('for', 1), ('manner', 1), ('was', 3), ('conducted', 1), ('September-October', 1), ('term', 1), ('been', 1), ('charged', 1), ('by', 2), ('Superior', 1), ('Court', 1), ('Judge', 1), ('Durwood', 1), ('Pye', 1), ('to', 6), ('investigate', 1), ('reports', 2), ('possible', 1), ('hard-fought', 1), ('won', 1), ('Mayor-nominate', 1), ('Ivan', 1), (

In [16]:
l1 = [1,2,3]
l1.pop(0)
print(l1)

[2, 3]


In [19]:
word_count['Fulton']

4

In [17]:
# delete item -- the frequency of occurrence of word `The`
word_count.pop('The')

6

In [18]:
# check the items again, you will notice 'The' and its associated value has been removed
word_count.items()

dict_items([('Fulton', 4), ('County', 2), ('Grand', 1), ('Jury', 1), ('said', 7), ('Friday', 1), ('an', 1), ('investigation', 1), ('of', 15), ("Atlanta's", 1), ('recent', 1), ('primary', 2), ('election', 5), ('produced', 1), ('``', 11), ('no', 1), ('evidence', 1), ("''", 11), ('that', 5), ('any', 1), ('irregularities', 2), ('took', 1), ('place', 1), ('.', 10), ('jury', 7), ('further', 1), ('in', 5), ('term-end', 1), ('presentments', 1), ('the', 19), ('City', 3), ('Executive', 1), ('Committee', 1), (',', 9), ('which', 5), ('had', 2), ('over-all', 1), ('charge', 1), ('deserves', 1), ('praise', 1), ('and', 9), ('thanks', 1), ('Atlanta', 2), ('for', 1), ('manner', 1), ('was', 3), ('conducted', 1), ('September-October', 1), ('term', 1), ('been', 1), ('charged', 1), ('by', 2), ('Superior', 1), ('Court', 1), ('Judge', 1), ('Durwood', 1), ('Pye', 1), ('to', 6), ('investigate', 1), ('reports', 2), ('possible', 1), ('hard-fought', 1), ('won', 1), ('Mayor-nominate', 1), ('Ivan', 1), ('Allen', 1),

## 2. What is the difference between using the `.pop()` method on a dictionary and using it on a list?

- On a *list*: `.pop()` called without an index argument removes the last item in the list. Called with an index `.pop(index)` deletes the item at that index in the list. Both times it will return the object it has deleted from the list.

- On a *dictionary*: `.pop(key)` deletes the (key: value) pair associated with that key in the dictionary, returning the value it has removed. Without an argument, `.pop()` will **NOT** work because unlike lists, dictionaries do not have an ordering of entries. Therefore, .pop() needs a key to know which value to delete.

In [8]:
# TODO
demo_list = ['b','c','a','a']
demo_dict = {'b':1,'c':2,'a':3}

In [9]:
demo_dict.pop('b')
demo_dict

{'c': 2, 'a': 3}

## 3. In what situations would we use a `set`? How does it differ from other “containers” such as lists and dictionaries?

A set stores a collection of **unique** objects. Perhaps most naturally, we may use sets to store a mathematical set of numbers, but we may also store a mixture of any other unique objects. Sets are useful when we want to **remove duplicates from some other sequence**, or combine sets with set operations.
Sets are somewhat like a dictionary without a value for each key: in both cases each entry is unique and there is **no concept of an ordering**. A list has an order and may have duplicates: both of these attributes are lost when converting to a set.


**No concept of ordering means** u cannot access items using position indexing or slicing in a `set`.

In [25]:
demo_list = ['b','c','a','a']
demo_set = set(demo_list)
print(demo_list, demo_set)

['b', 'c', 'a', 'a'] {'a', 'c', 'b'}


In [19]:
demo_list[0]

'b'

In [20]:
#TODO
demo_set[0]

TypeError: 'set' object is not subscriptable

In [26]:
demo_set.add('a')

## 4. What special operations can we perform on sets? How do we add and remove items from them?

The three main operations are 
- union: `s1 | s2` or `s1.union(s2)`; 
- intersection: `s1 & s2` or `s1.intersection(s2)`;
- difference: `s1 - s2` or `s1.difference(s2)`. Pay attention to the Minuend and Subtraction

![](uH6cL.png)

In [27]:
s1 = {1,2,3}
s2 = {2,3,4}

(s1 - s2) | (s2 - s1)

{1, 4}

In [29]:
s2 - s1

{4}

- Adding an item is possible with the `.add(item)` method and removing an item is done with the `.remove(item)` method.

- Note that since a pair of empty braces `{}` denotes an empty dictionary, we use `set()` in order to create an empty set.

In [28]:
s1,s2,int_3 = set(),set(), 5 # declare two empty sets


for i in [1,2,3,4]:
    s1.add(i)       # add item in set
    
    
for i in [3,4,5,6,7]:
    s2.add(i)
    
print(s1,s2)
s2.remove(7)        # remove 7 from s2
print("After removing 7: ",s2)


print("union of s1 and s2: ",s1|s2)
print("intersection of s1 and s2: ",s1&s2)
print("difference of s1 towards s2: ",s1-s2)
print("difference of s2 towards s1: ",s2-s1)

{1, 2, 3, 4} {3, 4, 5, 6, 7}
After removing 7:  {3, 4, 5, 6}
union of s1 and s2:  {1, 2, 3, 4, 5, 6}
intersection of s1 and s2:  {3, 4}
difference of s1 towards s2:  {1, 2}
difference of s2 towards s1:  {5, 6}


## Exercise 3

![](e01.png)

____

![](e02.png)


## 5. What is `None`? How is it used?

`None` is a special value in Python, notable for being what’s passed as the return value of a function when no return value is specified. It can therefore be used to represent the absence of a result, perhaps as a somewhat third option to a True/False boolean result. `None` is the value you will find if you assign the “output” from many mutating methods such as `.append()` which do not return anything. `None` is its own type, so no value of any other type has equality with it.

In [29]:
var = None

if var:
    print('if executed')
else:
    print('else executed')

else executed


In [32]:
l1 = [1,2, 3]
var = l1.append(4)
print(var)

None


## 6. What is the difference between `sorted()` and `.sort()` when applied to a list? What does it mean to edit an object “in-place”?

### A perfect consolidation of comparing mutation/immutation operation

Say we’re talking about a list `my_list`. Both `sorted(my_list)` and `my_list.sort()` will sort `my_list`. 

`sorted(my_list)` will **return a new list** which contains the items of my_list in sorted order. my_list is left unchanged by this function. 

`my_list.sort()` on the other hand, will **mutate `my_list`**, changing the order of its items to sort it. Nothing is returned from this method (None if you try to assign its output) because it does its work directly on the list. The original order of items in my_list is overwritten.

Editing an object **in-place** means **mutating** it: editing it directly without creating a copy or returning a new obejct. It can be dangerous if you’re not sure you want your data to be changed so be careful!

In [32]:
# randomly generate a list of integers
import random
nums = [random.randint(0, 20) for _ in range(10)]
print(nums)

[1, 14, 9, 17, 10, 4, 10, 18, 17, 5]


In [33]:
sorted_nums = sorted(nums)
print(sorted_nums)
print(nums)

[1, 4, 5, 9, 10, 10, 14, 17, 17, 18]
[1, 14, 9, 17, 10, 4, 10, 18, 17, 5]


In [34]:
dot_sort_nums = nums.sort()
print(dot_sort_nums)
print(nums)


None
[1, 4, 5, 9, 10, 10, 14, 17, 17, 18]


## Problems

![](p01.png)

In [35]:
def freq_counts(arg):
    # finish this block
    if not arg:  # corner
        return {}
    
    freq = dict() # {}
    for char in arg:
        if char not in freq.keys():
            freq[char] = 1
        else:
            freq[char] += 1

    return freq
    
    # finish block above

freq_counts('booboo')
    

{'b': 2, 'o': 4}

![](p02.png)

In [36]:
def in_common(list1, list2):
    # finish this block
    
    s1, s2 = set(list1), set(list2)
    
    return list(s1 & s2)
    
    
    # finish block above

result = in_common([1,2,4], [3,4,5])
print(result)

[4]


![](p03.png)

In [1]:
def unique_values(d):
    # finish this block
    
    values = d.values()
    set_values = set(values)
    sorted_values = sorted(set_values)
    return sorted_values
    
    # finish block above
    

result = unique_values({'a': 1, 'b': 0, 'c': 0})
print(result)
    

[0, 1]


![](p04.png)

In [40]:
def above_thresh(text, char, threshold):
    # finish this block
    if char not in text:
        return None
    
    freq = freq_counts(text)
    
    
    if freq[char] > threshold:
        return True
    else:
        return False
    

    # finish block above
    
result = above_thresh('I like the letter e', 'e', 3)
print(result)

True
