# Data structure in Python: Set and Dictionary

**Welcome!** This notebook will teach you about set and dictionary data structures in Python. By the end of this notebook, you'll know how to create them, and how to add / remove elements. In addition, you will learn examples about how they can be useful when solving specific problems.

<hr>

## Set

In Python, a set is a unique collection of elements. A set is normally unordered and thus does not support indexing.


To create a set, put all elements between a pair of curly bracket <code>{}</code>. Python will automatically remove duplicate items:

In [None]:
# create a set manually

a = {1, 1, 2, 3, 3, 4, 5}
type(a)

You can also use the <code>set()</code> command to create a set from a list-liked object:

In [None]:
# create a set of unique characters from a string

set("hello, python!")

### Adding / Removing elements

To add an element to a set, or to remove an element from a set, we can use <code><i>set</i>.add()</code> and <code><i>set</i>.remove()</code> functions. For example:

In [None]:
# create a set of unique characters from a string
a = set("hello")

# add a character to the set
a.add('x')

a

In [None]:
'x' in a

In [None]:
# remove a character from the set
a.remove('h')

a

### Size of set

Set has the concept of size, and in Python we can use the <code>len()</code> function to obtain it.

In [None]:
len(set("hello, python!"))

However, set has no order and cannot use indexing

In [None]:
a = set("hello, python!")

a[0]

To visit each element of a set, we can either
- make it a list using <code>list()</code> function, and then use indexing, or
- use _for_ loop

In [None]:
a = set("hello, python!")
b = list(a)

b[0]

In [None]:
a = set("hello, python!")

for i in a:
    print(i)

The meaning of **has no order** is not that there is literary no order, but rather that the actual order of the element depends on the internal algorithm and cannot be easily controlled by us. In this case, we may expect arbitrary order when computing a set from a list.

### Set operations

Sets are useful tools when we want to find elements that satisfy certain criteria. For example, the sharing elements of two sets.

#### Intersection

The sharing elements of two or multiple sets are called the _intersection_.

<img src="https://upload.wikimedia.org/wikipedia/commons/9/99/Venn0001.svg" width="200"/>

In python, we can compute the intersection of sets using <code>&</code> symbol, or using <code><i>set</i>.intersection()</code> function.

In [None]:
# we got two burger provider

mcdonalds = {"big mac", "french fries", "coca-cola"}

burger_king = {"whopper", "french fries", "coca-cola"}

In [None]:
# intersection is what the two provider both serve

mcdonalds & burger_king

In [None]:
# intersection is what the two provider both serve

mcdonalds.intersection(burger_king)

#### difference

The elements that belong to only one set, but not the other, is called the _difference_.

<img src="https://upload.wikimedia.org/wikipedia/commons/2/23/Relative_compliment.svg" width="200"/>

In python, we can compute the difference of sets using <code><i>set</i>.difference()</code> function.

In [None]:
# difference is what is served only by one provider

mcdonalds.difference(burger_king)

In [None]:
# difference is what is served only by one provider

burger_king.difference(mcdonalds)

#### union

A larger set, that contains all elements from all sets, is called the _union_.

<img src="https://upload.wikimedia.org/wikipedia/commons/3/30/Venn0111.svg" width="200"/>

In python, we can compute the difference of sets using <code><i>set</i>.union()</code> function.

In [None]:
# if the two provider become one, this is what they serve

burger_king.union(mcdonald_s)

### Example

See how McDonald's and Burger King talk about themselves

https://www.mcdonalds.com/us/en-us/about-us.html

https://www.bk.com/about-bk

In [None]:
mcdonalds = "Our story starts with one man. Back in 1954, a man named Ray Kroc discovered a small burger restaurant in California, and wrote the first page of our history. From humble beginnings as a small restaurant, we're proud to have become one of the world's leading food service brands with more than 36,000 restaurants in more than 100 countries.".lower()
mcdonalds

In [None]:
burger_king = "Great Food Comes First Every day, more than 11 million guests visit Burger King restaurants around the world. And they do so because our restaurants are known for serving high-quality, great-tasting, and affordable food. Founded in 1954, Burger King is the second largest fast food hamburger chain in the world. The original Home of the Whopper, our commitment to premium ingredients, signature recipes, and family-friendly dining experiences is what has defined our brand for more than 50 successful years.".lower()
burger_king

In [None]:
mcdonalds_words = set(mcdonalds.split(" "))
mcdonalds_words

In [None]:
burger_king_words = set(burger_king.split(" "))
burger_king_words

In [None]:
mcdonalds_words & burger_king_words

## Dictionary

Dictionary is a data type which we use when we want to associate some data with something else. For example, when we want to associate the name of restaurants with their founding years.

In Python, we can create a dictionary by putting all key-value pairs between a pair of curly bracket <code>{}</code>. A key-value pair is defined by putting two values on each side of a colon <code>:</code>.

In [None]:
founding_year = {"mcdonald's": 1954, "burger king": 1954, "wendy's": 1969, "popeyes": 1972, "chick-fil-a":1946}

type(founding_year)

In [None]:
type_of_food = {"mcdonald's": "burger", "burger king": "burger", "wendy's": "burger", "popeyes":"chicken", "chick-fil-a":"chicken"}

type(founding_year)

### Size of dictionary

Dictionary also has the concept of size, and in Python we can use the <code>len()</code> function to obtain it.

In [None]:
len(founding_year)

### Keys and values

Unlike list whose elements are accessed by indexes. For dictionary, we need to access _values_ from _keys_.

In [None]:
founding_year["mcdonald's"]

In [None]:
type_of_food["burger king"]

In [None]:
type_of_food[3]

To get all the keys we can either
- convert the dictionary in to a list or set, using the <code>list()</code> or <code>set()</code> function
- use _for_ loop

In [None]:
restaurants = list(founding_year)
restaurants

In [None]:
for name in type_of_food:
    print(name, "servers", type_of_food[name])

#### check if a key exist

We can use the <code>in</code> operator to check if a key exists in a dictionary:

In [None]:
"mcdonald's" in founding_year

In [None]:
"kfc" in founding_year

### Adding / updating key-value pairs

Adding a key-value pair to a dictionary is rather simple, it uses the same syntax as list indexing. For example:

In [None]:
# adding a key-value pair

founding_year["kfc"] = 1930

founding_year["kfc"]

In [None]:
# adding a key-value pair

type_of_food["kfc"] = "burger"

type_of_food["kfc"]

In [None]:
# updating a key-value pair

type_of_food["kfc"] = "chicken"

type_of_food["kfc"]

### Removing key-value pairs

To remove a key-value pair, we need to use the <code>del</code> keyword:

In [None]:
del founding_year["kfc"]
del type_of_food["kfc"]

In [None]:
founding_year["kfc"]

In [None]:
type_of_food["kfc"]

### Get all key-value pairs

We can use the <code><i>dict</i>.items()</code> to get all keu-value pairs as a list-liked data:

In [None]:
restaurants = founding_year.items()
restaurants

### Examples

This example demonstrate counting how many times each word appears in a text.

https://en.wikipedia.org/wiki/McDonald%27s

In [None]:
text = "McDonald's Corporation is an American multinational fast food chain, founded in 1940 as a restaurant operated by Richard and Maurice McDonald, in San Bernardino, California, United States. They rechristened their business as a hamburger stand, and later turned the company into a franchise, with the Golden Arches logo being introduced in 1953 at a location in Phoenix, Arizona. In 1955, Ray Kroc, a businessman, joined the company as a franchise agent and in 1961 bought out the McDonald brothers. Previously headquartered in Oak Brook, Illinois, it moved to nearby Chicago in June 2018. McDonald's is also a real estate company through its ownership of around 70% of restaurant buildings and 45% of the underlying land (which it leases to its franchisees). McDonald's is the world's largest fast food restaurant chain, serving over 69 million customers daily in over 100 countries in more than 40,000 outlets as of 2021. McDonald's is best known for its hamburgers, cheeseburgers and french fries, although their menu also includes other items like chicken, fish, fruit, and salads. Their best-selling licensed item are their french fries, followed by the Big Mac. The McDonald's Corporation revenues come from the rent, royalties, and fees paid by the franchisees, as well as sales in company-operated restaurants. McDonald's is the world's second-largest private employer with 1.7 million employees (behind Walmart with 2.3 million employees), the majority of whom work in the restaurant's franchises. As of 2022, McDonald's has the sixth-highest global brand valuation. McDonald's has been subject to criticism over the health effects of its products, its treatment of employees, and the gifting of free food by its Israeli franchises to the Israeli Defense Forces during the 2023 Israel–Hamas war, the latter of which triggered a social media-induced boycott."

# convert the text to lower case
text = text.lower()

# remove symbols
text = text.replace(",","")
text = text.replace(".","")
text = text.replace("(","")
text = text.replace(")","")

text

In [None]:
words = text.split(" ")
words

In [None]:
# create an empty dictionary
word_count = {}

for word in words:
    if word in word_count:
        # increase the counting result, if the word already in the dict
        word_count[word] = word_count[word] + 1
    else:
        # create a counting result of 1 for the word
        word_count[word] = 1

word_count

In [None]:
letters = list(text)
letters

In [None]:
# create an empty dictionary
letter_count = {}

for letter in letters:
    if letter in letter_count:
        # increase the counting result, if the letter already in the dict
        letter_count[letter] = letter_count[letter] + 1
    else:
        # create a counting result of 1 for the letter
        letter_count[letter] = 1

letter_count