## Intro to Python programming: 
## Data structures basic - lists and sets

Data structures are a collection of multiple data values in a format that is convenient for manipulation and management. We will discuss some fundamental formats like lists and sets in this lesson. 

### 1. Lists
Lists are used to store multiple items within a single variable. Their values can be updated (mutable property).

#### 1.1 Lists assignment
A list is defined by using various type of assignments. It can be created both with values and without values.

In [1]:
# typical method to define a list with values
list_num = [1, 2, 3, 4, 5]
list_str = ['a', 'b', 'c' ,'d' 'e']
list_bool = [True, True, False, False, True]
list_mixed = [1, 2, 'x', 'y', True]

In [2]:
# create an empty list
list_empty = []

Let's use type() function to verify if the created lists are really a list.

In [3]:
print(type(list_num))
print(type(list_empty))

<class 'list'>
<class 'list'>


#### 1.2 Indexing list
We can access through each element of the list by indexing. The index always starts at 0 and it indicates the first element of the list.

In [4]:
# let's create a new list for evaluation
hundreds = [100, 200, 300, 400, 500, 600, 700, 800, 900]

In [5]:
hundreds[0] # the first element

100

In [6]:
hundreds[1] # the second element 

200

In [7]:
hundreds[4] # the 5th element

500

In [8]:
hundreds[-1] # the last element of the list

900

Likewise, we can get the index of an element in the list from its value. See example below.

In [9]:
hundreds.index(700)

6

Likewise, we can slice to get to only some part of the list

In [10]:
hundreds[0:3] # display the 1st three elements

[100, 200, 300]

In [11]:
hundreds[:3] # display the 1st three elements also

[100, 200, 300]

In [12]:
hundreds[3:] # display everything from the 4th element onwards

[400, 500, 600, 700, 800, 900]

In [13]:
hundreds[3:7] # display everything from the 4th to the 7th element

[400, 500, 600, 700]

In [14]:
hundreds[3:7:2] # display everything from the 4th to the 7th element, with a stride of two

[400, 600]

In [15]:
hundreds[10::-1] # display everything in a reversed order

[900, 800, 700, 600, 500, 400, 300, 200, 100]

In [16]:
hundreds[10::-2] # display everything in a reversed order, stride rate of two

[900, 700, 500, 300, 100]

Nice to know: the sliced part of list can be stored as a new variable.

In [17]:
hundreds_sl = hundreds[3:7]
print(type(hundreds_sl))
print(hundreds_sl)

<class 'list'>
[400, 500, 600, 700]


#### 1.3 Nested list
List formation is super versatile. You may also create a nested list and access to values by its index

In [18]:
list_nested = ['ผัดกระเพรา', 'หมูสับ', 'ไข่ดาว', 
               [['พริก', 'กระเทียม'],
                ['น้ำปลา', 'น้ำตาล', 'อร่อยชัวร์หมู', 'น้ำมันหอย', 'น้ำมันพืช']]] 

In [19]:
list_nested[0] # first element

'ผัดกระเพรา'

In [20]:
list_nested[-1] # last element

[['พริก', 'กระเทียม'],
 ['น้ำปลา', 'น้ำตาล', 'อร่อยชัวร์หมู', 'น้ำมันหอย', 'น้ำมันพืช']]

In [21]:
list_nested.index('หมูสับ') # where is หมูสับ

1

In [22]:
list_nested[3][0] # first element of the inner list

['พริก', 'กระเทียม']

In [23]:
list_nested[3][0][0] # first element of the 2nd inner list

'พริก'

Quick exercise: Walk through the below nested list named ASIA, it comprises of continents and countries. 
This list has a special characteristic
- First element of the sublist is the name of region
- The rest are names of country/state

In [None]:
ASIA = [['SE Asia', 'Thailand', 'Malaysia', 'Singapore', 'Cambodia'],
        ['S Asia', 'India', 'Pakistan', 'Sri Lanka'],
        ['E Asia', 'S.Korea', 'N.Korea', 'Japan'],
        ['China', 'Taiwan', 'China Mainland', 'Hong kong']]

1. What is the 2nd element of the list?

In [None]:
# answer


2. What is the 2nd element of the list? Can you specify the region name?

In [None]:
# answer


3. Go to South Asia, extract the index of 'Pakistan'

In [None]:
# answer


4. Slice the list to get South East Asia elements only. Then select only country names and reorder them into a reversed order. <br> <br>
You should get something like: <br><br>
['Cambodia', 'Singapore', 'Malaysia', 'Thailand']

In [None]:
# answer


#### 1.4 List operations
Let's consider the below list for demonstration

In [24]:
pasta = ['Macrorani', 'Penne', 'Ravioli', 'Spagthetti', 'Fusilli', 'Linguine']

How many elements are in the list?

In [25]:
len(pasta)

6

Replace element

In [26]:
pasta[0] = 'Macaroni'
pasta

['Macaroni', 'Penne', 'Ravioli', 'Spagthetti', 'Fusilli', 'Linguine']

Insert new element


In [27]:
pasta.insert(1,'Lasagna')
pasta

['Macaroni',
 'Lasagna',
 'Penne',
 'Ravioli',
 'Spagthetti',
 'Fusilli',
 'Linguine']

Copy

In [28]:
pasta_cpy = pasta.copy()

Combine two lists together

In [29]:
sauce = ['Pesto' , 'Bolognese', 'Cabonara', 'Arrabiata']
pasta_cpy.extend(sauce)
pasta_cpy

['Macaroni',
 'Lasagna',
 'Penne',
 'Ravioli',
 'Spagthetti',
 'Fusilli',
 'Linguine',
 'Pesto',
 'Bolognese',
 'Cabonara',
 'Arrabiata']

In [30]:
# alternatively
pasta_cpy = pasta.copy()
pasta_cpy = pasta_cpy+sauce
pasta_cpy 

['Macaroni',
 'Lasagna',
 'Penne',
 'Ravioli',
 'Spagthetti',
 'Fusilli',
 'Linguine',
 'Pesto',
 'Bolognese',
 'Cabonara',
 'Arrabiata']

In [31]:
# but you can't do this! 
pasta_cpy-sauce

TypeError: unsupported operand type(s) for -: 'list' and 'list'

Append an element to the current list

In [32]:
pasta.append('Angel hair')
pasta

['Macaroni',
 'Lasagna',
 'Penne',
 'Ravioli',
 'Spagthetti',
 'Fusilli',
 'Linguine',
 'Angel hair']

Remove element by index

In [33]:
pasta.pop(3)
pasta

['Macaroni',
 'Lasagna',
 'Penne',
 'Spagthetti',
 'Fusilli',
 'Linguine',
 'Angel hair']

In [34]:
print(pasta.pop(len(pasta)-1)) # check what is gone
pasta

Angel hair


['Macaroni', 'Lasagna', 'Penne', 'Spagthetti', 'Fusilli', 'Linguine']

Delete element by element name or index

In [35]:
# index - use del
del pasta[1]
pasta

['Macaroni', 'Penne', 'Spagthetti', 'Fusilli', 'Linguine']

In [36]:
# element name - use remove
pasta.remove('Penne')
pasta

['Macaroni', 'Spagthetti', 'Fusilli', 'Linguine']

Sort a list

In [37]:
pasta.sort()
pasta

['Fusilli', 'Linguine', 'Macaroni', 'Spagthetti']

In [38]:
pasta.sort(reverse=True)
pasta

['Spagthetti', 'Macaroni', 'Linguine', 'Fusilli']

Clear all elements in the list: make an empty list

In [39]:
pasta.clear()
pasta

[]

Quick exercise: Working with a numeric list. <br> 
Using num_list for the following tasks

In [40]:
num_list = [10, 11, 12, 13, 14, 21, 25, 99, 100, 1, 2]

1. Sort this list from the smallest to largest value

In [None]:
# Answer


2. Add value of 1.77 to the list

In [None]:
# Answer


3. Insert new value of 3.14 as a third element of the list

In [None]:
# Answer


4. Remove the element that its value is equal to 11 from the list

In [None]:
# Answer


5. Sort the list again from the largest to smallest value

In [None]:
# Answer


If you do everthing correctly, you list should look like this <br>
[100, 99, 25, 21, 14, 13, 12, 10, 3.14, 2, 1.77, 1]

### 2. Sets
Similar to lists, sets are used to store multiple values into a single variable. However, a set cannot be changed or reordered. A set also doesn't accept the repeated value

#### 2.1 Defining a set

In [41]:
set_A = {'A', 'B' ,'C' ,'F' ,'G', 'X', 'Y'}
set_B = {'B', 'C', 'D', 'F'}

In [42]:
type(set_A), type(set_B)

(set, set)

#### 2.2 Set operations
Becuase set is an immutable data structure, so we cannot do anything much in indexing/slicing through this kind of data structure. Fortunately, we still have some luck on working with sets. 

In [43]:
set_A[0] # We can't do this

TypeError: 'set' object is not subscriptable

Get the number of items in the set

In [44]:
len(set_A)

7

Add/Remove item from a set

In [45]:
# Add
set_B.add('Z')
set_B

{'B', 'C', 'D', 'F', 'Z'}

In [46]:
# Remove
set_A.remove('Y')
set_A

{'A', 'B', 'C', 'F', 'G', 'X'}

In [47]:
len(set_A)

6

Copy a set

In [48]:
set_A_cpy = set_A.copy()
set_B_cpy = set_B.copy()
print(set_A_cpy)
print(set_B_cpy)

{'X', 'B', 'A', 'F', 'G', 'C'}
{'Z', 'B', 'D', 'F', 'C'}


Union

In [49]:
set_merged = set_A_cpy.union(set_B_cpy)
set_merged

{'A', 'B', 'C', 'D', 'F', 'G', 'X', 'Z'}

Intersection: find common items

In [50]:
set_intersec = set_A_cpy.intersection(set_B_cpy)
set_intersec

{'B', 'C', 'F'}

Find the difference between two sets

In [51]:
set_diff = set_A_cpy.difference(set_B_cpy)
set_diff

{'A', 'G', 'X'}

#### 2.3 Converting list into set
There is convenience way to do this. The only limitation is that a set will not keep duplicated values of a list (if any).

In [52]:
product_orders = ['Computer', 'Computer', 'ipad' ,'Phone', 'Phone', 'ipad']
product_avail = ['Computer', 'Phone']

To convert, we can simply cast a list into a set like this :-

In [53]:
orders_set = set(product_orders)
avail_set = set(product_avail)

In [54]:
print(orders_set)
print(avail_set)

{'Phone', 'Computer', 'ipad'}
{'Phone', 'Computer'}


Then, if we want to know which product is not available at this moment, we could find the difference between two sets.

In [55]:
not_avail = orders_set.difference(avail_set)
not_avail

{'ipad'}

#### 2.4 Converting set in to list
Similar manner can be done. We've already realized that working with set has so many limitations. For example, we cannot sort the items if it is stored in the set.

In [56]:
numbers = {1, 6566, 3424, 546, 453}

In [57]:
# convert to list
numbers_list = list(numbers)
numbers_list

[3424, 1, 546, 453, 6566]

In [58]:
numbers_list.sort()
numbers_list

[1, 453, 546, 3424, 6566]