# Collections and Tuples

As part of this module we will get an overview of collections and tuples that are part of the standard library of Python.

* Overview of Collections and Tuples
* Tuples
* Collections - list
* Collections - set
* Collections - dict
* List of Tuples
* Using Data Structures

## Overview of Collections and Tuples
Let us understand details about Collections and Tuples.
* A Collection is nothing but a group of homogeneous elements while Tuple is a group of heterogeneous elements.
* Collection is like a spreadsheet or a table while Tuple is like one row in them. We typically create a collection of objects or tuples.
* Standard library of Python covers 3 types of collections.
  * list
  * set
  * dict
* Depending upon the characteristics of each collection type, we have different functions. We will see those details later.
* There are some functions which are applicable to all.
  * Getting a number of elements in a collection or a tuple - len
  * Getting the sum of all elements in a collection or a tuple of integers - sum


## Tuples
Now let us understand definition and characteristics of a tuple.
* Tuple is like object with unnamed attributes
* Values of attributes can be accessed only using positional notation
* It represents individual row in a table or spreadsheet with multiple attributes
* We use () to represent tuples
* Tuples are immutable
* Very limited operations are available - e.g.: count, index

### Tasks
Let us perform few tasks related to tuples.

* Create 3 tuples with order_id, order_date, order_customer_id, order_status.

| order_id | order_date | order_customer_id | order_status |
| --- | --- | --- | --- |
| 1 | 2013-07-25 00:00:00.0 | 11599 | CLOSED |
| 2 | 2013-07-25 00:00:00.0 | 256 | PENDING_PAYMENT |
| 3 | 2013-07-25 00:00:00.0 | 12111 | COMPLETE |


In [4]:
order1 = (1, '2013-07-25 00:00:00.0', 11599, 'CLOSED')

In [5]:
order2 = (2, '2013-07-25 00:00:00.0', 256, 'PENDING_PAYMENT')

In [6]:
order3 = (3, '2013-07-25 00:00:00.0', 12111, 'COMPLETE')

In [7]:
type(order1)

tuple

In [None]:
help(order1)

In [8]:
order3

(3, '2013-07-25 00:00:00.0', 12111, 'COMPLETE')

In [9]:
order3.index?

[0;31mSignature:[0m [0morder3[0m[0;34m.[0m[0mindex[0m[0;34m([0m[0mvalue[0m[0;34m,[0m [0mstart[0m[0;34m=[0m[0;36m0[0m[0;34m,[0m [0mstop[0m[0;34m=[0m[0;36m9223372036854775807[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Return first index of value.

Raises ValueError if the value is not present.
[0;31mType:[0m      builtin_function_or_method


In [10]:
order3.index(3) #index of value 3 in order3

0

In [11]:
order1[1]

'2013-07-25 00:00:00.0'

In [16]:
order1.index('2013-07-25 00:00:00.0') #pass whole value to get it's index

1

In [16]:
order1.index('2013-07-25 00:00:00.0') #pass whole value to get it's index

1

In [21]:
order3.index(12111) #pass whole value to get it's index expected ValueError

2

In [22]:
order1.index(11599) #pass whole value to get it's index expected index

2

In [19]:
order2.index('PENDING') #pass whole value to get it's index expected ValueError

ValueError: tuple.index(x): x not in tuple

## Collections - list
Let us understand **list** in detail.
* Group of elements with index and length
* Elements can be added/inserted at a particular position
* We can access elements in list by using index in []
* There can be duplicates in a list
* APIs are available to add elements to the list, delete elements from the list and sort the list

### Tasks
Let us perform few tasks to understand more about list operations.

* Create list of employees. Make sure each item in the list is a tuple.

In [24]:
employees = [
    (1, 'Scott', 'Tiger', 1000.0, 'United States'),
    (2, "Henry", "Ford", 1250.0, "India"),
    (3, "Nick", "Junior", 750.0, "united KINGDOM"),
    (4, "Bill", "Gomes", 1500.0, "AUSTRALIA")
]

In [25]:
type(employees)

list

In [None]:
help(employees)

* Adding elements into list (append, insert)

In [27]:
employees.append?

[0;31mSignature:[0m [0memployees[0m[0;34m.[0m[0mappend[0m[0;34m([0m[0mobject[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Append object to the end of the list.
[0;31mType:[0m      builtin_function_or_method


In [28]:
employees.append((5, 'Donald', 'Duck', 1800.0, 'USA'))

In [29]:
employees

[(1, 'Scott', 'Tiger', 1000.0, 'United States'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA')]

In [16]:
employees.insert?

[0;31mSignature:[0m [0memployees[0m[0;34m.[0m[0minsert[0m[0;34m([0m[0mindex[0m[0;34m,[0m [0mobject[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Insert object before index.
[0;31mType:[0m      builtin_function_or_method


In [31]:
employees.insert(3, (6, 'Mickey', 'Mouse', 2000.0, 'Disney Land'))

In [32]:
employees

[(1, 'Scott', 'Tiger', 1000.0, 'United States'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (6, 'Mickey', 'Mouse', 2000.0, 'Disney Land'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA')]

In [33]:
employees[3]

(6, 'Mickey', 'Mouse', 2000.0, 'Disney Land')

* Deleting elements from list (pop, clear)

In [34]:
employees.pop?

[0;31mSignature:[0m [0memployees[0m[0;34m.[0m[0mpop[0m[0;34m([0m[0mindex[0m[0;34m=[0m[0;34m-[0m[0;36m1[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Remove and return item at index (default last).

Raises IndexError if list is empty or index is out of range.
[0;31mType:[0m      builtin_function_or_method


In [35]:
employees.pop()

(5, 'Donald', 'Duck', 1800.0, 'USA')

In [36]:
employees

[(1, 'Scott', 'Tiger', 1000.0, 'United States'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (6, 'Mickey', 'Mouse', 2000.0, 'Disney Land'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA')]

In [37]:
employees.pop(3)

(6, 'Mickey', 'Mouse', 2000.0, 'Disney Land')

In [38]:
employees.clear?

[0;31mSignature:[0m [0memployees[0m[0;34m.[0m[0mclear[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Remove all items from list.
[0;31mType:[0m      builtin_function_or_method


* Checking how many times an element is repeated in list (count)

In [39]:
employees

[(1, 'Scott', 'Tiger', 1000.0, 'United States'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA')]

In [59]:
l1 = [1, 'Hello']

In [62]:
type(l1)

list

In [60]:
type(l1[1])

str

In [61]:
type(l1[0])

int

In [41]:
l = [1, 2, 4, 4, 5, 7, 3, 4, 2, 1]

In [42]:
type(l[0])

int

In [43]:
l.count?

[0;31mSignature:[0m [0ml[0m[0;34m.[0m[0mcount[0m[0;34m([0m[0mvalue[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Return number of occurrences of value.
[0;31mType:[0m      builtin_function_or_method


In [44]:
l.count(4)

3

In [45]:
s = '1244573421'

In [46]:
s.count('4')

3

* Get the position of element (index)

In [47]:
employees

[(1, 'Scott', 'Tiger', 1000.0, 'United States'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA')]

In [48]:
employees.index((2, 'Henry', 'Ford', 1250.0, 'India'))

1

In [49]:
l = [1, 2, 4, 4, 5, 7, 3, 4, 2, 1]

In [50]:
l.index?

[0;31mSignature:[0m [0ml[0m[0;34m.[0m[0mindex[0m[0;34m([0m[0mvalue[0m[0;34m,[0m [0mstart[0m[0;34m=[0m[0;36m0[0m[0;34m,[0m [0mstop[0m[0;34m=[0m[0;36m9223372036854775807[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Return first index of value.

Raises ValueError if the value is not present.
[0;31mType:[0m      builtin_function_or_method


In [51]:
l.index(4)

2

In [52]:
l.index(4, 3)

3

In [53]:
l.index(4, 6)

7

In [54]:
l.index(4, 8)

ValueError: 4 is not in list

In [56]:
l.index(4, 5, 7) #may be upper bound of index is not included when creating a sublist to search the position of the first element from

ValueError: 4 is not in list

In [58]:
l.index(4, 5, 8)

7

In [None]:
help(l)

* Accessing elements in list using index and range of index (from the beginning). As `str` is nothing but list of characters, these worked for strings in the past

In [53]:
l = [1, 2, 4, 4, 5, 7, 3, 4, 2, 1]

In [63]:
l[0:3]#gives a sublist of elements based on index from beginning till end

[1, 2, 4]

In [64]:
l[:3]#same as above, but 0 can be omitted as 0 is only the beginning 

[1, 2, 4]

In [67]:
l[3:6]#starting from index 3 gives elements till indexes (upperbound -1) #same as l.index(4, 5, 7)

[4, 5, 7]

In [68]:
l[:6]#omitting lower bound index, by default the starting index is taken as 0

[1, 2, 4, 4, 5, 7]

In [70]:
l[3:]#omitting upper bound index, by default prints till last index

[4, 5, 7, 3, 4, 2, 1]

In [2]:
employees = [(1, "Scott", "Tiger", 1000.0, "united states"),
             (2, "Henry", "Ford", 1250.0, "India"),
             (3, "Nick", "Junior", 750.0, "united KINGDOM"),
             (4, "Bill", "Gomes", 1500.0, "AUSTRALIA")
            ]
employees.append((5, "Donald", "Duck", 1800.0, "USA"))
employees.insert(3, (6, "Mickey", "Mouse", 2000.0, "Disney Land"))

employees

[(1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (6, 'Mickey', 'Mouse', 2000.0, 'Disney Land'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA')]

In [72]:
employees[0]

(1, 'Scott', 'Tiger', 1000.0, 'united states')

In [73]:
employees[5]

(5, 'Donald', 'Duck', 1800.0, 'USA')

In [74]:
employees[1:2]

[(2, 'Henry', 'Ford', 1250.0, 'India')]

In [75]:
employees[:3]

[(1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM')]

In [76]:
employees[-3:]

[(6, 'Mickey', 'Mouse', 2000.0, 'Disney Land'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA')]

In [77]:
employees[3:6]

[(6, 'Mickey', 'Mouse', 2000.0, 'Disney Land'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA')]

* Accessing elements in list using index and range of index (from the end).

In [79]:
l = [1, 2, 4, 4, 5, 7, 3, 4, 2, 1]

In [80]:
len(l)

10

In [81]:
l[-3:]

[4, 2, 1]

In [84]:
l[5: 8]

[7, 3, 4]

In [88]:
l[-5:-2] #get from the end

[7, 3, 4]

* Sorting elements in the list (sort for in place sort and sorted for sorting and creating new collection)

In [89]:
l = [1, 2, 4, 4, 5, 7, 3, 4, 2, 1]

In [90]:
l

[1, 2, 4, 4, 5, 7, 3, 4, 2, 1]

In [91]:
l.sort?

[0;31mSignature:[0m [0ml[0m[0;34m.[0m[0msort[0m[0;34m([0m[0;34m*[0m[0;34m,[0m [0mkey[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mreverse[0m[0;34m=[0m[0;32mFalse[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Stable sort *IN PLACE*.
[0;31mType:[0m      builtin_function_or_method


In [95]:
l.sort()#*IN PLACE* list will be updated with sorted data

In [96]:
l

[1, 1, 2, 2, 3, 4, 4, 4, 5, 7]

In [92]:
l.sort(reverse=True)

In [90]:
l

[7, 5, 4, 4, 4, 3, 2, 2, 1, 1]

In [97]:
sorted?

[0;31mSignature:[0m [0msorted[0m[0;34m([0m[0miterable[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0;34m,[0m [0mkey[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mreverse[0m[0;34m=[0m[0;32mFalse[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Return a new list containing all items from the iterable in ascending order.

A custom key function can be supplied to customize the sort order, and the
reverse flag can be set to request the result in descending order.
[0;31mType:[0m      builtin_function_or_method


In [98]:
l = [1, 2, 4, 4, 5, 7, 3, 4, 2, 1]

In [99]:
l

[1, 2, 4, 4, 5, 7, 3, 4, 2, 1]

In [102]:
sorted(l) #returns a new collection(implicit collection) It is not a inplace sorting, hence not updating the list

[1, 1, 2, 2, 3, 4, 4, 4, 5, 7]

In [103]:
l

[1, 2, 4, 4, 5, 7, 3, 4, 2, 1]

In [104]:
sorted(l, reverse=True) #returns a new collection(implicit collection) It is not a inplace sorting, hence not updating the list

[7, 5, 4, 4, 4, 3, 2, 2, 1, 1]

In [105]:
l

[1, 2, 4, 4, 5, 7, 3, 4, 2, 1]

In [106]:
set(l)

{1, 2, 3, 4, 5, 7}

In [107]:
sorted(employees)#sorts by first element in each of the tuple

[(1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA'),
 (6, 'Mickey', 'Mouse', 2000.0, 'Disney Land')]

In [108]:
employees

[(1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (6, 'Mickey', 'Mouse', 2000.0, 'Disney Land'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA')]

In [111]:
sorted(employees,key=lambda t: t[3]) #sort based on the value at index 3, that is, salary

[(3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA'),
 (6, 'Mickey', 'Mouse', 2000.0, 'Disney Land')]

In [112]:
sorted(employees,key=lambda t: t[4]) #sort based on the value at index 4, that is, country

[(4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (6, 'Mickey', 'Mouse', 2000.0, 'Disney Land'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (5, 'Donald', 'Duck', 1800.0, 'USA'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (1, 'Scott', 'Tiger', 1000.0, 'united states')]

In [113]:
t = (1, 'Scott', 'Tiger', 1000.0, 'united states') #tuple being assigned to t

In [114]:
t[3]#t of 3 returns salary

1000.0

In [4]:
sorted(employees, key=lambda t: t[3], reverse=True)#sorting in descending order by salary

[(6, 'Mickey', 'Mouse', 2000.0, 'Disney Land'),
 (5, 'Donald', 'Duck', 1800.0, 'USA'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM')]

In [5]:
employees

[(4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (6, 'Mickey', 'Mouse', 2000.0, 'Disney Land'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (5, 'Donald', 'Duck', 1800.0, 'USA'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (1, 'Scott', 'Tiger', 1000.0, 'united states')]

In [6]:
employees.sort(key=lambda t :t[4])#updated the original list employees

In [7]:
employees

[(4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (6, 'Mickey', 'Mouse', 2000.0, 'Disney Land'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (5, 'Donald', 'Duck', 1800.0, 'USA'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (1, 'Scott', 'Tiger', 1000.0, 'united states')]

In [124]:
employees[1][2]#2nd value from tuple of 2nd element from list

'Mouse'

In [127]:
employees[1][3]

2000.0

## Collections - set

Let us understand **set** in detail.
* Group of unique elements with no index or length
* Elements can be added/inserted but not at a particular position
* We can check whether the element exists using **in operator**
* There can be no duplicates in a set
* APIs are available to add elements to the set, delete elements from the set and perform set operations such as union, intersection etc
* We need to convert set to list to sort the data or use sorted function. There is no API available in set to sort it.

In [137]:
s = {1, 2, 2, 1, 2, 2,3,3,3} #mathematical set containing only 1, 2 and 3 as the elements no matter repeated. Hence we can't use indexes to access the element.

In [138]:
s

{1, 2, 3}

In [116]:
s[0]

TypeError: 'set' object is not subscriptable

### Exercises

We will see some basic set operations by using simple examples
* Create a set of 3 employees with ids 1, 2 and 3 using elements from **employees** list.

In [142]:
#to create an empty set do not use {} as these are default notation to create a dict

employees_set = {}
type(employees_set)

dict

In [141]:
#to create an empty set do not use {} as these are default notation to create a dict
#instead use a set constructor as below -

employees_set = set()
type(employees_set)

set

In [166]:
employees_set = {(1, "Scott", "Tiger", 1000.0, "united states"),
                 (2, "Henry", "Ford", 1250.0, "India"),
                 (3, "Nick", "Junior", 750.0, "united KINGDOM")
                }

In [167]:
type(employees_set)

set

In [168]:
employees_set?

[0;31mType:[0m        set
[0;31mString form:[0m {(1, 'Scott', 'Tiger', 1000.0, 'united states'), (2, 'Henry', 'Ford', 1250.0, 'India'), (3, 'Nick', 'Junior', 750.0, 'united KINGDOM')}
[0;31mLength:[0m      3
[0;31mDocstring:[0m  
set() -> new empty set object
set(iterable) -> new set object

Build an unordered collection of unique elements.


* Adding elements into set (add) - Add employees with ids 4, 5.

In [124]:
employees_set.add?

[0;31mDocstring:[0m
Add an element to a set.

This has no effect if the element is already present.
[0;31mType:[0m      builtin_function_or_method


In [169]:
employees_set.add((4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'))

In [170]:
employees_set.add((5, 'Donald', 'Duck', 1800.0, 'USA'))

In [186]:
employees_set

{(2, 'Henry', 'Ford', 1250.0, 'India'), (5, 'Donald', 'Duck', 1800.0, 'USA')}

* Deleting elements from set (pop/remove, clear)

In [177]:
help(set.pop)

Help on method_descriptor:

pop(...)
    Remove and return an arbitrary set element.
    Raises KeyError if the set is empty.



In [153]:
employees_set.clear?

[0;31mDocstring:[0m Remove all elements from this set.
[0;31mType:[0m      builtin_function_or_method


In [164]:
employees_set.clear

<function set.clear>

In [185]:
employees_set.pop()

(1, 'Scott', 'Tiger', 1000.0, 'united states')

In [178]:
employees_set.remove?

[0;31mDocstring:[0m
Remove an element from a set; it must be a member.

If the element is not a member, raise a KeyError.
[0;31mType:[0m      builtin_function_or_method


In [183]:
help(employees_set.remove)

Help on built-in function remove:

remove(...) method of builtins.set instance
    Remove an element from a set; it must be a member.
    
    If the element is not a member, raise a KeyError.



In [179]:
employees_set.remove((4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'))

In [136]:
employees_set.remove((5, 'Donald', 'Duck', 1800.0, 'USA'))

* Checking whether element is present in a set using `[]` - check whether employee with ids 2 and 7 exists in the set.

In [None]:
#[] this notation worked in previous versions 
#however we can check if the element is present in the set using ** in operator ** as below

In [180]:
employees_set[(1, 'Scott', 'Tiger', 1000.0, 'united states')]

TypeError: 'set' object is not subscriptable

In [181]:
(2, 'Henry', 'Ford', 1250.0, 'India') in employees_set

True

In [182]:
(3, 'Nick', 'Junior', 750.0, 'united KINGDOM') in employees_set

False

* Set operations (union, intersection, difference etc) - Create a new set with **employee ids** 4, 5 and 6, then perform all 3 set operations on the set created in first step and this step.

In [190]:
employees_set1 = {(1, 'Scott', 'Tiger', 1000.0, 'united states'),
              (2,'Henry','Ford',1250.0,'India'),
              (3, "Nick", "Junior", 750.0, "united KINGDOM"),
              (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
              (5, 'Donald', 'Duck', 1800.0, 'USA'),
              (6,'Walt','Disney',900.0,'UK')}

In [191]:
employees_set1

{(1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA'),
 (6, 'Walt', 'Disney', 900.0, 'UK')}

In [192]:
employees_set2 = {(4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
              (5, 'Donald', 'Duck', 1800.0, 'USA'),
              (6,'Walt','Disney',900.0,'UK')}

In [193]:
employees_set2

{(4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA'),
 (6, 'Walt', 'Disney', 900.0, 'UK')}

In [194]:
employees_set1.union?

[0;31mDocstring:[0m
Return the union of sets as a new set.

(i.e. all elements that are in either set.)
[0;31mType:[0m      builtin_function_or_method


In [195]:
employees_set1.union(employees_set2)

{(1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA'),
 (6, 'Walt', 'Disney', 900.0, 'UK')}

In [196]:
employees_set2.add((7,'Bill','Gates',5000.0,'USA'))

In [206]:
employees_set2

{(4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA'),
 (6, 'Walt', 'Disney', 900.0, 'UK'),
 (7, 'Bill', 'Gates', 5000.0, 'USA')}

In [198]:
employees_set1.union(employees_set2)

{(1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA'),
 (6, 'Walt', 'Disney', 900.0, 'UK'),
 (7, 'Bill', 'Gates', 5000.0, 'USA')}

In [208]:
employees_set2.issubset(employees_set1)

False

In [None]:
employees_set2.remove((7, 'Bill', 'Gates', 5000.0, 'USA'))

In [203]:
employees_set2.issubset(employees_set1)

True

In [207]:
employees_set2.intersection(employees_set1)

{(4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA'),
 (6, 'Walt', 'Disney', 900.0, 'UK')}

In [205]:
employees_set2.add((7,'Bill','Gates',5000.0,'USA'))

In [209]:
employees_set1.difference(employees_set2) #set A - set B gives extra elements from set A

{(1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM')}

In [211]:
employees_set2.difference(employees_set1) #set B - set A gives extra elements from set B

{(7, 'Bill', 'Gates', 5000.0, 'USA')}

In [215]:
new = set(l) # creating set from a list -> using set constructor and passing list name in it
#upon creation duplicate elements are eliminated but the data will not be sorted

In [216]:
new

{1, 2, 3, 4, 5, 7}

## Collections - dict
Let us understand **dict** in detail.
* Group of key value pairs
* Keys are unique
* Values need not be unique
* We can access values using keys
* APIs are available to add new key value pairs to a dict, update values based on keys in dict, extract keys as set from dict, extract values as list from dict, to check whether key exists in the dict etc

### Tasks
We will see some basic dict operations by using simple examples
* Adding elements to dict

In [222]:
db = {
    'host': 'dslab.itversity.com',
    'db_name': 'retail_db',
    'username': 'retail_fake',
    'username': 'retail_user',
    'username': 'user_retail',
    'password': 'itversity'
}

In [223]:
type(db)

dict

In [224]:
db #even though 2 keys with username were created it retained only the last one

{'host': 'dslab.itversity.com',
 'db_name': 'retail_db',
 'username': 'user_retail',
 'password': 'itversity'}

* Get all keys (keys)

In [225]:
db.keys()

dict_keys(['host', 'db_name', 'username', 'password'])

In [226]:
set(db.keys())

{'db_name', 'host', 'password', 'username'}

* Get all key value pairs (items)

In [227]:
db.items()#converts each key value pair into a tuple

dict_items([('host', 'dslab.itversity.com'), ('db_name', 'retail_db'), ('username', 'user_retail'), ('password', 'itversity')])

In [228]:
set(db.items())

{('db_name', 'retail_db'),
 ('host', 'dslab.itversity.com'),
 ('password', 'itversity'),
 ('username', 'user_retail')}

* Get only values (values)

In [229]:
db.values()

dict_values(['dslab.itversity.com', 'retail_db', 'user_retail', 'itversity'])

In [230]:
set(db.values())#if the values are unique set is created

{'dslab.itversity.com', 'itversity', 'retail_db', 'user_retail'}

In [231]:
#typically we create a list from dict values
list(db.values())

['dslab.itversity.com', 'retail_db', 'user_retail', 'itversity']

In [233]:
d = db.values()

In [None]:
d.#doesn't expose any functions so we try creating a list instead to see if we get list functions or not

In [234]:
d = list(db.values())

In [None]:
d.insert #list functions are exposed and it creates a new list the underlying dict is not impacted 

* Accessing values from dict

In [236]:
db['host']

'dslab.itversity.com'

In [164]:
db.get('host')

'dslab.itversity.com'

In [165]:
db['port']

KeyError: 'port'

In [166]:
db.get('port')

In [None]:
#difference between [] and get in dict is [] returns an exception if key is not found whereas get returns nothing

In [237]:
'host' in db

True

In [238]:
'port' in db

False

In [242]:
'itversity' in db #searches in keys

False

In [243]:
'itversity' in db.values()

True

In [244]:
'itversity' in db.items()

False

In [245]:
'host' in db.items()

False

* Removing elements from dict (clear, pop, popitem)

In [246]:
db.pop?

[0;31mDocstring:[0m
D.pop(k[,d]) -> v, remove specified key and return the corresponding value.
If key is not found, d is returned if given, otherwise KeyError is raised
[0;31mType:[0m      builtin_function_or_method


In [248]:
db.pop('password')

'itversity'

In [252]:
db

{'host': 'dslab.itversity.com', 'db_name': 'retail_db'}

In [250]:
db.popitem?

[0;31mDocstring:[0m
D.popitem() -> (k, v), remove and return some (key, value) pair as a
2-tuple; but raise KeyError if D is empty.
[0;31mType:[0m      builtin_function_or_method


In [251]:
db.popitem()

('username', 'user_retail')

## List of Tuples

We often create collection (list) of tuples. Let us perform few tasks related to collection of tuples.
* Create 3 tuples with order_id, order_date, order_customer_id, order_status.

|order_id|order_date|order_customer_id|order_status|
|--------|----------|-----------------|------------|
|1|2013-07-25 00:00:00.0|11599|CLOSED|
|2|2013-07-25 00:00:00.0|256|PENDING_PAYMENT|
|3|2013-07-25 00:00:00.0|12111|COMPLETE|

* Create a list of the above 3 tuples by name **orders**

## Using Data Structures

Let us understand how to leverage the data structures for data processing.

* Read data from files using basic file I/O.

In [None]:
open?

In [253]:
orders_file = open('/Users/monikamendiratta/data/retail_db/orders/part-00000.csv')

In [254]:
type(orders_file)

_io.TextIOWrapper

In [255]:
orders_file.read?

[0;31mSignature:[0m [0morders_file[0m[0;34m.[0m[0mread[0m[0;34m([0m[0msize[0m[0;34m=[0m[0;34m-[0m[0;36m1[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Read at most n characters from stream.

Read from underlying buffer until we have n characters or we hit EOF.
If n is negative or omitted, read until EOF.
[0;31mType:[0m      builtin_function_or_method


In [256]:
orders_raw = orders_file.read()

In [257]:
type(orders_raw)

str

In [None]:
orders_raw

* Get data into collections.

In [260]:
orders_raw.split('\n')[:10]#list of strings

['1,2013-07-25 00:00:00.0,11599,CLOSED',
 '2,2013-07-25 00:00:00.0,256,PENDING_PAYMENT',
 '3,2013-07-25 00:00:00.0,12111,COMPLETE',
 '4,2013-07-25 00:00:00.0,8827,CLOSED',
 '5,2013-07-25 00:00:00.0,11318,COMPLETE',
 '6,2013-07-25 00:00:00.0,7130,COMPLETE',
 '7,2013-07-25 00:00:00.0,4530,COMPLETE',
 '8,2013-07-25 00:00:00.0,2911,PROCESSING',
 '9,2013-07-25 00:00:00.0,5657,PENDING_PAYMENT',
 '10,2013-07-25 00:00:00.0,5648,PENDING_PAYMENT']

In [261]:
orders_raw.splitlines?

[0;31mSignature:[0m [0morders_raw[0m[0;34m.[0m[0msplitlines[0m[0;34m([0m[0mkeepends[0m[0;34m=[0m[0;32mFalse[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Return a list of the lines in the string, breaking at line boundaries.

Line breaks are not included in the resulting list unless keepends is given and
true.
[0;31mType:[0m      builtin_function_or_method


In [262]:
orders_raw.splitlines()[:10]#list of strings

['1,2013-07-25 00:00:00.0,11599,CLOSED',
 '2,2013-07-25 00:00:00.0,256,PENDING_PAYMENT',
 '3,2013-07-25 00:00:00.0,12111,COMPLETE',
 '4,2013-07-25 00:00:00.0,8827,CLOSED',
 '5,2013-07-25 00:00:00.0,11318,COMPLETE',
 '6,2013-07-25 00:00:00.0,7130,COMPLETE',
 '7,2013-07-25 00:00:00.0,4530,COMPLETE',
 '8,2013-07-25 00:00:00.0,2911,PROCESSING',
 '9,2013-07-25 00:00:00.0,5657,PENDING_PAYMENT',
 '10,2013-07-25 00:00:00.0,5648,PENDING_PAYMENT']

* Convert data each record into tuple for better control.
* Process data based up on the problem statement using APIs that are available on top of collections.

**We will understand these as part of subsequent modules.**

In [268]:
orders_tuple = tuple(orders_raw.splitlines()[:1])

In [269]:
type(orders_tuple)

tuple

In [270]:
orders_tuple

('1,2013-07-25 00:00:00.0,11599,CLOSED',)