# Overview of Collections - dict and tuple

* Overview of dict and tuple
* Common Operations
* Accessing Elements - tuples
* Accessing Elements - dict
* Manipulating dict
* Common Examples - dict
* List of tuples
* List of dicts

## Overview of dict and tuple

As we have gone through details related to `list` and `set`, now let us get an overview of `dict` and `tuple`.
* `dict`
  * Group of heterogeneous elements
  * Each element is a key value pair.
  * All the keys are unique in the `dict`.
  * `dict` can be created by enclosing elements in `{}`. Key Value pair in each element are separated by `:` - example `{1: 'a', 2: 'b', 3: 'c', 4: 'd'}`
  * Empty `dict` can be initialized using `{}` or `dict()`.
* `tuple`
  * Group of heterogeneous elements.
  * We can access the elements in `tuple` only by positional notation (by using index)
  * `tuple` can be created by enclosing elements in `()` - example `(1, 2, 3, 4)`.

In [1]:
t = (1, 'Scott', 'Tiger', 1000.0) # tuple

In [2]:
t

(1, 'Scott', 'Tiger', 1000.0)

In [3]:
t = ()

In [4]:
t

()

In [5]:
type(t)

tuple

In [6]:
t = tuple()

In [7]:
t

()

In [8]:
d = {'id': 1, 'first_name': 'Scott', 'last_name': 'Tiger', 'amount': 1000.0} # dict

In [9]:
d

{'id': 1, 'first_name': 'Scott', 'last_name': 'Tiger', 'amount': 1000.0}

In [10]:
type(d)

dict

In [11]:
d = dict() # Initializing empty dict

In [12]:
d

{}

In [13]:
d = {} # d will be of type dict

In [14]:
type(d)

dict

## Common Operations

There are some functions which can be applied on all collections. Here we will see details related to `tuple` and `dict`.
* `in` - check if element exists in the `tuple` and if the exists in the `dict`.
* `len` - to get the number of elements.
* `sorted` - to sort the data (original collection will be untouched). Typically, we assign the result of sorting to a new collection.
* `sum`, `min`, `max`, etc - arithmetic operations. In case of `dict`, the operations will be performed on key.
* There can be more such functions.

In [23]:
t = (1, 2, 3, 4) # tuple

In [24]:
len(t)

4

In [25]:
sorted(t)

[1, 2, 3, 4]

In [26]:
sum(t)

10

In [27]:
d = {1: 'a', 2: 'b', 3: 'c', 4: 'd'} # dict

In [28]:
len(d)

4

In [29]:
sorted(d) # only sorts the keys

[1, 2, 3, 4]

In [147]:
sum(d) # applies only on keys

10

## Accessing Elements - tuples

Let us see details related to operations on tuples. Unlike `list` we have limited functions with `tuple`.
* `tuple` is by definition immutable and hence we will not be able to add elements to a tuple or delete elements from a tuple.
* Only functions that are available are `count` and `index`.
* `count` gives number of times an element is repeated in a tuple.
* `index` returns the position of element in a tuple.

In [19]:
t =(1, 2, 3, 4, 4, 6, 1, 2, 3)

In [20]:
t.count?

[0;31mSignature:[0m [0mt[0m[0;34m.[0m[0mcount[0m[0;34m([0m[0mvalue[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Return number of occurrences of value.
[0;31mType:[0m      builtin_function_or_method


In [21]:
t.count(4)

2

In [17]:
t.index?

[0;31mSignature:[0m [0mt[0m[0;34m.[0m[0mindex[0m[0;34m([0m[0mvalue[0m[0;34m,[0m [0mstart[0m[0;34m=[0m[0;36m0[0m[0;34m,[0m [0mstop[0m[0;34m=[0m[0;36m9223372036854775807[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Return first index of value.

Raises ValueError if the value is not present.
[0;31mType:[0m      builtin_function_or_method


In [22]:
t.index(2)

1

In [26]:
t.index(2, 3)

7

In [28]:
t.index(2, 3, 5) #throws ValueError

ValueError: tuple.index(x): x not in tuple

## Accessing Elements - dict

Let us see how we can access elements from the `dict`.
* We can access a value of a particular element in `dict` by passing key `l[key]`. If the key does not exists, it will throw **KeyError**.
* `get` also can be used to access a value of particular element in `dict` by passing key as argument. However, if key does not exists, it will return none.
* We can also pass a default value to `get`.
* We can get all the keys in the form of set like object by using `keys` and all the values in the form of list like object by using `values`.
* We can also use `items` to convert a `dict` into a set like object with pairs. Each element in the pair will be a tuple.
* Let us see few examples.

In [114]:
d = {'id': 1, 'first_name': 'Scott', 'last_name': 'Tiger', 'amount': 1000.0}

In [115]:
d['id']

1

In [116]:
d['first_name']

'Scott'

In [117]:
d['commission_pct'] # throws key error

KeyError: 'commission_pct'

In [118]:
d.get?

[0;31mSignature:[0m [0md[0m[0;34m.[0m[0mget[0m[0;34m([0m[0mkey[0m[0;34m,[0m [0mdefault[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Return the value for key if key is in the dictionary, else default.
[0;31mType:[0m      builtin_function_or_method


In [119]:
d.get('first_name')

'Scott'

In [120]:
d.get('commission_pct') # Returns None

In [121]:
d.get('first_name', 'Some First Name')

'Scott'

In [122]:
d.get('commission_pct', 0) 

0

In [123]:
d.keys?

[0;31mDocstring:[0m D.keys() -> a set-like object providing a view on D's keys
[0;31mType:[0m      builtin_function_or_method


In [124]:
d.keys()

dict_keys(['id', 'first_name', 'last_name', 'amount'])

In [125]:
d.values?

[0;31mDocstring:[0m D.values() -> an object providing a view on D's values
[0;31mType:[0m      builtin_function_or_method


In [126]:
d.values()

dict_values([1, 'Scott', 'Tiger', 1000.0])

In [127]:
d.items?

[0;31mDocstring:[0m D.items() -> a set-like object providing a view on D's items
[0;31mType:[0m      builtin_function_or_method


In [128]:
d.items()

dict_items([('id', 1), ('first_name', 'Scott'), ('last_name', 'Tiger'), ('amount', 1000.0)])

In [130]:
list(d.items())[0]

('id', 1)

In [131]:
list(d.items())[1]

('first_name', 'Scott')

In [132]:
type(list(d.items())[1])

tuple

## Manipulating dict

Let us understand how we can manipulate the dicts.
* We can add new key value pairs to `dict` by using typical assignment.
* We can also use assignment operation to update existing key value pair in the `dict`.
* `setdefault` can be used to get the element from the `dict` by using key. If key does not exist, it will update the `dict` with the key passed along with default value.
* `update` can be used to merge a list of pairs (2 tuples) or a `dict` into the `dict`.
* Elements from the dict can be removed using functions like `pop` and `popitem`.
  * `pop` is typically used to remove the element using key.
  * `popitem` is used to remove one of the item (typically last) from the `dict`.

In [82]:
d = {'id': 1, 'first_name': 'Scott', 'last_name': 'Tiger', 'amount': 1000.0}

In [83]:
d['commission_pct'] = 10 # Adding Element

In [84]:
d['phoneNumbers'] = 1234567890

In [85]:
d

{'id': 1,
 'first_name': 'Scott',
 'last_name': 'Tiger',
 'amount': 1000.0,
 'commission_pct': 10,
 'phoneNumbers': 1234567890}

In [86]:
d['amount'] = 1500.0

In [87]:
d

{'id': 1,
 'first_name': 'Scott',
 'last_name': 'Tiger',
 'amount': 1500.0,
 'commission_pct': 10,
 'phoneNumbers': 1234567890}

In [107]:
d = {'id': 1, 'first_name': 'Scott', 'last_name': 'Tiger', 'amount': 1000.0}

In [102]:
d.setdefault?

[0;31mSignature:[0m [0md[0m[0;34m.[0m[0msetdefault[0m[0;34m([0m[0mkey[0m[0;34m,[0m [0mdefault[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.
[0;31mType:[0m      builtin_function_or_method


In [110]:
d.setdefault('amount')

1000.0

In [111]:
d.setdefault('commission_pct')

In [112]:
d

{'id': 1,
 'first_name': 'Scott',
 'last_name': 'Tiger',
 'amount': 1000.0,
 'commission_pct': None}

In [113]:
d.setdefault('commission_pct', 0)

In [106]:
d

{'id': 1,
 'first_name': 'Scott',
 'last_name': 'Tiger',
 'amount': 1000.0,
 'commission_pct': 0}

In [96]:
d.update?

[0;31mDocstring:[0m
D.update([E, ]**F) -> None.  Update D from dict/iterable E and F.
If E is present and has a .keys() method, then does:  for k in E: D[k] = E[k]
If E is present and lacks a .keys() method, then does:  for k, v in E: D[k] = v
In either case, this is followed by: for k in F:  D[k] = F[k]
[0;31mType:[0m      builtin_function_or_method


In [97]:
d = {'id': 1}

In [98]:
d.update({'first_name': 'Donald', 'last_name': 'Duck'})

In [99]:
d

{'id': 1, 'first_name': 'Donald', 'last_name': 'Duck'}

In [100]:
d.update([('amount', 1000.0), ('commission_pct', 10)])

In [101]:
d

{'id': 1,
 'first_name': 'Donald',
 'last_name': 'Duck',
 'amount': 1000.0,
 'commission_pct': 10}

In [82]:
d = {'id': 1, 'first_name': 'Scott', 'last_name': 'Tiger', 'amount': 1000.0}

In [83]:
d['commission_pct'] = 10 # Adding Element

In [84]:
d['phoneNumbers'] = 1234567890

In [88]:
d.pop('phoneNumbers')

1234567890

In [89]:
d

{'id': 1,
 'first_name': 'Scott',
 'last_name': 'Tiger',
 'amount': 1500.0,
 'commission_pct': 10}

In [90]:
d.pop('phoneNumbers') # throws KeyError

KeyError: 'phoneNumbers'

In [91]:
d.pop('phoneNumbers', 'No such key exists')

'No such key exists'

In [92]:
d

{'id': 1,
 'first_name': 'Scott',
 'last_name': 'Tiger',
 'amount': 1500.0,
 'commission_pct': 10}

In [93]:
d.popitem?

[0;31mDocstring:[0m
D.popitem() -> (k, v), remove and return some (key, value) pair as a
2-tuple; but raise KeyError if D is empty.
[0;31mType:[0m      builtin_function_or_method


In [94]:
d.popitem()

('commission_pct', 10)

In [95]:
d

{'id': 1, 'first_name': 'Scott', 'last_name': 'Tiger', 'amount': 1500.0}

## Common Examples - dict

Let us see some common examples while creating `dict`. If you are familiar with JSON, `dict` is similar to JSON.
* A dict can have key value pairs where key is of any type and value is of any type.
* However, typically we use attribute names as keys for `dict`. They are typically of type `str`.
* The value can be of simple types such as `int`, `float`, `str` etc or it can be object of some custom type.
* The value can also be of type `list` or nested `dict`.
* An individual might have multiple phone numbers and hence we can define it as `list`.
* An individual address might have street, city, state and zip and hence we can define it as nested `dict`.
* Let us see some examples.

In [133]:
# All attribute names are of type str and values are of type int, str or float
d = {'id': 1, 'first_name': 'Scott', 'last_name': 'Tiger', 'amount': 1000.0}

In [134]:
for key in d.keys():
    print(f'type of attribute name {key} is {type(key)}')

type of id is <class 'str'>
type of first_name is <class 'str'>
type of last_name is <class 'str'>
type of amount is <class 'str'>


In [137]:
for value in d.values():
    print(f'type of value {value} is {type(value)}')

type of value 1 is <class 'int'>
type of value Scott is <class 'str'>
type of value Tiger is <class 'str'>
type of value 1000.0 is <class 'float'>


In [138]:
# phone_numbers is of type list
d = {'id': 1, 'first_name': 'Scott', 'last_name': 'Tiger', 'amount': 1000.0, 'phone_numbers': [1234567890, 2345679180]}

In [139]:
for value in d.values():
    print(f'type of value {value} is {type(value)}')

type of value 1 is <class 'int'>
type of value Scott is <class 'str'>
type of value Tiger is <class 'str'>
type of value 1000.0 is <class 'float'>
type of value [1234567890, 2345679180] is <class 'list'>


In [140]:
d = {
    'id': 1, 
    'first_name': 'Scott', 
    'last_name': 'Tiger', 
    'amount': 1000.0, 
    'phone_numbers': [1234567890, 2345679180],
    'address': {'street': '1234 ABC Towers', 'city': 'Round Rock', 'state': 'Texas', 'zip': 78664}
}

In [141]:
d['address']

{'street': '1234 ABC Towers',
 'city': 'Round Rock',
 'state': 'Texas',
 'zip': 78664}

In [142]:
type(d['address'])

dict

In [143]:
for value in d.values():
    print(f'type of value {value} is {type(value)}')

type of value 1 is <class 'int'>
type of value Scott is <class 'str'>
type of value Tiger is <class 'str'>
type of value 1000.0 is <class 'float'>
type of value [1234567890, 2345679180] is <class 'list'>
type of value {'street': '1234 ABC Towers', 'city': 'Round Rock', 'state': 'Texas', 'zip': 78664} is <class 'dict'>


## List of tuples
Let us see an example of how we can read data from a file into **list of tuples**.
* When we read data from a file into a `list`, typically each element in the list will be of type binary or string.
* We can convert the element into `tuple` to simplify the processing.
* Once each element is converted to `tuple`, we can access elements in the `tuple` using positional notation.
* Let us see an example to read the data from a file into **list of tuples** and access dates.

In [151]:
# Reading data from file into a list
path = '/Users/itversity/Research/data/retail_db/orders/part-00000'
# C:\\users\\itversity\\Research
orders_file = open(path)

In [152]:
orders_raw = orders_file.read()

In [153]:
orders = orders_raw.splitlines()

In [154]:
orders[:10]

['1,2013-07-25 00:00:00.0,11599,CLOSED',
 '2,2013-07-25 00:00:00.0,256,PENDING_PAYMENT',
 '3,2013-07-25 00:00:00.0,12111,COMPLETE',
 '4,2013-07-25 00:00:00.0,8827,CLOSED',
 '5,2013-07-25 00:00:00.0,11318,COMPLETE',
 '6,2013-07-25 00:00:00.0,7130,COMPLETE',
 '7,2013-07-25 00:00:00.0,4530,COMPLETE',
 '8,2013-07-25 00:00:00.0,2911,PROCESSING',
 '9,2013-07-25 00:00:00.0,5657,PENDING_PAYMENT',
 '10,2013-07-25 00:00:00.0,5648,PENDING_PAYMENT']

In [155]:
len(orders) # same as number of records in the file

68883

In [200]:
order = '1,2013-07-25 00:00:00.0,11599,CLOSED'

In [201]:
order.split(',')

['1', '2013-07-25 00:00:00.0', '11599', 'CLOSED']

In [203]:
tuple(order.split(','))

('1', '2013-07-25 00:00:00.0', '11599', 'CLOSED')

In [202]:
(*order.split(','), )# special operator to convert list to tuple

('1', '2013-07-25 00:00:00.0', '11599', 'CLOSED')

In [157]:
order_tuples = [(*order.split(','),) for order in orders] 

In [158]:
order_tuples[0]

('1', '2013-07-25 00:00:00.0', '11599', 'CLOSED')

In [163]:
order_tuples[:3]

[('1', '2013-07-25 00:00:00.0', '11599', 'CLOSED'),
 ('2', '2013-07-25 00:00:00.0', '256', 'PENDING_PAYMENT'),
 ('3', '2013-07-25 00:00:00.0', '12111', 'COMPLETE')]

In [165]:
len(order_tuples)

68883

In [161]:
order_dates = [order[1] for order in order_tuples]

In [164]:
order_dates[:3]

['2013-07-25 00:00:00.0', '2013-07-25 00:00:00.0', '2013-07-25 00:00:00.0']

In [166]:
len(order_dates)

68883

In [167]:
# We can also change the data types of elements in the tuples
def get_order_details(order):
    order_details = order.split(',')
    return (int(order_details[0]), order_details[1], int(order_details[2]), order_details[3])

In [168]:
order_tuples = [get_order_details(order) for order in orders]

In [169]:
order_tuples[:3]

[(1, '2013-07-25 00:00:00.0', 11599, 'CLOSED'),
 (2, '2013-07-25 00:00:00.0', 256, 'PENDING_PAYMENT'),
 (3, '2013-07-25 00:00:00.0', 12111, 'COMPLETE')]

In [170]:
order_customer_ids = [order[2] for order in order_tuples]

In [171]:
order_customer_ids[:3]

[11599, 256, 12111]

In [172]:
type(order_customer_ids[0])

int

## List of dicts
Let us see an example of how we can read data from a file into **list of dicts**.
* When we read data from a file into a `list`, typically each element in the `list` will be of type binary or string.
* We can convert the element into `dict` to simplify the processing.
* Once each element is converted to `dict`, we can access elements in the `dict` using attribute name.
* Let us see an example to read the data from a file into **list of dicts** and access dates.

In [151]:
# Reading data from file into a list
path = '/Users/itversity/Research/data/retail_db/orders/part-00000'
# C:\\users\\itversity\\Research
orders_file = open(path)

In [152]:
orders_raw = orders_file.read()

In [153]:
orders = orders_raw.splitlines()

In [154]:
orders[:10]

['1,2013-07-25 00:00:00.0,11599,CLOSED',
 '2,2013-07-25 00:00:00.0,256,PENDING_PAYMENT',
 '3,2013-07-25 00:00:00.0,12111,COMPLETE',
 '4,2013-07-25 00:00:00.0,8827,CLOSED',
 '5,2013-07-25 00:00:00.0,11318,COMPLETE',
 '6,2013-07-25 00:00:00.0,7130,COMPLETE',
 '7,2013-07-25 00:00:00.0,4530,COMPLETE',
 '8,2013-07-25 00:00:00.0,2911,PROCESSING',
 '9,2013-07-25 00:00:00.0,5657,PENDING_PAYMENT',
 '10,2013-07-25 00:00:00.0,5648,PENDING_PAYMENT']

In [155]:
len(orders) # same as number of records in the file

68883

In [186]:
def get_order_dict(order):
    order_details = order.split(',')
    order_dict = {
        'order_id': int(order_details[0]),
        'order_date': order_details[1],
        'order_customer_id': int(order_details[2]),
        'order_status': order_details[3],
    }
    return order_dict

In [187]:
order_dicts = [get_order_dict(order) for order in orders]

In [188]:
order_dicts[0]

{'order_id': 1,
 'order_date': '2013-07-25 00:00:00.0',
 'order_customer_id': 11599,
 'order_status': 'CLOSED'}

In [189]:
order_dicts[:3]

[{'order_id': 1,
  'order_date': '2013-07-25 00:00:00.0',
  'order_customer_id': 11599,
  'order_status': 'CLOSED'},
 {'order_id': 2,
  'order_date': '2013-07-25 00:00:00.0',
  'order_customer_id': 256,
  'order_status': 'PENDING_PAYMENT'},
 {'order_id': 3,
  'order_date': '2013-07-25 00:00:00.0',
  'order_customer_id': 12111,
  'order_status': 'COMPLETE'}]

In [192]:
len(order_dicts)

68883

In [194]:
order_dates = [order['order_date'] for order in order_dicts]

In [195]:
order_dates[:3]

['2013-07-25 00:00:00.0', '2013-07-25 00:00:00.0', '2013-07-25 00:00:00.0']

In [196]:
len(order_dates)

68883

In [197]:
order_customer_ids = [order['order_customer_id'] for order in order_dicts]

In [198]:
order_customer_ids[:3]

[11599, 256, 12111]

In [199]:
type(order_customer_ids[0])

int