# Overview of Collections - dict and tuple
Let us get an overview of dict and tuple as part of the Python Collections.

* Overview of dict and tuple
* Common Operations
* Accessing Elements - tuples
* Accessing Elements - dict
* Manipulating dict
* Common Examples - dict
* List of tuples
* List of dicts

## Overview of dict and tuple

As we have gone through details related to `list` and `set`, now let us get an overview of `dict` and `tuple`.
* `dict`
  * Group of heterogeneous elements
  * Each element is a key value pair.
  * All the keys are unique in the `dict`.
  * `dict` can be created by enclosing elements in `{}`. Key Value pair in each element are separated by `:` - example `{1: 'a', 2: 'b', 3: 'c', 4: 'd'}`
  * Empty `dict` can be initialized using `{}` or `dict()`.
* `tuple`
  * Group of heterogeneous elements.
  * We can access the elements in `tuple` only by positional notation (by using index)
  * `tuple` can be created by enclosing elements in `()` - example `(1, 2, 3, 4)`.

In [None]:
t = (1, 'Scott', 'Tiger', 1000.0) # tuple

In [None]:
t

In [None]:
t = ()

In [None]:
t

In [None]:
type(t)

In [None]:
t = tuple()

In [None]:
t

In [None]:
d = {'id': 1, 'first_name': 'Scott', 'last_name': 'Tiger', 'amount': 1000.0} # dict

In [None]:
d

In [None]:
type(d)

In [None]:
d = dict() # Initializing empty dict

In [None]:
d

In [None]:
d = {} # d will be of type dict

In [None]:
type(d)

## Common Operations

There are some functions which can be applied on all collections. Here we will see details related to `tuple` and `dict`.
* `in` - check if element exists in the `tuple` and if the exists in the `dict`.
* `len` - to get the number of elements.
* `sorted` - to sort the data (original collection will be untouched). Typically, we assign the result of sorting to a new collection.
* `sum`, `min`, `max`, etc - arithmetic operations. In case of `dict`, the operations will be performed on key.
* There can be more such functions.

In [None]:
t = (1, 2, 3, 4) # tuple

In [None]:
len(t)

In [None]:
sorted(t)

In [None]:
sum(t)

In [None]:
d = {1: 'a', 2: 'b', 3: 'c', 4: 'd'} # dict

In [None]:
len(d)

In [None]:
sorted(d) # only sorts the keys

In [None]:
sum(d) # applies only on keys

## Accessing Elements - tuples

Let us see details related to operations on tuples. Unlike `list` we have limited functions with `tuple`.
* `tuple` is by definition immutable and hence we will not be able to add elements to a tuple or delete elements from a tuple.
* Only functions that are available are `count` and `index`.
* `count` gives number of times an element is repeated in a tuple.
* `index` returns the position of element in a tuple.

In [None]:
t =(1, 2, 3, 4, 4, 6, 1, 2, 3)

In [None]:
t.count?

In [None]:
t.count(4)

In [None]:
t.index?

In [None]:
t.index(2)

In [None]:
t.index(2, 3)

In [None]:
t.index(2, 3, 5) #throws ValueError

## Accessing Elements - dict

Let us see how we can access elements from the `dict`.
* We can access a value of a particular element in `dict` by passing key `l[key]`. If the key does not exists, it will throw **KeyError**.
* `get` also can be used to access a value of particular element in `dict` by passing key as argument. However, if key does not exists, it will return none.
* We can also pass a default value to `get`.
* We can get all the keys in the form of set like object by using `keys` and all the values in the form of list like object by using `values`.
* We can also use `items` to convert a `dict` into a set like object with pairs. Each element in the pair will be a tuple.
* Let us see few examples.

In [None]:
d = {'id': 1, 'first_name': 'Scott', 'last_name': 'Tiger', 'amount': 1000.0}

In [None]:
d['id']

In [None]:
d['first_name']

In [None]:
d['commission_pct'] # throws key error

In [None]:
d.get?

In [None]:
d.get('first_name')

In [None]:
d.get('commission_pct') # Returns None

In [None]:
d.get('first_name', 'Some First Name')

In [None]:
d.get('commission_pct', 0) 

In [None]:
d.keys?

In [None]:
d.keys()

In [None]:
d.values?

In [None]:
d.values()

In [None]:
d.items?

In [None]:
d.items()

In [None]:
list(d.items())[0]

In [None]:
list(d.items())[1]

In [None]:
type(list(d.items())[1])

## Manipulating dict

Let us understand how we can manipulate the dicts.
* We can add new key value pairs to `dict` by using typical assignment.
* We can also use assignment operation to update existing key value pair in the `dict`.
* `setdefault` can be used to get the element from the `dict` by using key. If key does not exist, it will update the `dict` with the key passed along with default value.
* `update` can be used to merge a list of pairs (2 tuples) or a `dict` into the `dict`.
* Elements from the dict can be removed using functions like `pop` and `popitem`.
  * `pop` is typically used to remove the element using key.
  * `popitem` is used to remove one of the item (typically last) from the `dict`.

In [None]:
d = {'id': 1, 'first_name': 'Scott', 'last_name': 'Tiger', 'amount': 1000.0}

In [None]:
d['commission_pct'] = 10 # Adding Element

In [None]:
d['phoneNumbers'] = 1234567890

In [None]:
d

In [None]:
d['amount'] = 1500.0

In [None]:
d

In [None]:
d = {'id': 1, 'first_name': 'Scott', 'last_name': 'Tiger', 'amount': 1000.0}

In [None]:
d.setdefault?

In [None]:
d.setdefault('amount')

In [None]:
d.setdefault('commission_pct')

In [None]:
d

In [None]:
d.setdefault('commission_pct', 0)

In [None]:
d

In [None]:
d.update?

In [None]:
d = {'id': 1}

In [None]:
d.update({'first_name': 'Donald', 'last_name': 'Duck'})

In [None]:
d

In [None]:
d.update([('amount', 1000.0), ('commission_pct', 10)])

In [None]:
d

In [None]:
d = {'id': 1, 'first_name': 'Scott', 'last_name': 'Tiger', 'amount': 1000.0}

In [None]:
d['commission_pct'] = 10 # Adding Element

In [None]:
d['phoneNumbers'] = 1234567890

In [None]:
d.pop('phoneNumbers')

In [None]:
d

In [None]:
d.pop('phoneNumbers') # throws KeyError

In [None]:
d.pop('phoneNumbers', 'No such key exists')

In [None]:
d

In [None]:
d.popitem?

In [None]:
d.popitem()

In [None]:
d

## Common Examples - dict

Let us see some common examples while creating `dict`. If you are familiar with JSON, `dict` is similar to JSON.
* A dict can have key value pairs where key is of any type and value is of any type.
* However, typically we use attribute names as keys for `dict`. They are typically of type `str`.
* The value can be of simple types such as `int`, `float`, `str` etc or it can be object of some custom type.
* The value can also be of type `list` or nested `dict`.
* An individual might have multiple phone numbers and hence we can define it as `list`.
* An individual address might have street, city, state and zip and hence we can define it as nested `dict`.
* Let us see some examples.

In [None]:
# All attribute names are of type str and values are of type int, str or float
d = {'id': 1, 'first_name': 'Scott', 'last_name': 'Tiger', 'amount': 1000.0}

In [None]:
for key in d.keys():
    print(f'type of attribute name {key} is {type(key)}')

In [None]:
for value in d.values():
    print(f'type of value {value} is {type(value)}')

In [None]:
# phone_numbers is of type list
d = {'id': 1, 'first_name': 'Scott', 'last_name': 'Tiger', 'amount': 1000.0, 'phone_numbers': [1234567890, 2345679180]}

In [None]:
for value in d.values():
    print(f'type of value {value} is {type(value)}')

In [None]:
d = {
    'id': 1, 
    'first_name': 'Scott', 
    'last_name': 'Tiger', 
    'amount': 1000.0, 
    'phone_numbers': [1234567890, 2345679180],
    'address': {'street': '1234 ABC Towers', 'city': 'Round Rock', 'state': 'Texas', 'zip': 78664}
}

In [None]:
d['address']

In [None]:
type(d['address'])

In [None]:
for value in d.values():
    print(f'type of value {value} is {type(value)}')

## List of tuples
Let us see an example of how we can read data from a file into **list of tuples**.
* When we read data from a file into a `list`, typically each element in the list will be of type binary or string.
* We can convert the element into `tuple` to simplify the processing.
* Once each element is converted to `tuple`, we can access elements in the `tuple` using positional notation.
* Let us see an example to read the data from a file into **list of tuples** and access dates.

In [None]:
# Reading data from file into a list
path = '/Users/itversity/Research/data/retail_db/orders/part-00000'
# C:\\users\\itversity\\Research
orders_file = open(path)

In [None]:
orders_raw = orders_file.read()

In [None]:
orders = orders_raw.splitlines()

In [None]:
orders[:10]

In [None]:
len(orders) # same as number of records in the file

In [None]:
order = '1,2013-07-25 00:00:00.0,11599,CLOSED'

In [None]:
order.split(',')

In [None]:
tuple(order.split(','))

In [None]:
(*order.split(','), )# special operator to convert list to tuple

In [None]:
order_tuples = [(*order.split(','),) for order in orders] 

In [None]:
order_tuples[0]

In [None]:
order_tuples[:3]

In [None]:
len(order_tuples)

In [None]:
order_dates = [order[1] for order in order_tuples]

In [None]:
order_dates[:3]

In [None]:
len(order_dates)

In [None]:
# We can also change the data types of elements in the tuples
def get_order_details(order):
    order_details = order.split(',')
    return (int(order_details[0]), order_details[1], int(order_details[2]), order_details[3])

In [None]:
order_tuples = [get_order_details(order) for order in orders]

In [None]:
order_tuples[:3]

In [None]:
order_customer_ids = [order[2] for order in order_tuples]

In [None]:
order_customer_ids[:3]

In [None]:
type(order_customer_ids[0])

## List of dicts
Let us see an example of how we can read data from a file into **list of dicts**.
* When we read data from a file into a `list`, typically each element in the `list` will be of type binary or string.
* We can convert the element into `dict` to simplify the processing.
* Once each element is converted to `dict`, we can access elements in the `dict` using attribute name.
* Let us see an example to read the data from a file into **list of dicts** and access dates.

In [None]:
# Reading data from file into a list
path = '/Users/itversity/Research/data/retail_db/orders/part-00000'
# C:\\users\\itversity\\Research
orders_file = open(path)

In [None]:
orders_raw = orders_file.read()

In [None]:
orders = orders_raw.splitlines()

In [None]:
orders[:10]

In [None]:
len(orders) # same as number of records in the file

In [None]:
def get_order_dict(order):
    order_details = order.split(',')
    order_dict = {
        'order_id': int(order_details[0]),
        'order_date': order_details[1],
        'order_customer_id': int(order_details[2]),
        'order_status': order_details[3],
    }
    return order_dict

In [None]:
order_dicts = [get_order_dict(order) for order in orders]

In [None]:
order_dicts[0]

In [None]:
order_dicts[:3]

In [None]:
len(order_dicts)

In [None]:
order_dates = [order['order_date'] for order in order_dicts]

In [None]:
order_dates[:3]

In [None]:
len(order_dates)

In [None]:
order_customer_ids = [order['order_customer_id'] for order in order_dicts]

In [None]:
order_customer_ids[:3]

In [None]:
type(order_customer_ids[0])