## List of dicts
Let us see an example of how we can read data from a file into **list of dicts** using Python as programming language.

* When we read data from a file into a `list`, typically each element in the `list` will be of type binary or string.
* We can convert the element into `dict` to simplify the processing.
* Once each element is converted to `dict`, we can access elements in the `dict` using attribute name.
* Let us see an example to read the data from a file into **list of dicts** and access dates.

In [2]:
# Reading data from file into a list
path = '/data/retail_db/orders/part-00000'
# C:\\users\\itversity\\Research\\data\\retail_db\\orders\\part-00000
orders_file = open(path)

In [3]:
%%sh

wc -l /data/retail_db/orders/part-00000

68883 /data/retail_db/orders/part-00000


In [4]:
type(orders_file)

_io.TextIOWrapper

In [5]:
orders_raw = orders_file.read()

In [6]:
orders = orders_raw.splitlines()

In [7]:
orders[:10]

['1,2013-07-25 00:00:00.0,11599,CLOSED',
 '2,2013-07-25 00:00:00.0,256,PENDING_PAYMENT',
 '3,2013-07-25 00:00:00.0,12111,COMPLETE',
 '4,2013-07-25 00:00:00.0,8827,CLOSED',
 '5,2013-07-25 00:00:00.0,11318,COMPLETE',
 '6,2013-07-25 00:00:00.0,7130,COMPLETE',
 '7,2013-07-25 00:00:00.0,4530,COMPLETE',
 '8,2013-07-25 00:00:00.0,2911,PROCESSING',
 '9,2013-07-25 00:00:00.0,5657,PENDING_PAYMENT',
 '10,2013-07-25 00:00:00.0,5648,PENDING_PAYMENT']

In [8]:
len(orders) # same as number of records in the file

68883

In [9]:
def get_order_dict(order):
    order_details = order.split(',')
    order_dict = {
        'order_id': int(order_details[0]),
        'order_date': order_details[1],
        'order_customer_id': int(order_details[2]),
        'order_status': order_details[3],
    }
    return order_dict

In [10]:
get_order_dict(orders[0])

{'order_id': 1,
 'order_date': '2013-07-25 00:00:00.0',
 'order_customer_id': 11599,
 'order_status': 'CLOSED'}

In [11]:
order_dicts = [get_order_dict(order) for order in orders]

In [12]:
type(order_dicts)

list

In [13]:
type(order_dicts[0])

dict

In [14]:
order_dicts[0]

{'order_id': 1,
 'order_date': '2013-07-25 00:00:00.0',
 'order_customer_id': 11599,
 'order_status': 'CLOSED'}

In [15]:
order_dicts[:3]

[{'order_id': 1,
  'order_date': '2013-07-25 00:00:00.0',
  'order_customer_id': 11599,
  'order_status': 'CLOSED'},
 {'order_id': 2,
  'order_date': '2013-07-25 00:00:00.0',
  'order_customer_id': 256,
  'order_status': 'PENDING_PAYMENT'},
 {'order_id': 3,
  'order_date': '2013-07-25 00:00:00.0',
  'order_customer_id': 12111,
  'order_status': 'COMPLETE'}]

In [16]:
len(order_dicts)

68883

In [17]:
order_dates = [order['order_date'] for order in order_dicts]

In [18]:
order_dates[:3]

['2013-07-25 00:00:00.0', '2013-07-25 00:00:00.0', '2013-07-25 00:00:00.0']

In [19]:
len(order_dates)

68883

In [20]:
set(order_dates)

{'2013-07-25 00:00:00.0',
 '2013-07-26 00:00:00.0',
 '2013-07-27 00:00:00.0',
 '2013-07-28 00:00:00.0',
 '2013-07-29 00:00:00.0',
 '2013-07-30 00:00:00.0',
 '2013-07-31 00:00:00.0',
 '2013-08-01 00:00:00.0',
 '2013-08-02 00:00:00.0',
 '2013-08-03 00:00:00.0',
 '2013-08-04 00:00:00.0',
 '2013-08-05 00:00:00.0',
 '2013-08-06 00:00:00.0',
 '2013-08-07 00:00:00.0',
 '2013-08-08 00:00:00.0',
 '2013-08-09 00:00:00.0',
 '2013-08-10 00:00:00.0',
 '2013-08-11 00:00:00.0',
 '2013-08-12 00:00:00.0',
 '2013-08-13 00:00:00.0',
 '2013-08-14 00:00:00.0',
 '2013-08-15 00:00:00.0',
 '2013-08-16 00:00:00.0',
 '2013-08-17 00:00:00.0',
 '2013-08-18 00:00:00.0',
 '2013-08-19 00:00:00.0',
 '2013-08-20 00:00:00.0',
 '2013-08-21 00:00:00.0',
 '2013-08-22 00:00:00.0',
 '2013-08-23 00:00:00.0',
 '2013-08-24 00:00:00.0',
 '2013-08-25 00:00:00.0',
 '2013-08-26 00:00:00.0',
 '2013-08-27 00:00:00.0',
 '2013-08-28 00:00:00.0',
 '2013-08-29 00:00:00.0',
 '2013-08-30 00:00:00.0',
 '2013-08-31 00:00:00.0',
 '2013-09-01

In [21]:
order_customer_ids = [order['order_customer_id'] for order in order_dicts]

In [22]:
order_customer_ids[:3]

[11599, 256, 12111]

In [23]:
type(order_customer_ids[0])

int

In [24]:
def get_order_dict(order):
    order_details = order.split(',')
    order_dict = {
        'order_id': int(order_details[0]),
        'order_date': order_details[1],
        'order_customer_id': int(order_details[2]),
        'order_status': order_details[3],
    }
    return order_dict

In [25]:
# Reading data from file into a list
path = '/data/retail_db/orders/part-00000'
# C:\\users\\itversity\\Research\\data\\retail_db\\orders\\part-00000
orders_file = open(path)
orders_raw = orders_file.read()
orders = orders_raw.splitlines()
order_dicts = [get_order_dict(order) for order in orders]
order_dates = [order['order_date'] for order in order_dicts]

In [26]:
order_dates[:3]

['2013-07-25 00:00:00.0', '2013-07-25 00:00:00.0', '2013-07-25 00:00:00.0']

In [27]:
len(order_dates)

68883