## Overview of Collections and Tuples

Let"s quickly recap about Collections and Tuples in Python. We will primarily talk about collections and tuples that comes as part of Python standard library such as `list`, `set`,` dict` and `tuple.`

* Group of elements with length and index - `list`
* Group of unique elements - `set`
* Group of key value pairs - `dict`
* While list, set and dict contain group of homogeneous elements, tuple contains group of heterogeneous elements.
* We can consider list, set and dict as a table in a database and tuple as a row or record in a given table.
* Typically we create list of tuples or set of tuples and dict is nothing but collection of tuples with 2 elements and key is unique.
* We typically use Map Reduce APIs to process the data in collections. There are also some pre-defined functions such as `len`, `sum`,` min`,` max` etc for aggregating data in collections.

### Tasks

Let us perform few tasks to quickly recap details about Collections and Tuples in Python. We will also quickly recap about Map Reduce APIs.

* Create a collection of orders by reading data from a file.

In [None]:
import sys.process._

In [None]:
"ls -ltr /data/retail_db/orders/part-00000"!

In [None]:
val ordersPath = "/data/retail_db/orders/part-00000"

In [None]:
import scala.io.Source

In [None]:
val orders = Source.fromFile(ordersPath).
    getLines

* Get all unique order statuses. Make sure data is sorted in alphabetical order.

In [None]:
val ordersPath = "/data/retail_db/orders/part-00000"

import scala.io.Source
val orders = Source.fromFile(ordersPath).
    getLines

orders.
    map(order => order.split(",")(3)).
    toSet.
    toList.
    sorted.
    foreach(println)

* Get count of all unique dates.

In [None]:
val ordersPath = "/data/retail_db/orders/part-00000"

import scala.io.Source
val orders = Source.fromFile(ordersPath).
    getLines

orders.
    map(order => order.split(",")(1)).
    toSet.
    toList.
    sorted

* Sort the data in orders in ascending order by order_customer_id and then order_date.

In [None]:
val ordersPath = "/data/retail_db/orders/part-00000"

import scala.io.Source
val orders = Source.fromFile(ordersPath).
    getLines

orders.
    toList.
    sortBy(k => {
        val a = k.split(",")
        (a(2).toInt, a(1))
    }).
    take(20).
    foreach(println)

* Create a collection of order_items by reading data from a file.

In [None]:
val orderItemsPath = "/data/retail_db/order_items/part-00000"

import scala.io.Source
val orderItems = Source.fromFile(orderItemsPath).
    getLines.
    toList
orderItems.take(10).foreach(println)

* Get revenue for a given order_item_order_id.

In [None]:
def getOrderRevenue(orderItems: List[String], orderId: Int) = {   
    val orderItemsFiltered = orderItems.
        filter(orderItem => orderItem.split(",")(1).toInt == orderId)
    val orderItemsMap = orderItemsFiltered.
        map(orderItem => orderItem.split(",")(4).toFloat)
    orderItemsMap.sum
}

In [None]:
val orderItemsPath = "/data/retail_db/order_items/part-00000"

import scala.io.Source
val orderItems = Source.fromFile(orderItemsPath).
    getLines.
    toList

In [None]:
print(getOrderRevenue(orderItems, 2))