# Collections and Tuples

As part of this module we will get an overview of collections and tuples that are part of the standard library of Python.

* Overview of Collections and Tuples
* Tuples
* Collections - list
* Collections - set
* Collections - dict
* List of Tuples
* Using Data Structures

## Overview of Collections and Tuples
Let us understand details about Collections and Tuples.
* A Collection is nothing but a group of homogeneous elements while Tuple is a group of heterogeneous elements.
* Collection is like a spreadsheet or a table while Tuple is like one row in them. We typically create a collection of objects or tuples.
* Standard library of Python covers 3 types of collections.
  * list
  * set
  * dict
* Depending upon the characteristics of each collection type, we have different functions. We will see those details later.
* There are some functions which are applicable to all.
  * Getting a number of elements in a collection or a tuple - len
  * Getting the sum of all elements in a collection or a tuple of integers - sum


## Tuples
Now let us understand definition and characteristics of a tuple.
* Tuple is like object with unnamed attributes
* Values of attributes can be accessed only using positional notation
* It represents individual row in a table or spreadsheet with multiple attributes
* We use () to represent tuples
* Tuples are immutable
* Very limited operations are available - e.g.: count, index

### Tasks
Let us perform few tasks related to tuples.

* Create 3 tuples with order_id, order_date, order_customer_id, order_status.

| order_id | order_date | order_customer_id | order_status |
| --- | --- | --- | --- |
| 1 | 2013-07-25 00:00:00.0 | 11599 | CLOSED |
| 2 | 2013-07-25 00:00:00.0 | 256 | PENDING_PAYMENT |
| 3 | 2013-07-25 00:00:00.0 | 12111 | COMPLETE |


In [None]:
employee1 = (1, "Scott", "Tiger", 1000.0, "united states")
employee2 = (2, "Henry", "Ford", 1250.0, "India")
employee3 = (3, 'Nick', 'Junior', 750.0, 'united KINGDOM')

In [None]:
type(employee1)

In [None]:
help(employee1)

## Collections - list
Let us understand **list** in detail.
* Group of elements with index and length
* Elements can be added/inserted at a particular position
* We can access elements in list by using index in []
* There can be duplicates in a list
* APIs are available to add elements to the list, delete elements from the list and sort the list

### Tasks
Let us perform few tasks to understand more about list operations.

* Create list of employees. Make sure each item in the list is a tuple.

In [65]:
# Creating list
# We are creating list of tuples

employees = [(1, "Scott", "Tiger", 1000.0, "united states"),
             (2, "Henry", "Ford", 1250.0, "India"),
             (3, "Nick", "Junior", 750.0, "united KINGDOM"),
             (4, "Bill", "Gomes", 1500.0, "AUSTRALIA")
            ]

In [66]:
help(employees.append)

Help on built-in function append:

append(object, /) method of builtins.list instance
    Append object to the end of the list.



* Adding elements into list (append, insert)

In [67]:
# Appending element to the list

employees.append((5, "Donald", "Duck", 1800.0, "USA"))

employees

[(1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA')]

In [68]:
# Getting help about insert

help(employees.insert)

Help on built-in function insert:

insert(index, object, /) method of builtins.list instance
    Insert object before index.



In [69]:
# Inserting element into the list

employees.insert(3, (6, "Mickey", "Mouse", 2000.0, "Disney Land"))

employees

[(1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (6, 'Mickey', 'Mouse', 2000.0, 'Disney Land'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA')]

* Deleting elements from list (pop, clear)

In [70]:
employees.pop?

[0;31mSignature:[0m [0memployees[0m[0;34m.[0m[0mpop[0m[0;34m([0m[0mindex[0m[0;34m=[0m[0;34m-[0m[0;36m1[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Remove and return item at index (default last).

Raises IndexError if list is empty or index is out of range.
[0;31mType:[0m      builtin_function_or_method


In [71]:
employees.pop(3)

(6, 'Mickey', 'Mouse', 2000.0, 'Disney Land')

In [72]:
employees.pop()

(5, 'Donald', 'Duck', 1800.0, 'USA')

In [73]:
employees.clear?

[0;31mSignature:[0m [0memployees[0m[0;34m.[0m[0mclear[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Remove all items from list.
[0;31mType:[0m      builtin_function_or_method


In [74]:
employees.clear()

In [75]:
employees

[]

In [76]:
employees = [(1, "Scott", "Tiger", 1000.0, "united states"),
             (2, "Henry", "Ford", 1250.0, "India"),
             (3, "Nick", "Junior", 750.0, "united KINGDOM"),
             (4, "Bill", "Gomes", 1500.0, "AUSTRALIA")
            ]
employees.append((5, "Donald", "Duck", 1800.0, "USA"))
employees.insert(3, (6, "Mickey", "Mouse", 2000.0, "Disney Land"))

employees

[(1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (6, 'Mickey', 'Mouse', 2000.0, 'Disney Land'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA')]

* Checking how many times an element is repeated in list (count)

In [77]:
l = [1, 2, 4, 4, 5, 7, 3, 4, 2, 1]

In [78]:
l.count(4)

3

In [79]:
s = '1244573421'

In [81]:
s.count('4')

3

* Get the position of element (index)

In [82]:
# Getting position of the element in the list

employees.index((2, 'Henry', 'Ford', 1250.0, 'India'))

1

In [83]:
l = [1, 2, 4, 4, 5, 7, 3, 4, 2, 1]

In [84]:
l.index?

[0;31mSignature:[0m [0ml[0m[0;34m.[0m[0mindex[0m[0;34m([0m[0mvalue[0m[0;34m,[0m [0mstart[0m[0;34m=[0m[0;36m0[0m[0;34m,[0m [0mstop[0m[0;34m=[0m[0;36m9223372036854775807[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Return first index of value.

Raises ValueError if the value is not present.
[0;31mType:[0m      builtin_function_or_method


In [85]:
l.index(4)

2

In [86]:
l.index(4, 3)

3

In [87]:
l.index(4, 4)

7

* Accessing elements in list using index and range of index (from the beginning). As `str` is nothing but list of characters, these worked for strings in the past

In [88]:
l = [1, 2, 4, 4, 5, 7, 3, 4, 2, 1]

In [89]:
l[0:3]

[1, 2, 4]

In [90]:
l[3:6]

[4, 5, 7]

In [91]:
l[:6]

[1, 2, 4, 4, 5, 7]

In [92]:
l[3:]

[4, 5, 7, 3, 4, 2, 1]

In [93]:
employees = [(1, "Scott", "Tiger", 1000.0, "united states"),
             (2, "Henry", "Ford", 1250.0, "India"),
             (3, "Nick", "Junior", 750.0, "united KINGDOM"),
             (4, "Bill", "Gomes", 1500.0, "AUSTRALIA")
            ]
employees.append((5, "Donald", "Duck", 1800.0, "USA"))
employees.insert(3, (6, "Mickey", "Mouse", 2000.0, "Disney Land"))

employees

[(1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (6, 'Mickey', 'Mouse', 2000.0, 'Disney Land'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA')]

In [94]:
# Accessing elements in the list

employees[0]

(1, 'Scott', 'Tiger', 1000.0, 'united states')

In [95]:
employees[5]

(5, 'Donald', 'Duck', 1800.0, 'USA')

In [96]:
employees[1:2]

[(2, 'Henry', 'Ford', 1250.0, 'India')]

In [97]:
employees[:3]

[(1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM')]

In [98]:
employees[-3:]

[(6, 'Mickey', 'Mouse', 2000.0, 'Disney Land'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA')]

* Accessing elements in list using index and range of index (from the end).

In [99]:
l = [1, 2, 4, 4, 5, 7, 3, 4, 2, 1]

In [100]:
len(l)

10

In [101]:
l[-3:]

[4, 2, 1]

In [102]:
l[-5:-2]

[7, 3, 4]

* Sorting elements in the list (sort for in place sort and sorted for sorting and creating new collection)

In [103]:
# In place sorting - data in employees will be sorted
employees.sort()
employees

[(1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA'),
 (6, 'Mickey', 'Mouse', 2000.0, 'Disney Land')]

In [104]:
# Sorting and creating new list

employees = [(1, "Scott", "Tiger", 1000.0, "united states"),
             (2, "Henry", "Ford", 1250.0, "India"),
             (3, "Nick", "Junior", 750.0, "united KINGDOM"),
             (4, "Bill", "Gomes", 1500.0, "AUSTRALIA")
            ]

employees.append((5, "Donald", "Duck", 1800.0, "USA"))

employees.insert(3, (6, "Mickey", "Mouse", 2000.0, "Disney Land"))

# We will typically assign output of sorted to a new variable or object
# or return it
sorted(employees)

[(1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA'),
 (6, 'Mickey', 'Mouse', 2000.0, 'Disney Land')]

In [105]:
employees # employees is not updated

[(1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (6, 'Mickey', 'Mouse', 2000.0, 'Disney Land'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA')]

In [106]:
employees.sort()

In [107]:
employees # employees is sorted

[(1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA'),
 (6, 'Mickey', 'Mouse', 2000.0, 'Disney Land')]

In [108]:
sorted(employees, reverse=True)

[(6, 'Mickey', 'Mouse', 2000.0, 'Disney Land'),
 (5, 'Donald', 'Duck', 1800.0, 'USA'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (1, 'Scott', 'Tiger', 1000.0, 'united states')]

In [109]:
sorted(employees, key=lambda k: k[3])

[(3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA'),
 (6, 'Mickey', 'Mouse', 2000.0, 'Disney Land')]

In [110]:
sorted(employees, key=lambda k: k[3], reverse=True)

[(6, 'Mickey', 'Mouse', 2000.0, 'Disney Land'),
 (5, 'Donald', 'Duck', 1800.0, 'USA'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM')]

In [111]:
employees

[(1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA'),
 (6, 'Mickey', 'Mouse', 2000.0, 'Disney Land')]

In [112]:
employees.sort(key=lambda k: k[3], reverse=True)

In [113]:
employees

[(6, 'Mickey', 'Mouse', 2000.0, 'Disney Land'),
 (5, 'Donald', 'Duck', 1800.0, 'USA'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM')]

## Collections - set

Let us understand **set** in detail.
* Group of unique elements with no index or length
* Elements can be added/inserted but not at a particular position
* We can check whether the element exists using in operator
* There can be no duplicates in a set
* APIs are available to add elements to the set, delete elements from the set and perform set operations such as union, intersection etc
* We need to convert set to list to sort the data or use sorted function. There is no API available in set to sort it.

### Exercises

We will see some basic set operations by using simple examples
* Create a set of 3 employees with ids 1, 2 and 3 using elements from **employees** list.

In [114]:
employees_set = {(1, "Scott", "Tiger", 1000.0, "united states"),
                 (2, "Henry", "Ford", 1250.0, "India"),
                 (3, "Nick", "Junior", 750.0, "united KINGDOM")
                }

In [115]:
type(employees_set)

set

In [116]:
employees_set?

[0;31mType:[0m        set
[0;31mString form:[0m {(3, 'Nick', 'Junior', 750.0, 'united KINGDOM'), (1, 'Scott', 'Tiger', 1000.0, 'united states'), (2, 'Henry', 'Ford', 1250.0, 'India')}
[0;31mLength:[0m      3
[0;31mDocstring:[0m  
set() -> new empty set object
set(iterable) -> new set object

Build an unordered collection of unique elements.


* Adding elements into set (add) - Add employees with ids 4, 5.

In [118]:
employees_set.add((4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'))

In [119]:
employees_set.add((5, 'Donald', 'Duck', 1800.0, 'USA'))

In [120]:
employees_set

{(1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA')}

* Deleting elements from set (pop/remove, clear)

In [121]:
employees_set.pop?

[0;31mDocstring:[0m
Remove and return an arbitrary set element.
Raises KeyError if the set is empty.
[0;31mType:[0m      builtin_function_or_method


In [123]:
employees_set.pop()

(5, 'Donald', 'Duck', 1800.0, 'USA')

In [124]:
employees_set

{(1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA')}

In [122]:
employees_set.remove?

[0;31mDocstring:[0m
Remove an element from a set; it must be a member.

If the element is not a member, raise a KeyError.
[0;31mType:[0m      builtin_function_or_method


In [125]:
employees_set.remove((4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'))

In [126]:
employees_set

{(1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM')}

* Checking whether element is present in a set using `[]` - check whether employee with ids 2 and 7 exists in the set.

In [128]:
(2, 'Henry', 'Ford', 1250.0, 'India') in employees_set

True

In [129]:
(7, 'Henry', 'Ford', 1250.0, 'India') in employees_set

False

* Set operations (union, intersection, difference etc) - Create a new set with **employee ids** 4, 5 and 6, then perform all 3 set operations on the set created in first step and this step.

In [131]:
employees_set1 = {(1, "Scott", "Tiger", 1000.0, "united states"),
                  (2, "Henry", "Ford", 1250.0, "India"),
                  (3, "Nick", "Junior", 750.0, "united KINGDOM"),
                  (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
                  (5, 'Donald', 'Duck', 1800.0, 'USA')
                 }

In [132]:
employees_set2 = {(4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
                  (5, 'Donald', 'Duck', 1800.0, 'USA'),
                  (6, 'Mickey', 'Mouse', 2000.0, 'Disney Land')
                 }

In [133]:
employees_set1.union(employees_set2)

{(1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM'),
 (4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA'),
 (6, 'Mickey', 'Mouse', 2000.0, 'Disney Land')}

In [135]:
employees_set1.intersection(employees_set2)

{(4, 'Bill', 'Gomes', 1500.0, 'AUSTRALIA'),
 (5, 'Donald', 'Duck', 1800.0, 'USA')}

In [136]:
employees_set1.difference(employees_set2)

{(1, 'Scott', 'Tiger', 1000.0, 'united states'),
 (2, 'Henry', 'Ford', 1250.0, 'India'),
 (3, 'Nick', 'Junior', 750.0, 'united KINGDOM')}

In [137]:
employees_set2.difference(employees_set1)

{(6, 'Mickey', 'Mouse', 2000.0, 'Disney Land')}

## Collections - dict
Let us understand **dict** in detail.
* Group of key value pairs
* Keys are unique
* Values need not be unique
* We can access values using keys
* APIs are available to add new key value pairs to a dict, update values based on keys in dict, extract keys as set from dict, extract values as list from dict, to check whether key exists in the dict etc

### Tasks
We will see some basic dict operations by using simple examples
* Adding elements to dict
* Removing elements from dict (clear, pop, popitem)
* Get all keys (keys)
* Get all key value pairs (items)
* Get only values (values)


In [1]:
db = {
    "host": "dslab.itversity.com",
    "db_name": "retail_db",
    "username": "retail_fake",
    "username": "retail_user",
    "password": "itversity"
}

In [2]:
type(db)

dict

In [3]:
db

{'host': 'dslab.itversity.com',
 'db_name': 'retail_db',
 'username': 'retail_user',
 'password': 'itversity'}

In [4]:
db["port"] = "3306"

In [5]:
db

{'host': 'dslab.itversity.com',
 'db_name': 'retail_db',
 'username': 'retail_user',
 'password': 'itversity',
 'port': '3306'}

In [6]:
db.keys()

dict_keys(['host', 'db_name', 'username', 'password', 'port'])

In [7]:
db.values()

dict_values(['dslab.itversity.com', 'retail_db', 'retail_user', 'itversity', '3306'])

In [8]:
db.items() # returns list of 2 value tuples (pairs)

dict_items([('host', 'dslab.itversity.com'), ('db_name', 'retail_db'), ('username', 'retail_user'), ('password', 'itversity'), ('port', '3306')])

In [9]:
type(list(db.items())[0])

tuple

In [10]:
db['host']

'dslab.itversity.com'

In [11]:
'host' in db

True

## List of Tuples

We often create collection (list) of tuples. Let us perform few tasks related to collection of tuples.
* Create 3 tuples with order_id, order_date, order_customer_id, order_status.

|order_id|order_date|order_customer_id|order_status|
|--------|----------|-----------------|------------|
|1|2013-07-25 00:00:00.0|11599|CLOSED|
|2|2013-07-25 00:00:00.0|256|PENDING_PAYMENT|
|3|2013-07-25 00:00:00.0|12111|COMPLETE|

In [2]:
o1 = (1, '2013-07-25 00:00:00.0', 11599, 'CLOSED')

In [3]:
o2 = (2, '2013-07-25 00:00:00.0', 256, 'PENDING_PAYMENT')

In [4]:
o3 = (3, '2013-07-25 00:00:00.0', 12111, 'COMPLETE')

* Create a list of the above 3 tuples by name **orders**

In [5]:
orders = [o1, o2, o3]

In [9]:
orders =[(1, '2013-07-25 00:00:00.0', 11599, 'CLOSED'),
         (2, '2013-07-25 00:00:00.0', 256, 'PENDING_PAYMENT'),
         (3, '2013-07-25 00:00:00.0', 12111, 'COMPLETE')
        ]

In [6]:
orders[0]

(1, '2013-07-25 00:00:00.0', 11599, 'CLOSED')

In [7]:
orders[1][2]

256

## Using Data Structures

Let us understand how to leverage the data structures for data processing.

* Read data from files using basic file I/O.

In [146]:
orders_file = open('/data/retail_db/orders/part-00000')

In [147]:
type(orders_file)

_io.TextIOWrapper

In [148]:
orders_file.read?

[0;31mSignature:[0m [0morders_file[0m[0;34m.[0m[0mread[0m[0;34m([0m[0msize[0m[0;34m=[0m[0;34m-[0m[0;36m1[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Read at most n characters from stream.

Read from underlying buffer until we have n characters or we hit EOF.
If n is negative or omitted, read until EOF.
[0;31mType:[0m      builtin_function_or_method


In [149]:
orders_raw = orders_file.read()

In [150]:
type(orders_raw)

str

* Get data into collections.

In [151]:
str.split?

[0;31mSignature:[0m [0mstr[0m[0;34m.[0m[0msplit[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0msep[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mmaxsplit[0m[0;34m=[0m[0;34m-[0m[0;36m1[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Return a list of the words in the string, using sep as the delimiter string.

sep
  The delimiter according which to split the string.
  None (the default value) means split according to any whitespace,
  and discard empty strings from the result.
maxsplit
  Maximum number of splits to do.
  -1 (the default value) means no limit.
[0;31mType:[0m      method_descriptor


In [152]:
str.splitlines?

[0;31mSignature:[0m [0mstr[0m[0;34m.[0m[0msplitlines[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0mkeepends[0m[0;34m=[0m[0;32mFalse[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Return a list of the lines in the string, breaking at line boundaries.

Line breaks are not included in the resulting list unless keepends is given and
true.
[0;31mType:[0m      method_descriptor


In [154]:
orders_raw.split('\n')[:10]

['1,2013-07-25 00:00:00.0,11599,CLOSED',
 '2,2013-07-25 00:00:00.0,256,PENDING_PAYMENT',
 '3,2013-07-25 00:00:00.0,12111,COMPLETE',
 '4,2013-07-25 00:00:00.0,8827,CLOSED',
 '5,2013-07-25 00:00:00.0,11318,COMPLETE',
 '6,2013-07-25 00:00:00.0,7130,COMPLETE',
 '7,2013-07-25 00:00:00.0,4530,COMPLETE',
 '8,2013-07-25 00:00:00.0,2911,PROCESSING',
 '9,2013-07-25 00:00:00.0,5657,PENDING_PAYMENT',
 '10,2013-07-25 00:00:00.0,5648,PENDING_PAYMENT']

In [155]:
orders_raw.splitlines()[:10]

['1,2013-07-25 00:00:00.0,11599,CLOSED',
 '2,2013-07-25 00:00:00.0,256,PENDING_PAYMENT',
 '3,2013-07-25 00:00:00.0,12111,COMPLETE',
 '4,2013-07-25 00:00:00.0,8827,CLOSED',
 '5,2013-07-25 00:00:00.0,11318,COMPLETE',
 '6,2013-07-25 00:00:00.0,7130,COMPLETE',
 '7,2013-07-25 00:00:00.0,4530,COMPLETE',
 '8,2013-07-25 00:00:00.0,2911,PROCESSING',
 '9,2013-07-25 00:00:00.0,5657,PENDING_PAYMENT',
 '10,2013-07-25 00:00:00.0,5648,PENDING_PAYMENT']

* Convert data each record into tuple for better control.
* Process data based up on the problem statement using APIs that are available on top of collections.

**We will understand these as part of subsequent modules.**