# Overview of Collections - list and set

Let us get an overview of list and set as part of the Python Collections.

* Overview of list and set
* Common Operations
* Accessing Elements from Lists
* Adding Elements to list
* Updating and Deleting Elements - list
* Other list operations
* Adding and Deleting elements - set
* Typical set operations
* list and set - Usage

## Overview of list and set

There are 4 types of collections in Python. While `list` and `set` fundamentally contain homogeneous elements, `dict` and `tuple` contain heterogeneous elements.
* Homogeneous means of same type.
* Examples of collections with homogeneous elements.
  * Collection of employees - `list`
  * Collection of unique employees - `set`
  * Collection of integers - `list`
  * Collection of unique integers - `set`
* Based up on the requirement we should use appropriate type of collection.
* `list`
  * Group of homogenous elements.
  * There can be duplicates in the `list`.
  * `list` can be created by enclosing elements in `[]` - example `[1, 2, 3, 4]`.
  * Empty `list` can be initialized using `[]` or `list()`.
* `set`
  * Group of homogenous elements
  * No duplicates allowed in the `set`. Even if you add same element more than once, such elements will be ignored.
  * `set` can be created by enclosing elements in `{}` - example `{1, 2, 3, 4}`.
  * Empty `set` can be initialized using `set()`. We cannot initialize empty set using `{}` as it will be treated as empty `dict`.
* `list` and `set` can be analogous to Table with columns and rows while `dict` and `tuple` can be analogous to a row with in a table.
* `list` can hold duplicate values while `set` can only hold unique values.
* If you want to have a row with column names then we use `dict` otherwise we use `tuple`.
* We will deep dive into all types of collections to get better understanding about them.

In [154]:
l = [1, 2, 3, 3, 4, 4]

In [155]:
l

[1, 2, 3, 3, 4, 4]

In [156]:
l = []

In [157]:
l

[]

In [158]:
type(l)

list

In [159]:
l = list()

In [160]:
l

[]

In [161]:
s = {1, 2, 3, 3, 4, 4}

In [162]:
s

{1, 2, 3, 4}

In [163]:
type(s)

set

In [1]:
s = set() # Initializing empty set

In [165]:
s

set()

In [166]:
s = {} # s will be of type dict

In [167]:
type(s)

dict

## Common Operations

There are some functions which can be applied on all collections. Here we will see details related to `list` and `set`.
* `in` - check if element exists
* `len` - to get the number of elements.
* `sorted` - to sort the data (original collection will be untouched). Typically, we assign the result of sorting to a new collection.
* `sum`, `min`, `max`, etc - arithmetic operations.
* There can be more such functions.

In [13]:
l = [1, 2, 3, 4] # list

In [14]:
1 in l

True

In [15]:
5 in l

False

In [16]:
len(l)

4

In [17]:
sorted(l)

[1, 2, 3, 4]

In [18]:
sum(l)

10

In [18]:
s = {1, 2, 3, 4} # set

In [19]:
1 in s

True

In [20]:
5 in s

False

In [20]:
len(s)

4

In [21]:
sorted(s)

[1, 2, 3, 4]

In [22]:
sum(s)

10

## Accessing Elements from list

Let us see how we can access elements from the `list`.
* We can access a particular element in a `list` by using index `l[index]`. Index starts with 0.
* We can also pass index and length up to which we want to access elements using `l[index:length]`
* Index can be negative and it will provide elements from the end. We can get last n elements by using `l[-n:]`.
* Let us see few examples.

In [139]:
l = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [140]:
l[0] # getting first element

1

In [141]:
l[2:4] # get elements from 3rd up to 4 elements

[3, 4]

In [142]:
l[-1] # get last element

10

In [143]:
l[-4:] # get last 4 elements

[7, 8, 9, 10]

In [146]:
l[-5:-2] # get elements from 6th to 8th

[6, 7, 8]

## Adding Elements to list

We can perform below operations to add elements to the list.
* `append` - to add elements at the end of the list.
* `insert` - to insert an element at the index specified. All the elements from that index will be moved to right side.
* `extend` - to extend the list by appending elements from other list.
* We can also append the list using `+`

In [62]:
l = [1, 2, 3, 4]

In [63]:
l.append?

[0;31mSignature:[0m [0ml[0m[0;34m.[0m[0mappend[0m[0;34m([0m[0mobject[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Append object to the end of the list.
[0;31mType:[0m      builtin_function_or_method


In [64]:
l.append(5)

In [65]:
l

[1, 2, 3, 4, 5]

In [66]:
l = l + [6]

In [67]:
l

[1, 2, 3, 4, 5, 6]

In [68]:
l = l + [7, 8, 9, 10]

In [69]:
l

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [70]:
l.insert?

[0;31mSignature:[0m [0ml[0m[0;34m.[0m[0minsert[0m[0;34m([0m[0mindex[0m[0;34m,[0m [0mobject[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Insert object before index.
[0;31mType:[0m      builtin_function_or_method


In [71]:
l.insert(3, 13)

In [72]:
l

[1, 2, 3, 13, 4, 5, 6, 7, 8, 9, 10]

In [73]:
l.extend?

[0;31mSignature:[0m [0ml[0m[0;34m.[0m[0mextend[0m[0;34m([0m[0miterable[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Extend list by appending elements from the iterable.
[0;31mType:[0m      builtin_function_or_method


In [74]:
l.extend([11, 12])

In [75]:
l

[1, 2, 3, 13, 4, 5, 6, 7, 8, 9, 10, 11, 12]

## Updating and Deleting Elements - list

Here is how we can update elements in the list as well as delete elements from the list.
* We can assign an element to the list using index to update.
* There are multiple functions to delete elements from list.
  * `remove` - delete the first occurrence of the element from the list.
  * `pop` - delete the element from the list using index.
  * `clear` - deletes all the elements from the list.

In [76]:
l = [1, 2, 3, 4]

In [77]:
l[1] = 11

In [78]:
l

[1, 11, 3, 4]

In [97]:
l = [1, 2, 3, 4, 4, 6]

In [98]:
l.remove?

[0;31mSignature:[0m [0ml[0m[0;34m.[0m[0mremove[0m[0;34m([0m[0mvalue[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Remove first occurrence of value.

Raises ValueError if the value is not present.
[0;31mType:[0m      builtin_function_or_method


In [99]:
l.remove(4)

In [100]:
l

[1, 2, 3, 4, 6]

In [101]:
l.pop?

[0;31mSignature:[0m [0ml[0m[0;34m.[0m[0mpop[0m[0;34m([0m[0mindex[0m[0;34m=[0m[0;34m-[0m[0;36m1[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Remove and return item at index (default last).

Raises IndexError if list is empty or index is out of range.
[0;31mType:[0m      builtin_function_or_method


In [102]:
l.pop()

6

In [103]:
l

[1, 2, 3, 4]

In [104]:
l.pop(2)

3

In [105]:
l

[1, 2, 4]

In [113]:
l.clear()

In [114]:
l

[]

## Other list operations

Here are some of the other list operations we frequently use.
* `count` - number of time an element is present in a list.
* `sort` - to sort the data with in the list. Data in the list will be sorted in-place.

In [122]:
s ='asdfasfsafljojlsdfaljfasf'

In [123]:
l = list(s)

In [124]:
l.count?

[0;31mSignature:[0m [0ml[0m[0;34m.[0m[0mcount[0m[0;34m([0m[0mvalue[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Return number of occurrences of value.
[0;31mType:[0m      builtin_function_or_method


In [125]:
l.count('a')

5

In [126]:
l.count('z')

0

In [127]:
l.sort?

[0;31mSignature:[0m [0ml[0m[0;34m.[0m[0msort[0m[0;34m([0m[0;34m*[0m[0;34m,[0m [0mkey[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mreverse[0m[0;34m=[0m[0;32mFalse[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Stable sort *IN PLACE*.
[0;31mType:[0m      builtin_function_or_method


In [128]:
l.sort()

In [129]:
l

['a',
 'a',
 'a',
 'a',
 'a',
 'd',
 'd',
 'f',
 'f',
 'f',
 'f',
 'f',
 'f',
 'j',
 'j',
 'j',
 'l',
 'l',
 'l',
 'o',
 's',
 's',
 's',
 's',
 's']

In [130]:
l.reverse?

[0;31mSignature:[0m [0ml[0m[0;34m.[0m[0mreverse[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Reverse *IN PLACE*.
[0;31mType:[0m      builtin_function_or_method


In [131]:
l.reverse()

In [132]:
l

['s',
 's',
 's',
 's',
 's',
 'o',
 'l',
 'l',
 'l',
 'j',
 'j',
 'j',
 'f',
 'f',
 'f',
 'f',
 'f',
 'f',
 'd',
 'd',
 'a',
 'a',
 'a',
 'a',
 'a']

## Adding and Deleting elements - set

Let us see how we can add and delete elements to the set.
* We can add elements to `set` or update existing ones.
  * `add`
  * `update`
  * `union`
* We can delete elements from the `set` using different functions.
  * `pop`
  * `remove`
  * `discard`
  * `clear`

In [168]:
s = {1, 2, 3, 3, 3, 4, 4}

In [170]:
s.add(5)

In [171]:
s

{1, 2, 3, 4, 5}

In [172]:
s.update?

[0;31mDocstring:[0m Update a set with the union of itself and others.
[0;31mType:[0m      builtin_function_or_method


In [174]:
s.update({4, 5, 6, 7}) # Updates the set on which update is invoked

In [175]:
s

{1, 2, 3, 4, 5, 6, 7}

In [176]:
s.union?

[0;31mDocstring:[0m
Return the union of sets as a new set.

(i.e. all elements that are in either set.)
[0;31mType:[0m      builtin_function_or_method


In [178]:
s = {1, 2, 3, 4, 5}
s.union({4, 5, 6, 7}) # Creates new set

{1, 2, 3, 4, 5, 6, 7}

In [179]:
s

{1, 2, 3, 4, 5}

In [181]:
s.pop?

[0;31mDocstring:[0m
Remove and return an arbitrary set element.
Raises KeyError if the set is empty.
[0;31mType:[0m      builtin_function_or_method


In [182]:
s.pop()

1

In [183]:
s

{2, 3, 4, 5}

In [184]:
s.remove?

[0;31mDocstring:[0m
Remove an element from a set; it must be a member.

If the element is not a member, raise a KeyError.
[0;31mType:[0m      builtin_function_or_method


In [185]:
s.remove(4)

In [186]:
s

{2, 3, 5}

In [187]:
s.remove(6) # 6 does not exist, throws KeyError

KeyError: 6

In [194]:
s = {1, 2, 3, 4, 5}

In [195]:
s.discard?

[0;31mDocstring:[0m
Remove an element from a set if it is a member.

If the element is not a member, do nothing.
[0;31mType:[0m      builtin_function_or_method


In [196]:
s.discard(4)

In [197]:
s

{1, 2, 3, 5}

In [198]:
s.discard(6)

In [199]:
s

{1, 2, 3, 5}

In [190]:
s.clear?

[0;31mDocstring:[0m Remove all elements from this set.
[0;31mType:[0m      builtin_function_or_method


In [191]:
s.clear()

In [192]:
s

set()

## Typical set operations

We typically perform below operations on set. These are typical mathematical set operations.
* `union` - get all unique elements from 2 or more sets.
* `intersection` - get common elements between 2 or more sets.
* `difference` - get operations from one set but not in other set.

All the above functions generate a new set.

In [1]:
s1 = {1, 2, 3, 4}

In [2]:
s2 = {3, 4, 5, 6, 7}

In [3]:
s1.union(s2)

{1, 2, 3, 4, 5, 6, 7}

In [4]:
s1.intersection(s2)

{3, 4}

In [5]:
s1.difference(s2)

{1, 2}

In [6]:
s2.difference(s1)

{5, 6, 7}

## Validating set

Here are some of the operations that can be performed to validate sets.

* Checking if an element exists (using in operator).
* `issubset` - checking if first set is subset of second set
* `issuperset` - checking if first set is superset of second set
* `isdisjoint` - check if 2 sets have common elements


In [9]:
s = {1, 2, 3, 3, 4, 4, 4, 5}

In [11]:
1 in s

True

In [23]:
s1 = {1, 2, 3}

In [24]:
s2 = {1, 2, 3, 4, 5}

In [25]:
s1.issubset?

[0;31mDocstring:[0m Report whether another set contains this set.
[0;31mType:[0m      builtin_function_or_method


In [26]:
s1.issubset(s2)

True

In [27]:
s1.issuperset(s2)

False

In [28]:
s2.issuperset(s1)

True

In [29]:
s1.issuperset(s2)

False

In [31]:
s1.isdisjoint?

[0;31mDocstring:[0m Return True if two sets have a null intersection.
[0;31mType:[0m      builtin_function_or_method


In [32]:
s1 = {1, 2, 3, 4}

In [35]:
s2 = {3, 4, 5, 6, 7}

In [37]:
s1.isdisjoint(s2)

False

In [38]:
s1 = {1, 2, 3, 4}

In [39]:
s2 = {5, 6, 7}

In [41]:
s1.isdisjoint(s2)

True

## list and set - Usage

Let us see some real world usage of list and set.
* list is used more often than set.
  * Reading data from file into a list
  * Reading data from a table into a list
* We can convert a list to set to perform these operations.
  * Get unique elements from the list
  * Perform set operations between 2 lists such as union, intersection, difference etc.
* We can convert a set to list to perform these operations.
  * Reverse the collection
  * Append multiple collections to create new collections while retaining duplicates
* You will see some of these in action as we get into other related topics down the line

In [2]:
# Reading data from file into a list
path = '/Users/itversity/Research/data/retail_db/orders/part-00000'
# C:\\users\\itversity\\Research
orders_file = open(path)

In [3]:
orders_raw = orders_file.read()

In [4]:
orders = orders_raw.splitlines()

In [5]:
orders[:10]

['1,2013-07-25 00:00:00.0,11599,CLOSED',
 '2,2013-07-25 00:00:00.0,256,PENDING_PAYMENT',
 '3,2013-07-25 00:00:00.0,12111,COMPLETE',
 '4,2013-07-25 00:00:00.0,8827,CLOSED',
 '5,2013-07-25 00:00:00.0,11318,COMPLETE',
 '6,2013-07-25 00:00:00.0,7130,COMPLETE',
 '7,2013-07-25 00:00:00.0,4530,COMPLETE',
 '8,2013-07-25 00:00:00.0,2911,PROCESSING',
 '9,2013-07-25 00:00:00.0,5657,PENDING_PAYMENT',
 '10,2013-07-25 00:00:00.0,5648,PENDING_PAYMENT']

In [14]:
len(orders) # same as number of records in the file

68883

In [6]:
# Get unique dates
dates = ['2013-07-25 00:00:00.0', '2013-07-25 00:00:00.0', '2013-07-26 00:00:00.0', '2014-01-25 00:00:00.0']

In [7]:
dates

['2013-07-25 00:00:00.0',
 '2013-07-25 00:00:00.0',
 '2013-07-26 00:00:00.0',
 '2014-01-25 00:00:00.0']

In [8]:
len(dates)

4

In [9]:
set(dates)

{'2013-07-25 00:00:00.0', '2013-07-26 00:00:00.0', '2014-01-25 00:00:00.0'}

In [10]:
len(dates)

4

In [19]:
# Creating new collection retaining duplicates using 2 sets
s1 = {'2013-07-25 00:00:00.0', '2013-07-26 00:00:00.0', '2014-01-25 00:00:00.0'}

In [20]:
s2 = {'2013-08-25 00:00:00.0', '2013-08-26 00:00:00.0', '2014-01-25 00:00:00.0'}

In [21]:
s1.union(s2)

{'2013-07-25 00:00:00.0',
 '2013-07-26 00:00:00.0',
 '2013-08-25 00:00:00.0',
 '2013-08-26 00:00:00.0',
 '2014-01-25 00:00:00.0'}

In [22]:
len(s1.union(s2))

5

In [26]:
s = list(s1) + list(s2)

In [28]:
s

['2013-07-25 00:00:00.0',
 '2013-07-26 00:00:00.0',
 '2014-01-25 00:00:00.0',
 '2013-08-25 00:00:00.0',
 '2013-08-26 00:00:00.0',
 '2014-01-25 00:00:00.0']

In [29]:
len(s)

6