# Python collections
Python has a number of built-in collection types to help you manage your data:

We’ll look at several built-in collections.

- List

- Tuple

- Set

- Dictionary

## List
Lists are the basic datatype for accessing data that is stored according to position. A list fits into the broader category of Sequence data types. A list literal is created using brackets: **[ ]**

In [1]:
new_list = ["a", "b", "c"]
new_list

['a', 'b', 'c']

You can also create a list using a function: list()

list([iterable])
- iterable (Optional) - an object that could be a sequence (string, tuples) or collection (set, dictionary) or iterator object

In [7]:
empty_list = list()
empty_list=[]

b=list('abcdefg')
b

['a', 'b', 'c', 'd', 'e', 'f', 'g']

**List indices start from 0, not 1**:

In [4]:
list1 = [5, 2, 3, 91]
print(list1[0])
print(list1[3])

5
91


Use List slicing:

**slice=list[start:stop:step]**

slice range is half-open => **stop not included**

In [6]:
x = [3, 1, 2, 5, 6, 1]
print(x[1:4])
print(x[1:5:2])

[1, 2, 5]
[1, 5]


In [7]:
# omit some of the arguments but you still need : so the interpreter know what you mean
print(x[:4])
print(x[0::2])

[3, 1, 2, 5]
[3, 2, 6]


In [8]:
# use negative step-sizes to visit items in reverse:
print(x[::-1])

[1, 6, 5, 2, 1, 3]


Removing items from a list is easy, either by value or by position:

In [9]:
#Remove() is by value. 
y = [1, 2, 3, 31, 3, 7]
y.remove(3) #Delete the first element with this value
y

[1, 2, 31, 3, 7]

In [10]:
#by position
del y[1] #Delete the *second* element in the list
y

[1, 31, 3, 7]

Therefore, a list is a **mutable** collection of items, which means a list can be changed after it is created.

Many methods that operate on lists (including append(), extend(), remove(), sort(), and reverse()) perform their operations by modifying the list object in-place. 

These functions **do not return a useful value**. You don’t use y = x.reverse(). No useful value is assigned to y

In [14]:
x = [1, 3, 2, 5, 4]
x.reverse()
print(x)

[4, 5, 2, 3, 1]


In [12]:
y=x.reverse()
print(y)
# there is no return value from reverse()

None


In [37]:
x.sort()
print(x)

[1, 2, 3, 4, 5, 6]


In [16]:
x.index(5) #find the index for item 1 in the list

1

In [31]:
x.append(6)
print(x)

[4, 5, 2, 3, 1, 6]


We don’t assign the results of these methods to another variable because they mutate the list object itself instead of returning a value

In [32]:
#lists can contain values of multiple types
a=[1, 2, 3]
b=['z','v','d']
c=a+b # Concatenate two lists
print(c)

[1, 2, 3, 'z', 'v', 'd']


But for lists, in real business use cases, we generally assume the list contains a single, consistent type of data

Useful list functions:
- **len()** : return the length of the list[]
- **sum()** : add the items in a list

In [40]:
a=[1, 2, 3]
len(a)

3

In [42]:
a=[1, 2, 3]
sum(a)

6

In [18]:
b=['z','v','d']
sum(b)

TypeError: unsupported operand type(s) for +: 'int' and 'str'

In [48]:
max(a)

3

In [50]:
min(b)

'd'

**use Range() to generate list**
- If we give it a single value, that’s the stop value. It creates values, x, such that 0 ≤ x < stop.
- If we give it two values, those are the start and stop values. It creates values, x, such that start ≤ x < stop.
- If we give it three values, it the third value is the step. It creates values, x = {start, start+step, start+2*step, ..., } and that x < stop.

In [51]:
range(0,6)

range(0, 6)

In [52]:
list(range(0,6))

[0, 1, 2, 3, 4, 5]

In [53]:
list(range(2, 10))

[2, 3, 4, 5, 6, 7, 8, 9]

In [54]:
list(range(1, 23, 2))

[1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21]

In [55]:
sum(range(1, 11, 2))

25

The range() object is an iterator which can produce a sequence of values.

However, **it doesn’t produce anything until we consume the values**. In these examples above we’re using list() and sum() to consume the values produced by the range() object.

In [56]:
# We can also use a for statement to consume items from a range() object:
for i in range(1, 16):
    if i % 3 == 0 and i % 5 == 0:
        print("fizz", "buzz")
    elif i % 3 == 0:
        print("fizz")
    elif i % 5 == 0:
        print("buzz")
    else:
        print(i)

1
2
fizz
4
buzz
fizz
7
8
fizz
buzz
11
fizz
13
14
fizz buzz


**List Copy:**

***Python Doesn’t Copy Objects! Python Makes Shared References!***

This is important because we’ll often write something that uses two references to a single object when we thought we were making a copy. We’ll forget that Python doesn’t make copies and write this:


In [58]:
my_list = [1, 1, 2, 3, 5, 8, 13]
y=my_list
# We think we’ve made a copy, but actuall we didn't

In [59]:
#What we did was slap two sticky labels – my_list and x – on a single object - we can use id() to prove this:
# The built-in function id() returns the identity of an object as an integer. 
# This integer usually corresponds to the object’s location in memory
id(my_list)

4449174792

In [60]:
id(y)

4449174792

How will my_list change when we assign a value to y?

In [61]:
y[2]= 66
my_list

[1, 1, 66, 3, 5, 8, 13]

Therefore, when we updated the y variable, we’re really updating the underlying list object that is shared by y and my_list. This behavior is true for all mutable data structure!

**Make a copy of list**:

In [64]:
#1.Copy by slice
copy_of_list = my_list[:] #shallow copy

In [65]:
id(copy_of_list)

4449173768

In [70]:
#2. copy() method:
copy_of_list= my_list.copy() #shallow copy
id(copy_of_list)

4449184456

In [71]:
#3. the copy module:
from copy import copy
copy_of_list = copy(my_list)
id(copy_of_list)

4449218056

When we make a copy using one of above techniques, we do not have two references to the same underlying object. We actually have two objects with two different internal identifiers.
Therefore, we would have had two list objects. Changing the list referred to by copy_of_list would have no effect on the list referred to by my_list.

In [73]:
copy_of_list[1]=100
my_list

[1, 1, 66, 3, 5, 8, 13]

In [74]:
copy_of_list

[1, 100, 66, 3, 5, 8, 13]

## Tuples

Tuples are sequences – like lists – except you can’t do anything to change them.

Tuples are the data structure of choice in situations where we can’t meaningfully remove an item or append an additional item

Also, tuples are a little bit **faster** to use than lists, so you can create them and access their members quickly.

In [77]:
# use () to create a tuple literal
some_place = (-37.234, 52.876)

In [76]:
# slicing is the same as lists.
print("Lat = {0}".format(some_place[0]))  

Lat = 37.234


In [79]:
#We can use assignment to decompose a tuple, also.
lat, lon = some_place  #This works when some_place has exactly two items.

In [81]:
lat

-37.234

In [82]:
lon

52.876

In [88]:
a=(3.14,) # , to create a singleton tuple.
a

(3.14,)

In [84]:
b=() #
b

()

In [98]:
# use tuple() to create a tuple object.
a=tuple('123')
a

('1', '2', '3')

In [99]:
c=tuple()
c

()

Copying a Tuple:

Since **a tuple is immutable**,we can’t make any internal state changes to the tuple that both variables share, there’s no strangeness in changing a and seeing the effect in b because we can’t change a in the first place.
Tuple object also has no attribute copy

In [90]:
a = (1,3,4)
b = a  

In [91]:
id(b)

4449141528

In [92]:
id(a)

4449141528

## Sets
Sets have no order or positions, items are either in the set or not in the set. Consequently, **a set can only contain one of any item.** 

We can surround items with {} to create a set literal

In [96]:
{1,1, 2, 3, 4, 5, 6,6}   #the duplicate items are collapsed into a single item

{1, 2, 3, 4, 5, 6}

In [97]:
# Built a set from another sequence:
## building a set from a list
x = set([1, 2, 2, 3, 4, 3])
print(x)

{1, 2, 3, 4}

*Note: we cannot create an empty set with {}. This creates an empty dictionary. We can only create an empty set with set()*

Sets are better whenever you only need to test for membership - **much faster than lists or tuples**:

In [100]:
x = set([1, 2, 3, 4, 2])
print(5 in x)

False


In [101]:
#get things in and out of sets
y = set([3])
y.add(2)
print(y)

{2, 3}


In [109]:
y.remove(2)
print(y)

{3}


Sets support the basic binary set operations:

- Union – anything in either set is contained in the union. The | operator does set union: a | b.
- Intersection – only items contained in both sets end up in the intersection. The & operator is the intersection of two sets: a & b.
- Difference – only items in the first set that are not in the second are in the difference. The - operator does difference: a - b.
- Symmetric Difference – only items which are distinct in each set are part of the result. The ^ operator does symmetric difference: a ^ b.

These operators create a new set object that contains the results of applying the operation to the two operand sets. These parallel the way ordinary integer arithmetic works. Except it uses complex data structures.

In [112]:
c=x-y
print(c)

{1, 2, 4}


Copying a set:

When we do set_a = set_b we’re going to put two names on a single underlying set object

In [116]:
set_a = {7,8,9}
set_b = set_a

In [122]:
id(set_b)

4447923784

In [123]:
id(set_a)

4447923784

When we change the underlying set, both variables will reflect the change

In [120]:
set_b |= {19, 1} # Adds elements from another set: set |= other
set_a

{1, 7, 8, 9, 19}

In [121]:
# is operator shows us that the two variables, set_x and set_b are two references to the same underlying object
set_a is set_b

True

In [None]:
#If you want to make a copy of a set, you must do something like this:

In [124]:
set_c = set_a.copy() #This creates a shallow copy of the original set.

In [126]:
set_c is set_b

False

**we can’t have a set where each item is a list.** This is also true for dictionaries

In [127]:
{1, 2, 3}  # integers

{1, 2, 3}

In [128]:
{'a', 'b', 'c'}  # strings

{'a', 'b', 'c'}

In [129]:
{(1, 2), (3, 4)}  # tuples

{(1, 2), (3, 4)}

In [130]:
{[1, 2, 3], [1, 3, 4]}  # list doesn't work

TypeError: unhashable type: 'list'

## Dictionaries
Python dictionaries are used whenever you have a collection of objects that you want to refer to by a name (or key) rather than position. 

**Refering to objects by name is generally much better coding practice than refering to them by position so we recommend this as a general practice whenever possible.** It certainly makes code much more readable!

Dictionaries are created with {}. The key and value must be separated with :

In [131]:
x = {"key": "value", "other-key": 123}

Accessing data is similar to with lists and tuples: we surround the key with []:

In [132]:
print(x["key"])

value


In [134]:
print(x["other-key"])

123


The name you use to refer to a piece of data is the “key” and the data itself is called the “value”. 

Dictionaries are just a way of organizing a collection of key-value pairs. Clearly, **all keys must be distinct**

You can create dictionaries from a collection of key-value pairs using the **dict() function**

In [135]:
#1. list of pairs was provided a list in []‘s. Each (key, value) pair is a 2-tuple, using ()‘s.
nums = dict([("dozen", 12), ("score", 20)])

In [136]:
nums

{'dozen': 12, 'score': 20}

In [137]:
nums['dozen']

12

In [139]:
#2. We can also create a dictionary using fancy name=value arguments for the dict() function. 
# This only works when the keys will be strings that have the same syntax rules as Python variable names
x = dict(key1= "value1", key2= "value2")
x

{'key1': 'value1', 'key2': 'value2'}

We can create an empty dictionary with {} and then use it to accumulate data

Adding new elements, updating and removing elements is quite easy:

In [140]:
x = {"key1": "value1"}
x["key2"] = "value2"  # creates new entry
x["key2"] = "value3"  # updates "key2" entry
print(x)

{'key1': 'value1', 'key2': 'value3'}


In [141]:
del x["key1"]
print(x)

{'key2': 'value3'}


copy dictionaries with the **x.copy()** method, similar to list and set

Not every kind of object can be a key to a dictionary:

In [142]:
x = {}
x["key1"] = 1  # string is fine
x

{'key1': 1}

In [143]:
x[0] = 2  # int is good
x

{'key1': 1, 0: 2}

In [144]:
x[1.4] = 3  # so is float
x

{'key1': 1, 0: 2, 1.4: 3}

In [145]:
x[(1, 2, 3)] = 4  # tuple works...
x

{'key1': 1, 0: 2, 1.4: 3, (1, 2, 3): 4}

In [146]:
x[[1, 2, 3]]       #list doesn't work
x 

TypeError: unhashable type: 'list'

## Mutability
- Tuple. Immutable. A lot like a number: there’s no internal state that can be changed. There’s no sort(), reverse(), append(), or extend(). Once a tuple has been created, that’s it.
- List. Mutable. Can be updated in-place. We can sort, reverse, append to, insert into, and remove from a list.

more details:
![Screen%20Shot%202019-07-21%20at%206.58.29%20PM.png](attachment:Screen%20Shot%202019-07-21%20at%206.58.29%20PM.png)


## Assignment vs Shallow Copy vs Deep Copy
Copying a list in Python might be trickier than you think. There are 3 ways you can do it: simply using the assignment operator (=), making a shallow copy and making a deep copy

1. Assignment with an = on lists does not make a copy. Instead, assignment makes the two variables point to the one list in memory

In [147]:
colors = ['red', 'blue', 'green']
b = colors

In [148]:
b.append('white')

In [149]:
b

['red', 'blue', 'green', 'white']

In [150]:
colors

['red', 'blue', 'green', 'white']

Before we discuss shallow copy and deep copy, keep in mind that:

The difference between shallow and deep copying is **only relevant for compound objects (e.g. a list of lists, or class instances)**

2. A shallow copy constructs a new compound object and then (to the extent possible) **inserts references into it to the objects found in the original**

Shallow copy is different from assignment in that it creates a new object. So, if you make changes to the new list, such as adding or removing items, it won’t affect the original list.

In [190]:
a = [[1, 2], [2, 4]]
b = a[:] ## shallow copy

In [191]:
b.append([3, 6])
b

[[1, 2], [2, 4], [3, 6]]

In [192]:
a

[[1, 2], [2, 4]]

If the original list is a compound object (e.g. a list of lists), the elements in the new object are referenced to the original elements. (which is why it is called a shallow copy). So, if you modify the mutable elements like lists, the changes will be reflected on the original elements

In [157]:
b[0].append(3) #b[0] refers to the original elements, so a will change

In [158]:
b

[[1, 2, 3], [2, 4], [3, 6]]

In [159]:
a

[[1, 2, 3], [2, 4]]

In [160]:
b[2].append(3) #b[2] doesn't refer to the original elements, so a won't change
b

In [162]:
a

[[1, 2, 3], [2, 4]]

3. A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original

The deep copy is different from shallow copy in that the copied elements have their own pointers and are not referenced to the original elements. **Therefore, no matter how you modify the deep copy, the changes will NOT be reflected on the original list**

In [187]:
#Creating a deep copy is slower, because you are making new copies for everything
a = [[1, 2], [2, 4]]
import copy
b = copy.deepcopy(a) ## deep copy


In [188]:
b[0].append(3)
b

[[1, 2, 3], [2, 4]]

In [189]:
a

[[1, 2], [2, 4]]

**For simple lists such as a list of integers:**
- Use assignment = if you want the new changes to affect the original list
- Use shallow copy [:] if you don’t want the new changes to affect the original list

**For compound objects (e.g. a list of lists):**
- Use assignment = if you want the new changes to affect the original list
- Use deep copy if you don’t want the new changes to affect the original list

Remember: deep copy makes sure that the newly copied object is not referenced to the original object in any way.

# Iterating over Data with for Loops and Comprehensions
## The for Statement

In [195]:
list_of_numbers = [1,7,1,-5]
for number in list_of_numbers:
    print(number*4)

4
28
4
-20


This is the **for statement**, which iterates over an object containing multiple items, running the indented block of code on each item. In this example the object was a list, but **it can be any object that is an iterable**, which includes sets, tuples, dictionaries, files, and even the characters of a string

In [198]:
#Let's try something more difficult
list_of_numbers = [1,2,3,4,5,6,7,8,9]
sum_numbers = 0
even_numbers = []
sum_even_numbers = 0
for number in list_of_numbers:
    sum_numbers = sum_numbers + number
    if number % 2 == 0:
        even_numbers.append(number)
        sum_even_numbers = sum_even_numbers + number

In [199]:
print(sum_numbers)

45


In [200]:
print(even_numbers)

[2, 4, 6, 8]


In [201]:
print(sum_even_numbers)

20


The body of the for statement is five lines of code. It includes an aggregation, a filtering, and an aggregation of the filtered data. And the best part? The for statement doesn’t care how big the input list is.

This code will work to print the sum of all the even numbers in any iterable containing integers.

This is sometimes called a **loop** because the indented body forms a kind of circular loop of code.

## The range() object and enumerate() Function

Any kind of rank-ordering problem, for example, we’ll need the index position of a value as well as the value itself. We need to know which item was first in the ordering.
You could compute the position the hard way using an extra variable:

In [202]:
list_of_numbers = [118, 110, 109, 101, 104]

In [203]:
position = 0

In [204]:
for number in list_of_numbers:
    print("Value at index {0:d} is {1:d}".format(position,number))
    position = position + 1

Value at index 0 is 118
Value at index 1 is 110
Value at index 2 is 109
Value at index 3 is 101
Value at index 4 is 104


We don’t need to do this manually, however. We have better tools. We can use a range() object with the len() function to generate index values. The len() function tells how many items are in a list object. The range() object generates values from 0 to the given stop value.

Here’s how we use range(len(x)) to generate index positions:

In [205]:
list_of_numbers = [118, 110, 109, 101, 104]
for i in range(len(list_of_numbers)):
    print("Value at index {0:d} is {1:d}".format(i,list_of_numbers[i]))

Value at index 0 is 118
Value at index 1 is 110
Value at index 2 is 109
Value at index 3 is 101
Value at index 4 is 104


We can combine both approaches using the **enumerate()** function. This function will yield an iterable sequence of values. Each of those values is a pair that contains the index position and the original value from the list

In [206]:
list_of_numbers = [118, 110, 109, 101, 104]
for i, number in enumerate(list_of_numbers):
    print("Value at index {0:d} is {1:d}".format(i,number))

Value at index 0 is 118
Value at index 1 is 110
Value at index 2 is 109
Value at index 3 is 101
Value at index 4 is 104


In [207]:
fibonacci_numbers = [0,1]

In [208]:
fibonacci_numbers[-2]

0

Since enumerate() is so useful, why would you ever use range()? The primary case for range() is when you are create a new list (or other iterable,) rather than just operating on an existing list.

Here’s an example where we’re going to create 15 Fibonacci numbers. We don’t have any existing list, so we use a range(15) object to make sure our statement is executed 15 times

In [1]:
fibonacci_numbers = [0,1]
for _ in range(15):
    fibonacci_numbers.append(fibonacci_numbers[-1]+fibonacci_numbers[-2])
print(fibonacci_numbers)    

[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987]


### Mutable can be used to do assignment, shallow copy, and deep copy.
### Immutabe can only do assignment with the same reference

### Shallow object and Deep copy is only different when regarding compound object

## Loop through Dictionaries


In [3]:
dict = {'sunday':5598,
        'friday':'7998', 'thursday':5813,
        'monday':5978}

for key in dict:
    print(key, dict[key])

sunday 5598
friday 7998
thursday 5813
monday 5978


In [4]:
dic = {'a':1,
       'b':2,
       'c': 3}

for key in dic:
    print(key, dic[key])

a 1
b 2
c 3


## Break and Continue

The break statement in Python terminates the current loop and resumes execution at the next statement.

The most common use for break is when some external condition is triggered requiring a hasty exit from a loop

In [8]:
for val in 'string':
    if val == 'i':
        break
    print(val) # be careful about the position of print(), it can result different results

s
t
r


In [17]:
for val in 'string':
    if val == 'a':
        break
print(val)

g


The continue statement in Python returns the control to the beginning of the loop 

You can use continue statement to skip the rest of the indented statements inside the for statement

In [21]:
# What is the difference between continue and pass in python?

# Pass does nothing
# Continue return to the loop

In [22]:
for val in 'string':
    if val == 'i':
        continue
    print(val)

s
t
r
n
g


In [23]:
for val in 'string':
    if val == 'i':
        continue
print(val)

g


## List Comprehension

List comprehensions are used for creating new lists from another iterable


In [24]:
list_cube = [x**3 for x in range(10)]
print(list_cube)

[0, 1, 8, 27, 64, 125, 216, 343, 512, 729]


In [25]:
list_cube = []

for x in range(10):
    list_cube.append(x**3)
    
print(list_cube)

[0, 1, 8, 27, 64, 125, 216, 343, 512, 729]


Add an if to a list comprehension:

In [27]:
list_cube_even = [x**3 for x in range(10) if x**3 % 2 == 0]
list_cube_even

[0, 8, 64, 216, 512]

In [29]:
list_cube_even = [x**2 for x in range(20) if x**2 >= 100]
print(list_cube_even)

[100, 121, 144, 169, 196, 225, 256, 289, 324, 361]


## Dictionary and Set Comprehension

In [31]:
# Comprehension also works for dictionaries

three = {x : x**3 for x in range(10)}
print(three)

# Provide key and value in {}

{0: 0, 1: 1, 2: 8, 3: 27, 4: 64, 5: 125, 6: 216, 7: 343, 8: 512, 9: 729}


In [32]:
# Set comprehension

odds = {x for x in range(10) if x % 2 == 1}
print(odds)

{1, 3, 5, 7, 9}
