Book Reference: page 51-59 of **Python for Data Analysis Book by Wes McKinney**

Data Structures and Sequences

I. Data Structures  
II. Create your own reusable Python functions  
II. Mechanics of Python file objects and interacting with your local hard drive 

In [1]:
# Tuple
# fixed-length
# immutable sequence of Python obj

tup = 4,5,6
nested_tup = ((1,2,3),(4,5,6))
print(tup)
print(nested_tup)

tup = tuple('string')
print(tup)


print("Concatenation: ")
print("method 1")
concatenation = (4, None, 'foo') + (6, 0) + ('bar',)
print(concatenation)

print("method 2")
concatenation = ('happy', 'girl') * 5
print(concatenation)

In [2]:
print("Unpacking tuples")
tup = (4, 5, 6)
a, b, c = tup
print(b)

tupNest = (3,2,43, (2,3,4,5))
a,b,c,(d,e,f,g) = tupNest
print(d)

In [3]:
print("In other languages swapping is like this:")

a = 99
b = 1

tmp = a 
a = b
b = tmp
print("a: {0} b: {1}".format(a,b))

print("In python:")
a, b = b,a
print("a: {0} b: {1}".format(a,b))

In other languages swapping is like this:
a: 1 b: 99
In python:
a: 99 b: 1


#### Pluck a few elements from the beginning of a tuple using

```
* rest
```

In [4]:
values = 1,2,3,4,5
a,b, *rest = values
a,b

(1, 2)

In [5]:
rest

[3, 4, 5]

In [6]:
# or
a,b, *_ = values
_

[3, 4, 5]

In [7]:
# discard *_ (unwanted variables)

In [8]:
# List
# variable-length and theri contents can be modified in-place
# frequently used in data processing as a way to materialize an 
# iterator or generator expression

gen = range(10)
gen

range(0, 10)

In [9]:
list(gen)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [10]:
# adding and removing elements

b_list = ['food', 'heart', 'cloud']
b_list.append('dwarf')
b_list

['food', 'heart', 'cloud', 'dwarf']

In [11]:
b_list.insert(1, 'red')
b_list

['food', 'red', 'heart', 'cloud', 'dwarf']

### <span style="color:red">Warnings on Tuples</span> 

- Insert is computationally expensive compared with append because of the shifting of the subsequent elements
- If you need to insert at start and end of a seq, explore `collections.deque`, a double-ended queue


In [12]:
# pop item by index
b_list.pop(2)

'heart'

In [13]:
b_list

['food', 'red', 'cloud', 'dwarf']

In [14]:
b_list.append('food')
b_list

['food', 'red', 'cloud', 'dwarf', 'food']

In [15]:
# remove by value
b_list.remove('food') ## REMOVES ONLY FIRST VALUE FOUND FROM THE LIST
b_list

['red', 'cloud', 'dwarf', 'food']

### <span style="color:red">Warnings on Lists</span> 

- If performance is not a concern, by using append and remove, you can use a Python list as a perfectly suitable “multiset” data structure.
- searching in lists: linear scan WHILE
- searching in dicts, sets: constant time (based on hash table)

In [63]:
everything = ['Taylor', 22, None]
to_add = ['Swift', 1989, "December", 1, 2, 3]
long_list = [to_add] * 1000

In [64]:
# concatenating and combining lists
# method 1 (the PLUS method)


def method1(everything, long_list):
    for chunk in long_list: 
        everything = everything + chunk

%timeit method1(everything, long_list)

6.04 ms ± 107 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [66]:
# method 2 (the extend method)
def method2(everything, long_list):
    for chunk in long_list:
        everything.extend(chunk)

%timeit method2

23.7 ns ± 0.683 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


### <span style="color:red"> Note that list concatenation by addition is a comparatively expensive operation since a new list must be created and the objects copied over. Using extend to append elements to an existing list, especially if you are building up a **large list**, is usually preferable. <span>
    
If small list lang ang iaappend, okay lang din yung addition method.

In [69]:
# Sorting a list
a = [10, 13, 2, 4,3,2, 88, 23]
a.sort()

In [70]:
a

[2, 2, 3, 4, 10, 13, 23, 88]

In [72]:
b = ['saw', 'small', 'He', 'foxes', 'six']
b.sort(key=len)
# note the extra sort **key** argument to sort words acc to len

b

['He', 'saw', 'six', 'small', 'foxes']

## Binary search 

The built-in bisect module implements binary search and insertion into a sorted list. `bisect.bisec`t finds the location where an element should be inserted to keep it sorted,while `bisect.insort` actually inserts the element into that location:

In [81]:
import bisect
c = [1, 2, 2, 2, 3, 4, 7]
bisect.bisect(c,2) # returns index of the insert location

4

In [83]:
bisect.bisect(c, 5)

6

In [85]:
bisect.insort(c, 6) # insort actually inserts the element to the insert location
c

[1, 2, 2, 2, 3, 4, 6, 6, 7]

## Slicing

In [87]:
seq = [7, 2, 3, 7, 5, 6, 0, 1]
seq[1:5] # outputs seq with index 1 to 4 (5 is excluded)

[2, 3, 7, 5]

In [88]:
seq[3:4] = [6, 3]

In [90]:
seq # the number 7 in seq[3] is replaced by [6, 3]

[7, 2, 3, 6, 3, 5, 6, 0, 1]

In [91]:
seq[:5] # outputs seq with index 0 to 4 
# Outputs first 5 elements

[7, 2, 3, 6, 3]

In [92]:
seq[5:] # outputs last 5 elements

[5, 6, 0, 1]

In [94]:
seq[-4:] # Outputs last 4 elements

[5, 6, 0, 1]

In [95]:
seq[-6:-2] # Outputs last 5 elements except the last 2 elements

[6, 3, 5, 6]

In [97]:
seq[::2] # number after the 2nd colon is the increment

[7, 3, 3, 6, 1]

In [98]:
seq[::-1] # reversing the list

[1, 0, 6, 5, 3, 6, 3, 2, 7]

### Built-in Sequence Functions
- Enumerate
    - 

In [100]:
some_list = ['foo', 'bar', 'baz']
mapping = {}
for i, v in enumerate(some_list):
    mapping[v] = i
    
mapping

{'bar': 1, 'baz': 2, 'foo': 0}