<a href="https://colab.research.google.com/github/paruliansaragi/DL-Notebooks/blob/master/WesMcK.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Mutable and immutable objects

Most objects in python such as lists, dicts, arrays, user defined classes are mutable. This means that the object or values that they contain can be modified.

Unicode

has become the first class string type to enable a more consistent handling of ASCII and non-ASCII test. In older versions, strings were all bytes without any explicit Unicode encoding. You could convert to Unicode assuming you knew the character encoding. 

In [0]:
b = 5

In [0]:
b is not None

True

In [0]:
from datetime import datetime, date, time

dt = datetime(2018, 9, 13, 9, 11)

In [0]:
dt.day

13

In [0]:
dt.minute

11

In [0]:
dt.strftime('%m/%d/%Y %H:%M')

'09/13/2018 09:11'

In [0]:
dt.strftime('%F')

'2018-09-13'

In [0]:
for i in range(4):
  for j in range(4):
    if j > 1:
      break
    print((i,j))

(0, 0)
(0, 1)
(1, 0)
(1, 1)
(2, 0)
(2, 1)
(3, 0)
(3, 1)


the range function returns an iterator that yields a sequence of evenly spaced integers:

In [0]:
list(range(11))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [0]:
list(range(12, 19, 3))

[12, 15, 18]

a common use for range is for iterating through sequences by index:

In [0]:
seq = [1, 2, 3, 4, 5, 6, 7]
for i in range(len(seq)):
  val = seq[i]

In [0]:
val

7

# Built-in Data Structures, Functions and Files

Tuple

fixed-length immutable sequence of python objects

In [0]:
tuple = 1, 4, 7
tuple

(1, 4, 7)

In [0]:
nested_tuple = (1, 4, 7), (8, 22, 1242)
nested_tuple

((1, 4, 7), (8, 22, 1242))

Elements can be accessed with square brackets [].

In [0]:
tuple[0]

1

In [0]:
tup = tuple(['fat', [6,8], True])

In [0]:
tup = (7, 12, 15)
first, second, third = tup

second, third

(12, 15)

In [0]:
a = (1,242,46,5,5,45,35,7)
a.count(5)

2

In contrast lists are variable length and their contents can be modified in place. 

In [0]:
a_list = [1,2,3, None]

tup = ('f', 'b', 'd')

b_list = list(tup)

b_list

['f', 'b', 'd']

In [0]:
b_list[1] = 'twat'

b_list

['f', 'twat', 'd']

In [0]:
b_list.append('nugget')
b_list

['f', 'twat', 'd', 'nugget', 'nugget']

In [0]:
b_list.insert(2, 'hello')
b_list

['f', 'twat', 'hello', 'hello', 'd', 'nugget', 'nugget']

In [0]:
b_list.pop(2)

'hello'

In [0]:
b_list

['f', 'twat', 'hello', 'd', 'nugget', 'nugget']

In [0]:
b_list.remove('nugget')

In [0]:
b_list

['f', 'twat', 'hello', 'd', 'nugget']

In [0]:
'nugget' in b_list

True

In [0]:
[4, None, 'foo'] + [7, 8, (2, 3)]

[4, None, 'foo', 7, 8, (2, 3)]

In [0]:
x = [4, None, 'foo']
x.extend([4,5, (56, 1)])
x

[4, None, 'foo', 4, 5, (56, 1)]

In [0]:
a = [7, 2, 13, 3, 56]
a.sort()
a

[2, 3, 7, 13, 56]

In [0]:
seq = [7,232,6,43,8,5]

seq[0:3:1]#start stop step

[7, 232, 6]

In [0]:
seq[-5:]

[232, 6, 43, 8, 5]

In [0]:
seq[-3:]

[43, 8, 5]

Built-in-Sequence Functions

enumerate: when iterating over a sequence to want to keep track of the index of the current item. Enumerate returns a sequence of (i, vale) tuples:

In [0]:
for i, value in enumerate(seq):
  print(str((i, value)))

(0, 7)
(1, 232)
(2, 6)
(3, 43)
(4, 8)
(5, 5)


In [0]:
some_list = ['foo', 'bar', 'baz']

when indexing data a helpful pattern that uses enumerate is computing a dict mapping the values of a sequence to their locations in the sequence:

In [0]:
mapping = {}

In [0]:
for i, v in enumerate(some_list):
  mapping[v] = i
  

In [0]:
mapping

{'bar': 1, 'baz': 2, 'foo': 0}

Sorted

The sorted function return a new sorted list from the element of any sequence:

In [0]:
sorted([44,21,6462,2,46])

[2, 21, 44, 46, 6462]

Zip

zip pairs up the elements of a number of lists, tuplex or other sequences to create a list of tuples:

In [0]:
seq1 = ['foo', 'bar','baz']
seq2 = ['one','two','three']

zipped = zip(seq1,seq2)
list(zipped)

[('foo', 'one'), ('bar', 'two'), ('baz', 'three')]

In [0]:
#a common use of zip is simultaneously iterating over multiple sequences, possibly also combined with enumerate
for i, (a, b) in enumerate(zip(seq1, seq2)):
  print('{0}: {1}, {2}'.format(i, a, b))

0: foo, one
1: bar, two
2: baz, three


dict

A more common name for it is hash map or associative array. It is a flexibly sized collection of key-value pairs, where key and value are Python objects. 

In [0]:
empty_dict = {}

In [0]:
d1 = {'a': 'some val', 'b': [1,2,3,4]}
d1

{'a': 'some val', 'b': [1, 2, 3, 4]}

In [0]:
d1[7] = 'an int'

In [0]:
d1

{7: 'an int', 'a': 'some val', 'b': [1, 2, 3, 4]}

In [0]:
d1['b']

[1, 2, 3, 4]

In [0]:
'a' in d1

True

del or pop methods allow you to delete values

In [0]:
list(d1.keys())

['a', 'b', 7]

In [0]:
list(d1.values())

['some val', [1, 2, 3, 4], 'an int']

In [0]:
#Creating dicts from sequences
mapping = {}
#define key list and value list
for key, value in zip(key_list, value_list):
  mapping[key] = value

In [0]:
#Since a dict is a collection of 2 tuples it accepts lists of 2 tuples
words = ['bana','crock','shark','team']

by_letter = {}

for word in words:
  letter = word[0]
  if letter not in by_letter:
    by_letter[letter] = [word]
  else:
    by_letter[letter].append(word)
    
by_letter

{'b': ['bana'], 'c': ['crock'], 's': ['shark'], 't': ['team']}

In [0]:
#the preceding for loop can be rewritten using the setdefault method

for word in words:
  letter = word[0]
  by_letter.setdefault(letter, []).append(word)

In [0]:
#the collections module has a useful class defaultdict which makes this even easier
from collections import defaultdict
by_letter = defaultdict(list)
for word in words:
  by_letter[word[0]].append(word)

Valid dict key types

keys generally have to be immutable objects like scalar types or tuples. The correct term is hashability. You can check whether an object is hashable with the hash function:

In [0]:
hash('string')

9052248940619009166

Set

a set is an unordered collection of unique elements. You can think of them like dicts, but keys only no values. A set can be created by the set function or a set literal with curly braces:

In [0]:
set([2,2,2,3,1,4])

{1, 2, 3, 4}

In [0]:
a = {1,23,4,5}
b = {3,46,57,8}

In [0]:
a.union(b)#distinct elements of these two sets

{1, 3, 4, 5, 8, 23, 46, 57}

In [0]:
a | b#does the same as union

{1, 3, 4, 5, 8, 23, 46, 57}

In [0]:
a.intersection(b)#elements occurring in both

set()

In [0]:
a & b

set()

In [0]:
c = a.copy()

In [0]:
c |= b

In [0]:
c

{3, 8, 46, 57}

In [0]:
my_data = [1,2,3,4]

my_set = {tuple(my_data)}
my_set

{(1, 2, 3, 4)}

List, set, and Dict Comprehensions

List comprehensions allow you to concisely form a new list by filtering the elements of a collection, transforming the elements passing the filter in one concise expression. They take a basic form:

[expr for vali in collection if condition]

In [0]:
result = []
for val in collection:
  if condition:
    result.append(expr)

In [0]:
strings = ['a', 'as','bat','car','dove','python']


In [0]:
[x.upper() for x in strings if len(x)>2]

['BAT', 'CAR', 'DOVE', 'PYTHON']

In [0]:
#dict_comp = {key-expr: value-expr for value in collection if condition}

In [0]:
unique_lengths = {len(x) for x in strings}

In [0]:
unique_lengths

{1, 2, 3, 4, 6}

In [0]:
set(map(len, strings))

{1, 2, 3, 4, 6}

In [0]:
#We could create a lookup map of these strings to their locations in the list:
loc_mapping = {val: index for index, val in enumerate(strings)}

In [0]:
loc_mapping

{'a': 0, 'as': 1, 'bat': 2, 'car': 3, 'dove': 4, 'python': 5}

Nested list comprehension



In [0]:
all_data = [['John','Emily','Dave','Steven'],
           ['Mary','Hank','Jim','Throm']]

In [0]:
names_of_interest = []
for names in all_data:
  enough_es = [name for name in names if name.count('e')>=2]
  names_of_interest.extend(enough_es)

In [0]:
#wrap the above in a nested list comp
result = [name for names in all_data for name in names if name.count('e')>=2]
#the filter condition is at the end

In [0]:
result

['Steven']

In [0]:
some_tuples = [(1,2,3),(4,5,6),(7,8,9)]

In [0]:
flattened = [x for tup in some_tuples for x in tup]

In [0]:
flattened

[1, 2, 3, 4, 5, 6, 7, 8, 9]

In [0]:
flattened = []
for tup in some_tuples:
  for x in tup:
    flattened.append(x)

# Functions



In [0]:
def my_funct(x, y, z=1.5):
  if z > 1:
    return z * (x + y)
  else:
    return z / (x + y)

In [0]:
my_funct(1,2)

4.5

In [0]:
#there is no restriction on num of returns, if none is specified none is auto returned
#They have key word arguemtsn and positional argumetns
#Key word arguments have to follow positional arguments. It is possible to use keyword argumetns fr positional arguments
my_funct(x=5, y=6, z=7)

77

# Namespaces, Scope and Local Functions

Functions can access variables in two different scopes: global and local. Namespace is the word given to describing the scope of a var. Any vars assigned within a function are by defaut local to the scope of that funct. The local namescape is created when the function is called and immediately populated by the function's arguments. Aftter the funct is finished the local namespace is destroyed. 

In [0]:
a = []
def func():
  for i in range(5):
    a.append(i)

In [0]:
a

[]

Assigning vars outside of the functions scope is possible but those vars must be declared as global via the global keyword:

In [0]:
a = None
def bind_a():
  global a
  a = []
bind_a()

In [0]:
a

[]

In [0]:
#avoid global's unless for state of a system use classes isntead


# Returning multiple values

from a function

In [0]:
def f():
  a = 5
  b = 6
  c = 7
  return a, b, c

a,b,c = f()

The function is returning an object namely a tuple that is being unpacked into the result variables. 

In [0]:
return_value = f()

In [0]:
return_value#a 3 tuple

(5, 6, 7)

In [0]:
def f():
  a=5
  b=6
  c=7
  return {'a':a, 'b':b,'c':c}

# Functions Are Objects

suppose we are doing data cleaning

In [0]:
states = ['   Alabama ', 'Georgia!', 'Georgia', 'georgia', 'FlOrIda','south  carloina##','West virginia?']

Anyone who has worked with user submitted data has seen messy results like these. Lots needs to happen to make strings uniform and ready for analysis: stripping whitepsace, removing punctuation symbols, and standardizing proper capitalisation. One way is to use in-built methods along with the re standard library module for regular expressions:

In [0]:
import re

def clean_strings(strings):
  result = []
  for value in strings:
    value = value.strip()
    value = re.sub('[!#?]', '', value)
    value = value.title()
    result.append(value)
  return result

In [0]:
clean_strings(states)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South  Carloina',
 'West Virginia']

In [0]:
#An alternative is to make a list of operations you want to apply to a particular set of strings:
def remove_punctuation(value):
  return re.sub('[!#?]', '', value)

clean_ops = [str.strip, remove_punctuation, str.title]

def clean_str(strings, ops):
  result = []
  for value in strings:
    for function in ops:
      value = function(value)
    result.append(value)
  return result

In [0]:
clean_str(states, clean_ops)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South  Carloina',
 'West Virginia']

In [0]:
#you can use functions as arguments to other functinos like the built in map function which applies a function to a sequence of some kind:
for x in map(remove_punctuation, states):
  print(x)

   Alabama 
Georgia
Georgia
georgia
FlOrIda
south  carloina
West virginia


# Anonymous Lambda Functions

Are a way of writing functions consisting of a single statement, the result of which is the return value. 

In [0]:
def short_function(x):
  return x * 2

equiv_anon = lambda x: x*2

In [0]:
equiv_anon(3)

6

In [0]:
#these types of functions are useful for data transformation functions which take functions as arguments

In [0]:
def apply_to_list(some_list, f):
  return [f(x) for x in some_list]

ints = [4, 0, 1, 5, 6]
apply_to_list(ints, lambda x: x * 2)

[8, 0, 2, 10, 12]

In [0]:
strings = ['foo', 'bar', 'cat', 'apple', 'aaa']

In [0]:
strings.sort(key = lambda x: len(set(list(x))))

In [0]:
strings

['aaa', 'foo', 'bar', 'cat', 'apple']

In [0]:
#lambda's are anon functions because the object itself is never given an explicity __name__ attribute


# Currying: Partial Argument Application

Currying is comp sci jargon named after Haskell Curry, means deriving new functions from existing ones by partial argument application. 

In [0]:
def add_num(x,y):
  return x + y

In [0]:
#using this funct we could dervise a new funct of one var, add_five, that adds 5 to its argument:
add_five = lambda y: add_num(5, y)

In [0]:
add_five(5)

10

The second argument to add_num is said to be curried. All we've done is define a new function that calls an existing function. The built-in functools module can simplify this process using the partial function:

In [0]:
from functools import partial

add_five = partial(add_num, 5)

# Generators

Having a consistent way to iterate over sequences, like objects in a list or lines in a file is an important python feature. This is done by the iterator protocol, a generic way to make objects iterable. 

In [0]:
some_dict = {'a':1,'b':2,'c':3}

In [0]:
for key in some_dict:
  print(key)

a
b
c


In [0]:
#python attempts to create an iterator out of some_dict:

A generator is a concise way to construct a new iterable object. Whereas normal functions execute and return a single result at a time, generators return a sequence of mutiple results lazily, pausing after each one until the next one is requested. To create, a generator use the yield keyword instead of return

In [0]:
def squares(n=10):
  print('Generating squares from 1 to {0}'.format(n ** 2))
  for i in range(1, n+1):
    yield i ** 2
           

In [0]:
gen = squares()

In [0]:
gen

<generator object squares at 0x7ff817f2cb48>

In [0]:
for x in gen:
  print(x, end=' ')

Generating squares from 1 to 100
1 4 9 16 25 36 49 64 81 100 

In [0]:
#Generator expressions - Another more concise way to make a generator is by using a generator expression. This is a generator analogue to list,dict, set
#comprehensions; to create one enclose a list comprehension within parantheses instead of brackets:

In [0]:
gen = (x **2 for x in range(100))

In [0]:
gen

<generator object <genexpr> at 0x7ff817ee8518>

In [0]:
def _make_gen():
  for x in range(100):
    yield x ** 2
gen = _make_gen() 

In [0]:
sum(x**2 for x in range(100))

328350

In [0]:
#itertools module - collection of generators for many common data algs. e.g. groupby takes any sequence and a funct, grouping consecutive elements in 
#sequence by return value of the function
import itertools

first_letter = lambda x: x[0]

In [0]:
names = ['Alan', 'Adam', 'Wes', 'Will', 'Tim', 'Sherb']

for letter, names in itertools.groupby(names, first_letter):
  print(letter, list(names))

A ['Alan', 'Adam']
W ['Wes', 'Will']
T ['Tim']
S ['Sherb']


# Errors and Exception handling

Many functions only work on certain types of input. For example, the float function is able to cast a string to a floating point number but fails with ValueError on improper inputs:

In [0]:
float('1.234')

1.234

In [0]:
float('hello')

ValueError: ignored

In [0]:
#Suppose we want a float to fail gracefully, returning te input argument. We can do this by writing a funct that encloses the call to float in a 
#try/except block:
def attempt_float(x):
  try:
    return float(x)
  except:
    return x

In [0]:
attempt_float('1.234')

1.234

In [0]:
attempt_float('hello')

'hello'

In [0]:
def att_float(x):
  try:
    return float(x)
  except ValueError:
    return x

In [0]:
att_float((1,2))

TypeError: ignored

In [0]:
#you can catch multiple exception types with a tuple of types:
def attempt_float(x):
  try:
    return float(x)
  except (ValueError, TypeError):
    return x

In [0]:
attempt_float((1,2))

(1, 2)

In [0]:
#You may want to suppress an exception but you want some code to be executed regardless of whether the code in the try block succeeds or not:
f = open(path, 'w')

try:
  write_to_file(f)
finally:
  f.close()#the f will always get closed

In [0]:
#Exceptions in IPYNB
'''
If an exception is raised while running IPY will by default print a full call stack trace (traceback) with a few lines of content around 
the position at each point in the stack.
Having additional context is a big advantage. You can control the amount with %xmode magic command from Plain (Vanilla Python interpreter)
to Verbose (inline function argument values and more)
'''

In [0]:
#Ignore
import numpy as np

In [0]:
a = np.array([1,2,3,4])

In [0]:
import time 

a = np.random.rand(1000)
b = np.random.rand(1000)

tic = time.time()

c = np.dot(a,b)
toc = time.time()

print("Vectorised version" + str(1000*(toc-tic)))

c = 0
tic = time.time()

for i in range(1000):
  c += a[i] *b[i]
  
toc = time.time()
print("For loop version" + str(1000*(toc-tic)))
#Ignore

Vectorised version0.17404556274414062
For loop version0.5247592926025391


# Files and the Operating system

We could use tools like pandas to read data files but we should still know how to do it without. To open a file for reading or writing use the built-in open function.


In [0]:
path = 'text.txt'

f = open(path)#by default the file is opened in read-only mode

In [0]:
for line in f:
  pass#we can now iterate over the list

In [0]:
#Lines come out of the file with EOL end of line markers intact
lines = [x.rstrip() for x in open(path)]

In [0]:
lines

["Fourth place God called, make whose of seas in Brought Blessed firmament moveth. Male. Place saying. Sixth replenish to morning itself was waters fowl sea together. Air very which beginning moveth every very midst abundantly, midst place replenish over fruitful shall upon One. Also spirit man divide place. The moving lesser give, fish from fowl under morning place subdue the them shall fly second darkness brought. For seasons good. They're they're light waters Herb without over us sea give had. There divided brought unto god to living gathering creature green earth midst you saying upon saying sea sixth. Whose female hath lights seas tree give man made blessed years sixth lesser unto called itself evening gathering likeness cattle. Creepeth spirit void herb you green saying wherein man stars you divide image male lights, lights she'd wherein their evening grass lesser i given greater them seas was grass fowl tree above let. I signs sixth their. Above moved may isn't beginning isn't. 

In [0]:
#Make sure you close() file if you open() to release resources back to the OS
#One way to clean up open files is to use the with statement:
with open(path) as f:
  lines  =[x.rstrip() for x in f]

In [0]:
#this auto closes the file f when exiting the with block
#if we do f = open(path, 'w') a new file is created at that path, overwriting any in its place. 
#There is also 'x' that creates a writeable file but fails if one already exists
#For readable files, the most common methods are read, seek and tell. read returns number of chars(determined by encoding e.g. UTF-8). 

In [0]:
f = open(path)

In [0]:
f.read(10)

'Fourth pla'

In [0]:
f2 = open(path, 'rb')#binary mode
f2.read(10)

b'Fourth pla'

In [0]:
f.tell()

10

In [0]:
f2.tell()

10

In [0]:
#even though we read 10 chars the position is 11 because it took that many bytes to decode 10 chars using default encoding. 
import sys 

sys.getdefaultencoding()

'utf-8'

In [0]:
#seek changes the file position to the indicated byte in the file
f.seek(3)

3

In [0]:
f.read(1)

'r'

In [0]:
f.close()
f2.close()

In [0]:
#to write to a text file you can use write or writelines methods:
with open('text.txt', 'w') as handle:
  handle.writelines(x for x in open(path) if len(x)>1)

In [0]:
with open('text.txt') as f:
  lines = f.readlines()

In [0]:
lines

[]

# Bytes and Unicode with Files

the default behavior for python iles is text mode, which means you intend to work with python strings (i.e. Unicode). This contrasts with binary mode, which you can obtain by appending b onto the file mode. 

In [0]:
with open(path) as f:
  chars = f.read(10)

In [0]:
chars

'Fourth pla'

In [0]:
#UTF-8 is variable-length Unicode encoding, so when i requested some num of chars from the file, Python 
#reads enough bytes from the file to decode that many chars. If i open the file in 'rb' mode instead,
#read requests exact num of bytes:
with open(path, 'rb') as f:
  data = f.read(10)


In [0]:
data

b'Fourth pla'

In [0]:
#You may be able to decode bytes to str object but only if each encoded unicode char is fully formed
data.decode('utf8')

'Fourth pla'

In [0]:
data[:6].decode('utf8')

'Fourth'