<a href="https://colab.research.google.com/github/swethanarayanan/hello-world/blob/master/Hello_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Acknowledgements

This learning python notebook was created with the help of following textbooks

1.   Effective Python
2.   Think Stats - 2nd edition
3.   Python for data analysis - 2nd edition




### High level summary of best practices



1.   Always add doc string for functions



### Main Data structures

1.   List  [1,2,3] | Mutable, variable length

*   List can be used as a stack as well as a queue - x.pop() - stack. x.pop(0) is queue
*   You can insert or remove items from the end of a list in O(1) - constant time. But inserting or removing items from the head of a list takes linear time - O(n)
*   Ideal for LIFO queues

2.   Integer, Float, Decimal, bool, bytes (raw ASCII bytes),  None( unique instance of NoneType )

3.   Tuple (1,2,3) | Immutable

4.  Dictionary - Key value pair, or hashmap or slang structure

*   Dictionaries have a method called items that returns a sequence of tuples, where each tuple is a key-value pair

5.  String

7.  Matrix

8.  Under collections, Deque - double ended queue

*   The deque class from the collections module is a double-ended queue. It provides constant time operations for inserting or removing items from its beginning or end. This makes it ideal for first-in-first-out (FIFO) queues.

9. built-in module - datettime


Note : Lists and tuples are semantically similar 




In Python 3, there are two types that represent sequences of characters: bytes and str. Instances of bytes contain raw 8-bit values. Instances of str contain Unicode characters.

### Other Data Structures

1. Pandas Dataframe - designed for working with tabular or heterogeneous data
2. Pandas Series - A Series is a one-dimensional array-like object containing a sequence of values (of similar types to NumPy types) and an associated array of data labels, called its index. , can be used similar to a dict also
3. Numpy nd arrays  - best suited for numerical homogenous data

### Python cool stuff



1.  List comprehensions

2.  Map reduce

List comprehensions are easier to read than map


### Python syntax nuances

1) 2 ** 2 (square)

2) 2 ^ 2 - bitwise XOR

3) repr vs str

str() is used for creating output for end user while repr() is mainly used for debugging and development. repr’s goal is to be unambiguous and str’s is to be readable. For example, if we suspect a float has a small rounding error, repr will show us while str may not.

4) Private attributes aren’t rigorously enforced by the Python compiler.

5) As you can see by now, Python statements also do not need to be terminated by semicolons. Semicolons can be used, however, to separate multiple statements on a single line. Putting multiple statements on one line is generally discouraged in Python as it often makes code less readable.

6) If b = a, b and a now refer to the same object reference, Any change in a will affect b.

but if we do b = list(a), this creates a copy

7) Object references in Python have no type associated with them. Type information is stored inside the object itself. 

So you can do

a = 5
a = 'abc'

but you cant do a = 5 + 'abc'

So python is a strongly typed language

### Other Useful stuff

#### Version

In [0]:
import sys
print('Python: {}'.format(sys.version))

Python: 3.6.8 (default, Jan 14 2019, 11:02:34) 
[GCC 8.0.1 20180414 (experimental) [trunk revision 259383]]


In [0]:
a = [1,2,3]

#### Object introspection

Python int size can be arbitrarily large

In [0]:
a?

In [23]:
a = 5
isinstance(a, int)

True

In [26]:
type(a)

int

In [0]:
import sys
sys.maxsize

9223372036854775807

isinstance can accept a tuple of types if you want to check that an object’s type is among those present in the tuple:

In [0]:
isinstance(a, (int, float))

True

#### Attributes and methods

In other languages, accessing objects by name is often referred to as “reflection.”

In [0]:
a = 'hello world'
b = getattr(a, 'split') # similar to a.split()
b()

['hello', 'world']

#### Duck typing

Often you may not care about the type of an object but rather only whether it has certain methods or behavior. This is sometimes called “duck typing,” after the saying “If it walks like a duck and quacks like a duck, then it’s a duck.”

In [0]:
def isiterable(obj):
    try:
        iter(obj)
        return True
    except TypeError: # not iterable
        return False

#### Mutable and immutable

Most objects in Python, such as lists, dicts, NumPy arrays, and most user-defined types (classes), are mutable. Others, like strings and tuples, are immutable:

In [0]:
a = 'this is a string'
a[1] = 'b'

TypeError: ignored

#### %run Command
You can run any file as a Python program inside the environment of your IPython session using the %run command. 

#### Magic commands

IPython’s special commands (which are not built into Python itself) are known as “magic” commands. These are designed to facilitate common tasks and enable you to easily control the behavior of the IPython system. A magic command is any command prefixed by the percent symbol %

In [0]:
import numpy as np
%timeit np.random.randn(100, 100)

1000 loops, best of 3: 482 µs per loop


In [0]:
%pwd

'/content'

#### If else

In [0]:
x = 5
if x < 0:
    print('It\'s negative')
elif x == 0:
    print('Equal to zero')
elif 0 < x < 5:
    print('Positive but smaller than 5')
else:
    print('Positive and larger than or equal to 5')

Positive and larger than or equal to 5


#### Chaining comparisions

In [0]:
4 > 3 > 2 > 1

True

#### Break and pass

In [0]:
x = 0
if x < 0:
    print('negative!')
elif x == 0:
    # TODO: put something smart here
    pass
else:
    print('positive!')


#### Ternary operations in python

In [0]:
x = 5
'Non-negative' if x >= 0 else 'Negative'

'Non-negative'

#### Swapping

Very simple in python

In [0]:
b, a = a, b

#### _ (underscore) for unwanted variables

### List

You can append two lists in python with a +

In [0]:
a = [1] + [2,3]
a

[1, 2, 3]

In [0]:
numbers = [1,2,3]
for i in range(len(numbers)):
    print(i)
    numbers[i] = numbers[i] * 2
print(numbers)

0
1
2
[2, 4, 6]


In [0]:
result_list = [0] * 5
result_list

[0, 0, 0, 0, 0]

In [0]:
 for i, bit in enumerate(result_list):
    print(str(bit) + "," + str(i))

0,0
0,1
0,2
0,3
0,4


In [0]:
#Store all three values of the list in 3 new variables
x= [1,2,3]
a,b,c = x
a

1

In [0]:
x.pop() #Delete the last element from list and print it (i.e pop)

3

In [0]:
x.append(5)
x

[1, 2, 5]

In [0]:
x = [1,2,3]
x.append(5)
x.pop(0)

1

In [0]:
a = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
print('First four:', a[:4])
print('Last four: ', a[-4:])
print('Middle two:', a[3:-3])

First four: ['a', 'b', 'c', 'd']
Last four:  ['e', 'f', 'g', 'h']
Middle two: ['d', 'e']


In [0]:
a[:1] #evaluates to less than 1 index, hence only 0

['a']

When slicing from the start of a list, you should leave out the zero index to reduce visual noise.


In [0]:
assert a[:5] == a[0:5]


When slicing to the end of a list, you should leave out the final index because it’s redundant.

In [0]:
assert a[5:] == a[5:len(a)]

In [0]:
a[:-5] #5th character from end is NOT included

['a', 'b', 'c']

#### Stride

a[start:end:stride]

In [0]:
a = ['red', 'orange', 'yellow', 'green', 'blue', 'purple']
odds = a[::2]
evens = a[1::2]
odds

['red', 'yellow', 'blue']

In [0]:
evens

['orange', 'green', 'purple']

#### Stride can also be used for reversing a string or a list

That works well for byte strings and ASCII characters, but it will break for Unicode characters encoded as UTF-8 byte strings.

In [0]:
a[::-1]

['purple', 'blue', 'green', 'yellow', 'orange', 'red']

#### Python is forgiving of index out of bounds

In [0]:
a[:10] 

['red', 'orange', 'yellow', 'green', 'blue', 'purple']

#### Use zip to process iterators in parallel

In [0]:
names = ['Cecilia', 'Lise', 'Marie']
letters = [len(n) for n in names]

##### Not recommended

In [0]:
longest_name = None
max_letters = 0

for i in range(len(names)):
    count = letters[i]
    if count > max_letters:
        longest_name = names[i]
        max_letters = count

print(longest_name)

Cecilia


##### Recommended

In [0]:
longest_name = None
max_letters = 0
for name, count in zip(names, letters):
    if count > max_letters:
        longest_name = name
        max_letters = count
print(longest_name)

Cecilia


#### Avoid else blocks after for and while

In [0]:
for i in range(3):
    print('Loop %d' % i)
else:
    print('Else block!')

Loop 0
Loop 1
Loop 2
Else block!


insert is computationally expensive compared with append, because references to subsequent elements have to be shifted internally to make room for the new element. If you need to insert elements at both the beginning and end of a sequence, you may wish to explore collections.deque, a double-ended queue, for this purpose.

In [0]:
b = ['foo', 'red', 'peekaboo', 'baz', 'dwarf']
b.insert(1, 'red')
b

['foo', 'red', 'red', 'peekaboo', 'baz', 'dwarf']

### Integer, Float

In Python 3, they made the / operator do a floating-point division, and added the // operator to do integer division (i.e. quotient without remainder); 

In [0]:
4 / 3

1.3333333333333333

In [0]:
4 // 3

1

### Datetime

In [0]:
from datetime import datetime, date, time

In [0]:
dt = datetime(2011, 10, 29, 20, 30, 21)

In [0]:
dt.day

29

Date to string

In [0]:
dt.strftime('%m/%d/%Y %H:%M')

'10/29/2011 20:30'

String to date

In [0]:
datetime.strptime('20091031', '%Y%m%d')

datetime.datetime(2009, 10, 31, 0, 0)

### Pandas Data frame

In [0]:
import pandas as pd

# Load dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']
dataset = pd.read_csv(url, names=names)

# descriptions
print(dataset.describe())

       sepal-length  sepal-width  petal-length  petal-width
count    150.000000   150.000000    150.000000   150.000000
mean       5.843333     3.054000      3.758667     1.198667
std        0.828066     0.433594      1.764420     0.763161
min        4.300000     2.000000      1.000000     0.100000
25%        5.100000     2.800000      1.600000     0.300000
50%        5.800000     3.000000      4.350000     1.300000
75%        6.400000     3.300000      5.100000     1.800000
max        7.900000     4.400000      6.900000     2.500000


In [0]:
# Create data
data = {'score': [1,2,3,4,5]}

# Create dataframe
df = pd.DataFrame(data)

df

Unnamed: 0,score
0,1
1,2
2,3
3,4
4,5


### Tuple

In python, tuples are immutable but lists are mutable

Elements can be accessed with square brackets [] as with most other sequence types.

In [0]:
t1 = ('a', 'b', 'c', 'd', 'e')

t2 = (1,2)

In [0]:
t1[4]

'e'

Cool feature, tuple assignment

In [0]:
t1, t2 = t2, t1

In [0]:
t2

('a', 'b', 'c', 'd', 'e')

#### No of occurrences of a value

In [0]:
a = (1, 2, 2, 2, 3, 4, 2)
a.count(2)

4

#### Named tuple

In [0]:
import collections
Grade = collections.namedtuple('Grade', ('score', 'weight'))

### String manipulation

Strings are a sequence of Unicode characters and therefore can be treated like other sequences, such as lists and tuples.

You can do slicing etc for strings also like s[:3]

In [0]:
addr = 'monty@python.org'
uname, domain = addr.split('@')
uname

'monty'

In [0]:
  #Reverse a string
  a =  "codementor"
  print(a[::-1])

rotnemedoc


In [0]:
#Create a single string from all the elements in list above.
a = ["Code", "mentor", "Python", "Developer"]
print(" ".join(a))

Code mentor Python Developer


In [0]:
print("code"*4+' '+"mentor"*5)

codecodecodecode mentormentormentormentormentor


You can write string literals using either single quotes ' or double quotes ".

For multiline strings with line breaks, you can use triple quotes, either ''' or """



In [0]:
c = """
This is a longer string that
spans multiple lines
"""

In [0]:
c.count('\n')

3

Escape character

In [0]:
s = '12\\34'
print(s)

12\34


r prefix for string to be treated as is

In [0]:
s = r'12\\34'
print(s)

12\\34


concatentation

In [0]:
a = 'hello'
b = 'world'
a + b

'helloworld'

#### String templating or formatting


{0:.2f} means to format the first argument as a floating-point number with two decimal places.

{1:s} means to format the second argument as a string.

{2:d} means to format the third argument as an exact integer.


In [0]:
template = '{0:.2f} {1:s} are worth US${2:d}'

template.format(4.5560, 'Argentine Pesos', 1)

'4.56 Argentine Pesos are worth US$1'

### Dictionary

In [0]:
d = {'a':0, 'b':1, 'c':2}
t = d.items()
t

dict_items([('a', 0), ('b', 1), ('c', 2)])

#### DefaultDict over Dict

Dictionaries are useful for bookkeeping and tracking statistics. One problem with dictionaries is that you can’t assume any keys are already present. That makes it clumsy to do simple things like increment a counter stored in a dictionary.

In [0]:
stats = {}
key = 'my_counter'
if key not in stats: #we can do better
   stats[key] = 0
stats[key] += 1

The defaultdict class from the collections module simplifies this by automatically storing a default value when a key doesn’t exist. All you have to do is provide a function that will return the default value each time a key is missing. In this example, the int built-in function returns 0 (see Item 23: “Accept Functions for Simple Interfaces Instead of Classes” for another example). Now, incrementing a counter is simple.



In [0]:
from collections import defaultdict
stats = defaultdict(int)
stats['my_counter'] += 1
stats.items()

dict_items([('my_counter', 1)])

### Matrix

In [0]:
#transpose a matrix
mat = [[1, 2, 3], [4, 5, 6]]
mat_t = zip(*mat)
# or mat_t = rez = [[m[j][i] for j in range(len(m))] for i in range(len(m[0]))] 
for row in mat_t: 
    print(row) 

(1, 4)
(2, 5)
(3, 6)


### List comprehensions

Python provides compact syntax for deriving one list from another. These expressions are called list comprehensions. 

Unless you’re applying a single-argument function, list comprehensions are clearer than the map built-in function for simple cases. map requires creating a lambda function for the computation, which is visually noisy.

In [0]:
def list_double_when_greater_then_five(list):
  return [num*2 for num in list if num > 5]

In [0]:
list_double_when_greater_then_five([1,2,3,4,5,6,7])

[12, 14]

In [0]:
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
squares = [x**2 for x in a]
print(squares)

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]


In [0]:
even_squares = [x**2 for x in a if x % 2 == 0]
even_squares

[4, 16, 36, 64, 100]

The filter built-in function can be used along with map to achieve the same outcome, but it is much harder to read.

#### List comprehensions in dict and set

In [0]:
chile_ranks = {'ghost': 1, 'habanero': 2, 'cayenne': 3}
rank_dict = {rank: name for name, rank in chile_ranks.items()}
chile_len_set = {len(name) for name in rank_dict.values()}
print(rank_dict)
print(chile_len_set)

{1: 'ghost', 2: 'habanero', 3: 'cayenne'}
{8, 5, 7}


#### List comprehensions also support multiple levels of looping

Beyond basic usage (see Item 7: “Use List Comprehensions Instead of map and filter”), list comprehensions also support multiple levels of looping. For example, say you want to simplify a matrix (a list containing other lists) into one flat list of all cells. Here, I do this with a list comprehension by including two for expressions. These expressions run in the order provided from left to right.



In [0]:
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat = [x for row in matrix for x in row]
print(flat)

[1, 2, 3, 4, 5, 6, 7, 8, 9]


Maintaining the two level layer

In [0]:
squared = [[x**2 for x in row] for row in matrix]
print(squared)

[[1, 4, 9], [16, 25, 36], [49, 64, 81]]


List comprehensions also support multiple if conditions. Multiple conditions at the same loop level are an implicit and expression. For example, say you want to filter a list of numbers to only even values greater than four. These two list comprehensions are equivalent.

In [0]:
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
b = [x for x in a if x > 4 if x % 2 == 0]
c = [x for x in a if x > 4 and x % 2 == 0]
assert b == c

#### Avoid complicated list comprehensions

The rule of thumb is to avoid using more than two expressions in a list comprehension. This could be two conditions, two loops, or one condition and one loop. As soon as it gets more complicated than that, you should use normal if and for statements and write a helper function

#### List comprehensions for large input

For example, say you want to read a file and return the number of characters on each line. Doing this with a list comprehension would require holding the length of every line of the file in memory. If the file is absolutely enormous or perhaps a never-ending network socket, list comprehensions are problematic. Here, I use a list comprehension in a way that can only handle small input values.

#### Generator expressions for large list comprehensions

In [0]:
it = [len(x) for x in ["hello", "python"]]
print(it)

[5, 6]


In [0]:
#Assume the list is LARGE
a = ["hello", "world"]
it = (len(x) for x in a)
print(it)

<generator object <genexpr> at 0x7f005999a2b0>


For example, say you want to read a file and return the number of characters on each line. Doing this with a list comprehension would require holding the length of every line of the file in memory. If the file is absolutely enormous or perhaps a never-ending network socket, list comprehensions are problematic.

To solve this, Python provides generator expressions, a generalization of list comprehensions and generators. Generator expressions don’t materialize the whole output sequence when they’re run. Instead, generator expressions evaluate to an iterator that yields one item at a time from the expression.

A generator expression is created by putting list-comprehension-like syntax between () characters. Here, I use a generator expression that is equivalent to the code above. However, the generator expression immediately evaluates to an iterator and doesn’t make any forward progress.

The returned iterator can be advanced one step at a time to produce the next output from the generator expression as needed (using the next built-in function). Your code can consume as much of the generator expression as you want without risking a blowup in memory usage.

Chaining generators like this executes very quickly in Python. When you’re looking for a way to compose functionality that’s operating on a large stream of input, generator expressions are the best tool for the job. 

In [0]:
for i in range(len(a)):
  print(next(it))

5
5


Another powerful outcome of generator expressions is that they can be composed together. Here, I take the iterator returned by the generator expression above and use it as the input for another generator expression.

In [0]:
roots = ((x, x**0.5) for x in it)
print(next(roots))

(5, 2.23606797749979)


IMPORTANT CATCH WHILE USING SOME GENERATOR EXPRESSIONS : The iterators returned by generator expressions are stateful, so you must be careful not to use them more than once

#### Prefer enumerate over range

In [0]:
flavor_list = ['vanilla', 'chocolate', 'pecan', 'strawberry']

In [0]:
for i in range(len(flavor_list)):
    flavor = flavor_list[i]
    print('%d: %s' % (i + 1, flavor))

Enumerate wraps any iterator with a lazy generator. This generator yields pairs of the loop index and the next value from the iterator. The resulting code is much clearer.

In [0]:
for i,f in enumerate(flavor_list):
  print('%d: %s' % (i + 1, f))

1: vanilla
2: chocolate
3: pecan
4: strawberry


You can make this even shorter by specifying the number from which enumerate should begin counting (1 in this case).

In [0]:
for i, flavor in enumerate(flavor_list, 1):
    print('%d: %s' % (i, flavor))

1: vanilla
2: chocolate
3: pecan
4: strawberry


### Map, filter and  reduce

In [0]:
alt = map(lambda x: x**2, filter(lambda x: x % 2 == 0, a))
assert even_squares == list(alt)

In [0]:
arr = map(lambda x : 2*x, [1,2,3,4,5])

In [0]:
from functools import reduce
reduce(lambda x,y : x+ y, arr)

30

In [0]:
[1]*6

[1, 1, 1, 1, 1, 1]

In [0]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


In [0]:
import sys
print(sys.version_info)
print(sys.version)

sys.version_info(major=3, minor=6, micro=8, releaselevel='final', serial=0)
3.6.8 (default, Jan 14 2019, 11:02:34) 
[GCC 8.0.1 20180414 (experimental) [trunk revision 259383]]


### Use built in data structures and algorithms

When you’re implementing Python programs that handle a non-trivial amount of data, you’ll eventually see slowdowns caused by the algorithmic complexity of your code. This usually isn’t the result of Python’s speed as a language. The issue, more likely, is that you aren’t using the best algorithms and data structures for your problem.

#### Deque


In [0]:
from collections import deque
fifo = deque()
fifo.append(1)      # Producer
fifo.append(2) 
x = fifo.popleft()  # Consumer
x

1

#### Decimal

The solution is to use the Decimal class from the decimal built-in module. The Decimal class provides fixed point math of 28 decimal points by default.

In [0]:
from decimal import Decimal
from decimal import ROUND_UP
rate = Decimal('0.05')
seconds = Decimal('5')
cost = rate * seconds / Decimal('60')
rounded = cost.quantize(Decimal('0.01'), rounding=ROUND_UP)
print(cost)
print(rounded)
print(round(cost, 2)) #Always rounds down

0.004166666666666666666666666667
0.01
0.00


#### Fractions

For representing rational numbers with no limit to precision, consider using the Fraction class from the fractions built-in module.



### Whats a python runtime
You're computer doesn't understand python natively, it only understands machine code. In order to get your machine to run python code you need some way to convert it into machine code. The programs, libraries, and configuration that allow you to do this are collectively known as the "python runtime environment".There are multiple popular runtimes for Python: CPython, Jython, IronPython, PyPy, etc.

### Python style guide
Python Enhancement Proposal #8, otherwise known as PEP 8, is the style guide for how to format Python code. Follow style guide here  : https://www.python.org/dev/peps/pep-0008/

### Encoding and decoding in python
Encoding - convert unicode (string) to UTF-8 (bytes)

Unicode has become the first-class string type to enable more consistent handling of ASCII and non-ASCII text. 

In [0]:
val = "español"

In [0]:
val

'español'

Convert Unicode string to UTF-8

In [0]:
val_utf8 = val.encode('utf-8')
val_utf8

b'espa\xc3\xb1ol'

In [0]:
type(val_utf8)

bytes

In [0]:
val_utf8.decode('utf-8')


'español'

In [0]:
a = b'this is bytes'
decoded = a.decode('utf8')
decoded

'this is bytes'

In [0]:
def to_str(bytes_or_str):
    if isinstance(bytes_or_str, bytes):
        value = bytes_or_str.decode('utf-8')
    else:
        value = bytes_or_str
    return value  # Instance of str

In [0]:
def to_bytes(bytes_or_str):
    if isinstance(bytes_or_str, str):
        value = bytes_or_str.encode('utf-8')
    else:
        value = bytes_or_str
    return value  # Instance of bytes

### Python's pithy syntax 
A pithy phrase or statement is brief but full of substance and meaning.

### Functions in python

#### Prefer exceptions over returning None

Functions that return None to indicate special meaning are error prone because None and other values (e.g., zero, the empty string) all evaluate to False in conditional expressions.

##### Wrong way

In [0]:
def divide(a, b):
    try:
        return a / b
    except ZeroDivisionError:
        return None

x, y = 0, 5
result = divide(x, y)
if not result:
  print('Invalid inputs')  # This is wrong!

Invalid inputs


##### Slightly better way

In [0]:
def divide(a, b):
    try:
        return True, a // b
    except ZeroDivisionError:
        return False, None
      
success, result = divide(0, 5)
if not success:
    print('Invalid inputs')  
else:
    print(result)
success, result = divide(5, 0)
if not success:
    print('Invalid inputs')   
else:
    print(result)

0
Invalid inputs


##### Recommended way

In [0]:
def divide(a, b):
    try:
        return a / b
    except ZeroDivisionError as e:
        raise ValueError('Invalid inputs') from e
x, y = 5, 2
try:
    result = divide(x, y)
except ValueError:
    print('Invalid inputs')
else:
    print('Result is %.1f' % result)        
x, y = 5, 0
try:
    result = divide(x, y)
except ValueError:
    print('Invalid inputs')
else:
    print('Result is %.1f' % result)  

Result is 2.5
Invalid inputs


#### Closures and variable scopes

Say you want to sort a list of numbers but prioritize one group of numbers to come first. This pattern is useful when you’re rendering a user interface and want important messages or exceptional events to be displayed before everything else.

A common way to do this is to pass a helper function as the key argument to a list’s sort method. The helper’s return value will be used as the value for sorting each item in the list. The helper can check whether the given item is in the important group and can vary the sort key accordingly.

In [0]:
def sort_priority(values, group):
    def helper(x):
        if x in group:
            return (0, x)
        return (1, x)
    values.sort(key=helper)

In [0]:
numbers = [8, 3, 1, 2, 5, 4, 7, 6]
group = {2, 3, 5, 7}
sort_priority(numbers, group)
print(numbers)

[2, 3, 5, 7, 1, 4, 6, 8]


There are three reasons why this function operates as expected:

1) Python supports closures: functions that refer to variables from the scope in which they were defined. This is why the helper function is able to access the group argument to sort_priority.

2) Functions are first-class objects in Python, meaning you can refer to them directly, assign them to variables, pass them as arguments to other functions, compare them in expressions and if statements, etc. This is how the sort method can accept a closure function as the key argument.

3) Python has specific rules for comparing tuples. It first compares items in index zero, then index one, then index two, and so on. This is why the return value from the helper closure causes the sort order to have two distinct groups.




When you reference a variable in an expression, the Python interpreter will traverse the scope to resolve the reference in this order:

1. The current function’s scope

2. Any enclosing scopes (like other containing functions)

3. The scope of the module that contains the code (also called the global scope)

4. The built-in scope (that contains functions like len and str)

If none of these places have a defined variable with the referenced name, then a NameError exception is raised

In [0]:
def sort_priority(values, group):
    found = False
    def helper(x):
        if x in group:
            print(found)
            return (0, x)
        return (1, x)
    values.sort(key=helper)

In [0]:
numbers = [8, 3, 1, 2, 5, 4, 7, 6]
group = {2, 3, 5, 7}
sort_priority(numbers, group)
print(numbers)

False
False
False
False
[2, 3, 5, 7, 1, 4, 6, 8]


Assigning a value to a variable works differently. If the variable is already defined in the current function scope, then it will just take on the new value. If the variable doesn’t exist in the current scope, then Python treats the assignment as a variable definition. The scope of the newly defined variable is the function that contains the assignment.

Encountering this problem is sometimes called the scoping bug because it can be so surprising to newbies. But this is the intended result. This behavior prevents local variables in a function from polluting the containing module. Otherwise, every assignment within a function would put garbage into the global module scope. Not only would that be noise, but the interplay of the resulting global variables could cause obscure bugs.

In Python 3, there is special syntax for getting data out of a closure. The nonlocal statement is used to indicate that scope traversal should happen upon assignment for a specific variable name. The only limit is that nonlocal won’t traverse up to the module-level scope (to avoid polluting globals).

In [0]:
def sort_priority3(numbers, group):
    found = False
    def helper(x):
        nonlocal found
        if x in group:
            found = True
            return (0, x)
        return (1, x)
    numbers.sort(key=helper)
    return found, numbers

In [0]:
numbers = [8, 3, 1, 2, 5, 4, 7, 6]
group = {2, 3, 5, 7}
found, numbers = sort_priority3(numbers, group)
found

True

In [0]:
numbers

[2, 3, 5, 7, 1, 4, 6, 8]

However, much like the anti-pattern of global variables, I’d caution against using nonlocal for anything beyond simple functions. The side effects of nonlocal can be hard to follow. It’s especially hard to understand in long functions where the nonlocal statements and assignments to associated variables are far apart.

In [0]:
class Sorter(object):
    def __init__(self, group):
        self.group = group
        self.found = False

    def __call__(self, x):
        if x in self.group:
            self.found = True
            return (0, x)
        return (1, x)

sorter = Sorter(group)
numbers.sort(key=sorter)
assert sorter.found is True

#### Take advantage of try/catch/finally/else

Use try/except/else/finally when you want to do it all in one compound statement. For example, say you want to read a description of work to do from a file, process it, and then update the file in place. Here, the try block is used to read the file and process it. The except block is used to handle exceptions from the try block that are expected. The else block is used to update the file in place and to allow related exceptions to propagate up. The finally block cleans up the file handle.

In [0]:
UNDEFINED = object()

def divide_json(path):
    handle = open(path, 'r+')   # May raise IOError
    try:
        data = handle.read()    # May raise UnicodeDecodeError
        op = json.loads(data)   # May raise ValueError
        value = (
            op['numerator'] /
            op['denominator'])  # May raise ZeroDivisionError
    except ZeroDivisionError as e:
        return UNDEFINED
    else:
        op['result'] = value
        result = json.dumps(op)
        handle.seek(0)
        handle.write(result)    # May raise IOError
        return value
    finally:
        handle.close()          # Always runs

### Memory Management in python

#### Use trace malloc to understand memory usage and leaks

Memory management in the default implementation of Python, CPython, uses reference counting. This ensures that as soon as all references to an object have expired, the referenced object is also cleared. CPython also has a built-in cycle detector to ensure that self-referencing objects are eventually garbage collected.

In theory, this means that most Python programmers don’t have to worry about allocating or deallocating memory in their programs. It’s taken care of automatically by the language and the CPython runtime. However, in practice, programs eventually do run out of memory due to held references. Figuring out where your Python programs are using or leaking memory proves to be a challenge.

The first way to debug memory usage is to ask the gc built-in module to list every object currently known by the garbage collector. Although it’s quite a blunt tool, this approach does let you quickly get a sense of where your program’s memory is being used.

In [0]:
import gc
found_objects = gc.get_objects()
print('%d objects before' % len(found_objects))

128017 objects before


The tracemalloc built-in module provides powerful tools for understanding the source of memory usage.

### Documentation in python


Each module should have a top-level docstring. This is a string literal that is the first statement in a source file. It should use three double quotes ("""). The goal of this docstring is to introduce the module and its contents.

The first line of the docstring should be a single sentence describing the module’s purpose. The paragraphs that follow should contain the details that all users of the module should know about its operation. The module docstring is also a jumping-off point where you can highlight important classes and functions found in the module.

#### Library documentation

In [0]:
# words.py
#!/usr/bin/env python3
"""Library for testing words for various linguistic patterns.

Testing how words relate to each other can be tricky sometimes!
This module provides easy ways to determine when words you've
found have special properties.

Available functions:
- palindrome: Determine if a word is a palindrome.
- check_anagram: Determine if two words are anagrams.
...
"""

# ...

"Library for testing words for various linguistic patterns.\n\nTesting how words relate to each other can be tricky sometimes!\nThis module provides easy ways to determine when words you've\nfound have special properties.\n\nAvailable functions:\n- palindrome: Determine if a word is a palindrome.\n- check_anagram: Determine if two words are anagrams.\n...\n"

####  Function documentation

In [0]:
def find_anagrams(word, dictionary):
    """Find all anagrams for a word.

    This function only runs as fast as the test for
    membership in the 'dictionary' container. It will
    be slow if the dictionary is a list and fast if
    it's a set.

    Args:
        word: String of the target word.
        dictionary: Container with all strings that
            are known to be actual words.

    Returns:
        List of anagrams that were found. Empty if
        none were found.
    """
    # ...

In [0]:
def palindrome(word):
    """Return True if the given word is a palindrome."""
    return word == word[::-1]

In [0]:
palindrome('nan')

True

In [0]:
print(repr(palindrome.__doc__))

'Return True if the given word is a palindrome.'


#### Class documentation

Each class should have a class-level docstring. This largely follows the same pattern as the module-level docstring. The first line is the single-sentence purpose of the class. Paragraphs that follow discuss important details of the class’s operation.

Important public attributes and methods of the class should be highlighted in the class-level docstring. It should also provide guidance to subclasses on how to properly interact with protected attributes (see Item 27: “Prefer Public Attributes Over Private Ones”) and the superclass’s methods.



In [0]:
class Player(object):
    """Represents a player of the game.

    Subclasses may override the 'tick' method to provide
    custom animations for the player's movement depending
    on their power level, etc.

    Public attributes:
    - power: Unused power-ups (float between 0 and 1).
    - coins: Coins found during the level (integer).
    """

    # ...

### Namespace in python
Approach 1 : The as clause can be used to rename anything you retrieve with the import statement, including entire modules. This makes it easy to access namespaced code and make its identity clear when you use it.

Approach 2 : Another approach for avoiding imported name conflicts is to always access names by their highest unique module name.

For the example above, you’d first import analysis.utils and import frontend.utils. Then, you’d access the inspect functions with the full paths of analysis.utils.inspect and frontend.utils.inspect.

This approach allows you to avoid the as clause altogether. It also makes it abundantly clear to new readers of the code where each function is defined.

In [0]:
import numpy as np

The safest approach is to avoid import * in your code and explicitly import names with the from x import y style.

### Classes in python

In [0]:
class SimpleGradebook(object):
    def __init__(self):
        self._grades = {}

    def add_student(self, name):
        self._grades[name] = []

    def report_grade(self, name, score):
        self._grades[name].append(score)

    def average_grade(self, name):
        grades = self._grades[name]
        return sum(grades) / len(grades)

In [0]:
book = SimpleGradebook()
book.add_student('Isaac Newton')
book.report_grade('Isaac Newton', 90)
book.report_grade('Isaac Newton', 80)
# ...
print(book.average_grade('Isaac Newton'))

85.0


#### Use plain attributes instead of get and set, use property and setter only when required
In Python, however, you almost never need to implement explicit setter or getter methods. Instead, you should always start your implementations with simple public attributes.



In [0]:
#notrecommended

class OldResistor(object):
    def __init__(self, ohms):
        self._ohms = ohms

    def get_ohms(self):
        return self._ohms

    def set_ohms(self, ohms):
        self._ohms = ohms

In [0]:
#recommended
class Resistor(object):
    def __init__(self, ohms):
        self.ohms = ohms
        self.voltage = 0
        self.current = 0

r1 = Resistor(50e3)
r1.ohms = 10e3
r1.ohms 

10000.0

#### Using @property decorator and its corresponding setter attribute

Later, if you decide you need special behavior when an attribute is set, you can migrate to the @property decorator and its corresponding setter attribute. Here, I define a new subclass of Resistor that lets me vary the current by assigning the voltage property. Note that in order to work properly the name of both the setter and getter methods must match the intended property name.

In [0]:
class VoltageResistance(Resistor):
  
    def __init__(self, ohms):
        super().__init__(ohms)
        self._voltage = 0
        
    @property
    def voltage(self):
        print("hello from voltage")
        return self._voltage

    @voltage.setter
    def voltage(self, voltage):
        self._voltage = voltage
        self.current = self._voltage / self.ohms

In [0]:
r2 = VoltageResistance(1e3)
print('Before: %5r amps' % r2.current)
r2.voltage = 10
print('After:  %5r V' % r2.voltage)
print('After:  %5r amps' % r2.current)

Before:     0 amps
hello from voltage
After:     10 V
After:   0.01 amps


##### Use setter for type checking before setting

Specifying a setter on a property also lets you perform type checking and validation on values passed to your class. 

In [0]:
class BoundedResistance(Resistor):
    def __init__(self, ohms):
        super().__init__(ohms)

    @property
    def ohms(self):
        return self._ohms

    @ohms.setter
    def ohms(self, ohms):
        if ohms <= 0:
            raise ValueError('%f ohms must be > 0' % ohms)
        self._ohms = ohms
        
r3 = BoundedResistance(1e3)
r3.ohms = 0


ValueError: ignored

In [0]:
BoundedResistance(-5)

ValueError: ignored

This happens because BoundedResistance.__init__ calls Resistor.__init__, which assigns self.ohms = -5. That assignment causes the @ohms.setter method from BoundedResistance to be called, immediately running the validation code before object construction has completed.

##### You can even use @property to make attributes from parent classes immutable.

In [0]:
class FixedResistance(Resistor):
    # ...
    @property
    def ohms(self):
        return self._ohms

    @ohms.setter
    def ohms(self, ohms):
        if hasattr(self, '_ohms'):
            raise AttributeError("Can't set attribute")
        self._ohms = ohms
        
r4 = FixedResistance(1e3)
r4.ohms = 2e3

AttributeError: ignored

#### Prefer public attributes over private ones

In Python, there are only two types of attribute visibility for a class’s attributes: public and private.

Public attributes can be accessed by anyone using the dot operator on the object.

Private fields are specified by prefixing an attribute’s name with a double underscore. They can be accessed directly by methods of the containing class.

In [0]:
class MyObject():
    def __init__(self):
        self.public_field = 5
        self.__private_field = 10

    def get_private_field(self):
        return self.__private_field

In [0]:
foo = MyObject()
foo.public_field

5

In [0]:
#Looks like you cant access private fields
foo.__private_field

AttributeError: ignored

In [0]:
#BUT YOU CAN
foo._MyObject__private_field

10

In [0]:
foo.get_private_field()

10

If you look in the object’s attribute dictionary, you’ll see that private attributes are actually stored with the names as they appear after the transformation.

In [0]:
print(foo.__dict__)

{'public_field': 5, '_MyObject__private_field': 10}


##### Subclass cant access parent class private

In [0]:
class MyParentObject(object):
    def __init__(self):
        self.__private_field = 71

class MyChildObject(MyParentObject):
    def get_private_field(self):
        return self.__private_field

baz = MyChildObject()
baz.get_private_field()

AttributeError: ignored

##### But private variables can be accessed directly - wow, how is that?

In [0]:
#BUT YOU CAN!!
assert baz._MyParentObject__private_field == 71

##### And why?

Why doesn’t the syntax for private attributes actually enforce strict visibility? The simplest answer is one often-quoted motto of Python: “We are all consenting adults here.” Python programmers believe that the benefits of being open outweigh the downsides of being closed.

Beyond that, having the ability to hook language features like attribute access (see Item 32: “Use __getattr__, __getattribute__, and __setattr__ for Lazy Attributes”) enables you to mess around with the internals of objects whenever you wish. If you can do that, what is the value of Python trying to prevent private attribute access otherwise?

To minimize the damage of accessing internals unknowingly, Python programmers follow a naming convention defined in the style guide (see Item 2: “Follow the PEP 8 Style Guide”). Fields prefixed by a single underscore (like _protected_field) are protected, meaning external users of the class should proceed with caution.

However, many programmers who are new to Python use private fields to indicate an internal API that shouldn’t be accessed by subclasses or externally.

##### Summary

Private attributes aren’t rigorously enforced by the Python compiler.

Plan from the beginning to allow subclasses to do more with your internal APIs and attributes instead of locking them out by default.

Use documentation of protected fields to guide subclasses instead of trying to force access control with private attributes.

Only consider using private attributes to avoid naming conflicts with subclasses that are out of your control.

#### Avoid dictionaries that contain dictionaries

At first, you didn’t know you’d need to support weighted grades, so the complexity of additional helper classes seemed unwarranted. Python’s built-in dictionary and tuple types made it easy to keep going, adding layer after layer to the internal bookkeeping. But you should avoid doing this for more than one level of nesting (i.e., avoid dictionaries that contain dictionaries). It makes your code hard to read by other programmers and sets you up for a maintenance nightmare.

As soon as you realize the bookkeeping is getting complicated, break it all out into classes. This lets you provide well-defined interfaces that better encapsulate your data. This also enables you to create a layer of abstraction between your interfaces and your concrete implementations.



### Python and multi threading

Python can be a challenging language for building highly concurrent, multithreaded applications, particularly applications with many CPU-bound threads. The reason for this is that it has what is known as the global interpreter lock (GIL), a mechanism that prevents the interpreter from executing more than one Python instruction at a time. The technical reasons for why the GIL exists are beyond the scope of this book. While it is true that in many big data processing applications, a cluster of computers may be required to process a dataset in a reasonable amount of time, there are still situations where a single-process, multithreaded system is desirable.

This is not to say that Python cannot execute truly multithreaded, parallel code. Python C extensions that use native multithreading (in C or C++) can run code in parallel without being impacted by the GIL, so long as they do not need to regularly interact with Python objects.

### Pandas Library

#### Series

In [0]:
import pandas as pd

In [0]:
from pandas import Series, DataFrame

In [7]:
obj = pd.Series([4, 7, -5, 3])
obj

0    4
1    7
2   -5
3    3
dtype: int64

The string representation of a Series displayed interactively shows the index on the left and the values on the right. Since we did not specify an index for the data, a default one consisting of the integers 0 through N - 1 (where N is the length of the data) is created. 

In [10]:
obj2 = pd.Series([4, 7, -5, 3], index=['d', 'b', 'a', 'c'])
obj2
obj2['c']

3

In [11]:
obj2[['c', 'a', 'd']]

c    3
a   -5
d    4
dtype: int64

Compared with NumPy arrays, you can use labels in the index when selecting single values or a set of values:

#### Filtering in Series

In [12]:
obj2[obj2 > 0]

d    4
b    7
c    3
dtype: int64

### Numpy library

#### Performance

Performance difference between a NumPy array of one million integers, and the equivalent Python list.

NumPy-based algorithms are generally 10 to 100 times faster (or more) than their pure Python counterparts and use significantly less memory.

In [0]:
import numpy as np

In [0]:
my_arr = np.arange(1000000)
my_list = list(range(1000000))

In [18]:
%time for _ in range(10): my_arr2 = my_arr * 2

CPU times: user 25.1 ms, sys: 2.53 ms, total: 27.6 ms
Wall time: 48.1 ms


In [19]:
%time for _ in range(10): my_list2 = [x * 2 for x in my_list]

CPU times: user 820 ms, sys: 171 ms, total: 991 ms
Wall time: 1 s


#### ndarray 
ndarray, an efficient multidimensional array providing fast array-oriented arithmetic operations and flexible broadcasting capabilities

Designed for efficiency on large arrays of data

In [21]:
data = np.random.randn(2, 3)
data

array([[ 0.94881979, -1.11976145,  1.35823368],
       [ 0.87496606,  0.67827602, -1.57684087]])

In [22]:
data * 10

array([[  9.48819789, -11.19761449,  13.58233675],
       [  8.74966064,   6.78276017, -15.76840873]])

In [27]:
data.shape

(2, 3)

In [28]:
data1 = [6, 7.5, 8, 0, 1]
arr = np.array(data1)
arr

array([6. , 7.5, 8. , 0. , 1. ])

Nested sequences, like a list of equal-length lists, will be converted into a multidimensional array:

In [30]:
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr = np.array(data2)
arr

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [31]:
arr.ndim

2

In [32]:
np.zeros((3,4))

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [34]:
np.full((3,3),fill_value=4)

array([[4, 4, 4],
       [4, 4, 4],
       [4, 4, 4]])

#### Conversions

In [36]:
float_arr = arr.astype(np.float64)
float_arr

array([[1., 2., 3., 4.],
       [5., 6., 7., 8.]])

#### Vectorization

Mathematical functions for fast operations on entire arrays of data without having to write loops.

A C API for connecting NumPy with libraries written in C, C++, or FORTRAN.