<a href="https://colab.research.google.com/github/mvdheram/Python-DS/blob/master/Efficient_coding.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Efficient code 

*Definition*:

* **Minimal memory usage**
* **Minimal completion time**
* **Pythonic**
  * Following the idioms and best practices of python - *zen of python*  

In [None]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


# Examining runtime (`%timeit`)

`%timeit`

* Compare runtime to choose the optimal solution
* **Magic command `%timeit`**
  * Enhancement on top of python syntax
  * Start with `%` character

* Set number of runs with `-rnumber`
* Set number of loops wiht `-n10`


E.g. Code running 2 times with 10 loops each which gives the time spend running each loop

`%timeit -r2 -n10 rand_nums = np.random.rand(1000)`


Notes:
  * Check logic without print statements

## `%timeit` for single line of code

In [25]:
import numpy as np

%timeit rand_nums = np.random.rand(1000)

The slowest run took 11.94 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 9.12 µs per loop


## `%%timeit` for multiple lines of code

In [27]:
%%timeit
letters = ['a','b','c']

indexed_letters = enumerate(letters)

indexed_letters_list = list(indexed_letters)

The slowest run took 5.38 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 512 ns per loop


## `line_profile` to profile  time multiple lines of code (function)

* Detailed stats on frequency and duration of function call 
* Line by line analyses 
* Using package `line_profiler`

Steps:

1. Install line profiler with pip 
  * `pip install line_profiler`
2. Load line profiler into session
  * `%load_ext line_profiler`
3. Magic command for analysis 
  * `%lprun -f function_name funciton_name_with_arguments`  

## Comparing formal and literal syntax for built-in types

In [28]:
%timeit formal_dict = dict()

The slowest run took 8.44 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 5: 123 ns per loop


In [29]:
%timeit literal_dict = {}

10000000 loops, best of 5: 50.4 ns per loop


# Examining Memory usage 

2 ways 

1. `sys` : Quick and dirty way using built in module
  * `sys.getsizeof(nums_np)` 
  * Returns size in bytes
  * For single line

2. Code profiler : `memory_profiler` 
  * Gather detailed stats on memory used by a function line-by-line.
  * **Can only be used on physical files, hence functions need to be saved on files**
  * Steps :
    1. Import the function from file 
      * `from hero_funcs import convert_units`
    2. Install `memory_profiler` with pip
    3. Use `load_ext` for loading it into the current session
    4. run `%mprun -f function_name function_name_with_arguments`  



# Python standard library 

* **Use built-in solution rather than developing new solution** 

## Built-in collection data structure types 



### list (store multiple items in a single variable)

Properties :

* **Ordered**
* **Mutable**
* **Allow duplicate**

#### Access, check, change 

**Access**

In [4]:
thislist = ["apple", "banana", "cherry","cherry", "orange", "kiwi", "melon", "mango"]

thislist[1]

'banana'

In [5]:
thislist[-1] # last elements

'mango'

In [6]:
thislist[2:5] # range of elements

['cherry', 'cherry', 'orange']

In [7]:
thislist[:4] # First three elements

['apple', 'banana', 'cherry', 'cherry']

In [8]:
thislist[2:]

['cherry', 'cherry', 'orange', 'kiwi', 'melon', 'mango']

In [9]:
thislist[-4:-1]

['orange', 'kiwi', 'melon']

**Check**

In [11]:
"cherry" in thislist 

True

**Change**

### tuple

### `set` (Comparing objects multiple times and different ways)

* **Set Theory** 
  * Operaitons
    * `intersection()` : all elements both in sets 
    * `difference()` : all elements in one set bit not in the other
    * `symmetric_difference()` : all elements in exactly one set
    * `union()` : all elements that are in either set
* Fast membership testing using `in`

#### `intersection` : Collect all items that occur in both lists

In [37]:
list_a = ['a', 'b', 'c']
list_b = ['b','a','z']

set_a = set(list_a)
set_b = set(list_b)

set_a.intersection(set_b)

{'a', 'b'}

#### `difference` : Collect all items that exits in one list but not in other

In [38]:
set_a.difference(set_b)

{'c'}

#### `union` : Collect all unique items in two lists

In [39]:
set_a.union(set_b)

{'a', 'b', 'c', 'z'}

#### `in` : Memebership testing is faster than list or tuple 

In [46]:
%timeit 'a' in ['a','b','c']

10000000 loops, best of 5: 35.4 ns per loop


In [47]:
%timeit 'a' in ('a','b','c')

10000000 loops, best of 5: 34.6 ns per loop


In [48]:
%timeit 'a' in {'a','b','c'}

10000000 loops, best of 5: 34.8 ns per loop


#### `set` : Find all distinct elements

### dict

## Final notes

## Built-in functions

* **Functions to handle built in data types** 

### range (create sequence with start and stop values)

`range`

* Parameters : (start, stop, step value)
* Return : range object (to be converted to list)

Notes:

* **Stop value exclusive of given number**

In [None]:
nums = range(0,11)
print("list of values using range : " + str(list(nums)))

print("list of values using range - unpacked : " + str([*(range(7))]))

print(str(list(range(2, 11, 2))))

list of values using range : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
list of values using range - unpacked : [0, 1, 2, 3, 4, 5, 6]
[2, 4, 6, 8, 10]


### enumerate (create indexed list of objects)

`enumerate`

* Parameters : (list)
* Return : enumerate object (to be converted to list)

In [15]:
letters = ['a','b','c']

indexed_letters = enumerate(letters)

indexed_letters_list = list(indexed_letters)
print(indexed_letters_list)

[(0, 'a'), (1, 'b'), (2, 'c')]


### map (applies function over an object)

`map`

* Parameters : (function, list object)  
* Return : map object (to be converted to list)

Notes:

* Function argument can takes a built-in / anonymous(lamda) function such as `sum`,`round` etc.. 

In [17]:
nums = [1.5, 2.0, 4.1]

rnd_nums = map(round, nums)

print(list(rnd_nums))

[2, 2, 4]


In [18]:
sqrd_nums = map(lambda x : x ** 2, nums)

print(list(sqrd_nums))

[2.25, 4.0, 16.81]


# NumPy Arrays

* NumPy (Numerical Python) is an package for scientific computing in python.
* Provides efficient means of handling data

## NumPy arrays ( Fast memory efficient alternative to python list)

* **Properties**
  * **Homogenity** : Numpy arrays are homogenious (**elements must be of same type** - int or float etc.)
  * **Broadcasting** : Perform single operation on entire colleciton of values (vectorized operations)
  * **Boolean indexing** : Perform boolean/comparison operation and filter entire numpy array with the index  

In [20]:
import numpy as np 

nums_np = np.array(range(5))

print(nums_np)

print(nums_np.dtype)

[0 1 2 3 4]
int64


### Broadcasting (single operation on entire array)

In [22]:
nums = np.array([-1, -2, 0, 1, 2])

nums ** 2 

array([1, 4, 0, 1, 4])

### Boolean indexing (single boolean comparison operation and filtering)

In [23]:
nums > 0

array([False, False, False,  True,  True])

In [24]:
nums[nums > 0]

array([1, 2])

# Combining, counting and iterating

## Combining objects using `zip`

Efficient way of combining two lists 

* Parameters : (list1, list2)
* Return : zip object (**list of combined tuple elements**) 

In [31]:
names = ['x', 'y', 'z']
hps = [1, 7, 9]

combined_zip = zip(names, hps)

In [33]:
list(combined_zip)

[('x', 1), ('y', 7), ('z', 9)]

### Itertools module

* Part of python standard library used for creating and using iterators 
* Notable 
  * Infinite iterators : `count`,`cycle`, `repreat`
  * Finite iterators : `accumulate`, `chain`, `zip-longest`, etc.
  * Combining generators : `product`, `permutation`, `combinations` 

#### Combination generators for (cartesian product, permutation  and combination)

##### Combination (Combination of elemens in a list)

In [36]:
poke_types = ['Bug', 'Fire', 'Ghost', 'Grass']
from itertools import combinations

combos_obj = combinations(poke_types, 2)

print(list(combos_obj))

[('Bug', 'Fire'), ('Bug', 'Ghost'), ('Bug', 'Grass'), ('Fire', 'Ghost'), ('Fire', 'Grass'), ('Ghost', 'Grass')]


## Counting 

### Collection module ( alternatives to dict, list, set and tuples)

* Specialized container datatypes 

* e.g. 
  * `namedtupel` : tuple subclass with named fields
  * `deque` : list-like container with fast appends and pops
  * `Counter` : dict for counting hashable objects
  * `OrderedDict` : dict that retains order fo entries
  * `defaultdict` : dict that calls a factory funciton to supply missing values


In [35]:
poke_type = ['grass', 'grass', 'Fire', 'Fire', 'ground']

from collections import Counter

type_counts = Counter(poke_type)
print(type_counts)


Counter({'grass': 2, 'Fire': 2, 'ground': 1})


# Eliminating loops 

Looping patterns :    

* `for` : iterate over sequence piece by piece 
* `while` : repreat loop as long as conditoin is met
* nested : one loop inside other 

**All loops are costly - "Flat is better than nested"**

## Use map function (or list comprehension)   

In [50]:
poke_stats = [[90, 10, 10], [24,25,29],[99,100,220]]

# list comprehension 
%timeit total_sum = [sum(row) for row in poke_stats]

# map function 
%timeit total_map = [*map(sum, poke_stats)]

The slowest run took 5.99 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 694 ns per loop
The slowest run took 6.24 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 567 ns per loop


## Using `combinations` from itertools for finding combination 

## Using NumPy arrays broadcasting 

In [54]:
np.array(poke_stats).mean(axis = 1) # axis =1(alond rows) =0(along columns)

array([ 36.66666667,  26.        , 139.66666667])