### python Built-in modules 
-os

-sys

-intertools 

-collections 

-math 

and others 

### Built-in function: enumerate()
Creates an indexed list of objects

In [1]:
letters = ['a', 'b', 'c', 'd' ]
indexed_letters = enumerate(letters)
indexed_letters_list = list(indexed_letters)
print(indexed_letters_list)

[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')]


In [2]:
letters = ['a', 'b', 'c', 'd' ]
indexed_letters2 = enumerate(letters, start=5)
indexed_letters2_list = list(indexed_letters2)
print(indexed_letters2_list)

[(5, 'a'), (6, 'b'), (7, 'c'), (8, 'd')]


### Built-in function: map()
Applies a function over an object

In [4]:
nums = [1.5, 2.3, 3.4, 4.6, 5.0]
rnd_nums = map(round, nums)
print(list(rnd_nums))

[2, 2, 3, 5, 5]


In [5]:
nums = [1, 2, 3, 4, 5]
sqrd_nums = map(lambda x: x ** 2, nums)
print(list(sqrd_nums))

[1, 4, 9, 16, 25]


### asterisk (*) operator in Python

- \* and \** They are operators that unpack the values from iterable objects in python 
- \* can be useed on any iterable that python provides 
- \** can only be used on dictionaires 


In [7]:
# compare these two sections of codes 
my_list=[1,2,3,4]
print (my_list)

[1, 2, 3, 4]


In [8]:
# this is the meaning of unpacking 
print(*my_list)

1 2 3 4


In [21]:
my_list=[1,2,3,4,5,6]
a,*b,c=my_list 
print (" a: {}".format(a),"\n","b: {}".format(b),"\n","c: {}".format(c))

 a: 1 
 b: [2, 3, 4, 5] 
 c: 6


In [22]:
# double asteries is used for dictionaiers only 
my_first_dict={"A": 1,"B": 2}
my_second_dict={"C": 3, "D": 4}
my_merged_dict={**my_first_dict, **my_second_dict}
print(my_merged_dict)

{'A': 1, 'B': 2, 'C': 3, 'D': 4}


In [23]:
# single asterisk can be used to unpack strings
a = [*"Hooman Nouraei"]
print(a)

['H', 'o', 'o', 'm', 'a', 'n', ' ', 'N', 'o', 'u', 'r', 'a', 'e', 'i']


In [36]:
names = ['Jerry', 'Kramer', 'Elaine', 'George', 'Newman']

names_map  = [*map(str.upper, names)]

print(names_map)

['JERRY', 'KRAMER', 'ELAINE', 'GEORGE', 'NEWMAN']


## Zen of Python
The Zen of Python is one of these PEPs and is documented as PEP20.

One little Easter Egg in Python is the ability to print the Zen of Python using the command import this. Let's take a look at one of the idioms listed in these guiding principles.

In [1]:
import this


The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


## Examining runtime by using magic commands

### Using %timeit

In [3]:
import numpy as np 

In [4]:
rand_nums = np.random.rand(1000)

In [5]:
# timing with %timeit
%timeit rand_nums = np.random.rand(1000)

11.2 µs ± 2.39 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [6]:
# specifing the number of runs or the number of loops for %timneit
%timeit -r2 -n10 rand_nums=np.random.rand(1000)

The slowest run took 6.65 times longer than the fastest. This could mean that an intermediate result is being cached.
35.8 µs ± 26.4 µs per loop (mean ± std. dev. of 2 runs, 10 loops each)


- using two %% for timeit will apply timeit to multiple lines of code in the block 
- we can store the timeit by using -o 

In [7]:
# savinging timeit in a variable using -o 
times = %timeit -o rand_nums = np.random.rand(1000)

11.6 µs ± 2.2 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [9]:
print(times)

11.6 µs ± 2.2 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [10]:
times.timings

[7.0892559998901564e-06,
 1.1961980999913067e-05,
 1.2153002999839372e-05,
 1.4050523999612778e-05,
 1.1502897000173106e-05,
 1.070155400026124e-05,
 1.408470199967269e-05]

In [11]:
times.best 

7.0892559998901564e-06

In [12]:
times.worst

1.408470199967269e-05

### Example: comparing times between 

- data structure created using formal name and 
- data structure created using literal syntax 

list(), dict(), tuple() vs. [],{},()

In [18]:
f_times = %timeit -o formal_dict=dict()
l_times = %timeit -o literal_dict={}

diff = (f_times.average - l_times.average)*(10**9)
print ('l_time better than f_time by {} ns'.format(round(diff)))

77.4 ns ± 3.73 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
25.4 ns ± 0.286 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
l_time better than f_time by 52 ns


## Code Profiling 
we obtain detailed stats on frequency and duration of function calls. By using %timeit or %%timeit we only get 

the total run time. However, using line_profiler will provide us the line by line run time. For installing line_profiler use

`conda install -c anaconda line_profiler`

In [70]:
def num_gen (x):
    x=x+1
    num_list=[*range(x)]
    return num_list

In [71]:
import line_profiler

In [74]:
# loading the line_profiler
%load_ext line_profiler

The line_profiler module is not an IPython extension.


In [84]:
%prun num_gen(500)

 

### cProfile: another method of runtime calculation

In [76]:
# cProfile is recommended for most users; it’s a C extension with reasonable overhead that makes it suitable for profiling long-running programs.
import cProfile

In [77]:
cProfile.run('num_gen(x)')

         4 function calls in 0.000 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 <ipython-input-70-447281d439fe>:1(num_gen)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}




## The collections module
Part of Python's Standard Library (built-in module)

Specialized container datatypes

Alternatives to general purpose dict, list, set, and tuple

Notable:

namedtuple : tuple subclasses with named ,elds

deque : list-like container with fast appends and pops

Counter : dict for counting hashable objects

OrderedDict : dict that retains order of entries

defaultdict : dict that calls a factory function to supply missing values

## The itertools module

Part of Python's Standard Library (built-in module)

Functional tools for creating and using iterators

Notable:

Infinite iterators: count , cycle , repeat

Finite iterators: accumulate , chain , zip_longest , etc.

Combination generators: product , permutations , combinations

## Set theory

Branch of Mathematics applied to collections of objects i.e., sets

Python has built-in set datatype with accompanying methods:

intersection() : all elements that are in both sets

difference() : all elements in one set but not the other

symmetric_difference() : all elements in exactly one set

union() : all elements that are in either set

Fast membership testing
Check if a value exists in a sequence or not
Using the in operator

## Pandas dataframe iteration

Iterating with .iterrows() is much faster than .iloc() because it is similar to enumerate() for lists 

Iterating with .itertuples() is even faster than .iterrows()

In [None]:
# row is a pandas data series that you can use column index to call out the cells
for i,row in baseball_df.iterrows():
    wins = row['W']
    games_played = row['G']
    win_perc = calc_win_perc(wins, games_played)
    win_perc_list.append(win_perc)
    
baseball_df['WP'] = win_perc_list

In [None]:
for row_namedtuple in team_wins_df.itertuples():
    print(row_namedtuple)
    
# output
'''
Pandas(Index=0, Team='ARI', Year=2012, W=81)
Pandas(Index=1, Team='ATL', Year=2012, W=94)
# The out put of .itertuples() is a espceial type of tuple called named tuple. it is just like tuple but
the fields are accessible by attribute look up using . method. for example:

print (row_namedtuple.Index)
print (row_namedtuple.Team)
print (row_namedtuple.Year)

%%timeit
for row_tuple in team_wins_df.iterrows():
print(row_tuple)

527 ms ± 41.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
vs.

%%timeit
for row_namedtuple in team_wins_df.itertuples():
print(row_namedtuple)

7.48 ms ± 243 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

'''

## pandas .apply() method

Takes a function and applies it to a DataFrame. Must specify an axis to apply ( 0 for columns; 1 for rows)

Can be used with anonymous functions ( lambda functions)

Example:

`def calc_run_diff(runs_scored, runs_allowed):
    run_diff = runs_scored - runs_allowed
    return run_diff`
#### The lambda function act a map function, it takes a function and apply it to a data frame. 

Remember we have to specifiy the axis for the function to be applied. The argument for lambda is row.

`baseball_df.apply(lambda row: calc_run_diff(row['RS'], row['RA']),axis=1)`

# Power of vectorization 

## Broadcasting (vectorizing) is extremely efficient!

The output is in a form of numpy array since the pandas was built on numpy arrays. The broadcasting approach is must faster than all other approaches. 

`wins_np = baseball_df['W'].values
 print(type(wins_np))`
 
 Example:
 
`run_diffs_np = baseball_df['RS'].values - baseball_df['RA'].values
 baseball_df['RD'] = run_diffs_np
 print(baseball_df)`
 
 Example of all the three methods:
 
 `win_perc_preds_loop = []

#Use a loop and .itertuples() to collect each row's predicted win percentage
for row in baseball_df.itertuples():
    runs_scored = row.RS
    runs_allowed = row.RA
    win_perc_pred = predict_win_perc(runs_scored, runs_allowed)
    win_perc_preds_loop.append(win_perc_pred)

#Apply predict_win_perc to each row of the DataFrame
win_perc_preds_apply = baseball_df.apply(lambda row: predict_win_perc(row['RS'], row['RA']), axis=1)

#Calculate the win percentage predictions using NumPy arrays
win_perc_preds_np = predict_win_perc(baseball_df['RS'].values, baseball_df['RA'].values)
baseball_df['WP_preds'] = win_perc_preds_np
print(baseball_df.head())`