In [3]:
# importing libs
import pandas as pd
import numpy as np

## 1. Writing efficient Python code

### 1.1 Defining efficient 

* Minimal completion time (fast runtime)
* Minimal resource consumption (small memory footprint)

### 1.2 Defining Pythonic
* Focus on readability
* Using Python's constructs as intended

In [4]:
# Print the list created using the Non-Pythonic approach

names = ['Jerry', 'Kramer', 'Elaine', 'George', 'Newman']

i = 0
new_list = []
while i < len(names):
    if len(names[i]) >= 6:
        new_list.append(names[i])
    i += 1

print("Print the list created using the Non-Pythonic approach")
print(new_list, "\n")

# Print the list created by looping over the contents of names
better_list = []
for name in names:
    if len(name) >= 6:
        better_list.append(name)
print("Print the list created by looping over the contents of names")
print(better_list, "\n")

#The best Pythonic way of doing this is by using list comprehension.
best_list = [name for name in names if len(name) >= 6]
print("The best Pythonic way of doing this is by using list comprehension.")
print(best_list)


Print the list created using the Non-Pythonic approach
['Kramer', 'Elaine', 'George', 'Newman'] 

Print the list created by looping over the contents of names
['Kramer', 'Elaine', 'George', 'Newman'] 

The best Pythonic way of doing this is by using list comprehension.
['Kramer', 'Elaine', 'George', 'Newman']


In [5]:
# Zen of Python
import this
### 1.3 Building with built-ins

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


* Built-in types
  * list, tuple, set, dict and others.
* Built-in functions
  * print() , len() , range() , round() , enumerate() , map() , zip() , and others.
* Built-in modules 
  * os , sys , itertools , collections , math , and others.

1.3.1 Built-in functions
### range() 
```codeblock
  # Explicitly typing a list of numbers 
  nums = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

  #Using range() to create the same list 
  range(start,stop) 
  nums = range(0,11) 
  nums_list = list(nums) 
  # output: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] 

  # Using range() with a step value
  even_nums = range(2, 11, 2) 
  even_nums_list = list(even_nums) 
  # output: [2, 4, 6, 8, 10] 
```

### enumerate()

Creates an indexed list of objects 
```codeblock
  letters = ['a', 'b', 'c', 'd' ]
  indexed_letters = enumerate(letters)
  indexed_letters_list = list(indexed_letters) print(indexed_letters_list) 
  # output: [(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')] 
```

Can specify a start value 
```codeblock
  letters = ['a', 'b', 'c', 'd' ]
  indexed_letters2 = enumerate(letters, start=5)
  indexed_letters2_list = list(indexed_letters2)
  # output: [(5, 'a'), (6, 'b'), (7, 'c'), (8, 'd')] 
```

### map()

Applies a function over an object 
```codeblock
  nums = [1.5, 2.3, 3.4, 4.6, 5.0] 
  rnd_nums = list(map(round, nums))
  output: [2, 2, 3, 5, 5] 
```
with lambda (anonymous function) 
```codeblock
  nums = [1, 2, 3, 4, 5] 
  sqrd_nums = list(map(lambda x: x ** 2, nums))
  output: [1, 4, 9, 16, 25] 
```

In [6]:
# Example with range()

# Create a range object that goes from 0 to 5
nums = range(0, 6)
print(type(nums))

# Convert nums to a list
nums_list = list(nums)
print(nums_list)

# Create a new list of odd numbers from 1 to 11 by unpacking a range object, (*) unpacking a range object using the star character (*).
nums_list2 = [*range(1, 12, 2)]
print(nums_list2)

<class 'range'>
[0, 1, 2, 3, 4, 5]
[1, 3, 5, 7, 9, 11]


In [7]:
# Example with enumerate  ()
names = ['Jerry', 'Kramer', 'Elaine', 'George', 'Newman']

# Rewrite the for loop to use enumerate
indexed_names = []
for i, name in enumerate(names):
    index_name = (i,name)
    indexed_names.append(index_name) 
print(indexed_names)

# Rewrite the above for loop using list comprehension
indexed_names_comp = [(i, name) for i,name in enumerate(names)]
print(indexed_names_comp)

# Unpack an enumerate object with a starting index of one
indexed_names_unpack = [*enumerate(names, start=1)]
print(indexed_names_unpack)

[(0, 'Jerry'), (1, 'Kramer'), (2, 'Elaine'), (3, 'George'), (4, 'Newman')]
[(0, 'Jerry'), (1, 'Kramer'), (2, 'Elaine'), (3, 'George'), (4, 'Newman')]
[(1, 'Jerry'), (2, 'Kramer'), (3, 'Elaine'), (4, 'George'), (5, 'Newman')]


In [8]:
# Example with map  ()
names = ['Jerry', 'Kramer', 'Elaine', 'George', 'Newman']

# Use map to apply str.upper to each element in names
names_map  = map(str.upper, names)

# Print the type of the names_map
print(list(names_map))

# Unpack names_map into a list
names_uppercase = [*names_map]

# Print the list created above / not work
print(names_uppercase)

['JERRY', 'KRAMER', 'ELAINE', 'GEORGE', 'NEWMAN']
[]


### 1.4 The power of NumPy arrays
Alternative to Python lists

```codeblock
    nums_list = list(range(5)) #output [0,1,2,3,4]

    nums_np = np.array(range(5))  #output array([0,1,2,3,4])
```

1. NumPy array homogeneity = unique type 
2. NumPy array broadcasting 
   ```codeblock
   # Python lists don't support broadcasting
   nums = [-2, -1, 0, 1, 2] 
   nums ** 2 
   output: TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int' 

    # List comprehension (better option but not best) 
    nums = [-2, -1, 0, 1, 2] 
    sqrd_nums  = [num ** 2 for num in nums] 
    output: [4, 1, 0, 1, 4] 

    #NumPy array broadcasting for the win! 
    nums_np = np.array([-2, -1, 0, 1, 2]) 
    nums_np ** 2 
    
   ```
3. Indexing easy
   ```codeblock
   # 2-D list 
   
   #With list 
   nums2 = [ [1, 2, 3], [4, 5, 6] ] 
   [row[0] for row in nums2] 
   output: [1, 4] 

   #With numpy array
   nums2_np = np.array(nums2)
   nums2_np[:,0] 
   array([1, 4]) 
   ```


### 2. Timing and profiling code


Ipython packages used:
pip install line_profiler

### 2.1 %timeit
* %timeit - Time processing one line.
* %%timeit - Time processing for More lines.

Seing the number of runs (-r ) and/or loops (-n ).

Saving the output to a variable (-o ) 


In [9]:
!pip install line_profiler
!pip install memory_profiler

%load_ext line_profiler
%load_ext memory_profiler

Collecting line_profiler
  Downloading line_profiler-4.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (653 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m653.4/653.4 kB[0m [31m9.5 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: line_profiler
Successfully installed line_profiler-4.0.1

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3[0m[39;49m -> [0m[32;49m22.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython -m pip install --upgrade pip[0m
Collecting memory_profiler
  Downloading memory_profiler-0.61.0-py3-none-any.whl (31 kB)
Installing collected packages: memory_profiler
Successfully installed memory_profiler-0.61.0

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3[0m[39;49m -> [0m[32;49m22.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update,

In [10]:
# %timeit
import numpy as np

%timeit randnums = np.random.rand(1000)
%timeit -r2 -n10 rand_nums = np.random.rand(1000) 

8.95 µs ± 1.1 µs per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
The slowest run took 5.32 times longer than the fastest. This could mean that an intermediate result is being cached.
23.8 µs ± 16.3 µs per loop (mean ± std. dev. of 2 runs, 10 loops each)


In [11]:
# %timeit Saving the output to a variable
times = %timeit -o rand_nums = np.random.rand(1000) 

print("timings:", times.timings)
print("best time:", times.best)
print("worst time:", times.worst)


9.29 µs ± 1.02 µs per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
timings: [1.1243194249999533e-05, 8.36301795000054e-06, 8.984463549999191e-06, 9.593133820000048e-06, 8.543068469999753e-06, 8.184759479999002e-06, 1.0111128830000098e-05]
best time: 8.184759479999002e-06
worst time: 1.1243194249999533e-05


In [12]:
# %timeit diference average

f_time = %timeit -o formal_dict = dict() 
l_time = %timeit -o literal_dict = {} 

diff = (f_time.average - l_time.average) * (10**9)
print('l_time better than f_time by {} ns'.format(diff))

77.4 ns ± 10.4 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
30.4 ns ± 3.31 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
l_time better than f_time by 47.0560757857129 ns


### 2.2 Code profiling: line_profiler

* Magic command for line-by-line times
    *
```codeblock
%lprun -f convert_units convert_units(heroes, hts, wts) 
```

* Detailed stats on memory consumption: needs to be a separate file
    *
```codeblock
%lprun -f convert_units convert_units(heroes, hts, wts) 
```


In [14]:
%load_ext line_profiler

heroes = ['Batman', 'Superman', 'Wonder Woman']
hts = np.array([188.0, 191.0, 183.0])
wts = np.array([95.0, 101.0,  74.0])


def convert_units(heroes, heights, weights):
    new_hts = [ht * 0.39370 for ht in heights]
    new_wts = [wt * 2.20462 for wt in weights]
    hero_data = {}
    for i, hero in enumerate(heroes):
        hero_data[hero] = (new_hts[i], new_wts[i])

    return hero_data


%lprun -f convert_units convert_units(heroes, hts, wts)


The line_profiler extension is already loaded. To reload it, use:
  %reload_ext line_profiler


Timer unit: 1e-09 s

Total time: 2.3001e-05 s
File: /tmp/ipykernel_8282/1994859564.py
Function: convert_units at line 8

Line #      Hits         Time  Per Hit   % Time  Line Contents
     8                                           def convert_units(heroes, heights, weights):
     9         1      15501.0  15501.0     67.4      new_hts = [ht * 0.39370 for ht in heights]
    10         1       3000.0   3000.0     13.0      new_wts = [wt * 2.20462 for wt in weights]
    11         1        300.0    300.0      1.3      hero_data = {}
    12         3       2400.0    800.0     10.4      for i, hero in enumerate(heroes):
    13         3       1600.0    533.3      7.0          hero_data[hero] = (new_hts[i], new_wts[i])
    14                                           
    15         1        200.0    200.0      0.9      return hero_data

In [15]:
%load_ext memory_profiler 

from hero_funcs import convert_units

%mprun -f convert_units convert_units(heroes, hts, wts) 

The memory_profiler extension is already loaded. To reload it, use:
  %reload_ext memory_profiler



Filename: /workspaces/data_enginer_python/4_Writing_Efficient_Python_Code/hero_funcs.py

Line #    Mem usage    Increment  Occurrences   Line Contents
     1    107.5 MiB    107.5 MiB           1   def convert_units(heroes, heights, weights):
     2    107.5 MiB      0.0 MiB           6       new_hts = [ht * 0.39370 for ht in heights]
     3    107.5 MiB      0.0 MiB           6       new_wts = [wt * 2.20462 for wt in weights]
     4    107.5 MiB      0.0 MiB           1       hero_data = {}
     5    107.5 MiB      0.0 MiB           4       for i, hero in enumerate(heroes):
     6    107.5 MiB      0.0 MiB           3           hero_data[hero] = (new_hts[i], new_wts[i])
     7                                         
     8    107.5 MiB      0.0 MiB           1       return hero_data

## 3. Efficiently combining, counting, and iterating

### 3.1. Combining objects

In [None]:
names = ['Bulbasaur', 'Charmander', 'Squirtle']
hps = [45, 39, 44]

In [19]:
# Combining with enumerate

combined = []
for i,pokemon in enumerate(names):
    combined.append((pokemon, hps[i]))
print(combined)

[('Bulbasaur', 45), ('Charmander', 39), ('Squirtle', 44)]


In [20]:
# Combining with zip
combined_zip = [*zip(names, hps)]
print(combined_zip)

[('Bulbasaur', 45), ('Charmander', 39), ('Squirtle', 44)]


### 3.2. Counting

In [36]:
#Counting with loop primary_types (bad practice)

pokemon = pd.read_json("data/pokemon.json")

poke_types = pokemon["primary_types"].values

type_counts = {}
for poke_type in poke_types:
    if poke_type not in type_counts:
         type_counts[poke_type] = 1 
    else: 
        type_counts[poke_type] += 1
print(type_counts) 


{'Grass': 64, 'Psychic': 46, 'Dark': 28, 'Bug': 65, 'Rock': 41, 'Steel': 21, 'Normal': 92, 'Water': 105, 'Dragon': 25, 'Electric': 40, 'Poison': 28, 'Fire': 48, 'Fairy': 17, 'Ice': 23, 'Ground': 30, 'Ghost': 20, 'Fighting': 25, 'Flying': 2}


In [37]:
#Counting without loop primary_types 
from collections import Counter

type_counts = Counter(poke_types) 
print(type_counts) 

Counter({'Water': 105, 'Normal': 92, 'Bug': 65, 'Grass': 64, 'Fire': 48, 'Psychic': 46, 'Rock': 41, 'Electric': 40, 'Ground': 30, 'Dark': 28, 'Poison': 28, 'Dragon': 25, 'Fighting': 25, 'Ice': 23, 'Steel': 21, 'Ghost': 20, 'Fairy': 17, 'Flying': 2})


### 3.2. Combinations 

In [38]:
#Combination with Loop (bad practice)

combos = [] 

poke_types = pokemon["primary_types"].values

for x in poke_types:
     for y in poke_types:
        if x == y:
             continue
        if ((x,y) not in combos) & ((y,x) not in combos):
             combos.append((x,y))
print(combos) 

[('Grass', 'Psychic'), ('Grass', 'Dark'), ('Grass', 'Bug'), ('Grass', 'Rock'), ('Grass', 'Steel'), ('Grass', 'Normal'), ('Grass', 'Water'), ('Grass', 'Dragon'), ('Grass', 'Electric'), ('Grass', 'Poison'), ('Grass', 'Fire'), ('Grass', 'Fairy'), ('Grass', 'Ice'), ('Grass', 'Ground'), ('Grass', 'Ghost'), ('Grass', 'Fighting'), ('Grass', 'Flying'), ('Psychic', 'Dark'), ('Psychic', 'Bug'), ('Psychic', 'Rock'), ('Psychic', 'Steel'), ('Psychic', 'Normal'), ('Psychic', 'Water'), ('Psychic', 'Dragon'), ('Psychic', 'Electric'), ('Psychic', 'Poison'), ('Psychic', 'Fire'), ('Psychic', 'Fairy'), ('Psychic', 'Ice'), ('Psychic', 'Ground'), ('Psychic', 'Ghost'), ('Psychic', 'Fighting'), ('Psychic', 'Flying'), ('Dark', 'Bug'), ('Dark', 'Rock'), ('Dark', 'Steel'), ('Dark', 'Normal'), ('Dark', 'Water'), ('Dark', 'Dragon'), ('Dark', 'Electric'), ('Dark', 'Poison'), ('Dark', 'Fire'), ('Dark', 'Fairy'), ('Dark', 'Ice'), ('Dark', 'Ground'), ('Dark', 'Ghost'), ('Dark', 'Fighting'), ('Dark', 'Flying'), ('Bug',

In [41]:
#Combination without Loop (best practice)

from itertools import combinations

combos_obj = combinations(poke_types, 2)
print(type(combos_obj)) 

print([*combos_obj][:10])

<class 'itertools.combinations'>
[('Grass', 'Psychic'), ('Grass', 'Dark'), ('Grass', 'Bug'), ('Grass', 'Rock'), ('Grass', 'Steel'), ('Grass', 'Normal'), ('Grass', 'Psychic'), ('Grass', 'Water'), ('Grass', 'Dragon'), ('Grass', 'Rock')]


### 3.3. Combinations with Set

In [49]:
list_a = ['Bulbasaur', 'Charmander', 'Squirtle']
list_b = ['Caterpie', 'Pidgey', 'Squirtle']  

set_a = set(list_a)
print("set_a: ", set_a) 

set_b = set(list_b)
print("set_b:", set_b) 

#Intersection > Its more efficient time 
print("Set Intersection")
intersection = set_a.intersection(set_b)
print("intersection set_a & set_b:", intersection)

print("Set Difference")
difference = set_b.difference(set_a) 
print("Difference set_b - set_a:", difference)

print("Set Difference")
difference = set_a.difference(set_b) 
print("Difference set_a - set_b:", difference)

print("Set Symmetric Difference")
symmetric_difference = set_a.symmetric_difference(set_b)
print("symmetric_difference:", symmetric_difference)

print("Set Union")
union = set_a.union(set_b)
print("union:", union)

set_a:  {'Squirtle', 'Charmander', 'Bulbasaur'}
set_b: {'Squirtle', 'Caterpie', 'Pidgey'}
Set Intersection
intersection set_a & set_b: {'Squirtle'}
Set Difference
Difference set_b - set_a: {'Caterpie', 'Pidgey'}
Set Difference
Difference set_a - set_b: {'Charmander', 'Bulbasaur'}
Set Symmetric Difference
symmetric_difference: {'Bulbasaur', 'Charmander', 'Caterpie', 'Pidgey'}
Set Union
union: {'Charmander', 'Squirtle', 'Bulbasaur', 'Pidgey', 'Caterpie'}
