## Writing Efficient Python Code

- How to write clean, fast, and efficient Python Code
- How to profile your code for bottlenecks
- How to eliminate bottlenecks and bad design patterns

### Defining _efficient_
- Minimal completion time (fast runtime)
- Minimal resource consumption (small memory footprint)

### Defining _Pythonic_
- Focus on readability
- Using Python's constructs as intended

#### Contents:
- [Introduction to Pythonic Code](#introduction-to-pythonic-code)
- [The Power of Numpy Arrays](#the-power-of-numpy-arrays)
- [The Python Standard Library](#the-python-standard-library)
- [Timing and Profiling](#timing-and-profiling)
- [Code Profiling for Memory Usage](#code-profiling-for-memory-usage)
- [Code Profiling for Runtime](#code-profiling-for-runtime)
- [Examining Runtime](#examining-runtime)
- [Gaining Efficiencies](#gaining-efficiencies)
- [Eliminating Loops](#eliminating-loops)
- [Set Theory](#set-theory)
- [The itertools module](#the-itertools-module)
- [Writing better loops](#writing-better-loops)
- [Tricks](#Tricks)

#### Introduction to Pythonic Code

In [1]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


In [2]:
numbers = [1, 2, 3, 4, 5]


In [3]:
# Non-Pythonic
doubled_numbers = []
for i in range(len(numbers)):
    doubled_numbers.append(numbers[i]*2)


In [4]:
# Pythonic
doubled_numbers = [x * 2 for x in numbers]


In [5]:
names = ['Jerry', 'Kramer', 'Elaine', 'George', 'Newman']
# Suppose you wanted to collect the names in the above list that have six letters or more.


In [6]:
# Print the list, new_list that was created using a Non-Pythonic approach.
i = 0
new_list = []
while i < len(names):
    if len(names[i]) >= 6:
        new_list.append(names[i])
    i += 1
print(new_list)


['Kramer', 'Elaine', 'George', 'Newman']


In [7]:
# A more Pythonic approach would loop over the contents of names, rather than using an index variable.
better_list = []
for name in names:
    if len(name) >= 6:
        better_list.append(name)
print(better_list)


['Kramer', 'Elaine', 'George', 'Newman']


In [8]:
# The best Pythonic way of doing this is by using list comprehension.
best_list = [name for name in names if len(name) >= 6]
print(best_list)


['Kramer', 'Elaine', 'George', 'Newman']


### The Python Standard Library
- Built-in types : list, tuple, set, dict, and others
- Built-in functions : print(), len(), range(), round(), enumerate(), map(), zip(), and others
- Built-in modules : os, sys, itertools, collections, math, and others


In [9]:
even_nums = range(2, 11, 2)
even_nums_list = list(even_nums)
print(even_nums_list)


[2, 4, 6, 8, 10]


In [10]:
letters = ["a", "b"]
indexed_letters = enumerate(letters)  # Creates an indexed list of objects.
print(indexed_letters)
print(list(indexed_letters))


<enumerate object at 0x7f41443acb00>
[(0, 'a'), (1, 'b')]


In [11]:
indexed_letters = enumerate(letters, start=5)
print(list(indexed_letters))


[(5, 'a'), (6, 'b')]


In [12]:
nums = [1, 2, 3, 4, 5]
sqrd_nums = map(lambda x: x*2, nums)  # Applies a function over an object
print(sqrd_nums)
print(list(sqrd_nums))


<map object at 0x7f41443f8e50>
[2, 4, 6, 8, 10]


### The Power of Numpy Arrays
<a id="heading_ID_The_Power_or_Numpy_Arrays"></a>

In [13]:
# Alternative to Python Lists
import numpy as np
nums_list = list(range(5))
print(nums_list)
nums_np = np.array(range(5))
print(nums_np)


[0, 1, 2, 3, 4]
[0 1 2 3 4]


In [14]:
# Numpy array homogeneity
nums_np_ints = np.array([1, 2, 3])
nums_np_ints.dtype


dtype('int64')

In [15]:
nums_np_floats = np.array([2, 3.5, 4])
nums_np_floats.dtype


dtype('float64')

In [16]:
# Numpy array broadcasting
# Python lists don't support broadcasting
nums = [1, 2, 3, 4, 5]
nums ** 2


TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

In [17]:
# For loop (inefficient option)
sqrd_nums = []
for num in nums:
    sqrd_nums.append(num**2)
print(sqrd_nums)


[1, 4, 9, 16, 25]


In [18]:
# List comprehension (better option but not best)
sqrd_nums = [num ** 2 for num in nums]
print(sqrd_nums)


[1, 4, 9, 16, 25]


In [19]:
# Numpy array broadcasting for the win!
nums_np = np.array([-2, -1, 0, 1, 2])
nums_np ** 2


array([4, 1, 0, 1, 4])

In [20]:
# Indexing
# Basic 2-D indexing (lists)
nums2 = [[1, 2, 3], [4, 5, 6]]
# Basic 2-D indexing (arrays)
nums2_np = np.array(nums2)
nums2[0][1], nums2_np[0, 1]


(2, 2)

In [21]:
[row[0] for row in nums2]


[1, 4]

In [22]:
nums2_np[:, 0]


array([1, 4])

In [23]:
# Boolean indexing
# No boolean indexing for lists
nums2_np > 0


array([[ True,  True,  True],
       [ True,  True,  True]])

In [24]:
nums2_np[nums2_np > 2]


array([3, 4, 5, 6])

### Timing and Profiling 


#### Examining Runtime
- Faster code == more efficient code

In [25]:
# Magic Commands: enhancements on top of normal Python syntax
# See all available magic commands with %lsmagic
%lsmagic


Available line magics:
%alias  %alias_magic  %autoawait  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %conda  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %pip  %popd  %pprint  %precision  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%markdown  %%perl  %%prun  %%pypy  %%

In [26]:
import numpy as np
%timeit rand_nums = np.random.rand(1000)


5.44 µs ± 81.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [45]:
# Setting number the number of runs (-r) and/or loops (-n)
%timeit - r2 - n10 rand_nums = np.random.rand(1000)


The slowest run took 8.94 times longer than the fastest. This could mean that an intermediate result is being cached.
29.4 µs ± 23.5 µs per loop (mean ± std. dev. of 2 runs, 10 loops each)


In [46]:
# Single line of code
# Line magic (%timeit)
%timeit nums = [x for x in range(10)]


368 ns ± 15 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [47]:
% % timeit
# Multiple lines of code
# Cell magic (%%timeit) has to be first thing in the IPython (Jupyter) cell
nums = []
for x in range(10):

    nums.append(x)


UsageError: Line magic function `%` not found.


In [48]:
# Saving the output to a variable
times = %timeit - o rand_nums = np.random.rand(1000)


5.62 µs ± 100 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [33]:
times


<TimeitResult : 5.49 µs ± 115 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)>

In [34]:
times.timings


[5.706382329808548e-06,
 5.565177199896425e-06,
 5.381958889774978e-06,
 5.378519409568981e-06,
 5.371884419582784e-06,
 5.506358309648931e-06,
 5.541280459146947e-06]

In [35]:
times.best


5.371884419582784e-06

In [36]:
times.worst


5.706382329808548e-06

In [38]:
f_time = %timeit - o formal_dict = dict()
l_time = %timeit - o literal_dict = {}
diff = (f_time.average - l_time.average) * (10**9)
print("l_time better than f_time by {} ns ".format(diff))


55.5 ns ± 1.53 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
16 ns ± 1.02 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)
l_time better than f_time by 39.508602690184496 ns 


#### Code Profiling for Runtime
- Detailed stats on frequency and duration of function calls
- Line-by-line analyses

In [39]:
pip install line_profiler


Note: you may need to restart the kernel to use updated packages.


In [73]:
names = ["Kemal", "Talha", "Mahmut"]
hts = np.array([177.0, 174.0, 178.0])
wts = np.array([105.0, 86.0, 83.0])

def convert_units(names, heights, weights):
    
    new_hts = heights * 0.39370
    new_wts = weights * 2.20462

    name_data = {}

    for idx, name in enumerate(names):
        name_data[name] = (new_hts[idx], new_wts[idx])

    return name_data

convert_units(names, hts, wts)

{'Kemal': (69.6849, 231.4851),
 'Talha': (68.5038, 189.59732),
 'Mahmut': (70.0786, 182.98345999999998)}

In [74]:
%load_ext line_profiler


The line_profiler extension is already loaded. To reload it, use:
  %reload_ext line_profiler


In [75]:
# Magic command for line-by-line times
%lprun -f convert_units  convert_units(names, hts, wts)  # -f : function


Timer unit: 1e-09 s

Total time: 2.5487e-05 s
File: <ipython-input-73-df2c933999eb>
Function: convert_units at line 5

Line #      Hits         Time  Per Hit   % Time  Line Contents
     5                                           def convert_units(names, heights, weights):
     6                                               
     7         1      19224.0  19224.0     75.4      new_hts = heights * 0.39370
     8         1       1452.0   1452.0      5.7      new_wts = weights * 2.20462
     9                                           
    10         1        169.0    169.0      0.7      name_data = {}
    11                                           
    12         3       1297.0    432.3      5.1      for idx, name in enumerate(names):
    13         3       3210.0   1070.0     12.6          name_data[name] = (new_hts[idx], new_wts[idx])
    14                                           
    15         1        135.0    135.0      0.5      return name_data

#### Code Profiling for Memory Usage
- Detailed stats on memort consumption
- Line-by-line analyses 

In [76]:
import sys
nums_list = [*range(10000)]
print(sys.getsizeof(nums_list), "bytes")


90104 bytes


In [78]:
import numpy as np
nums_np = np.array(range(10000))
print(sys.getsizeof(nums_np), "bytes")


80104 bytes


In [79]:
pip install memory_profiler


Note: you may need to restart the kernel to use updated packages.


In [80]:
%load_ext memory_profiler


The memory_profiler extension is already loaded. To reload it, use:
  %reload_ext memory_profiler


In [91]:
%mprun -f convert_units convert_units(names, hts, wts)




Filename: /home/kemal/Projelerim/Python_Developer_Roadmap/Data/memory_profiling.py

Line #    Mem usage    Increment  Occurrences   Line Contents
     2     94.8 MiB     94.8 MiB           1   def convert_units(names, heights, weights):
     3                                             
     4     94.8 MiB      0.0 MiB           1       new_hts = heights * 0.39370
     5     94.8 MiB      0.0 MiB           1       new_wts = weights * 2.20462
     6                                         
     7     94.8 MiB      0.0 MiB           1       name_data = {}
     8                                         
     9     94.8 MiB      0.0 MiB           4       for idx, name in enumerate(names):
    10     94.8 MiB      0.0 MiB           3           name_data[name] = (new_hts[idx], new_wts[idx])
    11                                         
    12     94.8 MiB      0.0 MiB           1       return name_data

In [92]:
from Data.memory_profiling import convert_units
%mprun -f convert_units convert_units(names, hts, wts)




Filename: /home/kemal/Projelerim/Python_Developer_Roadmap/Data/memory_profiling.py

Line #    Mem usage    Increment  Occurrences   Line Contents
     2     94.8 MiB     94.8 MiB           1   def convert_units(names, heights, weights):
     3                                             
     4     94.8 MiB      0.0 MiB           1       new_hts = heights * 0.39370
     5     94.8 MiB      0.0 MiB           1       new_wts = weights * 2.20462
     6                                         
     7     94.8 MiB      0.0 MiB           1       name_data = {}
     8                                         
     9     94.8 MiB      0.0 MiB           4       for idx, name in enumerate(names):
    10     94.8 MiB      0.0 MiB           3           name_data[name] = (new_hts[idx], new_wts[idx])
    11                                         
    12     94.8 MiB      0.0 MiB           1       return name_data

In [98]:
"""%mprun caveats 
Small memory allocations could result in 0.0 MiB output.
Inspects memory by querying the operating  system.
Results may differ between platforms and runs.Can still observe how each line of code compares to others based on memory consumption.
"""

'%mprun caveats \nSmall memory allocations could result in 0.0 MiB output.\nInspects memory by querying the operating  system.\nResults may differ between platforms and runs.Can still observe how each line of code compares to others based on memory consumption.\n'

#### Gaining Efficiencies
- Efficiently combining, counting, and iterating

In [104]:
# zip returns a zip object that must be unpacked into a list and printed to see the contents
names = ['Bulbasaur', 'Charmander', 'Squirtle']
hps = [45, 39, 44]
combined_zip = zip(names,hps)
print(type(combined_zip))
combined_zip_list = [*combined_zip]
print(combined_zip_list)

<class 'zip'>
[('Bulbasaur', 45), ('Charmander', 39), ('Squirtle', 44)]


In [105]:
# The collections module
""" Part of Python's Standard Library (built-in module)
    Specialized container datatypes -> Alternative to general purpose dict, list, set, and tuple
    Notable:
        namedtuple : tuple subclasses with named fields
        deque : list-like container with fast appends and pops
        Counter : dict for counting hashable objects
        OrderedDict : dict that retains order of entries
        defaultdict : dict that calls a factory function to supply missing values """

" Part of Python's Standard Library (built-in module)\n    Specialized container datatypes -> Alternative to general purpose dict, list, set, and tuple\n    Notable:\n        namedtuple : tuple subclasses with named fields\n        deque : list-like container with fast appends and pops\n        Counter : dict for counting hashable objects\n        OrderedDict : dict that retains order of entries\n        defaultdict : dict that calls a factory function to supply missing values "

In [108]:
from collections import namedtuple

Point = namedtuple("Point", "x y")
issubclass(Point, tuple)

True

In [109]:
point = Point(2,4)  # Instantiate the new type
point

Point(x=2, y=4)

In [111]:
point.x, point.y

(2, 4)

In [112]:
point[0], point[1]

(2, 4)

In [114]:
# A generator expression for the field names
Point = namedtuple("Point", (field for field in "xy"))
Point(8,16)

Point(x=8, y=16)

In [115]:
"""For example, Python provides a built-in function called divmod() that takes two numbers as arguments and 
returns a tuple with the quotient and remainder that result from the integer division of the input numbers:
"""
divmod(8,4)

(2, 0)

In [117]:
"""To remember the meaning of each number, you might need to read the documentation of divmod() 
because the numbers themselves don’t provide much information on their individual meaning. 
The function’s name doesn’t help very much either."""
def custom_divmod(a, b):
    DivMod = namedtuple("DivMod", ["quotient", "number"])
    return DivMod(*divmod(a,b))

custom_divmod(8,4)

DivMod(quotient=2, number=0)

In [118]:
# Reducing the number of arguments to functions
# def create_user(db, username, client_name, plan):
#     db.add_user(username)
#     db.complete_user_profile(username, client_name, plan)
User = namedtuple("User", "username client_name plan")
user = User("john", "John Doe", "Premium")

def create_user(db, user):
    db.add_user(user.username)
    db.complete_user_profile(
        user.username,
        user.client_name,
        user.plan
    )

In [123]:
# Reading tabular data from files and databases
import csv

with open("Data/employees.csv", "r") as csv_file:
    reader = csv.reader(csv_file)
    Employee = namedtuple("Employee", next(reader), rename=True)
    for row in reader:
        employee = Employee(*row)
        print(employee.name, employee.job, employee.email)

Linda Technical Lead linda@example.com
Joe Senior Web Developer joe@example.com
Lara Project Manager lara@example.com
David Data Analyst david@example.com
Jane Senior Python Developer jane@example.com


In [124]:
# namedtuple vs Data Class
# Data Classes can be thought as "mutable named tuples with defaults"
from dataclasses import dataclass

@dataclass
class Person:
    name:str
    age:int
    height:float
    weight:float
    country:str = "Turkiye"

Kemal = Person("Kemal",27, 1.78, 105)
Kemal

Person(name='Kemal', age=27, height=1.78, weight=105, country='Turkiye')

In [126]:
"""Mutability-wise, data classes are mutable by definition, so you can change the value of their attributes when needed. 
However, they have an ace up their sleeve. You can set the dataclass() decorator’s frozen argument to True and make them immutable"""
@dataclass
class Person(frozen = True):
    name:str
    age:int
    height:float
    weight:float
    country:str = "Turkiye"

Kemal.name = "Alparslan"

TypeError: __init_subclass__() takes no keyword arguments

In [127]:
""" Another subtle difference between named tuples and data classes is that the latter aren’t iterable by default. 
Stick to the Kemal example and try to iterate over her data:
"""
for field in Kemal:
    print(field)

TypeError: 'Person' object is not iterable

In [3]:
# Counter
from collections import Counter
poke_types = ['grass', 'dark', 'fire', 'fire']
type_counts = Counter((poke_types))
print(type_counts)

Counter({'fire': 2, 'grass': 1, 'dark': 1})


In [4]:
import collections

# initializing deque
de = collections.deque([1, 2, 3])
print("deque: ", de)

# using append() to insert element at right end
# inserts 4 at the end of deque
de.append(4)

# printing modified deque
print("\nThe deque after appending at right is : ")
print(de)

# using appendleft() to insert element at left end
# inserts 6 at the beginning of deque
de.appendleft(6)

# printing modified deque
print("\nThe deque after appending at left is : ")
print(de)


deque:  deque([1, 2, 3])

The deque after appending at right is : 
deque([1, 2, 3, 4])

The deque after appending at left is : 
deque([6, 1, 2, 3, 4])


##### The itertools module 
- Part of Python's Standard Library (built-in module)
- Functional tools for creating and using iterators

- Notable:

-     Infinite iterators: count, cycle, repeat
-     Finite iterators: accumulate, chain, zip_longest, etc.
-     Combination generators: product, permutations, combinations

In [9]:
poke_types = ["Bug", 'Fire', 'Ghost', 'Grass', 'Water']
from itertools import combinations
combos_obj = combinations(poke_types, 2)
print(*combos_obj)

('Bug', 'Fire') ('Bug', 'Ghost') ('Bug', 'Grass') ('Bug', 'Water') ('Fire', 'Ghost') ('Fire', 'Grass') ('Fire', 'Water') ('Ghost', 'Grass') ('Ghost', 'Water') ('Grass', 'Water')


##### Set Theory
- Branch of Mathmematics applied to collections of objects

- Python has built-in set datatype with accompanying methods:
-      intersection(): all elements that are in both sets
-      difference(): all elements in one set but not the other
-      symmetric_difference(): all elements in exactly one set
-      union(): all elements that are in either set

- Fast membership testing
-      Check if a value exists in a sequence or not
-      Using the in operator

In [2]:
names_list = ['Kemal', "Mahmut", "Talha", 
                    'Kemal', "Mahmut", "Talha",
                        'Kemal', "Mahmut", "Talha",
                              'Kemal', "Mahmut", "Talha","Key"]
names_tuple = ('Kemal', "Mahmut", "Talha", 
                    'Kemal', "Mahmut", "Talha",
                        'Kemal', "Mahmut", "Talha",
                              'Kemal', "Mahmut", "Talha","Key")
names_set = {'Kemal', "Mahmut", "Talha", 
                    'Kemal', "Mahmut", "Talha",
                        'Kemal', "Mahmut", "Talha",
                              'Kemal', "Mahmut", "Talha","Key"}

In [3]:
%timeit 'Kemal' in names_list

36.2 ns ± 0.947 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [4]:
%timeit 'Kemal' in names_tuple

36.8 ns ± 0.551 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [5]:
%timeit 'Kemal' in names_set

34.3 ns ± 0.187 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [6]:
# Uniques with sets
number_of_elements_in_list = len(names_list)
number_of_elements_in_tuple = len(names_tuple)
number_of_elements_in_set = len(names_set)
print(number_of_elements_in_list)
print(number_of_elements_in_tuple)
print(number_of_elements_in_set)

13
13
4


##### Eliminating Loops
- Fewer lines of codes
- Better code readability 
-       Flat is better than nested
- Efficiency gains

In [13]:
poke_stats = [
    [90, 92, 75, 60],
    [25, 20, 15, 90],
    [65, 130, 60, 75]
]
totals = [] 

In [14]:
%%timeit
# For loop approach
for row in poke_stats:
    totals.append(sum(row))

569 ns ± 20.2 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


In [15]:
%%timeit
# List comprehension
totals_list = [sum(row) for row in poke_stats]

499 ns ± 15.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


In [17]:
%%timeit 
# Built-in map() function
totals_map = [*map(sum,poke_stats)]

457 ns ± 10.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


In [25]:
# Eliminate loops with Numpy
import numpy as np
poke_stats = np.array([
    [90, 92, 75, 60],
    [25, 20, 15, 90],
    [65, 130, 60, 75]
])
avgs = []

In [26]:
%%timeit
for row in poke_stats:
    avg = np.mean(row)
    avgs.append(avg)

21.1 µs ± 241 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [27]:
avgs = []

In [28]:
%timeit avgs = poke_stats.mean(axis=1)

10 µs ± 109 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


##### Writing better loops
- Understand what is being done with each loop iteration
- Move one-time calculations outside (above) the loop
- Use holistic conversions outside (below) the loop
- Anything that is done once should be outside the loop

In [29]:
import numpy as np
names = ['Absol', 'Aron', 'Jynx', 'Natu', 'Onix']
attacks = np.array([130, 70, 50, 50, 45])
# Calculate total average once (outside the loop)
total_attack_avg = attacks.mean()
for pokemon, attack in zip(names, attacks):
    if attack > total_attack_avg:
        print(
            "{}'s attack: {} > average: {}!"
            .format(pokemon, attack, total_attack_avg)
        )

Absol's attack: 130 > average: 69.0!
Aron's attack: 70 > average: 69.0!


In [30]:
names = ['Pikachu', 'Squirtle', 'Articuno']
legend_status = [False, False, True]
generations = [1,1,1]
poke_data_tuples = []
for poke_tuple in zip(names, legend_status, generations):
    poke_data_tuples.append(poke_tuple)
poke_data = [*map(list, poke_data_tuples)]   # Holistic conversion

### Tricks
- Use multiple assignments.
- Do not use global variables if it is not necessary.
- Concatenate strings with join.
- Use generators.
- Use 1 for inifinity loops
- Use speed up applications
- Use special libraries to process large datasets

In [None]:
# Use multiple assignments
a, b, c, d = 2, 3, 5, 7


In [None]:
# Do not use global variables if it is not necessary


In [None]:
# Concatenate strings with join
ConcatenatedString = " ".join(["Programming", "is", "fun"])
print(ConcatenatedString)


Programming is fun


In [None]:
# Use generators, If you have a large amount of data in your list and you need to use one data at a time and for once then use generators. It will save you time.

"""It is fairly simple to create a generator in Python. It is as easy as defining a normal function, but with a yield statement instead of a return statement.

If a function contains at least one yield statement (it may contain other yield or return statements), it becomes a generator function. Both yield and return will return some value from a function.

The difference is that while a return statement terminates a function entirely, yield statement pauses the function saving all its states and later continues from there on successive calls."""

# A simple generator function


def my_gen():
    n = 1
    print('This is printed first')
    # Generator function contains yield statements
    yield n

    n += 1
    print('This is printed second')
    yield n

    n += 1
    print('This is printed at last')
    yield n


In [None]:
# It returns an object but does not start execution immediately.
a = my_gen()
a


<generator object my_gen at 0x7fd803368ac0>

In [None]:
# We can iterate through the items using next().
next(a)


This is printed first


1

In [None]:
# Once the function yields, the function is paused and control is transferred to the caller.
# Local variables and their states are remembered between successive calls.
next(a)


This is printed second


2

In [None]:
next(a)


This is printed at last


3

In [None]:
# Finall, when the function terminates, StopIteration is raised automatically on further calls.

next(a)


StopIteration: 

In [None]:
"""One interesting thing to note in the above example is that the value of variable n is remembered between each call.

Unlike normal functions, the local variables are not destroyed when the function yields. Furthermore, the generator object can be iterated only once.

To restart the process we need to create another generator object using something like a = my_gen()."""


'One interesting thing to note in the above example is that the value of variable n is remembered between each call.\n\nUnlike normal functions, the local variables are not destroyed when the function yields. Furthermore, the generator object can be iterated only once.\n\nTo restart the process we need to create another generator object using something like a = my_gen().'

In [None]:
# Python Generators with a Loop
# A for loop takes an iterator and iterates over it using next() function. It automatically ends when StopIteration is raised.
def rev_str(my_str):
    length = len(my_str)
    for i in range(length - 1, -1, -1):
        yield my_str[i]


# For loop to reverse the string
for char in rev_str("hello"):
    print(char)


o
l
l
e
h


In [None]:
# The major difference between a list comprehension and a generator expression is that a list comprehension produces the entire list while the generator expression produces one item at a time.
# Initialize the list
my_list = [1, 3, 6, 10]

# square each term using list comprehension
list_ = [x**2 for x in my_list]

# same thing can be done using a generator expression
# generator expressions are surrounded by parenthesis ()
generator = (x**2 for x in my_list)

print(list_)
print(generator)


[1, 9, 36, 100]
<generator object <genexpr> at 0x7fd803343890>


In [None]:
for i in my_list:
    print(i)


1
3
6
10


In [None]:
for i in (x**2 for x in my_list):
    print(i)


1
9
36
100


In [None]:
print("my_list:", my_list, "generator:", list(generator))


my_list: [1, 3, 6, 10] generator: []


In [None]:
# Generators can be implemented in a clear and concise way as compared to their iterator class counterpart.
# Following is an example to implement a sequence of power of 2 using an iterator class.
class PowTwo:
    def __init__(self, max=0):
        self.n = 0
        self.max = max

    def __iter__(self):
        return self

    def __next__(self):
        if self.n > self.max:
            raise StopIteration

        result = 2 ** self.n
        self.n += 1
        return result


In [None]:
# The above program was lengthy and confusing. Now, let's do the same using a generator function.
def PowTwoGen(max=0):
    n = 0
    while n < max:
        yield 2 ** n
        n += 1


In [None]:
print(list(PowTwo(5)))


[1, 2, 4, 8, 16, 32]


In [None]:
print(list(PowTwoGen(5)))


[1, 2, 4, 8, 16]


In [None]:
"""A normal function to return a sequence will create the entire sequence in memory before returning the result. This is an overkill, if the number of items in the sequence is very large.

Generator implementation of such sequences is memory friendly and is preferred since it only produces one item at a time."""


'A normal function to return a sequence will create the entire sequence in memory before returning the result. This is an overkill, if the number of items in the sequence is very large.\n\nGenerator implementation of such sequences is memory friendly and is preferred since it only produces one item at a time.'

In [None]:
# Represent Infinite Stream
"""Generators are excellent mediums to represent an infinite stream of data. 
Infinite streams cannot be stored in memory, and since generators produce only one item at a time, they can represent an infinite stream of data."""


def all_even():
    n = 0
    while True:
        yield n
        n += 1


In [None]:
# Pipelining  Generators
def fibonacci_numbers(nums):
    x, y = 0, 1
    for _ in range(nums):
        x, y = y, x+y
        yield x


def square(nums):
    for num in nums:
        yield num**2


print(sum(square(fibonacci_numbers(10))))


4895


In [None]:
# Use 1 for inifinity loops. Use while 1 instead of while True. It will reduce some runtime.


In [None]:
# Use special libraries to process large datasets. C/C++ is faster than python.
# So, many packages and modules have been written in C/C++ that you can use in your python programme.
# Numpy, Scipy and Pandas are three of them and are popular for processing large datasets.
