## Bite-Sized Python Recipes
_Disclaimer:_ This is a collection of small useful functions I've found around the web, mainly on Stack Overflow or Python documentation page. I intend to keep up-to-date.

Jump to [Vol. 2](#vol2)

**Create a Dictionary From Two Lists:**

In [1]:
prod_id = [1, 2, 3]
prod_name = ['foo', 'bar', 'baz']
prod_dict = dict(zip(prod_id, prod_name))

prod_dict

{1: 'foo', 2: 'bar', 3: 'baz'}

**Remove Duplicates From a List and Keep the Order:**

In [2]:
from collections import OrderedDict

nums = [1, 2, 4, 3, 0, 4, 1, 2, 5]
list(OrderedDict.fromkeys(nums))

# As of Python 3.6 (for the CPython implementation) and
# as of 3.7 (across all implementations) dictionaries remember
# the order of items inserted. So, a better one is:
list(dict.fromkeys(nums))

[1, 2, 4, 3, 0, 5]

**Create a Multi-Level Nested Dictionary:**

Create a dictionary as a value in a dictionary.  Essentially, it's a dictionary that goes multiple levels deep.

In [3]:
from collections import defaultdict

def multi_level_dict():
    """ Constructor for creating multi-level nested dictionary. """

    return defaultdict(multi_level_dict)

_Example 1:_

In [4]:
d = multi_level_dict()
d['a']['a']['y'] = 2
d['b']['c']['a'] = 5
d['x']['a'] = 6

d

defaultdict(<function __main__.multi_level_dict()>,
            {'a': defaultdict(<function __main__.multi_level_dict()>,
                         {'a': defaultdict(<function __main__.multi_level_dict()>,
                                      {'y': 2})}),
             'b': defaultdict(<function __main__.multi_level_dict()>,
                         {'c': defaultdict(<function __main__.multi_level_dict()>,
                                      {'a': 5})}),
             'x': defaultdict(<function __main__.multi_level_dict()>,
                         {'a': 6})})

_Example 2:_

A list of products is given, where each product needs to be delivered from its origin to its distribution center (DC), and then to its destination. Given this list, create a dictionary for the list of products that are shipped through each DC, coming from each origin and going to each destination.

In [5]:
import random
random.seed(20)

# Just creating arbitrary attributes for each Product instance
class Product:
    def __init__(self, id):
        self.id = id
        self.materials = random.sample('ABCD', 3)  # comprising materials
        self.origin = random.choice(('o1', 'o2'))
        self.destination = random.choice(('d1', 'd2', 'd3'))
        self.dc = random.choice(('dc1', 'dc2'))
        
    def __repr__(self):
        return f'P{str(self.id)}'


products = [Product(i) for i in range(20)]

# create the multi-level dictionary
def get_dc_origin_destination_products_dict(products):
    dc_od_products_dict = multi_level_dict()
    for p in products:
        dc_od_products_dict[p.dc][p.origin].setdefault(p.destination, []).append(p)
    return dc_od_products_dict


dc_od_orders_dict = get_dc_origin_destination_products_dict(products)
dc_od_orders_dict

defaultdict(<function __main__.multi_level_dict()>,
            {'dc1': defaultdict(<function __main__.multi_level_dict()>,
                         {'o2': defaultdict(<function __main__.multi_level_dict()>,
                                      {'d3': [P0, P15],
                                       'd1': [P2, P9, P14, P18],
                                       'd2': [P3, P13]}),
                          'o1': defaultdict(<function __main__.multi_level_dict()>,
                                      {'d1': [P1, P16],
                                       'd3': [P4, P6, P7, P11],
                                       'd2': [P17, P19]})}),
             'dc2': defaultdict(<function __main__.multi_level_dict()>,
                         {'o1': defaultdict(<function __main__.multi_level_dict()>,
                                      {'d1': [P5, P12], 'd3': [P10]}),
                          'o2': defaultdict(<function __main__.multi_level_dict()>,
                                     

**Return the Keys and Values From the Innermost Layer of a Nested Dict:**

In [6]:
from collections import abc

def nested_dict_iter(nested):
    """ Return the keys and values from the innermost layer of a nested dict. """

    for key, value in nested.items():
        # Check if value is a dictionary. abc.Mapping is used for generality
        if isinstance(value, abc.Mapping):
            yield from nested_dict_iter(value)
        else:
            yield key, value

_Example 1:_

In [7]:
d = {'a':{'a':{'y':2}},'b':{'c':{'a':5}},'x':{'a':6}}
list(nested_dict_iter(d))

[('y', 2), ('a', 5), ('a', 6)]

_Example 2:_ let's retrieve keys and values from our `dc_od_orders_dict` above.

In [8]:
list(nested_dict_iter(dc_od_orders_dict))

[('d3', [P0, P15]),
 ('d1', [P2, P9, P14, P18]),
 ('d2', [P3, P13]),
 ('d1', [P1, P16]),
 ('d3', [P4, P6, P7, P11]),
 ('d2', [P17, P19]),
 ('d1', [P5, P12]),
 ('d3', [P10]),
 ('d1', [P8])]

**The Intersection of Multiple Sets:**

In [9]:
def get_common_attr(attr, *args):
    """ intersection requires 'set' objects """
    
    return set.intersection(*[set(getattr(a, attr)) for a in args])

_Example:_ Find the common comprising materials, if any, among our first 5 `products`.

In [10]:
get_common_attr('materials', *products[:5])

{'B'}

**First Match:**

Find the first element, if any, from an iterable that matches a condition.

In [11]:
def first_match(iterable, check_condition, default_value=None):
    """ check_condition is a function. """
    
    return next((i for i in iterable if check_condition(i)), default_value)

Example:

In [12]:
nums = [1, 2, 4, 0, 5]
f1 = first_match(nums, lambda x: x > 3)
f2 = first_match(nums, lambda x: x > 9)
f3 = first_match(nums, lambda x: x > 9, 'no_match')
f1, f2, f3

(4, None, 'no_match')

**Powerset:**

The powerset of a set S is the set of all the subsets of S.

In [13]:
import itertools as it

def powerset(iterable):
    s = list(iterable)
    return it.chain.from_iterable(it.combinations(s, r)
                                  for r in range(len(s) + 1))

Example:

In [14]:
list(powerset([1,2,3]))

[(), (1,), (2,), (3,), (1, 2), (1, 3), (2, 3), (1, 2, 3)]

**Timer Decorator:**

Shows the runtime of each class/method/function.

In [15]:
from time import time
from functools import wraps

def timeit(func):
    """
    :param func: Decorated function
    :return: Execution time for the decorated function
    """

    @wraps(func)
    def wrapper(*args, **kwargs):
        start = time()
        result = func(*args, **kwargs)
        end = time()
        print(f'{func.__name__} executed in {end - start:.4f} seconds')
        # In case you use logging module:
        # logging.info(f'{func.__name__} executed in {end - start:.4f} seconds')
        return result

    return wrapper

_Example:_

In [16]:
import random

# An arbitrary function
@timeit
def sort_rnd_num():
    numbers = [random.randint(100, 200) for _ in range(100000)]
    numbers.sort()
    return numbers
    
numbers = sort_rnd_num()

sort_rnd_num executed in 0.2194 seconds


**Calculate the Total Number of Lines in a File:**

In [17]:
def file_len(file_name, encoding='utf8'):
    with open(file_name, encoding=encoding) as f:
        i = -1
        for i, line in enumerate(f):
            pass
    return i + 1

Example: How many lines of codes are there in the python files of your current directory?

_Using `os` and `glob`:_

In [18]:
import os
import glob

path = os.path.abspath('')
files_list = glob.glob(path + '/*.ipynb')  # '/*.py' or '/*.ipynb' depending on what you have
print(sum(file_len(f) for f in files_list))

1011


_Using `pathlib`:_
Find out more about `pathlib` and its corrospondance to `os` [here](https://docs.python.org/3/library/pathlib.html#correspondence-to-tools-in-the-os-module).

In [19]:
from pathlib import Path

p = Path()
path = p.resolve()  # similar to os.path.abspath()
print(sum(file_len(f) for f in path.glob('*.ipynb')))  # '/*.py' or '/*.ipynb' depending on what you have

1011


**Just For Fun! Creating Long Hashtags:**

In [20]:
s = "#this is how I create very long hashtags"
"".join(s.title().split())

'#ThisIsHowICreateVeryLongHashtags'

### Some mistakes to avoid:

Be careful not to mix up mutable and immutable objects!

**Initialize a dictionary with empty lists as values:**

In [21]:
nums = [1, 2, 3, 4]
# Create a dictionary with keys from the list.
# Let's implement the dictionary in two ways
d1 = {n: [] for n in nums}
d2 = dict.fromkeys(nums, [])
# d1 and d2 may look similar. But list is mutable.
d1[1].append(5)
d2[1].append(5)
# Let's see if d1 and d2 are similar
print(f'd1 = {d1} \nd2 = {d2}')

d1 = {1: [5], 2: [], 3: [], 4: []} 
d2 = {1: [5], 2: [5], 3: [5], 4: [5]}


**Don't modify a list while iterating over it:**

This is something that should be avoided in any collection.

_Example:_ Remove all numbers less than 5 from a list.

- Wrong Implementation: Remove the elements while iterating!

In [22]:
nums = [1, 2, 3, 5, 6, 7, 0, 1]

for ind, n in enumerate(nums):
    if n < 5:
        del(nums[ind])

# expected: nums = [5, 6, 7]
nums

[2, 5, 6, 7, 1]

- Correct Implementation:
Use list comprehension to create a new list containing only the elements you want!

In [23]:
nums = [1, 2, 3, 5, 6, 7, 0, 1]
id(nums)  # before modification

2347411645384

In [24]:
nums = [n for n in nums if n >= 5]
id(nums)  # after modification

2347411752648

`id(nums)` is checked before and after to show that in fact, they are different lists. So, if the list is used in other places and it's important to mutate the existing list, rather than creating a new list with the same name, assign it to the slice:

In [25]:
nums = [1, 2, 3, 5, 6, 7, 0, 1]
id(nums)  # before modification

2347411753992

In [26]:
nums[:] = [n for n in nums if n >= 5]
id(nums)  # after modification

2347411753992

# ==========================================================
<a id='vol2'></a>
## Vol. 2
# ==========================================================

In [1]:
# We can print all the outputs not just the last one in a cell by adding the following snippet 
# at the top of the notebook. To revert to the original setting, you can uncomment the last line.
from IPython.core.interactiveshell import InteractiveShell 
InteractiveShell.ast_node_interactivity = "all"

# # To revert to the original setting
# InteractiveShell.ast_node_interactivity = "last_expr"

**Return the First _N_ Items of an Iterable:**

In [2]:
import itertools as it

def first_n(iterable, n):
    """ If n > len(iterable) then all the elements are returned. """
    
    return list(it.islice(iterable, n))

_Example:_

In [3]:
d1 = {3: 4, 6: 2, 0: 9, 9: 0, 1: 4}
first_n(d1.items(), 3)
first_n(d1, 10)

[(3, 4), (6, 2), (0, 9)]

[3, 6, 0, 9, 1]

**Check If All the Elements of an Iterable Are the Same:**

In [4]:
import itertools as it

def all_equal(iterable):
    """ Returns True if all the elements of iterable are equal to each other. """

    g = it.groupby(iterable)
    return next(g, True) and not next(g, False)

_Example:_

In [5]:
all_equal([1, 2, 3])
all_equal(((1, 0), (True, 0)))
all_equal([{1, 2}, {2, 1}])
all_equal(['4', '4', '4'])
all_equal(['4', '4', 4])
all_equal([False, 0])
all_equal([])
all_equal([{1:0, 3:4}, {True:False, 3:4}])

False

True

True

True

False

True

True

True

When you have a sequence, the following alternative is usually even faster. (Make sure you test it for yourself if you're working with very large sequences.)

In [6]:
def all_equal_seq(sequence):
    """ Only works on sequences. Returns True if the sequence is empty
    or all the elements are equal to each other. """

    return not sequence or sequence.count(sequence[0]) == len(sequence)

_Example1:_

In [7]:
all_equal_seq([1, 2, 3])
all_equal_seq(((1, 0), (True, 0)))
all_equal_seq([{1, 2}, {2, 1}])
all_equal_seq(['4', '4', '4'])
all_equal_seq(['4', '4', 4])
all_equal_seq([False, 0])
all_equal_seq([])
all_equal_seq([{1:0, 3:4}, {True:False, 3:4}])

False

True

True

True

False

True

True

True

_Example2:_ You have a list of trucks and you can check whether they are in the warehouse or en route. As the day progresses, the status of each truck changes.

In [8]:
import random
random.seed(500)

# Just creating an arbitrary class and attributes
class Truck:
    def __init__(self, id):
        self.id = id
        self.status = random.choice(('loading-unloading', 'en route'))  # random status
        
    def __repr__(self):
        return f'P{str(self.id)}'


trucks = [Truck(i) for i in range(50)]

In the morning you checked and saw that the first truck is `en route`. You heard that three others are also left the warehouse. Let's verify this:

In [9]:
all_equal_seq([t.status for t in trucks[:4]])

True

**Sum an Iterable With `None`:**

When you have `numpy` arrays or `pandas` Series or DataFrame, the options are obvious: [`numpy.nansum`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.nansum.html) or [`pandas.DataFrame/Series.sum`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sum.html). But what if you don't want or can't use those?

In [10]:
def sum_with_none(iterable):
    """ Returns sum of elements in an iterable that contains None. """
    
    assert not any(isinstance(v, str) for v in iterable), 'string is not allowed!'

    return sum(filter(None, iterable))

This works because [`filter`](https://docs.python.org/3/library/functions.html#filter) treats `None` as the identity function; i.e., all [falsy](https://stackoverflow.com/a/39984051) elements of iterable are removed.

_Example:_

In [11]:
seq1 = [None, 1, 2, 3, 4, 0, True, False, None]
# sum(seq1)  # --> TypeError
# list(filter(None, seq1))  # --> [1, 2, 3, 4, True]
sum_with_none(seq1)  # Remember True == 1

11

**Check if _N_ or Fewer Items are Truthy:**

In [12]:
def max_n_true(iterable, n):
    """ Returns True if at most `n` values are truthy. """
    
    return sum(map(bool, iterable)) <= n

_Example:_

In [13]:
seq2 = [None, 1, 2, 3, 4, 0, True, False, 'hi']
max_n_true(seq1, 5)
max_n_true(seq2, 5)  # It's now 6

True

False

**Check If Exactly One Element in an Iterable is True:**

In [15]:
def single_true(iterable):
    """ Returns True if only one element of iterable is truthy.
    It does it by making sure the iterator has any truthy value. Then, keeps looking
    from that point in the iterator to make sure there is no other truthy value. """

    i = iter(iterable)
    return any(i) and not any(i)

_Example:_ Putting a couple of the above functions to use!

In [16]:
# Just creating an arbitrary class and attributes
class SampleGenerator:
    def __init__(self, id, method1=None, method2=None, method3=None,
                 condition1=False, condition2=False, condition3=False):
        """
        Assumptions:
        1) One and only one method can be active at a time.
        2) Conditions are not necessary, but if passed, maximum one can have value.
        """
        
        # assumption 1
        assert single_true([method1, method2, method3]), "Exactly one method should be used"
        # assumption 2
        assert max_n_true([condition1, condition2, condition3], 1), "Maximum one condition can be active"

        self.id = id

To avoid clutter, all the cases which cause an `AssertionError` are commented in the cell below. Run them and see the errors for yourself!

In [17]:
sample1 = SampleGenerator(1, method1='active')  # Correct. So, no error should be thrown

# sample2 = SampleGenerator(2, condition2=True)  # no method is active
# sample3 = SampleGenerator(3, method2='active', method3='not-active')  # more than one method has truthy value
# sample4 = SampleGenerator(4, method3='do something', condition1=True, condition3=True)  # multiple condition
# sample5 = SampleGenerator(5)  # nothing passed

**Skip Redundant Headers When Writing to CSV:**

Suppose you need to run a series of simulations. At the end of each run (which may even take several hours), you record some basic statistics and want to create or update a single `restults.csv` file that you use to track outcomes. If so, you probably want to skip writing headers to file after the first time.

First, let's create some data to play with:

In [18]:
import pandas as pd
import random

# An arbitrary function
def gen_random_data():
    demands = [random.randint(100, 900) for _ in range(5)]
    costs = [random.randint(100, 500) for _ in range(5)]
    inventories = [random.randint(100, 1200) for _ in range(5)]
    data = {'demand': demands, 
            'cost': costs, 
            'inventory': inventories}

    return pd.DataFrame(data)

# Let's create a few df
df_list = [gen_random_data() for _ in range(3)]

In [19]:
from pprint import pprint
pprint(df_list)

[   demand  cost  inventory
0     821   385       1197
1     157   232        774
2     211   410        959
3     456   337        192
4     825   193        842,
    demand  cost  inventory
0     375   112        249
1     123   189       1075
2     137   432        583
3     130   301        593
4     130   410        652,
    demand  cost  inventory
0     724   189        108
1     496   396        397
2     392   352        264
3     586   382       1135
4     572   101       1080]


Now, let's assume that we need to write each of `df_list` dataframes to `orders.csv` as soon as they are created.

In [20]:
import os

# This is only for illustration. 
filename = 'orders.csv'
for df in df_list:
    df.to_csv(filename, index=False, mode='a', header=(not os.path.exists(filename)))

If you don't need to loop over similar dataframes one at a time, the alternative below is a concise way to write them to file:

In [21]:
pd.concat(df_list).to_csv('orders2.csv', index=False)

**Convert a CSV File to Python Objects:**

Assume you need to create a collection of Python objects, where their attributes come from the columns of a CSV file, and each row of the file becomes a new instance of that class. However, let's say that you don't know ahead of time what are the CSV columns, and thus you can't initialize the class with the desired attributes.

Below, you can see two ways to achieve that:

In [22]:
class MyClass1:
    def __init__(self, *args, **kwargs):
        for arg in args:
            setattr(self, arg, arg)

        for k, v in kwargs.items():
            setattr(self, k, v)


class MyClass2:
    def __init__(self, **kwargs):
        self.__dict__.update(kwargs)

In `MyClass1` we can pass both `args` and `kwargs`, while in `MyClass2` we took advantage of the special [`__dict__`](https://stackoverflow.com/a/19907498) attribute.

_Example:_ Let's convert our `orders.csv` file from the example above to objects using both implementations:

In [23]:
import csv

filename = 'orders.csv'
class1_list = []
class2_list = []

with open(filename) as f:
    reader = csv.DictReader(f)
    for row in reader:
        class1_list.append(MyClass1(**row))
        class2_list.append(MyClass2(**row))

In [24]:
# Let's check the attributes of the first row of each list
print(f'first row = {vars(class1_list[0])}')
print(f'first row = {vars(class2_list[0])}')

# Let's see all the attributes and data
[vars(c) for c in class1_list]

first row = {'demand': '821', 'cost': '385', 'inventory': '1197'}
first row = {'demand': '821', 'cost': '385', 'inventory': '1197'}


[{'demand': '821', 'cost': '385', 'inventory': '1197'},
 {'demand': '157', 'cost': '232', 'inventory': '774'},
 {'demand': '211', 'cost': '410', 'inventory': '959'},
 {'demand': '456', 'cost': '337', 'inventory': '192'},
 {'demand': '825', 'cost': '193', 'inventory': '842'},
 {'demand': '375', 'cost': '112', 'inventory': '249'},
 {'demand': '123', 'cost': '189', 'inventory': '1075'},
 {'demand': '137', 'cost': '432', 'inventory': '583'},
 {'demand': '130', 'cost': '301', 'inventory': '593'},
 {'demand': '130', 'cost': '410', 'inventory': '652'},
 {'demand': '724', 'cost': '189', 'inventory': '108'},
 {'demand': '496', 'cost': '396', 'inventory': '397'},
 {'demand': '392', 'cost': '352', 'inventory': '264'},
 {'demand': '586', 'cost': '382', 'inventory': '1135'},
 {'demand': '572', 'cost': '101', 'inventory': '1080'},
 {'demand': '821', 'cost': '385', 'inventory': '1197'},
 {'demand': '157', 'cost': '232', 'inventory': '774'},
 {'demand': '211', 'cost': '410', 'inventory': '959'},
 {'de