# Python tips and tricks

**Table of contents**<a id='toc0_'></a>    
- 1. [Organizing data: unpacking, zipping, and sorting](#toc1_)    
  - 1.1. [Unpacking iterables](#toc1_1_)    
  - 1.2. [Zipping iterables](#toc1_2_)    
  - 1.3. [Sorting and reversing data](#toc1_3_)    
  - 1.4. [Using `key` functions to customize sorting](#toc1_4_)    
- 2. [Getting rid of manual loops](#toc2_)    
  - 2.1. [Choosing the right method for iteration](#toc2_1_)    
  - 2.2. [Generator functions](#toc2_2_)    
  - 2.3. [`map()` function](#toc2_3_)    
  - 2.4. [`filter()` function](#toc2_4_)    
- 3. [Reducing memory usage](#toc3_)    
- 4. [Number precision](#toc4_)    
  - 4.1. [Rounding and formatting numbers](#toc4_1_)    
  - 4.2. [Rounding may not work as expected!](#toc4_2_)    
  - 4.3. [Underflow and overflow](#toc4_3_)    
- 5. [Miscellaneous](#toc5_)    
  - 5.1. [Anonymous function (aka lambda function)](#toc5_1_)    
  - 5.2. [Multiplication vs. addition](#toc5_2_)    
  - 5.3. [Using non-built-in data types](#toc5_3_)    
  - 5.4. [Iterating in custom order](#toc5_4_)    

<!-- vscode-jupyter-toc-config
	numbering=true
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

In [1]:
import collections
from timeit import default_timer as timer
# from time import time  # timeit is more accurate than time
import numpy as np
import sys
from pympler import asizeof

# 1. <a id='toc1_'></a>[Organizing data: unpacking, zipping, and sorting](#toc0_)

- When working with iterables and iterators, you can often write more concise code with functions that work with them directly (no manual loops needed).
- **iterable**: an object that can be iterated over (e.g. a list, tuple, string)
- **iterator**: an object that generates the next value when you call next() on it
- You can restructure your data easily by **unpacking** and **zipping** iterables (see below)

## 1.1. <a id='toc1_1_'></a>[Unpacking iterables](#toc0_)

- Unpacking: to retrieve individual elements from an iterable
- Use the unpacking operator `*`:

    `a, b, c = *[1, 2, 3]`

In [2]:
a = [1,2,3]
b = [1,1,1]
c = (a, b)

print(c)

([1, 2, 3], [1, 1, 1])


Example use case:

In [3]:
# # Unpacks the tuple 'c' into its individual elements (a and b),
# # and passes them as separate arguments to the function 'func'.

def func(x, y):
    return sum(x) + sum(y) 

print(func(*c))

9


## 1.2. <a id='toc1_2_'></a>[Zipping iterables](#toc0_)

- Zipping: to combine multiple iterables into a single *iterator* of tuples
- Syntax: `zip(iterable1, iterable2, ...)` or `zip(*list_of_iterables)`
- The resulting zip object is an *iterator* of tuples:
    - It generates tuples containing the elements from each iterable at the same index, e.g.:

      `zip([1, 2, 3], [4, 5, 6])` produces an iterator for `[(1, 4), (2, 5), (3, 6)]`

    - length of the zip object = length of the shortest iterable, e.g.:
  
      `zip([1, 2], [4, 5, 6])` produces an iterator for `[(1, 4), (2, 5)]`

- You can convert the zip object to a list or tuple if needed, e.g.:
    - `list(zip([1, 2, 3], [4, 5, 6]))` returns `[(1, 4), (2, 5), (3, 6)]`
    - `list` iterates over the an iterator and collects the results into a list
- Be aware that an iterator **can only be iterated over once!!**
    - If you need to use the zip object multiple times, you need to recreate it

In [4]:
zipped = zip(a, b)
print(zipped)
print(list(zipped))  # convert zipped to a list

zipped = zip(a, b)  # note: zipped was already used above. We need to recreate it
print(tuple(zipped))  # convert zipped to a tuple

<zip object at 0x0000026A5E966140>
[(1, 1), (2, 1), (3, 1)]
((1, 1), (2, 1), (3, 1))


Example use case: Iterate over multiple lists simultaneously

In [5]:
for x, y in zip(a, b):
    print(x, y)

1 1
2 1
3 1


## 1.3. <a id='toc1_3_'></a>[Sorting and reversing data](#toc0_)

(This file covers lists, tuples and dictionaries only. For Numpy arrays, see the Numpy notes.)

Sort a list:

- `.sort()` method: sorts a list *in place* in ascending order
- `sorted()` function: returns a ***list*** in ascending order
    - Syntax: `sorted(iterable, key=None, reverse=False)` (also see the `key` functions section below)

Reverse a list:

- `.reverse()` method: reverses the list *in place*
- `list(reversed())`: The `reversed()` function returns a reversed ***iterator*, not the list!**
- `sorted(my_list, reverse=True)` returns a new sorted list in descending order
- Slicing backward: `my_list[::-1]`


In [6]:
a = [5, 4, 2, 1, 3]
a.sort() # Sort the list *in place*
a.reverse()  # Reverse the list *in place*
print(a) # prints [5, 4, 3, 2, 1]

a = [5, 4, 2, 1, 3]
sorted_a = sorted(a) # returns [1, 2, 3, 4, 5]
reversed_a = sorted(a, reverse=True) # returns [5, 4, 3, 2, 1]
reversed_sorted_a = list(reversed(sorted_a)) # same output as above
print(sorted_a)
print(reversed_a)
print(reversed_sorted_a)

[5, 4, 3, 2, 1]
[1, 2, 3, 4, 5]
[5, 4, 3, 2, 1]
[5, 4, 3, 2, 1]


Sort or reverse a tuple:

- Tuples are immutable, so you cannot sort or reverse them in place using `.sort()` or `.reverse()`
- `sorted()` and `reversed()` returns a ***list*** and an ***iterator***, respectively. You can convert the output back to a tuple after sorting:
    - `tuple(sorted(my_tuple))` returns a new sorted tuple
    - `tuple(reversed(my_tuple))` returns a new reversed tuple

In [7]:
a = (5, 4, 2, 1, 3)
sorted_a = tuple(sorted(a))  # returns (1, 2, 3, 4, 5)
print(sorted_a)

(1, 2, 3, 4, 5)


Sort a dictionary:

- `dict(sorted(my_dict.items()))` sorts a dictionary by its keys
- `dict(sorted(my_dict.items(), key=lambda item: item[1]))` sorts a dictionary by its values
- Note that `sorted(my_dict.items())` returns a list of key-value tuples, so you need to convert it back to a dictionary using `dict()`.
- Alternatively, use a dictionary comprehension with `sorted()`:
    - `{k: my_dict[k] for k in sorted(my_dict)}` sorts by keys
    - `{k: my_dict[k] for k in sorted(my_dict, key=lambda item: item[1])}` sorts by values
    - Notice that `my_dict.items()` is not needed here because by default `sorted()` sorts the keys.
- You can reverse dictionary items using similar techniques.

**ATTENTION**:

- Dictionaries are *unordered* collections **before Python 3.7**.
- `pprint` (pretty print) sorts dictionaries by keys by default. Set `sort_dicts=False` to disable this. 

In [8]:
my_dict = {'b': 1, 'a': 3, 'c': 2}

# Sort by keys
sorted_by_keys = dict(sorted(my_dict.items()))
print(sorted_by_keys)

# Sort by keys (using a dictionary comprehension)
sorted_by_keys = {k: my_dict[k] for k in sorted(my_dict.keys())}
print(sorted_by_keys) # same output as above

{'a': 3, 'b': 1, 'c': 2}
{'a': 3, 'b': 1, 'c': 2}


## 1.4. <a id='toc1_4_'></a>[Using `key` functions to customize sorting](#toc0_)

- By default, functions like `sorted()`, `max()`, `min()`, etc. sort an iterable in ascending order.
- To change the sorting order, pass a **key function** as the `key` argument to these functions.
- A key function takes an item from the iterable and returns a value that will be used for sorting or comparison (instead of using the item itself).
    - Example: `sorted(iterable, key = lambda x: abs(x))` (sort by absolute value)

In [9]:
# Sort by absolute values
a = [1, -2, 3, -4]
print(sorted(a, key=lambda x: abs(x))) # prints [1, -2, 3, -4]

# Find the dict key with the maximum dict value
my_dict = {'b': 1, 'c': 2, 'a': 3}
print(max(my_dict)) # prints 'c' (dict keys are sorted alphabetically by default)
print(max(my_dict, key=lambda x: my_dict[x])) # prints 'a' (sort by dict values instead)

[1, -2, 3, -4]
c
a


# 2. <a id='toc2_'></a>[Getting rid of manual loops](#toc0_)

- Ways to iterate over data without a manual loop:
    - list comprehensions: `[x for x in iterable]`
    - generator expressions: `(x for x in iterable)` (This creates an iterator that generates values on-the-fly)
    - generator functions: `def my_generator(): yield x` (This creates a function that returns an iterator)
    - `map()`: apply a function to each element of an iterable
    - `filter()`: filter elements of an iterable based on a condition
    - `itertools` module: provides functions for creating iterators for efficient looping 

## 2.1. <a id='toc2_1_'></a>[Choosing the right method for iteration](#toc0_)

- Some methods are more efficient than others, depending on the use case, e.g.:
    - Why do you want to iterate over the data? (e.g. to apply a function, filter elements, organize/sorting data)
    - Do you need to get the results of all iterations at once/ iterate over the entire data right away?
    - Do you need to iterate over the data multiple times or just once?
- Iterating over an iterable (e.g. `for x in [a,b,c,d,e]:`) in one go can be inefficient:
    - It requires loading all items into memory at once
    - The results of all iterations need to be stored in memory, which can be inefficient for large datasets
- If you don't need to iterate over all items right away (e.g. you only need the next item at a time), consider iterating over an iterator instead of an iterable:
    - No need to load all items into memory at once (the next item is generated on-the-fly)
    - You can create an iterator in several ways:
        - Using the `iter()` function on an iterable: `iter(iterable)`
        - Using functions that return iterators, e.g. `range()`, `zip()`, `map()`, `filter()`
        - Using a generator (a special kind of iterator)
            - a **generator expression**: `(x for x in iterable)`
            - a **generator function**: `def my_generator(): yield x`
- If you need to iterate over the same iterator multiple times, consider creating a generator function so you can recreate the iterator easily when needed. (Remember that an iterator can only be iterated over once!)

## 2.2. <a id='toc2_2_'></a>[Generator functions](#toc0_)

- A generator function returns a generator object (an iterator)
- It uses the `yield` keyword (not `return`) to return values **one at a time**
- Call `next()` on the generator object to get the next value

In [10]:
def my_generator():
    yield 1
    yield 2
    yield 3

# Alternatively:
def my_generator():
    for i in range(1, 4):
        yield i

gen = my_generator()
print(next(gen))  # prints 1
print(next(gen))  # prints 2
print(next(gen))  # prints 3
print(next(gen, 'No more items'))  # prints 'No more items' since the generator is exhausted

1
2
3
No more items


## 2.3. <a id='toc2_3_'></a>[`map()` function](#toc0_)

- The `map()` function applies a function to each element of an iterable (or multiple iterables) and returns a map object (an iterator)
- Syntax: `map(function, iterable)`

In [11]:
def func(x):
    return x**2
a = [1, 2, 3]

print(list(map(func, a)))

# Multiple iterables with map function
def func(x, y):
    return x + y
a = [1, 2, 3]
b = [4, 5, 6]
print(list(map(func, a, b)))

[1, 4, 9]
[5, 7, 9]


`map()` vs list comprehensions:
- `map()` is faster than list comprehensions, but it returns an iterator.
- To get a list, you will need to use `list()`.
    - This means you will still need to iterate over all the items (`list()` iterates over the iterator).
    - So, if have a list to iterate over, you might as well just use list comprehensions.

In [12]:
# Compare the performance of list comprehensions and map function
def func(x):
    return x**2

start = timer()
for i in range(1000000):
    [x**2 for x in a] # returns a list
print(f"Time taken with list comprehension: {timer()-start:.3f} s")

start = timer()
for i in range(1000000):
    map(func, a) # returns a map object
print(f"Time taken with map(func, a): {timer()-start:.3f} s")

start = timer()
for i in range(1000000):
    list(map(func, a)) # returns a list
print(f"Time taken with list(map(func, a)): {timer()-start:.3f} s")


Time taken with list comprehension: 0.412 s
Time taken with map(func, a): 0.203 s
Time taken with list(map(func, a)): 0.659 s


## 2.4. <a id='toc2_4_'></a>[`filter()` function](#toc0_)

- The `filter()` function filters elements of an iterable based on a function and returns a filter object (an iterator)
- Syntax: `filter(function, iterable)`

In [13]:
def func(x): 
    return x % 2 == 0 # to filter even numbers
a = [1,2,3,4,5,6]

print(list(filter(func, a)))

[2, 4, 6]


`filter()` vs `np.where()`:

- `filter()` works with both lists and numpy arrays.
- In some cases, `filter()` is faster than `np.where()` **even for numpy arrays!**

In [14]:
def func(x): 
    return x % 2 == 0 # to filter even numbers

start = timer()
for i in range(1000000):
    list(filter(func, a))
print(f"Time taken with list(filter(func, a)): {timer()-start:.3f} s")

a = np.array(a)

start = timer()
for i in range(1000000):
    list(filter(func, a))
print(f"Time taken with list(filter(func, a)) on numpy array: {timer()-start:.3f} s")

start = timer()
for i in range(1000000):
    a[np.where(a%2==0)] # get a numpy array using the indices returned by np.where
print(f"Time taken with a[np.where(a%2==0)]: {timer()-start:.3f} s")

Time taken with list(filter(func, a)): 0.977 s
Time taken with list(filter(func, a)) on numpy array: 2.045 s
Time taken with a[np.where(a%2==0)]: 5.644 s


# 3. <a id='toc3_'></a>[Reducing memory usage](#toc0_)

- Use **Numpy arrays** instead of lists for numerical data.
- Use **iterators** instead of iterables when you don't need to iterate over all items right away. (See the section on making iterations more efficient)

Example
- Numpy arrays are way more memory-efficient than lists, especially for large datasets:

In [15]:
a = [[np.random.rand() for _ in range(100)] for _ in range(10000)] # a list of lists (2D array)
a_numpy = np.array(a)

# memory usage (in MB)
a_numpy_size = sys.getsizeof(a_numpy) / (1024*1024)

# Note that sys.getsizeof() returns the size of the outer object, not the size
# of the elements inside it. To get the accurate size of the list 'a', we need to
# sum the sizes of all its elements at all levels (including sublists).
a_size = (sum([sys.getsizeof(x) for sublist in a for x in sublist])
            + sum([sys.getsizeof(sublist) for sublist in a])
            + sys.getsizeof(a)
        ) / (1024 * 1024)
# Alternatively, use pympler to get the size of 'a'
# from pympler import asizeof
# a_size = asizeof.asizeof(a) / (1024 * 1024)

print(f"Memory usage of a: {a_size:.2f} MB")
print(f"Memory usage of a_numpy: {a_numpy_size:.2f} MB")


Memory usage of a: 31.74 MB
Memory usage of a_numpy: 7.63 MB


# 4. <a id='toc4_'></a>[Number precision](#toc0_)

## 4.1. <a id='toc4_1_'></a>[Rounding and formatting numbers](#toc0_)

In [16]:
# --- Rounding numbers ---
print('----- Rounding numbers -----')

# Rounding to n decimal places
# round(number, ndigits)
print(round(1.2304, 3)) # round to 3 decimal places, prints 1.23

# n significant figures (allows trailing zeros)
# f'{variable:format_specifier}'
# OR: format(variable, format_specifier)
print(f'{1.2304:.3f}') # include 3 significant figures, prints 1.230
print(format(1.2304, '.3f')) # include 3 significant figures, prints 1.230

# Rounding to the nearest tenth, hundredth, thousandth, etc.
# round(number, -n)
print(round(12345, -3)) # rounds to the nearest 1000, prints 12000 (2 significant figures)

# --- Scientific notation ---
print('----- Scientific notation -----')

print(f'{1234567890:.2e}') # prints 1.23e+09
# print(format(1234567890, '.2e'))

# Scientific notation with a specific number of significant figures
print(f'{1234567890:.2g}') # prints 1.2e+09 (2 significant figures)
# print(format(1234567890, '.2g'))

----- Rounding numbers -----
1.23
1.230
1.230
12000
----- Scientific notation -----
1.23e+09
1.2e+09


## 4.2. <a id='toc4_2_'></a>[Rounding may not work as expected!](#toc0_)

- Rounding may not work as expected due to floating-point representation errors. For example, 
    - `round(0.1 + 0.2, 2)` does not return `0.3`, but rather `0.30000000000000004`.
    - `round(1.005, 2)` does not return `1.01`, but rather `1.0`.

In [17]:
print(round(0.1 + 0.2, 2))
print(round(1.005, 2))

0.3
1.0


## 4.3. <a id='toc4_3_'></a>[Underflow and overflow](#toc0_)

- Underflow and overflow can occur when performing arithmetic operations that result in values that are too small or too large:
    - **Underflow**: `1e-400` is too small to be represented accurately, resulting in `0.0`.
    - **Overflow**: `1e400` is too large to be represented accurately, resulting in `inf`.
 

In [18]:
print(1e-400) # prints 0.0 (underflow)
print(1e400) # prints inf (overflow)

0.0
inf


In some cases, multiplcation can lead to representation issues, such as underflow (e.g. when multiplying very small numbers together).
- To avoid this, one can take the logarithm of the numbers *before* multiplying them so that each number is replaced by its logarithm and multiplication is replaced by addition of the logarithms. (`log(a * b * c * d) = log(a) + log(b) + log(c) + log(d)`)
- This results in a more stable computation (i.e. less prone to underflow or overflow issues) and allows a wider range of values to be represented.

In [19]:
a, b, c, d = 2e-100, 2e-100, 2e-100, 2e-100
p, q, r, s = np.log(a), np.log(b), np.log(c), np.log(d)
log_sum = p + q + r + s

print(f"Product of a, b, c, d: {a * b * c * d}")
print(f"Logarithm of product: {log_sum}")

# Note that logrithm of the numbers must be taken before multiplying them
# np.log(a * b * c * d)  # This will return -inf due to underflow

Product of a, b, c, d: 0.0
Logarithm of product: -918.2614484753785


# 5. <a id='toc5_'></a>[Miscellaneous](#toc0_)

## 5.1. <a id='toc5_1_'></a>[Anonymous function (aka lambda function)](#toc0_)

- Key functions are often defined as **anonymous functions**
- Syntax: `lambda arguments: expression`
- Only one expression allowed
- It can be used directly or assigned to a variable

In [20]:
# lambda function with one argument
func = lambda x: x**2
print(func(3))

# lambda function with two arguments
func = lambda x, y: x + y
print(func(1, 2))

9
3


## 5.2. <a id='toc5_2_'></a>[Multiplication vs. addition](#toc0_)

- Multiplication is not necessarily slower than addition.
- This is because addition of floating-point numbers can be more complex due to the need to align the decimal points.

In [21]:
a, b, c, d = 0.25, 0.25, 0.25, 0.25

time_mul = 0
time_add = 0

iter = 1000000
for _ in range(iter):
    # multiplication
    start_mul = timer()
    product = a * b * c * d
    end_mul = timer()
    time_mul += end_mul - start_mul
    # summation
    start_add = timer()
    summed = a + b + c + d
    end_add = timer()
    time_add += end_add - start_add

print(f"Multiplication: {time_mul / iter * 1e6:.3f} microseconds")
print(f"Summation     : {time_add / iter * 1e6:.3f} microseconds")

Multiplication: 0.410 microseconds
Summation     : 0.403 microseconds


## 5.3. <a id='toc5_3_'></a>[Using non-built-in data types](#toc0_)

- The collections module provides alternatives to built-in data types like lists, tuples, and dictionaries. 
- It also provides additional data structures like `deque`, `defaultdict`, `Counter`, and `namedtuple`.

In [22]:
# The deque data structure
print('----- The deque data structure -----')

# Deque is a double-ended queue that supports adding and removing elements from
# both ends in O(1) time. (A bit faster than lists)
# Syntax: collections.deque(iterable)
# Example: Create a deque object from the list a

a = [1,2,3]
d = collections.deque(a)
print(d) # prints deque([1, 2, 3])

d.append(4) # Add elements to the end
print(d) # prints deque([1, 2, 3, 4])

d.appendleft(0) # Add elements to the beginning
print(d) # prints deque([0, 1, 2, 3, 4])

d.pop() # Remove elements from the end
print(d) # prints deque([0, 1, 2, 3])

d.popleft() # Remove elements from the beginning
print(d) # prints deque([1, 2, 3])

# Notes: deque vs lists
# - deque is faster than lists for adding and removing elements from both ends.
# - deque is much SLOWER than lists for accessing elements in the middle.

# --- list operations --- 

a = [1,2,3]
start = timer()
for i in range(1000000):
    a.append(1)
print('list append:', timer()-start)

start = timer()
for i in range(10000):
    a[500000]
print('list access:', timer()-start)

# --- deque operations --- 

a = [1,2,3]
d = collections.deque(a)
start = timer()
for i in range(1000000):
    d.append(1)
print('deque append:', timer()-start)

start = timer()
for i in range(10000):
    d[500000]
print('deque access:', timer()-start)



----- The deque data structure -----
deque([1, 2, 3])
deque([1, 2, 3, 4])
deque([0, 1, 2, 3, 4])
deque([0, 1, 2, 3])
deque([1, 2, 3])
list append: 0.105018500238657
list access: 0.000821900088340044
deque append: 0.09464980009943247
deque access: 1.2352163000032306


In [23]:
# Counter
print('----- The Counter data structure -----')

# A dictionary that counts the number of occurrences of each element in an
# iterable. It is faster than using a for loop with a dictionary.
# Syntax: collections.Counter(iterable)
# Example: Create a Counter object from the list a

a = [1,2,3,1,2,3,1,2,3]
c = collections.Counter(a)
print(c) # prints Counter({1: 3, 2: 3, 3: 3})



----- The Counter data structure -----
Counter({1: 3, 2: 3, 3: 3})


In [24]:
# The namedtuple data structure
print('----- The namedtuple data structure -----')

# - A tuple with named fields. Think of it as a dictionary with predefined keys.
# - It is more memory-efficient than using a dictionary.
# Syntax: collections.namedtuple(typename, field_names)
# Example: Create a namedtuple object with the fields x and y

MyClass = collections.namedtuple('MyClass', ['x', 'y']) # MyClass is the typename
p = MyClass(1, 2) # each value is assigned to each of the predefined fields (x, y)
# You can access the fields by their names or their indices
print(p.x, p.y) # prints 1 2
print(p[0], p[1]) # prints 1 2

# Note: Unlike an object, you cannot change the fields of a namedtuple after creation.
# p.x = 3  # this won't work

# namedtuple is more memory-efficient than a dictionary.
# Accessing fields in a namedtuple is slightly slower (but not significantly).

# namedtuple vs dictionary memory usage
p = MyClass(1, 2)
print('namedtuple memory usage:', asizeof.asizeof(p)) # prints 120
dict = {'x':1, 'y':2}
print('dict memory usage:', asizeof.asizeof(dict)) # prints 360

# Time efficiency of namedtuple vs dictionary (accessing fields)
# Accessing fields in a namedtuple 
start = timer()
for i in range(10000000):
    p.x
print('namedtuple access time:', timer() - start)
# Accessing fields in a dictionary
start = timer()
for i in range(10000000):
    dict['x']
print('dict access time:', timer() - start)


----- The namedtuple data structure -----
1 2
1 2
namedtuple memory usage: 120
dict memory usage: 360
namedtuple access time: 1.0628104996867478
dict access time: 0.9576869001612067


## 5.4. <a id='toc5_4_'></a>[Iterating in custom order](#toc0_)



In [25]:
a = [5, 4, 2, 1, 3]

# Iterate in reverse order
for x in reversed(a):
    print(x, end=' ')  # prints 3 1 2 4 5

print()  # new line

# Iterate in ascending order
for x in sorted(a):
    print(x, end=' ')  # prints 1 2 3 4 5

print()  # new line

# Iterate in custom order
for x in sorted(a, key=lambda x: -x):  # sort by negative value
    print(x, end=' ')  # prints 5 4 3 2 1

3 1 2 4 5 
1 2 3 4 5 
5 4 3 2 1 