# Tips & Advice for Python Programing in Financial Accounting Research
This notebook contains some tricks I have learned while coding computational methods in Financial Accounting Research. The first section focuses on tricks to make your code shorter and more readable, while the second section focuses on how to make your code faster. Keep in mind that a faster code is not necessarily easy to read. The recommendation is to use readable code as much as possible except in the blocks that take too long to compute.

The content is strongly influenced by other people's work. See the list below:
[Python Engineer](https://www.youtube.com/watch?v=8OKTAedgFYg)
[Corey Schafer](https://www.youtube.com/watch?v=C-gEQdGVXbk)
[mCoding](https://www.youtube.com/watch?v=m_a0fN48Alw)

# 1. Better code
## 1.1 Iterate with Enumerate instead of Range(len())

In [1]:
data=[1,2,3,-4]
#classic approach
for i in range(len(data)):
    if data[i]<0:
            data[i]=0
print(data)

#enumerate: capture the index, the values, and the length of the list.
for ind, value in enumerate(data):
    if value<0:
        data[ind]=0
print(data)

[1, 2, 3, 0]
[1, 2, 3, 0]


## 1.2 Iterate in multiple lists with ZIP
Iterate in two or more lists of the same size. Useful when computing expected values. Is this faster relative to a matrix multiplication? We will see it later.

In [2]:
probabilities=[0.1,0.2,0.3,0.4]
values=[1,2,3,4]
multiplications=[]
#by using zip() we avoid doing coding a doble iteration.
for prob, value in zip(probabilities,values):
    multiplications.append(prob*value)
print(f" the expected value is {sum(multiplications)}")

 the expected value is 3.0


## 1.2 Use list comprehension instead of loops

In [3]:
squares=[]
for i in range(10):
    squares.append(i*i)
print(squares)

squares2=[i*i for i in range(10)]  # the same result but in just one line
print(squares2)


[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


## 1.3 Use ternary conditionals instead of the classic "if-else" block

In [4]:
condition=False

#classic approach
if condition==False:
    x=1
else:
    x=0
print(x)

# ternary approach
x=1 if condition==False else 0
print(x)

1
1


## 1.4 Save memory with generators or iter lists
If the use of memory is a concern, **do not** use lists.

In [5]:
import sys
my_list =[i for i in range(100000)] #list comprehension uses a lot of memory
print(sum(my_list))
print(sys.getsizeof(my_list),"bytes")

my_iter =iter(range(100000)) # Iterators barely use memory
print(sum(my_iter))
print(sys.getsizeof(my_iter),"bytes")

my_iter_list =iter([i for i in range(100000)]) # List Iterators barely use memory
print(sum(my_iter_list))
print(sys.getsizeof(my_iter_list),"bytes")

my_gen =(i for i in range(100000)) #Generators also less  memory
print(sum(my_gen))
print(sys.getsizeof(my_gen),"bytes")

4999950000
824456 bytes
4999950000
48 bytes
4999950000
48 bytes
4999950000
112 bytes


## 1.5 Easy to count (million)

In [6]:
num1=100_000_000     # use _ to visualize numbers better.
num2=2_000_000_000
total=num1+num2
print(f"{total:,}")  # also the output

2,100,000,000


## 1.6 Unpacking values from tuples

In [7]:
parameters=(1,2,3,4,5,6,7)

α,β,γ,δ,η,ρ,τ=parameters  #full unpacking
print(α)
print(β)

a,b,*c = parameters       #partial unpacking
print(a)
print(b)
print(c)

x,y,*_,z =parameters       #partial unpacking
print(x)
print(y)
print(z)

1
2
1
2
[3, 4, 5, 6, 7]
1
2
7


# 2. Speed up your code
## 2.1 Comparing iterators, list comprehension, and generators.
If the speed is a concern, **do** use lists.

In [8]:
import time

n = 10000000

# iterator on a range: Fast.
iter_ = iter(range(n)) 

# list comprehension: Fastest, altough it saves all the elements in the memory.
list_comp = [i for i in range(n)] 

# iterator on a list comprehension:  Fast.
iter_list_comp = iter([i for i in range(n)])

# generator: super slow, because they yield one result at a time (using less memory).
gene = (i for i in range(n))

for xs in [iter_, list_comp, iter_list_comp, gene]:
    start = time.time()
    sum(xs)
    end = time.time()
    print(f"type: {type(xs)}, Time: {(end-start):.3f} [sec]")

type: <class 'range_iterator'>, Time: 0.130 [sec]
type: <class 'list'>, Time: 0.043 [sec]
type: <class 'list_iterator'>, Time: 0.132 [sec]
type: <class 'generator'>, Time: 0.333 [sec]


## 2.2 Use explicit static definition
Defining the variable type speeds up the code since the compiler does not have to interpret. The speed gain is noticeable just the first time you run it. Anycase, it is good advice to use explicit definition since it helps identify bugs.

In [19]:
import time
import numpy.typing as npt
import numpy as np

def supersum(x:int,y:float,w:bool, arr: npt.ArrayLike)->np.ndarray:
    z=sum(i for i in range(x))
    if w==True:
        z=z/y*arr
    return z 
def simplesum(x,y,w,arr):
    z=sum(i for i in range(x))
    if w==True:
        z=z/y*arr
    return z

nparray=np.ones((1000,500000))

start = time.time()
zz=supersum(1000000,33,True,nparray)
end = time.time()
print(f"static definition method. Time:{(end-start):.5f}[sec]")

start = time.time()
zz=simplesum(1000000,33,True,nparray)
end = time.time()
print(f"normal method. Time:{(end-start):.5f}[sec]")

static definition method. Time:0.61471[sec]
normal method. Time:0.62027[sec]


## 2.3 Iteration in two arrays: comparing 3 alternatives to compute expectations
if speed is a concern, use matrix operations and avoid to use zip()

In [10]:
def double_iteration(values: npt.ArrayLike,probabilities: npt.ArrayLike)->int:
    mult1=np.empty(len(probabilities))
    for ind,prob in enumerate(probabilities):
        mult1[ind]=prob*values[ind]
    expectation=np.sum(mult1)
    return expectation

def zip_iteration(values: npt.ArrayLike,probabilities: npt.ArrayLike)->int:
    mult1=np.empty(len(probabilities))
    for value,prob in zip(values,probabilities):
        np.append(mult1,prob*value)
    expectation=np.sum(mult1)
    return expectation

def matrix_operation(values: npt.ArrayLike,probabilities: npt.ArrayLike)->int:
    expectation=values.T@probabilities #np.dot() takes almost the same time
    return expectation

In [11]:
probabilities=np.ones(1000)*1/1000
values=np.linspace(0,100,1000)

# first method (the second place)
start = time.time()
mat1=double_iteration(values,probabilities)
end = time.time()
print(f"double_iteration: Result: {mat1:.2f}, Time: {(end-start):.5f}[sec]")

# second method (the slowest)
start = time.time()
mat2=zip_iteration(values,probabilities)
end = time.time()
print(f"zip_iteration: Result: {mat2:.2f}, Time: {(end-start):.5f}[sec]")

# third method (the fastest!)
start = time.time()
mat3=matrix_operation(values,probabilities)
end = time.time()
print(f"matrix_operation: Result: {mat3:.2f}, Time: {(end-start):.5f}[sec]")


double_iteration: Result: 50.00, Time: 0.00026[sec]
zip_iteration: Result: 49.42, Time: 0.00602[sec]
matrix_operation: Result: 50.00, Time: 0.00149[sec]


## 2.4 What is taking so long? Profiling
We need to identify the functions/operations that are taking more time. This allows us to focalized the effort to improve the few lines of code with more impact on the total time ("tottime").

In [22]:
import cProfile
import pstats

with cProfile.Profile() as pr:
    #functions defined previously
    supersum(1000000,33,True,nparray)  
    zip_iteration(values,probabilities)
    gene = sum((i*i*i for i in range(10000)))
    
stats=pstats.Stats(pr)
stats.sort_stats(pstats.SortKey.TIME)
stats.print_stats()

         1025019 function calls (1023019 primitive calls) in 0.712 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.577    0.577    0.705    0.705 /tmp/ipykernel_223994/3952873066.py:5(supersum)
  1000001    0.069    0.000    0.069    0.000 /tmp/ipykernel_223994/3952873066.py:6(<genexpr>)
        2    0.059    0.030    0.129    0.064 {built-in method builtins.sum}
3001/1001    0.002    0.000    0.005    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
    10001    0.001    0.000    0.001    0.000 /tmp/ipykernel_223994/2850951957.py:8(<genexpr>)
     1000    0.001    0.000    0.004    0.000 /home/marcelo/anaconda3/lib/python3.8/site-packages/numpy/lib/function_base.py:4762(append)
        1    0.001    0.001    0.006    0.006 /tmp/ipykernel_223994/1934078803.py:8(zip_iteration)
     1000    0.000    0.000    0.001    0.000 /home/marcelo/anaconda3/lib/python3.8/site-packages/numpy/co

<pstats.Stats at 0x7f09e8760bb0>

the results indicate that the total time was 0.712 seconds. It also says that the function supersum is taking 0.577 sec. It also indicates that the generator is taking 0.069 sec and the respective sum 0.059 sec.