# Profiling and discussion

## Lorenz 96

In [1]:
# Add profiling code here
import cProfile
import automata
import numpy as np

def run_lorenz96():
    initial_state = np.random.random(16)
    result = automata.lorenz96(initial_state, 1000)

cProfile.run('run_lorenz96()')

         45008 function calls (42008 primitive calls) in 0.031 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.031    0.031 313247980.py:6(run_lorenz96)
        1    0.000    0.000    0.031    0.031 <string>:1(<module>)
        1    0.006    0.006    0.031    0.031 automata.py:10(lorenz96)
     3000    0.000    0.000    0.000    0.000 multiarray.py:85(empty_like)
     6000    0.000    0.000    0.000    0.000 numeric.py:1125(_roll_dispatcher)
6000/3000    0.019    0.000    0.025    0.000 numeric.py:1129(roll)
     3000    0.000    0.000    0.000    0.000 numeric.py:1216(<dictcomp>)
     3000    0.002    0.000    0.004    0.000 numeric.py:1330(normalize_axis_tuple)
     3000    0.001    0.000    0.001    0.000 numeric.py:1380(<listcomp>)
     3000    0.000    0.000    0.000    0.000 {built-in method _operator.index}
        1    0.000    0.000    0.031    0.031 {built-in method builtins.exec}


In [2]:
initial_state = np.random.rand(64)

def lorenz96(initial_state, nsteps, constants=(1/101, 100, 8)):

    alpha, beta, gamma = constants
    N = len(initial_state)
    
    # Create an array to store the time series of cell values
    time_series = []
    current_state = np.array(initial_state)

    for _ in range(nsteps):
        new_state = np.zeros(N)
        for i in range(N):
            new_state[i] = alpha * (beta * current_state[i] + (current_state[(i - 2) % N] - current_state[(i + 1) % N]) * current_state[(i - 1) % N] + gamma)
        current_state = new_state.copy()
        time_series.append(current_state.copy())

    return time_series[-1]

%timeit lorenz96(initial_state, 100)

def lorenz96_vectorized(initial_state, nsteps, constants=(1/101, 100, 8)):
    alpha, beta, gamma = constants
    N = len(initial_state)
    current_state = initial_state.copy()

    for _ in range(nsteps):
        x_minus_2 = np.roll(current_state, 2)
        x_minus_1 = np.roll(current_state, 1)
        x_i = current_state
        x_plus_1 = np.roll(current_state, -1)
        
        current_state = alpha * (beta * x_i + (x_minus_2 - x_plus_1) * x_minus_1 + gamma)

    return current_state

%timeit lorenz96_vectorized(initial_state, 100)

2.56 ms ± 2.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.25 ms ± 11.1 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


#### Add discussion here (lorenz96)

The original implementation from ChatGPT is correct in terms of its logic and is a straightforward representation of the Lorenz '96 equations. It was easy to understand and modular.

The initial code for GPT uses nested 'for' loops to implement traversal of the array. This is straightforward to implement, but as the size of the array increases, so does the amount of time we need to run it. In this task, we use numpy's 'roll' function, which can be very effective in improving the runtime to cope with larger arrays.

The vectorization operations in numpy are optimized in C to be much faster than native Python loops. By avoiding loops and using the built-in numpy functions, we can take advantage of numpy's built-in performance optimizations. Reducing array copying and memory operations often improves performance, especially during iterations.

Perhaps we can further improve the speed of our code by adding some python decorators to our code for specific tasks.

## Game of Life

In [3]:
# Add profiling code here
import cProfile
import automata
import numpy as np

def run_life():
    initial_state = np.random.random((16, 16))>0.6
    result = automata.life(initial_state, 100)

cProfile.run('run_life()')

         421207 function calls in 0.094 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.094    0.094 1748497511.py:6(run_life)
        1    0.000    0.000    0.094    0.094 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 automata.py:120(<listcomp>)
     1600    0.001    0.000    0.001    0.000 automata.py:122(<listcomp>)
        1    0.008    0.008    0.094    0.094 automata.py:47(life)
    25600    0.073    0.000    0.085    0.000 automata.py:66(count_neighbors_2d)
        1    0.000    0.000    0.094    0.094 {built-in method builtins.exec}
   394000    0.012    0.000    0.012    0.000 {built-in method builtins.len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {method 'random' of 'numpy.random.mtrand.RandomState' objects}




In [4]:
initial_state = np.random.random((10, 10))>0.6
%timeit automata.life(initial_state, 10)

#print(initial_state)
#print(automata.life(initial_state, 1))

1.57 ms ± 7.16 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


#### Add discussion here (life)

Compared to the previous task, in the Game of Life implementation, GPT showed more power and exposed some problems.LLM has a great understanding of human language but also has limitations. A single rule can be well understood, however, when faced with a more complex (or unclear) statement, LLM reacts by following the logic that has already been set up, and does not ask for clarification of the part that caused the confusion.

GPT accomplishes the task of expanding sequentially in a step-by-step fashion, but because of this, the code becomes lengthy, although it possesses good readability. But the running efficiency is not very good in the face of larger challenges (for example, the initial matrix of 1024*1024, the time to complete an iteration in the current code is up to 20s, if the current code to complete multiple iterations will be a disaster).

In this task, GPT stored the survival and resurrection conditions in arrays, which would be faster than using multiple if statements for judgment. However, due to the complexity of the task, we defined two different neighbor structures (2d and 3d) in the same function and used the same function for both classification discussions for the different cases, which resulted in our same function being called multiple times (even though we may not need it). The nesting of multiple judgment and looping statements makes the code run slowly.

For optimizing this part of the code, we can try to use some predefined tripartite libraries instead of multiple loops. More importantly, we can create multiple functions with the same name (accepting different types of arguments to distinguish them), which allows us to call fewer statements that are not needed.