### MY470 Computer Programming
# Order of Growth
### Week 9 Lab

- the list not exhaustive
- log: every time we double or factor of 10, run time increases by 1
    - rgardless of where you start at
- linear: as we add extra n, need to add n extra time units
- **log linear: sorting**
- polynomial: inefficient, hard to compute, if quadratic still double for large data, **typically with nested loops**
- exponential unfeasible


- we always keep dominant term, polynominal dominates the logarithmic, **important to remember the order in which they come**
- benchmarking: can do this for any poiece of code, record time, run the code and record the time afterwards, 
    - to make benchmarking more reliable, run it multiple times if possible
- execution time also contingent on other processes
- module timeit but problem that you have to feed code as strings, include setup which varies depending on what we re doing, 

- sum() is apparently also O(n)

![Big-O Comparison](figs/big-o-table.png "Big-O Comparison")

**What is the usual big O of a while loop?**

In [None]:
for i in range(100):
    for e1 in L:
        if e1==e:
            return True
    return False

In [None]:
# (1) Simple loop with a constant condition
i = 0
while i < n:
    i += 1

# Analysis: The loop runs as long as i < n. Each time through the loop, i is incremented by 1, so the loop will execute n times.
# Time complexity: Since the loop executes n times and the body of the loop runs in constant time, the overall time complexity is O(n).

In [None]:
# (2) Loop with a condition that changes more slowly
i = 1
while i < n: # every time the size of the input doubles, the algorithm performs one additional step
    i *= 2
# Analysis: In this case, i is doubled after each iteration. The loop will run as long as i is less than n, 
# and since i is doubling, the number of iterations is proportional to the logarithm of n.
# Time complexity: The loop will execute approximately log(n) times, so the time complexity is O(log n).

In [None]:
# (3) Nested loops
i = 0
while i < n:
    j = 0
    while j < n:
        j += 1
    i += 1
# Analysis: Here, you have a nested while loop, with both the outer and inner loops running n times.
# Time complexity: Since the inner loop runs n times for each iteration of the outer loop, the total number of iterations is n * n = n^2, 
# so the time complexity is O(n^2).

In [None]:
# (4) Loop with an incrementing variable
i = 0
while i < n:
    i += i + 1

# Analysis: Here, i grows exponentially, as it's incremented by i + 1 on each iteration. The value of i increases very quickly (e.g., 1, 3, 7, 15,...).
# Time complexity: The number of iterations is proportional to the logarithm of n because i doubles or grows faster on each iteration. 
# So the time complexity is O(log n).

Summary: The key to determining the Big-O of a while loop is to understand how the loop’s condition changes with each iteration and how many iterations are needed before the condition is no longer met.
- O(1): If the loop runs a constant number of times (e.g., a fixed number of iterations).
- O(n): If the loop runs n times.
- O(log n): If the loop's condition changes in a way that cuts the problem size down exponentially (e.g., doubling a variable each time).
- O(n^2): If there are nested loops that both run n times.

## Understanding O(log n)

- The logarithmic time complexity, O(log n), describes **an algorithm whose execution time grows much more slowly as the input size (n) increases.** 
- O(log n) happens **when the problem size is halved (or reduced by some constant factor) at each step.** 
- Essentially, **with each step, you "reduce the problem" significantly**, and this leads to fewer total steps as n increases.

### Example 1 Doubling or Halving the Problem
Let’s look at an example where we reduce the problem size by half in each step. Imagine you have a number, and at each step, you divide it by 2. The question is: how many times do you need to divide by 2 until you get to 1?

In [None]:
n = 16
steps = 0
while n > 1:
    n //= 2  # Integer division by 2
    steps += 1
print(steps)

# How the loop runs:
# Step 1: n = 16, divide by 2 → n = 8, steps = 1
# Step 2: n = 8, divide by 2 → n = 4, steps = 2
# Step 3: n = 4, divide by 2 → n = 2, steps = 3
# Step 4: n = 2, divide by 2 → n = 1, steps = 4

# This took 4 steps. If n started at 32, it would have taken 5 steps. If n was 64, it would have taken 6 steps, and so on.
# As you can see, doubling the size of n only increases the number of steps by 1.


### Why log n and not just n?
- If an algorithm has a time complexity of O(n), that means that the time it takes to run increases directly with the size of the input. For example, if n = 10, it takes 10 steps; if n = 1000, it takes 1000 steps.
- **But O(log n) means the time doesn't grow linearly. Instead, it grows much slower.** So **if you were to double the input size, the number of steps would increase only slightly.** If you think of this in terms of halving the problem (like in the example above), you can see that the number of operations required to solve a large problem is much smaller than for linear growth.

### Example 2: Binary Search (A Classic Example of O(log n))
One of the best-known examples of an algorithm with O(log n) complexity is binary search. Binary search is an algorithm used to find an element in a sorted list (or array) by repeatedly dividing the search space in half.

In [None]:
def binary_search(arr, target):
    left, right = 0, len(arr) - 1
    while left <= right:
        mid = (left + right) // 2  # Find middle index
        if arr[mid] == target:
            return mid  # Found the target
        elif arr[mid] < target:
            left = mid + 1  # Narrow down to the right half
        else:
            right = mid - 1  # Narrow down to the left half
    return -1  # Target not found

# How it works:
# Initial state: You start with the entire array (let’s say it has n elements).
# Step 1: You check the middle element. If it’s the target, you're done. If it's not, you either search the left half or the right half.
# Step 2: You now have half the array to search through. The next time you check the middle, it’s again halving the remaining search space.
# Step 3: Repeat this process: every time, you're cutting the number of elements you have to look at by half.

# Why this is O(log n):
# If the list has 16 elements, the search goes like this:
# First check: mid = 8 (half the list).
# Second check: mid = 4 or 12 (half of the remaining list).
# Third check: mid = 2, 6, 10, or 14.
# And so on, until you find the target or the search space is empty.

- If you have an array of size n, binary search will take log₂(n) comparisons to either find the target or determine it's not in the array. 
- So, if you have 1000 elements, it will take roughly log₂(1000) ≈ 10 steps to search through the entire list. 
- **Compare that to a linear search, which would take 1000 steps in the worst case.**

## General rule for O(log n)

- The key characteristic of O(log n) time complexity is that **with each iteration, you reduce the problem size significantly**, **typically by half, which leads to a very slow increase in the number of steps required as the input size increases**

- To put it simply:
    - If you double n, you only increase the number of steps by 1.
    - If you double n again, you increase the steps by just another 1.
- This makes O(log n) very efficient for large input sizes compared to linear time O(n) or quadratic time O(n²).

### Another example of O(log n): Finding Powers of 2 (Exponentiation by Squaring)
- Exponentiation by squaring is an efficient algorithm to calculate large powers of a number in logarithmic time.

In [None]:
def power(x, n):
    result = 1
    while n > 0:
        if n % 2 == 1:  # If n is odd
            result *= x
        x *= x  # Square the base
        n //= 2  # Divide n by 2 (integer division)
    return result

# Example usage
print(power(2, 10))  # Output: 1024 (2^10)

# Explanation: In this example, the exponent n is halved in each iteration (n //= 2), and the base x is squared. 
# This reduces the number of multiplications needed, and the time complexity is O(log n) 
# because the number of times the loop runs is proportional to the logarithm of n.

# Exponentiation by Squaring: The exponent n is halved in each step, reducing the number of multiplications.

## Understanding O(n log n)

- The O(n log n) time complexity is a combination of O(n) and O(log n). 
- It arises in algorithms that **involve iterating through a collection of elements (O(n))** and, **at each step, performing an operation that takes logarithmic time (O(log n)).**

### General Intuition
- (1): O(n) means an operation that grows linearly with the size of the input. If you have an array of size n, it takes n operations to process each element.
- (2): O(log n) means an operation that grows logarithmically, where each step reduces the problem size significantly (usually by half).
- ==> When you multiply these two together, you get O(n log n), which indicates that you perform a linear amount of work (O(n)) and, for each piece of work, you do something logarithmic (O(log n)).

### Summary of o(n log n)
- (1) O(n log n) arises when an algorithm processes each element once (O(n)) 
- (2) and **performs some logarithmic operation on each element (e.g., splitting, merging, partitioning, or rebalancing).**
- The main idea behind O(n log n) is that you’re combining a linear process with a logarithmic process, and this is common in divide-and-conquer algorithms, where the input is recursively divided into smaller parts (log n) and each part is processed linearly (n).

### Example 1: Merge Sort (Divide and Conquer)

- (1): Divide: The array is repeatedly divided in half until each subarray has only one element.
- (2): Conquer: These subarrays are then merged back together in sorted order.
- ==> At each level of the recursion, you're dividing the array into two halves, which takes logarithmic time (O(log n) levels). Then, at each level, you perform linear work (O(n)) to merge the subarrays.

In [None]:
# Merge Sort Code Example
def merge_sort(arr):
    if len(arr) <= 1:
        return arr
    # Find the middle of the array
    mid = len(arr) // 2
    # Recursively split the array into two halves
    left = merge_sort(arr[:mid])
    right = merge_sort(arr[mid:])
    
    return merge(left, right)

# Divide: In each recursive call, the array is divided into two parts. There are log n levels of division, as each division halves the array size.

def merge(left, right):
    sorted_arr = []
    i = j = 0
    # Merge the two sorted halves
    while i < len(left) and j < len(right):
        if left[i] < right[j]:
            sorted_arr.append(left[i])
            i += 1
        else:
            sorted_arr.append(right[j])
            j += 1
    # Add any remaining elements
    sorted_arr.extend(left[i:])
    sorted_arr.extend(right[j:])
    return sorted_arr

# Merge: At each level, the merging step requires linear time (O(n)) to combine the subarrays.

# Example usage
arr = [38, 27, 43, 3, 9, 82, 10]
sorted_arr = merge_sort(arr)
print(sorted_arr)

# Thus, the total time complexity is O(n log n), where:
# n is the total number of elements being sorted.
# log n is the number of levels of recursion (because the array is halved each time


### Example 2: Quick Sort (Another Divide and Conquer Algorithm)

Quick Sort is another famous sorting algorithm that also has an average time complexity of O(n log n). It works by:

Choosing a pivot: Select a pivot element from the array.
Partitioning: Rearrange the array so that elements smaller than the pivot come before it, and elements larger than the pivot come after it.
Recursively sorting: Recursively apply the same strategy to the subarrays on either side of the pivot.

On average, Quick Sort divides the problem size into two roughly equal parts, leading to a logarithmic number of recursive calls (log n), and at each level, it processes all n elements (O(n) work per level).

In [None]:
def quick_sort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[len(arr) // 2]
    left = [x for x in arr if x < pivot]
    middle = [x for x in arr if x == pivot]
    right = [x for x in arr if x > pivot]
    return quick_sort(left) + middle + quick_sort(right)

# Example usage
arr = [38, 27, 43, 3, 9, 82, 10]
sorted_arr = quick_sort(arr)
print(sorted_arr)


## Runtime: Benchmarking

Use `time` module:

1. Save time immediately before code
2. Save time immediately after code
3. Estimate 2 – 1

In [None]:
import time

ls = list(range(100000))

start = time.time()
ls.count(99999)
end = time.time()
print(end - start)

## Benchmarking: Repeat to Time More Accurately

* Execution time can be affected by other processes running simultaneously
* Execution time can depend on the order of execution (randomize execution order)

In [None]:
# Do it yourself
ls = list(range(100000))

res = 0
for i in range(100):
    start = time.time()
    ls.count(99999)
    end = time.time()
    res += end - start
print(res / 100)

# Use a module
from timeit import timeit 
timeit(stmt='ls.count(99999)', setup='ls = list(range(100000))', number=100) / 100

- inefficiency in code
- compare one version of code to another, would know which one is faster, comparative, wont transfer to another other machine, compiler, programming language, just for your setup which code might be faster
- more consistent results than before

In [None]:
### R code ###

require(microbenchmark)

ls <- seq(0, 99999)
microbenchmark(sum(ls == 99999))

# Unit: microseconds
#             expr     min      lq     mean  median       uq      max neval
# sum(ls == 99999) 368.309 416.865 684.3047 559.569 706.2215 3955.864   100

- microbenchmark package in R
- def sequence as before, here seq is inclusive, we create vector that tells us whether the number is 99999 or not, create true and false vector, will be false false false until last element when it is true and then we sum
-> number of times this value appears in our vector which is the same as the count method, this is the R way to do this
- unit is microseconds, while in python by defualt we get seconds, on average 227 microseconds, roughly the same as with python, can multiply sec estimate by 1 million, R faster, 
- 

## Runtime: Order of Growth

* Consider the worst-case scenario
* Look at:
    * Function and method calls 
    * Recursive calls
    * Loops
* **(1) Keep the term with the largest growth rate**
* **(2) Drop any constants from the remaining term**

- String methods in week 2: 
    - replace() -> is a loop since the function has to go through every single character in the worst case (if it is not there), would replace it if it exists
    vs
    - rstrip() not a loop, just starts at the end or beginning (lstrip), once it encounters a word or chr that is not the one then it will stop? How do these functions work?
- both of these are O(n)?? maybe wrong

- strign methods in week 2: replace is a loop since the function goes through every single character and look for it and then replaces it if it exists, rstrip is not a loop, just starts at the beginning or end and once it enocunters a word or chr that is not hte one you are looking for it will stop
- both of these are O(n), since worst case scenario, if stripping a new chr, every new line chr is a new line, worst case is where we have to touch every thing, same as replace but in terms of actual benchmarking performance rstrip more efficient most of the time than replace
- recursive calls: a function calling itself
- loops: everything with loop usually big o(n)?? yes see exerise 2, there two simple loops


**Exercise 1**: The following functions show the average number of operations required to perform some algorithm on a list of length $n$. Give the Big-O notation for the time complexity of each algorithm:

a) $4n^2 + 2n + 2$

b) $n + \log n$

c) $n \log n$

d) 3

- a) nsquared: only keep dominant term, get rid of constants
- b) n: go back to table, n dominates log n, get rid of log n, here end up with linear
- c): n times log n: big o would be n log n, this is log linear, **cannot get rid of multiplication, can get rid of addition** but not multiplication, this is its own class
- d) big o of 1 o(1)

- quiz will look like this
- look through code and give order of growth, that is for time complexity and in one sentence explain the logic, can count the steps, or narrative statement tere is loop and the number of steps in loop depends on length of input, so that not guessing randomly
- dont have to write any code in the assignment

Exercise 2 - My answer
Two loops that loop over the length of the list. Both touch each item. Thus we have ///// 2*O(n) //// -> O(n), the order of growth is linear
- WRONG IT IS O(2*L) - In Big-O notation, we drop the constant factor (since 2 is a constant), so the total time complexity -> O(l)

ChatGPT: 
- Since both loops are independent and occur sequentially
Independent:
- The two loops do not depend on each other in terms of the operations they perform. The first loop sums the elements, and the second loop calculates the product of the elements. The two operations are separate in terms of logic; one doesn't affect the other.
- Specifically, there is **no data or condition that connects the first loop to the second loop. The first loop completes before the second loop starts.**

Sequentially:
- The loops are executed one after the other, not concurrently or in parallel.
- In terms of time complexity, this means that the execution time of the second loop adds to the execution time of the first loop. **The loops don't run simultaneously, so we add the time taken by each loop.**

Her solution: Solution: O(n), where n is len(ls). The fact that there are two loops over the list is irrelevant as we ignore constants.

In [4]:
# Exercise 2: Give the order of growth for the function 
# and explain your reasoning in a couple of sentences.

def sum_product(ls):
    summ, product = 0, 1
    for i in range(len(ls)): # O(l), l is the length of the list
        summ += ls[i] # this is 0(1) and we do that l times 
        # each operation inside the loop (summ += ls[i]) is constant time O(1), so the total time for this loop is O(l).
    for j in range(len(ls)): # O(l); this also runs len(ls) times, which is O(l).
        product *= ls[j] # O(1) for each iteration
        # each operation inside the loop (product *= ls[j]) is also constant time O(1), so the total time for this loop is O(l).
    return summ, product 

# Since both loops are independent and occur sequentially, the total time complexity is O(2*l)




# Your answer: 
# two loops, list, given the input, might depend on the lenght of list, without any docstring, adding the leements of list, since 
# concatenation or, initialising with int, numeric list, list of numeric types, sum of product that is constant
# secodn we do the same, number of basic steps, two steps, n is length of list, n again for second loop, n legnth of list, 
# 2+2n, in terms of big o notation this wlill be Oh(n) n is lenght of ls, MKE SURE YOU SPECIFIY WHAT N IS, n is length of list here 
# ignore the size of numeric types
#the timpe complexity of function is o(n) where n is the lenght or the size of the ls, list ls, 
# the function iterates over the list twice but we ignore constantshence it is o(n), matehamtical not expected just intution, ONE OR TWO SENTENCES AS ANSWER
# LOOK AT ANSWER, EXPECT OCRRECT ANSWER, EXPECT DEFINITION OF WHAT N IS AND EXPECT EXPLANATION, THREE ELEMENTS WE EXPECT 

Exercise 3 - ChatGPT: 
- The second loop (for j in lb) iterates over the list lb, which has a length of m. **So, for each iteration of the outer loop, the inner loop runs O(m) times.**
- **Since the outer loop runs O(l) times and for each iteration of the outer loop the inner loop runs O(m) times, the total number of operations will be the product of O(l) and O(m).** -> O(l*m)

Her solution:
- Solution: O(ab), where a is len(la) and b is len(lb). The function has two different inputs and its runtime depends on the length of both.

In [None]:
# Exercise 3: Give the order of growth for the function 
# and explain your reasoning in a couple of sentences.

def combine(la, lb):
    for i in la: # O(l) where l is length of la
        # The first loop (for i in la) iterates over the list la, which has a length of l. Therefore, this loop runs O(l) times.
        for j in lb: # O(m) where m is length of lb
        # The second loop (for j in lb) iterates over the list lb, which has a length of m. So, for each iteration of the outer loop, the inner loop runs O(m) times.
            if i < j: # O(1)
                print(i, '-', j) # O(1)
            # Inside the inner loop, the operations if i < j and print(i, '-', j) each take constant time, O(1).
                # ==> O(l*m)
print(combine("abc", "def"))

- two inputs, nested loop, all it does is print, space complexity this is big o of 1, but  in terms fo time complexity we are iterating
- two inputs, important, no informaiton on hwo they vary, assume they are independent, a and b
- big o of ab O(ab) where a is the lenght of ab and b is the lenght of lb. if two inputs, make sure that check whether it depends on boht, sometimes it might not, iterate over first and for each element in the first one iterating over second one, if we double1 
- the function has two different inputs, and for each element in the first we iterate over each element in the second. 
- metniodns nested loops, has two inputs and that is why
- use single letter for size of input, the size of these inputs we call them a and b
- could have an expression, cannot call thsi quadratic but the product of tow ,no particular name for this, dependso on the product of the size of two inputs
- simplified classes we have, dont have to categorise everything you see, could have any kind of expression, not linear, depends on product, thsu more complex dont feel obliged to fit it into of these basic classes, anything is aloowed
- linear in the size of the input elements

Exercise 4 - Her solution: I did not get that right, I thought it is O(1)
Solution: **The runtime is the number of digits in the number.**
A number with d digits is of size up to 10^d. If n = 10^d,
then d = log n. Hence, the runtime is O(log n).

- I thought each term in the function is O(1), is that true?
- but we have a while loop here? So are while loops different from for loops?

In [8]:
# Exercise 4: Give the order of growth for the function 
# and explain your reasoning in a couple of sentences.

def sum_digits(n):
    """Take positive integer n and sum its digits."""
    summ = 0 # This is a simple assignment and takes constant time: O(1).
    while n > 0:
        # In each iteration, n is reduced by a factor of 10 (because of n = int(n / 10)), 
        # so the number of iterations is proportional to the number of digits in n
        # If n has d digits, then the loop will run d times, where d = O(log n) (since the number of digits in n is logarithmic with respect to n).#
        # The while loop runs O(log n) times because the number of iterations is determined by how many digits are in n, 
        # and the number of digits d is proportional to the logarithm of n.
        summ += int(n % 10) # 0(1)
        #This operation extracts the last digit of n and adds it to summ. 
        # The modulo operation (n % 10) and the addition are both constant time operations: O(1).
        n = int(n / 10) # O(1)
        # This divides n by 10 and updates n. The division and assignment are also constant time operations: O(1).
    return summ # O(1)

sum_digits(51)


6

- take int n, sums its digits, at every step key we are scaling by 10, divide number by 10, if 5 digit number we have 5 steps apart from first assignment, 5 operations, if we have 10 digit number then 10 steps, 
- log complexity if we double the number we get an extra step, incrase 10 fold and one extra step
- run time is the 0(log n), here n is the integer itself, ehre it is clear what n is, dont need to specify it, size of integer
- if we multiply the input by 10, we add one extra step, logic: run time depends on the number of digits in n. 
- a number with d digits is of size up to 10 to the power of d, if number has 2 digits it will be up to 99, 10 to the power of 2 is 100
- 10^d -1. if n equals 10^d, roughly equal to that, then d = log n, dont need matehtmical explanation but, at every step scaling downwe iterate d times hence log n times, when we scale the number by ten, we add an extra step

here loop but scales them differently, have to multiply by 10, 9
from 99
- watch out for iteratio over each element or are you skipping and how, if skip by every tother this does not matter, but here scaling at every step, 

In [None]:
if loop first intutiion is linear, but that is what we had here, but could also be nested, could depend on two diffferent inputs, this was, express as functio of both and not quadratic

Exercise 5 - Her solution
Solution: 
- The runtime is O(x^2). The code runs a total of:
x + (x-1) + (x-2) + ... + 1 + 0 steps. 
- The sum of the first x integer numbers can be expressed as x * (x + 1) / 2, so O(x^2).
- If you don't know the formula, you just need to notice that there are two nested loops that both depend linearly on size of x.

ChatGPT - Exercise 5
- Outer loop (for i in range(x)):
    - This loop iterates x times because i ranges from 0 to x-1. So, the outer loop runs in O(x) time.
- Inner loop (for j in reversed(range(i, x))):
    - The inner loop depends on i. For each value of i, the inner loop iterates over j from i to x-1.
    - The length of the range for j is x - i. Therefore, for each iteration of i, the inner loop runs O(x - i) times.
- Inside the inner loop (a = a + i + j):
    - The operation inside the inner loop is a constant-time operation, O(1).
- Now, let's compute the total number of operations by summing the iterations of the inner loop for each iteration of the outer loop:
    - or i = 0, the inner loop runs x times (because reversed(range(0, x)) has x elements).
    - For i = 1, the inner loop runs x - 1 times.
    - The total number of iterations is the sum of these values: (x(x+1)) / 2 -> O(x**2)
- The total time complexity of this code is O(x^2) because the number of operations is proportional to the sum of the first x integers, which scales quadratically with x.


3. Why is the iterator creation O(1) and not O(n)?
When you call reversed(), Python doesn't need to iterate through the sequence. Instead, it simply creates an iterator object that will know how to access each element in reverse order when you start iterating over it. It does not copy or create a new list of the reversed elements.

The iterator itself is just a mechanism that knows the current position in the sequence and can give you the next element when you request it.

In [None]:
# Exercise 5: Give the order of growth for the code 
# and explain your reasoning in a couple of sentences.

a = 0; 
for i in range(x): # O(x) where x is the lenght of the range 
    for j in reversed(range(i, x)): # O(x-i), again linearly depending on length of range + reverse each item
        # so if the list is longer we need to reverse more, if the range is longer by 1 item, one item more to touch 
        # so this is quadratic O(x**2), this is the dominant term
        # Or why is it not O(x) * O(x*x) and thus O(x**3)
        a = a + i + j
        
a = 0
for i in range(x):                 # O(x) - loop over i from 0 to x-1
    for j in reversed(range(i, x)):  # O(x - i) - loop over j from i to x-1
        a = a + i + j                # O(1) - constant time operation


- 2 loops, iterate over x, from current x to the end again but in reversed order, in both cases depend on x, the number items we are iterating over, only one unknown, 
- 0(x^2), clear what x = n, if input is clear 
- if x 10 then i is 0
- the sum of first n pos integers is 
for the first value we 

In [None]:
# Exercise 6: Give the order of growth for the function 
# and explain your reasoning in a couple of sentences.

def factorial(n):
    """Takes non-negative integer n and returns the factorial n!,
    where n! = n * (n-1) * (n-2) ... * 2 * 1
    """
    if n == 0:
        return 1
    else:
        return n * factorial(n-1)
        
# This is recursion and recrusion is uually exponential 

- run time of recursion function: no loop, thi is the same function, recursion means we keep repeating keep making call until we get call, only one call to itself, given n, every next, the number of calls we have is n, if we icnrease n by 2, if double the input then twice as many calls, so it will be linera in this case, jus tlookign at loops not enoug, if recursion need to know what is happenign, LOOK AT RECURSION FUNCTIONS AND HOW IT WORKS, SEE EARLIER WEEKS

Exercise 7

My Question to ChatGPT: What is the time complexity of list()?
In Python, the time complexity of list() depends on the context in which it is used:
- (1) Creating an empty list
    - list() (i.e., list() with no arguments) creates an empty list.
    - Time complexity: O(1) (constant time), since it just creates an empty list object.
- (2) Converting an iterable (e.g., tuple, set, or generator) to a list:
    - list(iterable) converts an iterable into a list.
    - **Time complexity: O(n), where n is the number of elements in the iterable. This is because each element in the iterable needs to be copied into the new list.**
- in summary. If you're calling list() with an iterable, it's O(n), where n is the number of elements in the iterable.

What is the time complexity of set()?
The time complexity of set() in Python depends on how it is used:
- (1) Creating an empty set:
    - set() creates an empty set.
    - Time complexity: O(1) (constant time), because it just initializes an empty set object.
- (2) Creating a set from an iterable:
    - set(iterable) creates a set from an existing iterable (e.g., list, tuple, string, etc.).
    - Time complexity: O(n), where n is the number of elements in the iterable. This is because each element must be hashed and inserted into the set, which takes constant time on average for each element, assuming the hash function is efficient.
    - Building a set from an iterable: When you create a set from an iterable (e.g., set([1, 2, 3])), Python will iterate over each element of the iterable and attempt to insert it into the set. The overall time complexity is O(n), where n is the number of elements in the iterable being passed to set(). This is because each insertion is O(1) on average, and there are n elements to insert.
- in summary: set(iterable) where iterable has n elements: O(n)

Chat GPT's solution for Exercise 7


In [None]:
lst = [] 
for i, j in data:                 # O(n) where n is the length of data
    lst.append(int(i))            # O(1) for each append, O(n) in total for all
    unique_authors = list(set(lst))  # O(k) where k is the length of lst (up to n)
    unique_authors.sort()          # O(k log k) for sorting, where k is the length of unique_authors


Exercise 7 - Her solution:
Solution: 
- The complexity of the code is O(n^2 log n), where n is the length of coauthors. 
- The code calls the set function and sorts the list n times, which results in n * (n + n log n) steps, since sorting is on the order of n log n (assuming 
the worst-case scenario where each edge introduces a unique author). 
- However, we worry only about the dominant term so this gives us O(n^2 log n). 
- If we un-indent the last statement, we will reduce the complexity to O(n^2). 
- If we were to also remove the set transformation outside of the loop, the complexity is further reduced to O(n log n), dictated by the sorting. 
- We can further reduce the actual runtime of the code by replacing the loop with a list comprehension.

lst = [int(i) for i, j in coauthors] 
unique_authors = list(set(lst))
unique_authors.sort()
So, the overall time complexity is the sum of the individual complexities: O(m) + O(n) + O(n log n)
Since 𝑛 ≤ 𝑚, we can simplify the total time complexity: (m + n log n)
worst case
In the worst case, where all elements in coauthors are unique, 
𝑛
=
𝑚
n=m, and the overall complexity becomes: O(mlogm)

In [None]:
# Exercise 7: This is code submitted by a student for Problem 2 
# in Problem Set 1. Given an edge list of coauthors in data, 
# the task was to create a sorted list of all unique authors. 
# What is the order of growth of this code? What is wrong here? 
# How would you rewrite the code to make it more efficient?

lst = [] # O(1)
for i,j in data: # O(n)? But I have two variables?
    lst.append(int(i)) # appending to list is O(1)
    unique_authors = list(set(lst)) # O(n)
    
    
    
    #                               THERE WAS SOMETHING ABOUT THE LIST FUNCTION
    # set() probably has to go over each item to check if unique, so I assume therefore O(n)
    # and list() is also O(n), thus this line is O(n**2)
    unique_authors.sort() # sorting is log linear -> O(n log n)
    # so my guess: O(n log n) * O(n**2) * for loop with two variables

    # n + n + n log n
    

- lenght of data is n, lets assume that. first iterate over all data, thus n, 
- appendign to list is constant, not touching anything before that can ignore that 
- going over wole list, hashing it to make set out of it, transofrming it into alist thusindexing thus two loops here, tough tevery elemento thash into set, and then going again and idnexign it into a lsit, thsi is n squared so far
- sorting the list = n log n, log linear
number of steps
T(n) = n*(1 this is constant + 2n (two operatiosn that depend on length of list + n log n))
- extrem uneffizient
- remove the two last terms out of the list, dont need to keep reindexing, 

big o of n squared n squared dominates over n log n, 11:18
- n log n 
- indentation can change order of growth
11:18 s i sn log n since it dominates n

sorting is n log n
- list is o(n): have two loops, set goes over every element and then it has to transform them to combine them into a new data structure, that is one loop, 
- no benchmarking for test

In [7]:
# Exercise 8: Compare the execution time for loops 
# between R and Python using Exercise 4.


- calling function we defined above, tell python where teh function is, import it from main, still have to import it from main
- code converts it into microseconds, 
- using for loop is faster in python by far by the order of 10, but better ways to do this, 

In [None]:
# Exercise 9: Create a function to multiply each element of a 
# vector `v` by a scalar `m` in R with and without a for-loop
# and compare their execution time.

### R code ###
multiply <- function (v, m) {
  # Write with a for-loop
}

multiply2 <- function(v, m) {
  # Write without a for-loop
}
 