<a href="https://colab.research.google.com/github/jonscales/jonscales-DataScience-2025/blob/main/Completed/05-Foundations/08-timing_and_performance.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ⏱️ 08 - Timing and Performance

In data science, performance matters. Some code runs fast, some slow.  
Jupyter/IPython gives us tools to measure runtime easily.

In this notebook you will learn:
- `%time` and `%timeit` for single expressions
- `%%time` and `%%timeit` for entire cells
- Comparing loops, list comprehensions, and NumPy
- Why performance awareness is important


## 1. `%time`

In [1]:
%time sum(range(1_000_000))

CPU times: user 23.8 ms, sys: 17 µs, total: 23.8 ms
Wall time: 23.6 ms


499999500000

✅ **Your Turn**: Use `%time` to measure how long it takes to sort a list of 1 million random numbers.

In [5]:
import random
rannums = [random.randint(1, 10000) for _ in range(1_000_000)]
%time rannums.sort()

CPU times: user 488 ms, sys: 1.34 ms, total: 489 ms
Wall time: 513 ms


## 2. `%timeit`

In [6]:
numbers = list(range(1_000))
%timeit [x**2 for x in numbers]

72.9 µs ± 25.1 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


✅ **Your Turn**: Compare `%timeit` results for a list comprehension vs. a `for` loop that builds the same list.

In [8]:
numbers = list(range(10_000))
%timeit [x**2 for x in numbers]
%timeit for x in numbers: x**2
%timeit [x**3 for x in numbers]
%timeit for x in numbers: x**3


784 µs ± 251 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
498 µs ± 9.14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.11 ms ± 383 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
739 µs ± 10.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


## 3. `%%time` for a Whole Cell

In [None]:
%%time
total = 0
for i in range(1_000_000):
    total += i
total

✅ **Your Turn**: Wrap a longer multi-line operation with `%%time` to measure its runtime.

In [11]:
%%time
total = 0
for i in range(1000):
    total += i^2
print(total)



499500
CPU times: user 0 ns, sys: 377 µs, total: 377 µs
Wall time: 476 µs


## 4. Comparing Loops vs. NumPy

In [12]:
import numpy as np

numbers = np.arange(1_000_000)

# Python loop
%timeit [x**2 for x in numbers]

# NumPy vectorized
%timeit numbers**2

254 ms ± 70 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.14 ms ± 53.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


✅ **Your Turn**: Try squaring numbers with a Python loop, list comprehension, and NumPy array. Compare times.

In [25]:

import numpy as np

numbers = np.arange(1000)
squared =[]
#python loop
%timeit [x**2 for x in numbers]

# list comprehension
%timeit squares =[x**2 for x in numbers]


# numpy vectorized
%timeit squared = numbers**2




155 µs ± 40.3 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
155 µs ± 51.3 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
1.82 µs ± 508 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


## 5. Why This Matters
- Performance differences become huge with large datasets.
- Vectorized operations (like NumPy, Pandas) are usually faster.
- `%timeit` is your friend when deciding how to implement something.


---
### Summary
- `%time` and `%timeit` measure execution speed.
- `%%time` and `%%timeit` work on whole cells.
- Loops are slower than list comprehensions, which are slower than NumPy.
- Always measure performance before optimizing.
