# NumPy

`NumPy` [(Numerical Python)](https://numpy.org/) is a core library for scientific computing in Python. It provides the `ndarray` type: an efficient, multi-dimensional, homogeneous array. `NumPy` arrays are much faster and more compact than Python lists, and they support “vectorised” operations—applying an operation to an entire array without writing explicit loops.

The content of this section is derived from the book "Python data science hanbdbook". Details of this book can be found in the further reading section.


In [1]:
# install NumPy
!python -m pip install numpy



In [2]:
# load the library
import numpy as np

## Creating arrays

You can create NumPy arrays in several ways:

1. **From a Python list or nested list**:


In [3]:
A = np.array([1, 2, 3])
print("1D array:", A)

B = np.array([[1, 2, 3], [4, 5, 6]])
print("2D array:", B)

1D array: [1 2 3]
2D array: [[1 2 3]
 [4 5 6]]


2. **With built-in initialisers**:

In [4]:
print("Zeros:\n", np.zeros((2, 3)))
print("Ones:\n", np.ones((2, 3)))
print("Empty:\n", np.empty((2, 3)))  # values not initialised
print("Range:\n", np.arange(0, 10, 2))
print("Linspace:\n", np.linspace(0, 1, 5))

Zeros:
 [[0. 0. 0.]
 [0. 0. 0.]]
Ones:
 [[1. 1. 1.]
 [1. 1. 1.]]
Empty:
 [[1. 1. 1.]
 [1. 1. 1.]]
Range:
 [0 2 4 6 8]
Linspace:
 [0.   0.25 0.5  0.75 1.  ]


## Inspecting arrays

Useful attributes:
- `shape`: dimensions of the array
- `dtype`: data type of elements
- `ndim`: number of dimensions
- `size`: total number of elements


In [5]:
C = np.array([[1.0, 2.0], [3.0, 4.0]])
print("Shape:", C.shape)
print("Data type:", C.dtype)
print("Dimensions:", C.ndim)
print("Size:", C.size)

Shape: (2, 2)
Data type: float64
Dimensions: 2
Size: 4


## Indexing and slicing

Array indexing is similar to lists, but extended to multiple dimensions.

In [6]:
M = np.array([[10, 20, 30],
              [40, 50, 60],
              [70, 80, 90]])

print("Element at row 1, col 2:", M[1, 2])   # 60
print("First row:", M[0])
print("Last column:", M[:, -1])              # all rows, last column

Element at row 1, col 2: 60
First row: [10 20 30]
Last column: [30 60 90]


Slicing syntax is `[start:stop:step]` for each axis:

In [7]:
print("Middle block:\n", M[0:2, 1:3])   # rows 0–1, cols 1–2
print("Every other element in row 2:", M[2, ::2])

Middle block:
 [[20 30]
 [50 60]]
Every other element in row 2: [70 90]


### Exercise 1
Create a 3×3 array with values from 1 to 9.  
- Print the second row.  
- Print the diagonal (top-left to bottom-right).

In [8]:
arr = np.arange(1, 10).reshape(3, 3)
print("Array:\n", arr)

print("Second row:", arr[1])
print("Diagonal:", arr.diagonal())


Array:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]
Second row: [4 5 6]
Diagonal: [1 5 9]


## Views vs copies

Important: **slices are views**, not copies. Modifying a slice changes the original.


In [9]:
X = np.array([[1, 2, 3, 4],
              [5, 6, 7, 8],
              [9, 10, 11, 12]])

Y = X[0:2, 1:3]   # view
Y[0, 0] = 99
print("Modified view:\n", Y)
print("Original X also changed:\n", X)

Modified view:
 [[99  3]
 [ 6  7]]
Original X also changed:
 [[ 1 99  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


To avoid this behaviour, we can make an explicit copy:

In [10]:
Z = X[0:2, 1:3].copy()
Z[0, 0] = -1
print("Copied block:\n", Z)
print("Original X remains unchanged:\n", X)

Copied block:
 [[-1  3]
 [ 6  7]]
Original X remains unchanged:
 [[ 1 99  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


## Array operations

`NumPy` supports **elementwise** arithmetic and universal functions (ufuncs).

In [11]:
a = np.array([1, 2, 3])
b = np.array([10, 20, 30])

print("Addition:", a + b)
print("Multiplication:", a * b)
print("Sine of a:", np.sin(a))
print("Dot product:", np.dot(a, b))

Addition: [11 22 33]
Multiplication: [10 40 90]
Sine of a: [0.84147098 0.90929743 0.14112001]
Dot product: 140


Broadcasting allows operations on arrays of different shapes when compatible.

In [12]:
x = np.array([1, 2, 3])
M = np.array([[10], [20], [30]])
print("Broadcasted addition:", M + x)

Broadcasted addition: [[11 12 13]
 [21 22 23]
 [31 32 33]]


### Exercise 2 
Create an array `x = np.linspace(0, 2*np.pi, 100)`.  
- Compute `sin(x)` and `cos(x)` using `NumPy`.  
- Verify the trigonometric identity `sin²(x) + cos²(x) ≈ 1` by checking the maximum difference.


In [13]:
x = np.linspace(0, 2*np.pi, 100)
sin_x = np.sin(x)
cos_x = np.cos(x)

identity = sin_x**2 + cos_x**2
max_diff = np.max(np.abs(identity - 1))
print("Max difference from 1:", max_diff)

Max difference from 1: 2.220446049250313e-16


## Concatenation, stacking, and splitting

`NumPy` provides functions to join and split arrays.

### Concatenation
Use `np.concatenate` to join arrays along an existing axis.

In [14]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print("Concatenate 1D:", np.concatenate([a, b]))

Concatenate 1D: [1 2 3 4 5 6]


In [15]:
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6]])
print("Concatenate along axis 0:", np.concatenate([A, B], axis=0))

Concatenate along axis 0: [[1 2]
 [3 4]
 [5 6]]


### Stacking
Use `np.vstack` and `np.hstack` for vertical and horizontal stacking.  `np.stack` can join along a **new axis**.


In [16]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print("Vertical stack:", np.vstack([a, b]))
print("Horizontal stack:", np.hstack([a, b]))

print("Stack along new axis:", np.stack([a, b], axis=1))


Vertical stack: [[1 2 3]
 [4 5 6]]
Horizontal stack: [1 2 3 4 5 6]
Stack along new axis: [[1 4]
 [2 5]
 [3 6]]


### Splitting
Use `np.split`, `np.hsplit`, or `np.vsplit` to divide arrays into parts.

In [17]:
X = np.arange(16).reshape(4, 4)
print("Original array:", X)

Original array: [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]


In [18]:
# Split into 2 equal parts along axis 1 (columns)
left, right = np.hsplit(X, 2)
print("Left half:", left)
print("Right half:", right)

Left half: [[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]]
Right half: [[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]]


In [19]:
# Split into 2 equal parts along axis 0 (rows)
top, bottom = np.vsplit(X, 2)
print("Top half:", top)
print("Bottom half:", bottom)

Top half: [[0 1 2 3]
 [4 5 6 7]]
Bottom half: [[ 8  9 10 11]
 [12 13 14 15]]


### Exercise 3

You have two arrays representing scores from two classes:

In [20]:
class1 = np.array([70, 85, 90])
class2 = np.array([60, 75, 80])
print(class1)
print(class2)

[70 85 90]
[60 75 80]


Combine both classes into a single 1D array `all_scores`. Create a 2×3 array `score_matrix` where each row is a class. Split `score_matrix` into two separate arrays, one for each student column. Print `all_scores`, `score_matrix`, and the two column arrays.

In [21]:
all_scores = np.concatenate([class1, class2])
print("All scores:", all_scores)

score_matrix = np.vstack([class1, class2])
print("Score matrix:\n", score_matrix)

col1, col2, col3 = np.hsplit(score_matrix, 3)
print("Column 1:", col1)
print("Column 2:", col2)

All scores: [70 85 90 60 75 80]
Score matrix:
 [[70 85 90]
 [60 75 80]]
Column 1: [[70]
 [60]]
Column 2: [[85]
 [75]]


## Random numbers

`NumPy` includes its own random number generator.

In [22]:
print("Random floats:", np.random.rand(3))             # uniform in [0,1)
print("Random integers:", np.random.randint(0, 10, 5)) # integers 0–9
print("Normal distribution:", np.random.randn(3))      # mean 0, std 1

Random floats: [0.65389233 0.0294315  0.33874915]
Random integers: [3 0 6 0 0]
Normal distribution: [-0.33812248 -0.7906545   1.08028461]


### Exercise 4
Generate a 5×5 array of random numbers between 0 and 1.  
- Replace all values less than 0.5 with 0.  
- Replace all values greater or equal to 0.5 with 1.


In [23]:
arr = np.random.rand(5, 5)
print("Original:", arr)

arr[arr < 0.5] = 0
arr[arr >= 0.5] = 1
print("Thresholded:", arr)

Original: [[0.30977988 0.50265953 0.44080221 0.65164302 0.86351012]
 [0.75629922 0.53363754 0.39906198 0.49223782 0.253072  ]
 [0.65073602 0.88636971 0.85468003 0.75679927 0.46768967]
 [0.83297621 0.62027096 0.92466821 0.90281452 0.46342752]
 [0.61819883 0.12268472 0.43484076 0.07884936 0.38377981]]
Thresholded: [[0. 1. 0. 1. 1.]
 [1. 1. 0. 0. 0.]
 [1. 1. 1. 1. 0.]
 [1. 1. 1. 1. 0.]
 [1. 0. 0. 0. 0.]]


## Why use `NumPy` instead of Python lists?

Python lists are flexible but not optimised for numerical computation.  
`NumPy` arrays offer several key advantages:

1. **Efficiency**  
   - `NumPy` arrays store elements in contiguous memory, while lists are collections of object references.  
   - This makes array operations much faster and more memory-efficient.

2. **Homogeneity**  
   - Lists can mix types, which slows operations.  
   - `NumPy` arrays are homogeneous (all elements share the same type), enabling vectorised computation.

3. **Vectorisation**  
   - Lists require explicit loops to process elementwise operations.  
   - `NumPy` allows operations to be applied to entire arrays without loops.

4. **Functionality**  
   - `NumPy` provides advanced features: broadcasting, ufuncs, linear algebra, random number generation, and more.  
   - These are difficult or impossible to achieve efficiently with lists.

In [24]:
import numpy as np
import math
import time

# Compare squaring numbers in a list vs NumPy array
nums = list(range(10000000)) # 10 milion
arr = np.arange(10000000) # 10 milion

In [25]:
# Python list
start = time.time()
squared_list = [x**2 for x in nums]
end = time.time()
print("Python list time:", round(end - start, 4), "seconds")

Python list time: 0.3694 seconds


In [26]:
# NumPy array
start = time.time()
squared_array = arr**2
end = time.time()
print("NumPy array time:", round(end - start, 4), "seconds")

NumPy array time: 0.0161 seconds


## `Numpy` basic types

`NumPy` has many more basic types of data than base Python. They typically map to the basic C types that they are built on.

<table class="table">
<colgroup>
<col style="width: 33%">
<col style="width: 33%">
<col style="width: 33%">
</colgroup>
<thead>
<tr class="row-odd"><th class="head"><p>Numpy type</p></th>
<th class="head"><p>C type</p></th>
<th class="head"><p>Description</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p><a class="reference internal" href="../reference/arrays.scalars.html#numpy.bool_" title="numpy.bool_"><code class="xref py py-obj docutils literal notranslate"><span class="pre">numpy.bool_</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">bool</span></code></p></td>
<td><p>Boolean (True or False) stored as a byte</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="../reference/arrays.scalars.html#numpy.byte" title="numpy.byte"><code class="xref py py-obj docutils literal notranslate"><span class="pre">numpy.byte</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">signed</span> <span class="pre">char</span></code></p></td>
<td><p>Platform-defined</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="../reference/arrays.scalars.html#numpy.ubyte" title="numpy.ubyte"><code class="xref py py-obj docutils literal notranslate"><span class="pre">numpy.ubyte</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">unsigned</span> <span class="pre">char</span></code></p></td>
<td><p>Platform-defined</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="../reference/arrays.scalars.html#numpy.short" title="numpy.short"><code class="xref py py-obj docutils literal notranslate"><span class="pre">numpy.short</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">short</span></code></p></td>
<td><p>Platform-defined</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="../reference/arrays.scalars.html#numpy.ushort" title="numpy.ushort"><code class="xref py py-obj docutils literal notranslate"><span class="pre">numpy.ushort</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">unsigned</span> <span class="pre">short</span></code></p></td>
<td><p>Platform-defined</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="../reference/arrays.scalars.html#numpy.intc" title="numpy.intc"><code class="xref py py-obj docutils literal notranslate"><span class="pre">numpy.intc</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">int</span></code></p></td>
<td><p>Platform-defined</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="../reference/arrays.scalars.html#numpy.uintc" title="numpy.uintc"><code class="xref py py-obj docutils literal notranslate"><span class="pre">numpy.uintc</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">unsigned</span> <span class="pre">int</span></code></p></td>
<td><p>Platform-defined</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="../reference/arrays.scalars.html#numpy.int_" title="numpy.int_"><code class="xref py py-obj docutils literal notranslate"><span class="pre">numpy.int_</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">long</span></code></p></td>
<td><p>Platform-defined</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="../reference/arrays.scalars.html#numpy.uint" title="numpy.uint"><code class="xref py py-obj docutils literal notranslate"><span class="pre">numpy.uint</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">unsigned</span> <span class="pre">long</span></code></p></td>
<td><p>Platform-defined</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="../reference/arrays.scalars.html#numpy.longlong" title="numpy.longlong"><code class="xref py py-obj docutils literal notranslate"><span class="pre">numpy.longlong</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">long</span> <span class="pre">long</span></code></p></td>
<td><p>Platform-defined</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="../reference/arrays.scalars.html#numpy.ulonglong" title="numpy.ulonglong"><code class="xref py py-obj docutils literal notranslate"><span class="pre">numpy.ulonglong</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">unsigned</span> <span class="pre">long</span> <span class="pre">long</span></code></p></td>
<td><p>Platform-defined</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="../reference/arrays.scalars.html#numpy.half" title="numpy.half"><code class="xref py py-obj docutils literal notranslate"><span class="pre">numpy.half</span></code></a> / <a class="reference internal" href="../reference/arrays.scalars.html#numpy.float16" title="numpy.float16"><code class="xref py py-obj docutils literal notranslate"><span class="pre">numpy.float16</span></code></a></p></td>
<td></td>
<td><p>Half precision float:
sign bit, 5 bits exponent, 10 bits mantissa</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="../reference/arrays.scalars.html#numpy.single" title="numpy.single"><code class="xref py py-obj docutils literal notranslate"><span class="pre">numpy.single</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">float</span></code></p></td>
<td><p>Platform-defined single precision float:
typically sign bit, 8 bits exponent, 23 bits mantissa</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="../reference/arrays.scalars.html#numpy.double" title="numpy.double"><code class="xref py py-obj docutils literal notranslate"><span class="pre">numpy.double</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">double</span></code></p></td>
<td><p>Platform-defined double precision float:
typically sign bit, 11 bits exponent, 52 bits mantissa.</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="../reference/arrays.scalars.html#numpy.longdouble" title="numpy.longdouble"><code class="xref py py-obj docutils literal notranslate"><span class="pre">numpy.longdouble</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">long</span> <span class="pre">double</span></code></p></td>
<td><p>Platform-defined extended-precision float</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="../reference/arrays.scalars.html#numpy.csingle" title="numpy.csingle"><code class="xref py py-obj docutils literal notranslate"><span class="pre">numpy.csingle</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">float</span> <span class="pre">complex</span></code></p></td>
<td><p>Complex number, represented by two single-precision floats (real and imaginary components)</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="../reference/arrays.scalars.html#numpy.cdouble" title="numpy.cdouble"><code class="xref py py-obj docutils literal notranslate"><span class="pre">numpy.cdouble</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">double</span> <span class="pre">complex</span></code></p></td>
<td><p>Complex number, represented by two double-precision floats (real and imaginary components).</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="../reference/arrays.scalars.html#numpy.clongdouble" title="numpy.clongdouble"><code class="xref py py-obj docutils literal notranslate"><span class="pre">numpy.clongdouble</span></code></a></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">long</span> <span class="pre">double</span> <span class="pre">complex</span></code></p></td>
<td><p>Complex number, represented by two extended-precision floats (real and imaginary components).</p></td>
</tr>
</tbody>
</table>

## Summary

- `NumPy` arrays (`ndarray`) are efficient, homogeneous, multi-dimensional arrays.  
- Create arrays from lists, or with constructors like `zeros`, `ones`, `arange`, `linspace`.  
- Inspect arrays with `.shape`, `.dtype`, `.ndim`, `.size`.  
- Index and slice arrays similarly to lists, but slicing returns **views**, not copies.  
- Use `.copy()` when you need independence.  
- Perform fast elementwise operations and linear algebra with ufuncs.  
- Broadcasting allows operations on arrays of different shapes.  
- Use `np.random` for random numbers.  
- NumPy vectorised code is much faster than Python loops.


### Exercise 5

You are tasked with analysing daily temperatures (in °C) for two cities over a week.  

1. Create two 1D arrays `city1` and `city2`, each containing 7 random integers between 15 and 30 (inclusive).  
2. Combine the two arrays into a single 2×7 array `temps` where each row corresponds to a city.  
3. Slice `temps` to get the temperatures of both cities for the first 4 days. Call this `first_days`.  
4. Compute the **mean**, **max**, and **min** temperatures for each city.  
5. Compute the **temperature difference between the two cities** for each day.  
6. Vertically stack a new row for a third city with temperatures `[20, 22, 21, 23, 19, 18, 21]` to create `all_cities`.  

In [27]:
np.random.seed(0)  # for reproducibility
city1 = np.random.randint(15, 31, 7)
city2 = np.random.randint(15, 31, 7)

temps = np.vstack([city1, city2])
print("Temps array:\n", temps)

first_days = temps[:, :4]
print("First 4 days:\n", first_days)

mean_temps = temps.mean(axis=1)
max_temps = temps.max(axis=1)
min_temps = temps.min(axis=1)
print("Mean temperatures:", mean_temps)
print("Max temperatures:", max_temps)
print("Min temperatures:", min_temps)

diff = temps[0, :] - temps[1, :]
print("Temperature difference (city1 - city2):", diff)

city3 = np.array([20, 22, 21, 23, 19, 18, 21])
all_cities = np.vstack([temps, city3])
print("All cities array:\n", all_cities)


Temps array:
 [[27 30 20 15 18 26 18]
 [22 24 18 20 17 19 22]]
First 4 days:
 [[27 30 20 15]
 [22 24 18 20]]
Mean temperatures: [22.         20.28571429]
Max temperatures: [30 24]
Min temperatures: [15 17]
Temperature difference (city1 - city2): [ 5  6  2 -5  1  7 -4]
All cities array:
 [[27 30 20 15 18 26 18]
 [22 24 18 20 17 19 22]
 [20 22 21 23 19 18 21]]
