# Uvod v NumPy


## What is NumPy?

- NumPy is the fundamental package for scientific computing in Python.
- It is a Python library that provides a multidimensional array object.
- At the core of the NumPy package, is the ndarray object.
    
There are several important differences between NumPy arrays and the standard Python sequences:
- NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size of an ndarray will create a new array and delete the original.
- The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in memory.
- NumPy arrays facilitate advanced mathematical and other types of operations on large numbers of data. Typically, such operations are executed more efficiently and with less code than is possible using Python’s built-in sequences.

## Understanding Data Types in Python

```c
/* C code */
int result = 0;
for(int i=0; i<100; i++){
    result += i;
}

/* C code */
int x = 4;
x = "four";  // FAILS
```

In [3]:
# Python code
x = 4
x = "four"
x = True
x = 56.6

In [4]:
l1 = [4, 5.6, True, "srere"]

In [5]:
import array

array.array("i", [1,2,3,4,5])

array('i', [1, 2, 3, 4, 5])

## NumPy Speed

In [6]:
a = list(range(100))
b = list(range(200,300))

c = []
for i in range(len(a)):
    c.append(a[i] * b[i])

print(c[:10])

[0, 201, 404, 609, 816, 1025, 1236, 1449, 1664, 1881]


```c
for (i = 0; i < rows; i++) {
  c[i] = a[i]*b[i];
}
```

```c
for (i = 0; i < rows; i++) {
  for (j = 0; j < columns; j++) {
    c[i][j] = a[i][j]*b[i][j];
  }
}
```

In [8]:
import numpy as np

a = np.arange(100)
b = np.arange(200,300)

c = a * b

print(c[:10])

[   0  201  404  609  816 1025 1236 1449 1664 1881]


## Example: Data analysis in pure Python

In [22]:
import csv

dataset_path = "data/f500_small.csv"

with open(dataset_path, "r") as f:
    f500_small = list(csv.reader(f))

print(sum([int(row[2]) for row in f500_small[1:]]))

4305395


## How Vectorization Makes Code Faster

In [23]:
%%timeit -n 3 -r 1
# Native Python
size = 5_000_000
list1 = [i for i in range(size)]
list2 = [i for i in range(size)]

sums = []

for el1, el2 in zip(list1, list2):
    row_sum = el1 + el2
    sums.append(row_sum)

print(sums[:5])

[0, 2, 4, 6, 8]
[0, 2, 4, 6, 8]
[0, 2, 4, 6, 8]
3.98 s ± 0 ns per loop (mean ± std. dev. of 1 run, 3 loops each)


In [25]:
%%timeit -n 10 -r 1
# NumPy - vectorized operations
import numpy as np

size = 5_000_000  # 10 times bigger than the native Python example

# Numpy - declaring arrays
array1 = np.arange(size)
array2 = np.arange(size)

sums = array1 + array2

print(sums[:5])

[0 2 4 6 8]
[0 2 4 6 8]
[0 2 4 6 8]
[0 2 4 6 8]
[0 2 4 6 8]
[0 2 4 6 8]
[0 2 4 6 8]
[0 2 4 6 8]
[0 2 4 6 8]
[0 2 4 6 8]
144 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 10 loops each)


## NumPy library

In [26]:
import numpy as np

In [27]:
np.__version__

'1.26.3'

## Introduction to Ndarrays

In [28]:
a = np.array([1,3,4,5,6,7,8])
a

array([1, 3, 4, 5, 6, 7, 8])

In [29]:
print(type(a))

<class 'numpy.ndarray'>


In [31]:
b = np.array([[1,2,3], [4,5,6], [7,8,9]])
print(b)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [33]:
a.ndim, b.ndim # ndarray.ndim: the number of axes (dimensions) of the array.

(1, 2)

In [36]:
a.shape, b.shape  # ndarray.shape: the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension.

((7,), (3, 3))

In [38]:
a.size, b.size # ndarray.size: the total number of elements of the array. This is equal to the product of the elements of shape.

(7, 9)

In [39]:
a.dtype, b.dtype # ndarray.dtype: an object describing the type of the elements in the array.

(dtype('int32'), dtype('int32'))

In [40]:
a.itemsize, b.itemsize # ndarray.itemsize: the size in bytes of each element of the array.

(4, 4)

In [41]:
a.data

<memory at 0x000001935191FB80>