# NumPy

NumPy is one of the most widely used libraries for scientific computing in Python. Even if you don‚Äôt use it directly, many other libraries rely on it behind the scenes. It offers fast multidimensional arrays along with tools to work with them efficiently.

# Python Lists vs NumPy Arrays

Python lists are flexible, they can store different types of data in the same container.
For example:

In [1]:
import numpy as np
example_list = [1, 1.5, "abc", False, [1.5, True],[2, "python"],[3, [False, "python"]]]
example_list

[1, 1.5, 'abc', False, [1.5, True], [2, 'python'], [3, [False, 'python']]]

This flexibility, however, causes limitations. If we try applying an operation like "add 1" to every element, we must process each item one by one. Adding 1 works for numbers, but not for strings, booleans, or nested lists.

In scientific computing we often work with huge collections of numerical values, and we need fast operations on all elements at once. This is where **NumPy** arrays excel.
What is a NumPy Array?

A NumPy array is a grid of values, but unlike lists, all elements have the same data type.
Arrays are stored efficiently in memory and allow vectorized operations, meaning operations apply to entire groups of values at once.

A **NumPy** array has three key attributes:

- **dtype** ‚Äî data type. Arrays always contain one type (arrays are homogeneous).
- **shape** ‚Äî Dimensions of the array, e.g. `(3,2)`, `(3,4,500)`, or `()`.
- **data** ‚Äî raw data storage in memory. This can be passed to C or Fortran code for efficient calculations.


## Performance Test: Python vs NumPy

To compare performance of list in pure python with NumPy, lets see the follwoing code. At first we make a list of numbers from 0 to 9999 (a) and a list with zero values (b).


In [2]:
a = list(range(100000))
b = [ 0 ] * 100000

The follwoing cell fill the list b with quare values of a and the magic code timeit calculate the running time.

In [3]:
%%timeit
for i in range(len(a)):
  b[i] = a[i]**2

27 ms ¬± 38.1 Œºs per loop (mean ¬± std. dev. of 7 runs, 10 loops each)


Now, lets compare with the Numpy. Create an array 

In [4]:
import numpy as np
a = np.arange(10000)

and then calculate the queare of each entity.

In [5]:
%%timeit
b = a ** 2

6.39 Œºs ¬± 8.82 ns per loop (mean ¬± std. dev. of 7 runs, 100,000 loops each)


We see that compared to working with numpy arrays, working with traditional python lists is actually slow.

# Creating arrays
There are different ways to create arrays:

- `numpy.array()`
- `numpy.zeros()`
- `numpy.ones()`
- `numpy.full()`
- `numpy.arange()`
- `numpy.linspace()`

# Array attributes
Useful attributes and information from an array:

- `numpy.ndarray.shape` ‚Üí array dimensions
- `numpy.ndarray.size` ‚Üí total number of elements
where ndarray is a predefined n-dimensional array.


In [6]:
a = np.array([1,2,3])               # 1-dimensional array (rank 1)
b = np.array([[1,2,3],[4,5,6]])     # 2-dimensional array (rank 2)

b.shape                             # the shape (rows,columns)
b.size                              # number of elements

6

In [7]:
np.zeros((2, 3))             # 2x3 array with all elements 0
np.ones((1,2))               # 1x2 array with all elements 1
np.full((2,2),7)             # 2x2 array with all elements 7
np.eye(2)                    # 2x2 identity matrix

np.arange(10)                # Evenly spaced values in an interval
np.linspace(0,9,10)          # same as above, see exercise

c = np.ones((3,3))
d = np.ones((3, 2), 'bool')  # 3x2 boolean array

Arrays can also be stored to and loaded from a `.npy` file using:

- `numpy.save()`
- `numpy.load()`


In [8]:
np.save('x.npy', a)           # save the array a to a .npy file
x = np.load('x.npy')          # load an array from a .npy file and store it in variable x

In many occasions (especially when something behaves differently than expected),  it is useful to check and control the data type of an array using:

- `numpy.ndarray.dtype`
- `numpy.ndarray.astype()`


In [9]:
d.dtype                    # datatype of the array

dtype('bool')

In [10]:
d.astype('int')            # change datatype from boolean to integer

array([[1, 1],
       [1, 1],
       [1, 1]])

In the last example, using `.astype('int')`, NumPy will make a **copy** of the array  and **re-allocate** memory ‚Äî unless the target `dtype` is **identical** to the original.  
Understanding and minimizing copies is one of the most important practices for performance.


<div style="border:2px solid #2AA198;background:#DFF7E3;
            padding:15px;border-radius:8px;">
<h3>üü© Exercises: NumPy-1</h3>

1. **Datatypes** ‚Äî Try `np.arange(10)` and `np.linspace(0,9,10)`.  
   What is the difference? Can you make one behave like the other?

2. **Datatypes** ‚Äî Create a 3√ó2 array of floats (`numpy.random.random()`)  
   and convert it to integers using `.astype(int)`. How does it change?

3. **Reshape** ‚Äî Create a 3√ó2 integer array (range 0‚Äì10) and change shape  
   using `.reshape()`. Which shapes are not allowed?

4. **NumPy I/O** ‚Äî Save the array using `numpy.save()` and load it back with `numpy.load()`.
</div>

<br>

<details>
<summary><strong>üëá Click to show solutions</strong></summary>

```python
# 1. Difference between arange & linspace
np.arange(10)         # ‚Üí integers, step 1
np.linspace(0,9,10)   # ‚Üí 10 numbers evenly spaced between 0 and 9

# 2. Create random array & convert type
arr = np.random.random((3,2))
arr_int = arr.astype(int)

# 3. Reshape example
x = np.random.randint(0,10,(3,2))
x.reshape(6,)   # works
# x.reshape(4,2) # ‚ùå fails ‚Äî size mismatch

# 4. Save & load
np.save("data.npy", x)
loaded = np.load("data.npy")
