# Numpy Introduction

Numpy is a Python module that provides support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. Numpy provides the core multi-dimensional array object that is necessary for most tasks in scientific computing with Python. So, Numpy is the root of all scientific computing in Python. Even though you may not use Numpy directly, many of the scientific libraries in Python are built on top of Numpy. For example, you will always work with Pandas package in Data science projects which is 100% dependent on Numpy. The internal memory layouts of Numpy arrays are very similar to those of C and Fortran arrays.

Matrices and vectors are the most important concepts in any scientific problem solutions. So, it's necessary to work with matrices and vectors during scientific problem solution using computer programming. Now, you could use the normal Python arrays, but they are not really efficient enough for scientific problems. That's where Numpy arrays shine and easily outperform Python arrays. In Machine Learning problems, you will mostly be dealing with numbers and those numbers are usually stored in the form of multi-dimensional arrays. Numpy provides super fast and efficient numerical computations on these multi-dimensional arrays.

Python is great for writing code concisely to model your data. However, when it comes to speed, it's usually better if your are processed closer to the CPU. This is where Numpy comes in. When you store your data in a variable, behind the scene, the number is stored at a memory address in the RAM of your machine. In Python, number is stored as an `int` object which means along with data, you're storing relevant method references. This is why Numpy was developed so that we can store information in concise manner and provides faster access to the memory of the system.

## Features of Numpy Array

- Numpy works close to hardware for higher efficiency. 
- It's written in C and uses vectorization which helps reduce the calculations by avoiding loops.
- Numpy is widely used in scientific computations.
- Numpy is free and open source.
- The basic data structure in Numpy is Numpy array.

## Installing Numpy

In order to work with Scientific packages, you can set up Python virtual environment where you can install required packages. Once you have your virtual environment set up, you can install Numpy using below command.

```bash
pip install numpy
```

If you're using `conda`, you can use below command.

```bash
conda install numpy
```

## Using Numpy

To use Numpy package in your code, you first need to import Numpy. The contention in scientific community is to use `np` instead of `numpy` in your code. This way you will not end up typing `numpy` every time but instead can use the shorthand `np`.You can import Numpy using below command.

```python
import numpy as np
```

From next lesson, you will learn how to work with numpy arrays.

# Creating Numpy Arrays

Numpy's basic data structure is called `ndarray`. This is basically n-dimensional array.

There are four main ways to create Numpy arrays.

1. Create Numpy array using Python sequences.
2. Using Numpy array creation objects such as `arange`, `ones`, `zeros`, etc.
3. Use Special package functions like `random` to create `ndarray`.
4. Create array by loading data from files.

In order to work with `numpy`, you first import `numpy` as `np`.


In [1]:
import numpy as np

## 1. Array using Python sequences

You can easily construct `ndarray` using `array` functino defined in the `numpy` by passing any Python list.


In [2]:
a = np.array([1,2,3])
a

array([1, 2, 3])

Internally, Numpy stores the data in ndarray format which is not exactly Python list. It's lot more concise and memory-efficient storage.

You can also create numpy array of other types using the same syntax.

In [3]:
float_arr = np.array([.1, .2, .3])

If you check the type of this array `a`, you will notice that it's a `numpy.ndarray` type.

In [4]:
type(a)

numpy.ndarray

You can also create a numpy array using tuples. You simply pass a tuple instead of a list in this case.

In [5]:
b = np.array((1,2,3))
type(b)

numpy.ndarray

If you want to create a multi-dimensional array, you can use Python nested lists as shown below.

In [6]:
c = np.array([[1,2,3], [4,5,6]])
print(c)

[[1 2 3]
 [4 5 6]]


Again this is of type `numpy.ndarray`.

## 2. Using Numpy array creation objects

Numpy provides some useful objects to create arrays. These are called array creation objects.

You can use `arange` to create an array of evenly spaced values. This is similar to `range` function in Python language.

In [7]:
a = np.arange(5)
print(a)

[0 1 2 3 4]


Again, you can specify `start`, `stop` and `step` arguments to this function.

In [8]:
b = np.arange(1, 10, 2)
print(b)

[1 3 5 7 9]


You can use `ones` and `zeros` to create arrays of ones and zeros. You can specify the shape of the array you want to create using `shape` argument and all items of these arrays will be set to 1 or 0.

In [10]:
zeros = np.zeros((3,4))
print(zeros)

[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]


In [11]:
ones = np.ones((2, 3))
print(ones)

[[1. 1. 1.]
 [1. 1. 1.]]


The `np.linspace` function is used to create an array of evenly spaced values in a specified interval.

In [12]:
linspace = np.linspace(0, 10, 5)
print(linspace)

[ 0.   2.5  5.   7.5 10. ]


You can use `repeat` function to create an array by repeating an existing array multiple times.

In [13]:
a = np.array([1,2,3])
repeated = np.repeat(a, 3)
print(repeated)

[1 1 1 2 2 2 3 3 3]


The `np.eye(n, m)` function is used to create 2D identity matrix. This creates a matrix with ones on the diagonal and zeros elsewhere.

In [14]:
eye = np.eye(3)
print(eye)

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


## 3. Using Special package functions

There are other functions in `numpy` package which can be used to create `ndarray`. These are `random`, `linspace`, `logspace`, etc.

`random.rand` function is used to create numpy array with uniform distribution of values between 0 and 1 where 1 is not included.

In [15]:
a = np.random.rand(4)
print(a)

[0.53826777 0.13952984 0.62274587 0.88301904]


If you want to create random array with normal distribution, you can use `random.randn` function.

In [16]:
a = np.random.randn(3)
print(a)

[-0.5727829  -0.51042528 -0.44357847]


You can also generate an array with random integers using `random.randint` function.

In [17]:
a = np.random.randint(1, 10, (2, 3))
print(a)

[[3 7 5]
 [3 8 2]]


## 4. By reading data from file

You can also create an array by loading data from a file. This is useful when you want to work with data stored in external files. This is also the common way to work with `numpy`. For example, you could load image data from a file and work with it using `numpy` functions.

Assume you have a file with below content.

```plaintext
1,2,3
4,5,6
```

You can create a numpy array from this file using `loadtxt` function.

```python
a = np.loadtxt('data.txt', delimiter=',')
print(a)
```

```plaintext
[[1 2 3]
 [4 5 6]]
```

You can also save numpy array to a file using `savetxt` function. You can specify the name of the file and the array to be saved as arguments.

```python
np.savetxt('data.txt', a)
```

# Numpy Array Attributes

Numpy arrays have several properties which provide information about the array. Below I will demonstrate some of those.

## 1. `shape`
You can get shape of an array using `shape` attribute. This is a tuple which contains dimensions of the array.

In [18]:
a = np.array([1,2,3])
a.shape # (3,)

(3,)

In [19]:
a = np.array([[1,2,3], [4,5,6]])
a.shape # (2, 3)

(2, 3)

## 2. `dtype`

The `dtype` attribute is used to get the data type of the array.

In [20]:
a = np.array([1,2,3])
a.dtype # dtype('int64')

dtype('int64')

In [21]:
b = np.array([True, False, True])
b.dtype # dtype('bool')

dtype('bool')

## 3. `size`

The `size` attribute is used to get the total number of elements in the array.

In [22]:
a = np.array([[1,2,3], [4,5,6]])
a.size # 6

6

## 4. `ndim`

The `ndim` attribute is used to get the number of dimensions of the array. For example, for 2D array, `ndim` will be 2 and for 3D array, `ndim` will be 3.

In [23]:
a = np.array([[1,2,3], [4,5,6]])
a.ndim # 2

2

## 5. `itemsize`

This is used to get the size of each element of the array in bytes. As you saw above for this machine, the default integer type is `int64` which is 8 bytes.

In [24]:
a = np.array([1,2,3])
a.itemsize # 8

8

## 6. `nbytes`

This is used to get the total size of the array in bytes. This is the product of `itemsize` and `size`.

In [25]:
a = np.array([[1,2,3], [4,5,6]])
a.nbytes # 48

48

## 7. `data`

This is used to get the memory address of the array. This is a read-only property.

In [26]:
a = np.array([1,2,3])
a.data

<memory at 0x7a2c2c30b700>

# Numpy Array Indexing