# Lecture 4: Basics of [NumPy](https://numpy.org/doc/stable/) & [`ndarrays`](https://numpy.org/doc/stable/reference/arrays.ndarray.html)

* Pandas → NumPy
* NumPy → Pandas
* Indexing & Selection

## Setup

Imports:

In [1]:
import pandas as pd
import numpy as np

The basis for all examples is the `DataFrame` we created last lecture:

In [2]:
df = pd.DataFrame(data=list(zip(range(5), range(5,10), range(10, 15))),
                  index=list('abcde'),
                  columns=['col0', 'col1', 'col2'])
df

Unnamed: 0,col0,col1,col2
a,0,5,10
b,1,6,11
c,2,7,12
d,3,8,13
e,4,9,14


## Pandas → NumPy

Since Pandas' "engine" under the hood is NumPy, going from Pandas to NumPy is as easy as accessing the property `values`. Every `DataFrame` and `Series` has this property, and it is called "values," because it stores the values.

### `DataFrame` → `ndarray`

If we access a `DataFrame`'s `values` property, we get a 2-dimensional `ndarray` containing all the data that is in the `DataFrame`.

In [3]:
df.values

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]], dtype=int64)

Of course, all indices are lost, because NumPy can't handle these. Recall: **Pandas = NumPy + indices**.

### `Series` → `ndarray`

Let's use `df`'s first column as an example of a `Series`:

In [4]:
S = df['col0']
S

a    0
b    1
c    2
d    3
e    4
Name: col0, dtype: int64

If we access a `Series`' `values` property, we get a 1-dimensional `ndarray` containing all the data that is in the `Series`.

In [5]:
S.values

array([0, 1, 2, 3, 4], dtype=int64)

Again, the index is lost, but the data is the same.

## NumPy → Pandas

Going from NumPy to Pandas is just as easy. We can simply pass an `ndarray` to a `DataFrame`'s or `Series`' constructor.

### `ndarray` → `DataFrame`

We use `df.values` to create a **N**um**P**y **a**rray (`npa`):

In [6]:
npa = df.values
npa

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]], dtype=int64)

To create a new `DataFrame` with the same data, we can simply call `DataFrame(npa)`:

In [7]:
pd.DataFrame(npa)

Unnamed: 0,0,1,2
0,0,5,10
1,1,6,11
2,2,7,12
3,3,8,13
4,4,9,14


Since we're not providing any explicit indices, the implicit indices are used for rows and columns.

### `ndarray` → `Series`

The same works for `Series`.

In [8]:
npa = S.values
npa

array([0, 1, 2, 3, 4], dtype=int64)

In [9]:
pd.Series(npa)

0    0
1    1
2    2
3    3
4    4
dtype: int64

## Indexing & Selection

Since indexing in Pandas is based on NumPy, it works exactly the same way.

### 1-dimensional

It works similarly for 1-dimensional `ndarray`s.

In [10]:
npa = S.values
npa

array([0, 1, 2, 3, 4], dtype=int64)

Select a single element:

In [11]:
npa[0]

0

### 2-dimensional

In [12]:
npa = df.values
npa

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]], dtype=int64)

Select a single row: `ndarray[row]`.

In [13]:
npa[0]

array([ 0,  5, 10], dtype=int64)

Select a single column: `ndarray[:, col]`.

In [14]:
npa[:, 0]

array([0, 1, 2, 3, 4], dtype=int64)

The colon means we want all rows. The same works for `DataFrame`s too:

In [15]:
df.loc[:, 'col0']

a    0
b    1
c    2
d    3
e    4
Name: col0, dtype: int64

Select a single element: `ndarray[row, col]`.

In [16]:
npa[0, 0]

0

Masking also works the same way as it does for `DataFrame`s.

© 2023 Philipp Cornelius