# Conversion to & from Numpy and Pandas
By the end of this lecture you will be able to:
- convert between Polars and Numpy
- convert between Polars and Pandas

Key functionality in this notebook requires that your Pandas version is 1.5+, Polars is 0.16.4+ and PyArrow is 11+.

Use `pl.show_versions()` to check your installation

In [None]:
import polars as pl
import numpy as np
import pandas as pd

In [None]:
csv_file = "../data/titanic.csv"

In [None]:
df = pl.read_csv(csv_file)
df.head(3)

## Convert a `DataFrame` to Numpy

To convert a `DataFrame` to Numpy use the `to_numpy` method. This clones (copies) the data.

In [None]:
arr = df.to_numpy()
arr

This conversion turns each row into a Numpy `ndarray` and vertically stacks these row-arrays.

As the `DataFrame` has a mix of types the Numpy array has an `object` dtype.

If the columns have uniform numeric dtype then the Numpy array has the corresponding dtype.

In this example we use `select` to choose the 64-bit floating point columns only for conversion to Numpy...

> We cover `select` in more detail in the Section on Selecting columns and transforming dataframes.

In [None]:
floats_array = (
    df
    .select(
        pl.col(pl.Float64)
    )
    .to_numpy()
)
floats_array

... and we get a float Numpy array

In [None]:
floats_array.dtype

## Convert Numpy to a `DataFrame`

We can create a Polars `DataFrame` from a Numpy array

In [None]:
rand_array = np.random.standard_normal((5,3))
(
    pl.DataFrame(
        rand_array,
    )
)

We can optionally pass a list of column names to `pl.DataFrame` if we want to specify these.

If we have a **1D** Numpy array we can create a Polars `Series` or `DataFrame` with zero-copy. We start by creating a 1D array of all ones

In [None]:
arr = np.ones(10)
arr.shape

We can then create a `Series` or `DataFrame` with *zero-copy*

In [None]:
# zero copy series conversion
pl.Series("a", arr)

# zero copy DataFrame conversion
pl.DataFrame(
    {
       "a": arr,
    }
)

Zero-copy means that the data - the array of ones - is only stored in one place in memory. Both Numpy and Polars are looking at this same place to get the data for the original Numpy array or the Polars `Series` or `DataFrame`

## Convert a `Series` to Numpy
Converting a `Series` to Numpy has more options than converting an entire `DataFrame`.

To do a simple conversion where the data is cloned use `to_numpy` on the `Series`

In [None]:
df['Age'].head().to_numpy()

### Convert a `Series` to Numpy with zero-copy
In some cases we can convert a `Series` to Numpy without copying ("zero-copy"). 

Zero-copy is only possible if there are no `null` or `NaN` values such as in the `Survived` column. If we want to ensure that conversion to Numpy happens with zero-copy - and raise an `Exception` if a copy is needed - we use the `allow_copy` argument

In [None]:
arr = (
    df['Survived']
    .head()
    .to_numpy(allow_copy=False)
)
arr

If we try this zero-copy approach with the `Age` column - where there is a `null` value we get an `Exception`

In [None]:
arr = (
    df['Age']
    .head()
    .to_numpy(allow_copy=False)
)
arr

With zero-copy conversion the Numpy array is read-only so we cannot change the values in the Numpy array.

In the following example we get an `Exception` when we try to change the values after a zero-copy operation on the `Survived` column

In [None]:
arr = (
    df['Survived']
    .head()
    .to_numpy(allow_copy=False)
)
arr[0] = 100

## Convert a `DataFrame` to Pandas

### Convert to a Numpy-backed Pandas DataFrame
Pandas has historically used Numpy arrays to represent its data in memory.

To convert a `DataFrame` to Pandas with Numpy array use the `to_pandas` method. This clones the data similar to calling `to_numpy` on a `DataFrame` above.

> This conversion to Pandas requires that you have `PyArrow` installed with `pip` or `conda`.

In [None]:
(
    df
    .to_pandas()
    .head(2)
)

### Convert to a PyArrow-backed Pandas `DataFrame`
Since Pandas release 1.5.0 and Polars release 1.6.4 you can have a Pandas `DataFrame` backed by an Arrow Table. You can create a Pandas `DataFrame` that references the same Arrow Table as your Polars `DataFrame`. This means that you can use (some) Pandas code on your data without copying the data

In [None]:
(
    df
    .to_pandas(use_pyarrow_extension_array=True)
    .head(2)
)

The advantage of using the pyarrow extension array is that creating the Pandas `DataFrame` is very cheap as it does not require copying data. 

If there is a function you want from Pandas you can do a quick transformation to Pandas, apply the function and revert back to Polars. This works in eager mode only of course.

This PyArrow conversion is a new feature in both libraries to there may be bugs with trickier features such as categorical or nested columns.

Note that when you do **not** use the PyArrow extension approach the dtypes of the columns in Pandas are the standard Pandas dtypes. When you do use the PyArrow extension approach the the dtypes of the columns in Pandas are PyArrow dtypes

In [None]:
# Without PyArrow dtypes
df.to_pandas(use_pyarrow_extension_array=False).dtypes

In [None]:
# With PyArrow dtypes
df.to_pandas(use_pyarrow_extension_array=True).dtypes

### Calling `pd.DataFrame` on a Polars `DataFrame`
With an up-to-date version of Pandas you can call `pd.DataFrame` on a Polars `DataFrame`

In [None]:
df_pandas = (
    pd.DataFrame(df)
    .head()
)

In [None]:
df_pandas



### Conversion from Pandas to Polars
You can convert from Pandas to Polars by calling `pl.DataFrame` on the Pandas `DataFrame`

In [None]:
(
    pl.DataFrame(
        df.to_pandas()
    )
    .head(3)
)

Or by calling `pl.from_pandas` on the Pandas `DataFrame`

In [None]:
(
    pl.from_pandas(
        df.to_pandas()
    ).head(3)
)

Both approaches are equivalent.

## Convert a `Series` to Pandas
You can convert a `Series` to Pandas with a `to_pandas` which clones the data

In [None]:
(
    df['Age']
    .to_pandas()
    .head()
)

Or you can again use the PyArrow extension type in Pandas for a zero-copy conversion

In [None]:
(
    df['Age']
    .to_pandas(use_pyarrow_extension_array=True)
    .head()
)

## Exercises

No exercises for this lecture!