# Conversion to & from Numpy and Pandas
By the end of this lecture you will be able to:
- convert between Polars and Numpy
- convert between Polars and Pandas

Key functionality in this notebook requires that your Pandas version is 2.0+ (automated testing is carried out with the latest version of Pandas on PyPi).

Use `pl.show_versions()` to check your installation

In [35]:
import polars as pl
import numpy as np
import pandas as pd


The history saving thread hit an unexpected error (OperationalError('attempt to write a readonly database')).History will not be written to the database.


In [46]:
csv_file = "../Files/Sample_Superstore.csv"

In [47]:
df = pl.read_csv(csv_file)
df.head(3)

Row_ID,Order_ID,Order_Date,Ship Date,Ship_Mode,Customer_ID,Customer_Name,Segment,Country,City,State,Postal_Code,Region,Product_ID,Category,Sub_Category,Product_Name,Sales,Quantity,Discount,Profit
i64,str,str,str,str,str,str,str,str,str,str,i64,str,str,str,str,str,f64,i64,f64,f64
1,"""CA-2016-152156""","""11/8/2016""","""11/11/2016""","""Second Class""","""CG-12520""","""Claire Gute""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-BO-10001798""","""Furniture""","""Bookcases""","""Bush Somerset Collection Bookc…",261.96,2,0.0,41.9136
2,"""CA-2016-152156""","""11/8/2016""","""11/11/2016""","""Second Class""","""CG-12520""","""Claire Gute""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-CH-10000454""","""Furniture""","""Chairs""","""Hon Deluxe Fabric Upholstered …",731.94,3,0.0,219.582
3,"""CA-2016-138688""","""6/12/2016""","""6/16/2016""","""Second Class""","""DV-13045""","""Darrin Van Huff""","""Corporate""","""United States""","""Los Angeles""","""California""",90036,"""West""","""OFF-LA-10000240""","""Office Supplies""","""Labels""","""Self-Adhesive Address Labels f…",14.62,2,0.0,6.8714


## Convert a `DataFrame` to Numpy

To convert a `DataFrame` to Numpy use the `to_numpy` method. This clones (copies) the data.

In [38]:
arr = df.to_numpy()
arr

array([[1, 'CA-2016-152156', '11/8/2016', ..., 2, 0.0, 41.9136],
       [2, 'CA-2016-152156', '11/8/2016', ..., 3, 0.0, 219.582],
       [3, 'CA-2016-138688', '6/12/2016', ..., 2, 0.0, 6.8714],
       ...,
       [9992, 'CA-2017-121258', '2/26/2017', ..., 2, 0.2, 19.3932],
       [9993, 'CA-2017-121258', '2/26/2017', ..., 4, 0.0, 13.32],
       [9994, 'CA-2017-119914', '5/4/2017', ..., 2, 0.0, 72.948]],
      dtype=object)

This conversion turns each row into a Numpy `ndarray` and vertically stacks these row-arrays.

As the `DataFrame` has a mix of types the Numpy array has an `object` dtype.

If the columns have uniform numeric dtype then the Numpy array has the corresponding dtype.

In this example we use `select` to choose the 64-bit floating point columns only for conversion to Numpy...

> We cover `select` in more detail in the Section on Selecting columns and transforming dataframes.

In [39]:
floats_array = (
    df
    .select(
        pl.col(pl.Float64)
    )
    .to_numpy()
)
floats_array

array([[2.61960e+02, 0.00000e+00, 4.19136e+01],
       [7.31940e+02, 0.00000e+00, 2.19582e+02],
       [1.46200e+01, 0.00000e+00, 6.87140e+00],
       ...,
       [2.58576e+02, 2.00000e-01, 1.93932e+01],
       [2.96000e+01, 0.00000e+00, 1.33200e+01],
       [2.43160e+02, 0.00000e+00, 7.29480e+01]])

... and we get a float Numpy array

In [40]:
type(floats_array)

numpy.ndarray

In [20]:
floats_array.dtype

dtype('float64')

The Polars sequence dtypes `pl.List` and `pl.Array` are common ways to store sequences that might be passed to Numpy. We learn more about these in Section 4 of the course.

## Convert Numpy to a `DataFrame`

We can create a Polars `DataFrame` from a Numpy array

In [45]:

data_list = floats_array.tolist()

# Create a Polars DataFrame from the list of lists
convert_df = pl.DataFrame(data_list)

In [42]:
convert_df

column_0,column_1,column_2,column_3,column_4,column_5,column_6,column_7,column_8,column_9,column_10,column_11,column_12,column_13,column_14,column_15,column_16,column_17,column_18,column_19,column_20,column_21,column_22,column_23,column_24,column_25,column_26,column_27,column_28,column_29,column_30,column_31,column_32,column_33,column_34,column_35,column_36,…,column_9957,column_9958,column_9959,column_9960,column_9961,column_9962,column_9963,column_9964,column_9965,column_9966,column_9967,column_9968,column_9969,column_9970,column_9971,column_9972,column_9973,column_9974,column_9975,column_9976,column_9977,column_9978,column_9979,column_9980,column_9981,column_9982,column_9983,column_9984,column_9985,column_9986,column_9987,column_9988,column_9989,column_9990,column_9991,column_9992,column_9993
f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,…,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64
261.96,731.94,14.62,957.5775,22.368,48.86,7.28,907.152,18.504,114.9,1706.184,911.424,15.552,407.976,68.81,2.544,665.88,55.5,8.56,213.48,22.72,19.46,60.34,71.372,1044.63,11.648,90.57,3083.43,9.618,124.2,3.264,86.304,6.858,15.76,29.472,1097.544,190.92,…,223.92,7.3,9.344,18.0,65.584,383.4656,10.368,13.4,4.98,109.69,40.2,735.98,22.75,119.56,140.75,99.568,271.96,18.69,13.36,249.584,13.86,13.376,437.472,85.98,16.52,35.56,97.98,31.5,55.6,36.24,79.99,206.1,25.248,91.96,258.576,29.6,243.16
0.0,0.0,0.0,0.45,0.2,0.0,0.0,0.2,0.2,0.0,0.2,0.2,0.2,0.2,0.8,0.8,0.0,0.0,0.0,0.2,0.2,0.0,0.0,0.3,0.0,0.2,0.0,0.5,0.7,0.2,0.2,0.2,0.7,0.2,0.2,0.2,0.6,…,0.0,0.0,0.2,0.0,0.2,0.32,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.2,0.0,0.0,0.2,0.0,0.2,0.2,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.2,0.0,0.0
41.9136,219.582,6.8714,-383.031,2.5164,14.1694,1.9656,90.7152,5.7825,34.47,85.3092,68.3568,5.4432,132.5922,-123.858,-3.816,13.3176,9.99,2.4824,16.011,7.384,5.0596,15.6884,-1.0196,240.2649,4.2224,11.7741,-1665.0522,-7.0532,15.525,1.1016,9.7092,-5.715,3.546,9.9468,123.4737,-147.963,…,109.7208,2.19,1.8688,3.24,23.7742,-67.6704,3.6288,6.432,2.3406,51.5543,18.09,331.191,6.5975,54.9976,42.225,33.6042,27.196,5.2332,6.4128,31.198,0.0,4.6816,153.1152,22.3548,5.369,16.7132,27.4344,15.12,16.124,15.2208,28.7964,55.647,4.1028,15.6332,19.3932,13.32,72.948


In [48]:
type(convert_df)

polars.dataframe.frame.DataFrame

## Convert a `Series` to Numpy
Converting a `Series` to Numpy has more options than converting an entire `DataFrame`.

To do a simple conversion where the data is cloned use `to_numpy` on the `Series`

In [49]:
(
    df['Profit']
    .head()
    .to_numpy()
)

array([  41.9136,  219.582 ,    6.8714, -383.031 ,    2.5164,   14.1694,
          1.9656,   90.7152,    5.7825,   34.47  ])

And here we get the same output as above.

### Convert a `Series` to Numpy with zero-copy
In some cases we can convert a `Series` to Numpy without copying ("zero-copy"). 

Zero-copy is only possible if there are no `null` or `NaN` values such as in the `Survived` column. If we want to ensure that conversion to Numpy happens with zero-copy - and raise an `Exception` if a copy is needed - we use the `allow_copy` argument

In [50]:
arr = (
    df['Profit']
    .head()
    .to_numpy(allow_copy=False)
)
arr

array([  41.9136,  219.582 ,    6.8714, -383.031 ,    2.5164,   14.1694,
          1.9656,   90.7152,    5.7825,   34.47  ])