# Pandas support

It is convenient to use pandas when dealing with numerical data, so pint provides PintArray to allow quantities to be used with Pandas. A PintArray is a pandas ExtensionArray, which allows pandas to recognise the Quantity and store it in DataFrames or Series. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.api.extensions.ExtensionArray.html

In [1]:
!pip list

Package              Version                  Location                                          
-------------------- ------------------------ --------------------------------------------------
appdirs              1.4.3                    
appnope              0.1.0                    
asn1crypto           0.24.0                   
atomicwrites         1.1.5                    
attrs                18.1.0                   
Automat              0.0.0                    
backcall             0.1.0                    
bleach               2.1.4                    
certifi              2018.8.24                
cffi                 1.11.5                   
chardet              3.0.4                    
constantly           15.1.0                   
coverage             4.5.1                    
coveralls            1.3.0                    
cryptography         2.3.1                    
cryptography-vectors 2.3.1                    
Cython               0.28.5         

In [2]:
import pandas as pd 
import pint
import numpy as np

from pint.pandas_interface import PintArray

In [3]:
ureg=pint.UnitRegistry()
Q_=ureg.Quantity

In [4]:
df=pd.DataFrame({"torque":PintArray(Q_([1,2,2,3],"lbf ft")),
              "angular_velocity":PintArray(Q_([1000,2000,2000,3000],"rpm"))})
df

Unnamed: 0,torque,angular_velocity
0,1,1000
1,2,2000
2,2,2000
3,3,3000


In [5]:
df['power'] = df['torque'] * df['angular_velocity']
df  # not clear why warning is appearing given the units appear below



Unnamed: 0,torque,angular_velocity,power
0,1,1000,1000
1,2,2000,4000
2,2,2000,4000
3,3,3000,9000


In [6]:
df.power.values.data

In [9]:
df.torque.values.data

In [10]:
df.angular_velocity.values.data

In [11]:
df.power.values.data.to("kW")

That's great, but looks like effort compared to reading from files. DataFrame accessors are provided to make it easy to get to PintArrays. Let's start by reading a file which has units as a level in the column multiindex:

In [15]:
df=pd.read_csv("pint_test_data.csv", header=[0,1])
df

Unnamed: 0_level_0,speed,mech power,torque,rail pressure,fuel flow rate,fluid power
Unnamed: 0_level_1,rpm,kW,N m,bar,l/min,kW
0,1000,,10,1000,10,
1,1100,,10,1000000000000,10,
2,1200,,10,1000,10,
3,1200,,10,1000,10,


Then use the DataFrame's pint accessor's quantify method to convert the columns from np arrays to PintArrays, with units from the bottom column level.

In [16]:
df_ = df.pint.quantify(ureg, level=-1)
df_

Unnamed: 0,speed,mech power,torque,rail pressure,fuel flow rate,fluid power
0,1000,,10,1000.0,10,
1,1100,,10,1000000000000.0,10,
2,1200,,10,1000.0,10,
3,1200,,10,1000.0,10,


Operations between PintArrays (the columns in the DataFrame) are unit aware.

In [17]:
df_['mech power'] = df_.speed*df_.torque
df_['fluid power'] = df_['fuel flow rate'] * df_['rail pressure']
df_  # again no idea why warning appears given everything seems to work...



Unnamed: 0,speed,mech power,torque,rail pressure,fuel flow rate,fluid power
0,1000,10000,10,1000.0,10,10000.0
1,1100,11000,10,1000000000000.0,10,10000000000000.0
2,1200,12000,10,1000.0,10,10000.0
3,1200,12000,10,1000.0,10,10000.0


We can verify the units of the columns have been multplied as expected

In [18]:
df_.pint.dequantify()

Unnamed: 0_level_0,speed,mech power,torque,rail pressure,fuel flow rate,fluid power
Unnamed: 0_level_1,revolutions_per_minute,meter * newton * revolutions_per_minute,meter * newton,bar,liter / minute,bar * liter / minute
0,1000.0,10000.0,10.0,1000.0,10.0,10000.0
1,1100.0,11000.0,10.0,1000000000000.0,10.0,10000000000000.0
2,1200.0,12000.0,10.0,1000.0,10.0,10000.0
3,1200.0,12000.0,10.0,1000.0,10.0,10000.0


We can change units to something more typical

In [19]:
df_['fluid power'] = df_['fluid power'].pint.to("kW")
df_['mech power'] = df_['mech power'].pint.to("kW")
df_.pint.dequantify()

Unnamed: 0_level_0,speed,mech power,torque,rail pressure,fuel flow rate,fluid power
Unnamed: 0_level_1,revolutions_per_minute,kilowatt,meter * newton,bar,liter / minute,kilowatt
0,1000.0,1.047198,10.0,1000.0,10.0,16.66667
1,1100.0,1.151917,10.0,1000000000000.0,10.0,16666670000.0
2,1200.0,1.256637,10.0,1000.0,10.0,16.66667
3,1200.0,1.256637,10.0,1000.0,10.0,16.66667


Or convert all columns to base units

In [20]:
df_.pint.to_base_units().pint.dequantify()

Unnamed: 0_level_0,speed,mech power,torque,rail pressure,fuel flow rate,fluid power
Unnamed: 0_level_1,radian / second,kilogram * meter ** 2 / second ** 3,kilogram * meter ** 2 / second ** 2,kilogram / meter / second ** 2,meter ** 3 / second,kilogram * meter ** 2 / second ** 3
0,104.719755,1047.197551,10.0,100000000.0,0.000167,16666.67
1,115.191731,1151.917306,10.0,1e+17,0.000167,16666670000000.0
2,125.663706,1256.637061,10.0,100000000.0,0.000167,16666.67
3,125.663706,1256.637061,10.0,100000000.0,0.000167,16666.67


## Comments

What follows is a short discussion about Pint's `PintArray` Object.

It is first useful to distinguish between three different things:

1. A scalar value

In [29]:
Q_(123,"m")

2. A 1d array or list

In [30]:
Q_([1, 2, 3], "m")

3. A 2d+ array or list

In [31]:
Q_([[1, 2], [3, 4]], "m")

The first, a single scalar value is not intended to be stored in the PintArray as it's not an array, and should raise an error (TODO). The scalar Quantity is the scalar form of the PintArray, and is returned when performing operations that use `get_item`, eg indexing. A PintArray can be created from a list of scalar Quantitys using `PintArray._from_sequence`.

The second, a 1d array or list, is intended to be stored in the PintArray, and is stored in the PintArray.data attribute.

The third, 2d+ arrays or lists, are beyond the capabilities of ExtensionArrays which are limited to 1d arrays, so cannot be stored in the array, and should raise an error (TODO).

Most operations on the PintArray act on the Quantity stored in `PintArray.data`, so will behave similiarly to operations on a Quantity, with some caveats:

1. An operation that would return a 1d Quantity will return a PintArray containing the Quantity. This allows pandas to assign the result to a Series.
2. Arithemetic and comparative operations are limited to scalars and sequences of the same length as the stored Quantity. This ensures results are the same length as the stored Quantity, so can be added to the same DataFrame.