# Pandas support

<div class="alert alert-warning">

**Warning:** pandas support is currently experimental, don't expect everything to work.

</div>

It is convenient to use the Pandas package when dealing with numerical data, so Pint provides PintArray. A PintArray is a Pandas Extension Array, which allows Pandas to recognise the Quantity and store it in Pandas DataFrames and Series.

## Installation


Pandas support is provided by the `pint-pandas` package. To install it use either:
```
python -m pip install pint-pandas
```
Or:
```
conda install -c conda-forge pint-pandas
```

## Basic example

This example will show the simplist way to use pandas with pint and the underlying objects. It's slightly fiddly as you are not reading from a file. A more normal use case is given in Reading a csv.

First some imports (you don't need to import `pint_pandas` for this to work)

In [1]:
import pandas as pd 
import pint
import pint_pandas

Next, we create a DataFrame with PintArrays as columns.

In [2]:
df = pd.DataFrame({
    "torque": pd.Series([1, 2, 2, 3], dtype="pint[lbf ft]"),
    "angular_velocity": pd.Series([1, 2, 2, 3], dtype="pint[rpm]"),
})
df

Unnamed: 0,torque,angular_velocity
0,1,1
1,2,2
2,2,2
3,3,3


Operations with columns are units aware so behave as we would intuitively expect.

In [3]:
df['power'] = df['torque'] * df['angular_velocity']
df

Unnamed: 0,torque,angular_velocity,power
0,1,1,1
1,2,2,4
2,2,2,4
3,3,3,9


We can see the columns' units in the dtypes attribute

In [4]:
df.dtypes

torque                                       pint[foot * force_pound]
angular_velocity                         pint[revolutions_per_minute]
power               pint[foot * force_pound * revolutions_per_minute]
dtype: object

Each column can be accessed as a Pandas Series

In [5]:
df.power

0    1
1    4
2    4
3    9
Name: power, dtype: pint[foot * force_pound * revolutions_per_minute]

Which contains a PintArray

In [6]:
df.power.values

<PintArray>
[1, 4, 4, 9]
Length: 4, dtype: pint[foot * force_pound * revolutions_per_minute]

The PintArray contains a Quantity

In [7]:
df.power.values.quantity

0,1
Magnitude,[1 4 4 9]
Units,foot force_pound revolutions_per_minute


Pandas Series accessors are provided for most Quantity properties and methods, which will convert the result to a Series where possible.

In [8]:
df.power.pint.units

In [9]:
df.power.pint.to("kW").values

<PintArray>
[0.00014198092353610379,  0.0005679236941444151,  0.0005679236941444151,
   0.001277828311824934]
Length: 4, dtype: pint[kilowatt]

## Reading from csv

Reading from files is the far more standard way to use pandas. To facilitate this, DataFrame accessors are provided to make it easy to get to PintArrays. 

In [10]:
import pandas as pd 
import pint
import pint_pandas
import io

Here's the contents of the csv file.

In [11]:
test_data = '''ShaftSpeedIndex,rpm,1200,1200,1200,1600,1600,1600,2300,2300,2300
pump,,A,B,C,A,B,C,A,B,C
ShaftSpeed,rpm,1200,1200,1200,1600,1600,1600,2300,2300,2300
FlowRate,m^3 h^-1,8.72,9.28,9.31,11.61,12.78,13.51,18.32,17.90,19.23
DifferentialPressure,kPa,162.03,144.16,136.47,286.86,241.41,204.21,533.17,526.74,440.76
ShaftPower,kW,1.32,1.23,1.18,3.09,2.78,2.50,8.59,8.51,7.61
Efficiency,dimensionless,30.60,31.16,30.70,30.72,31.83,31.81,32.52,31.67,32.05'''

Let's read that into a DataFrame.
Here io.StringIO is used in place of reading a file from disk, whereas a csv file path would typically be used and is shown commented.

In [12]:
df = pd.read_csv(io.StringIO(test_data), header=[0, 1], index_col = [0,1]).T
# df = pd.read_csv("/path/to/test_data.csv", header=[0, 1])
df

Unnamed: 0_level_0,Unnamed: 1_level_0,ShaftSpeed,FlowRate,DifferentialPressure,ShaftPower,Efficiency
Unnamed: 0_level_1,Unnamed: 1_level_1,rpm,m^3 h^-1,kPa,kW,dimensionless
ShaftSpeedIndex,pump,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
1200,A,1200.0,8.72,162.03,1.32,30.6
1200,B,1200.0,9.28,144.16,1.23,31.16
1200,C,1200.0,9.31,136.47,1.18,30.7
1600,A,1600.0,11.61,286.86,3.09,30.72
1600,B,1600.0,12.78,241.41,2.78,31.83
1600,C,1600.0,13.51,204.21,2.5,31.81
2300,A,2300.0,18.32,533.17,8.59,32.52
2300,B,2300.0,17.9,526.74,8.51,31.67
2300,C,2300.0,19.23,440.76,7.61,32.05


Then use the DataFrame's pint accessor's quantify method to convert the columns from `np.ndarray`s to PintArrays, with units from the bottom column level.

In [13]:
df.dtypes

ShaftSpeed            rpm              float64
FlowRate              m^3 h^-1         float64
DifferentialPressure  kPa              float64
ShaftPower            kW               float64
Efficiency            dimensionless    float64
dtype: object

In [14]:
df_ = df.pint.quantify(level=-1)
df_

Unnamed: 0_level_0,Unnamed: 1_level_0,ShaftSpeed,FlowRate,DifferentialPressure,ShaftPower,Efficiency
ShaftSpeedIndex,pump,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1200,A,1200.0,8.72,162.03,1.32,30.6
1200,B,1200.0,9.28,144.16,1.23,31.16
1200,C,1200.0,9.31,136.47,1.18,30.7
1600,A,1600.0,11.61,286.86,3.09,30.72
1600,B,1600.0,12.78,241.41,2.78,31.83
1600,C,1600.0,13.51,204.21,2.5,31.81
2300,A,2300.0,18.32,533.17,8.59,32.52
2300,B,2300.0,17.9,526.74,8.51,31.67
2300,C,2300.0,19.23,440.76,7.61,32.05


Let's confirm the units have been parsed correctly

In [15]:
df_.dtypes

ShaftSpeed                    pint[revolutions_per_minute]
FlowRate                pint[meter ** 3 / planck_constant]
DifferentialPressure                      pint[kilopascal]
ShaftPower                                  pint[kilowatt]
Efficiency                             pint[dimensionless]
dtype: object

Here the h in m^3 h^-1 has been parsed as the planck constant. Let's change the unit to hours.

In [16]:
df_['FlowRate'] = pint_pandas.PintArray(df_['FlowRate'].values.quantity.m, dtype = "pint[m^3/hr]")
df_.dtypes

ShaftSpeed              pint[revolutions_per_minute]
FlowRate                     pint[meter ** 3 / hour]
DifferentialPressure                pint[kilopascal]
ShaftPower                            pint[kilowatt]
Efficiency                       pint[dimensionless]
dtype: object

As previously, operations between DataFrame columns are unit aware

In [17]:
df_.ShaftPower / df_.ShaftSpeed

ShaftSpeedIndex  pump
1200             A                      0.0011
                 B                    0.001025
                 C       0.0009833333333333332
1600             A       0.0019312499999999998
                 B                   0.0017375
                 C                   0.0015625
2300             A        0.003734782608695652
                 B       0.0036999999999999997
                 C       0.0033086956521739133
dtype: pint[kilowatt / revolutions_per_minute]

In [18]:
df_['ShaftTorque'] = df_.ShaftPower / df_.ShaftSpeed
df_['FluidPower'] = df_['FlowRate'] * df_['DifferentialPressure']
df_

Unnamed: 0_level_0,Unnamed: 1_level_0,ShaftSpeed,FlowRate,DifferentialPressure,ShaftPower,Efficiency,ShaftTorque,FluidPower
ShaftSpeedIndex,pump,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1200,A,1200.0,8.72,162.03,1.32,30.6,0.0011,1412.9016
1200,B,1200.0,9.28,144.16,1.23,31.16,0.001025,1337.8048
1200,C,1200.0,9.31,136.47,1.18,30.7,0.0009833333333333,1270.5357
1600,A,1600.0,11.61,286.86,3.09,30.72,0.0019312499999999,3330.4446
1600,B,1600.0,12.78,241.41,2.78,31.83,0.0017375,3085.2198
1600,C,1600.0,13.51,204.21,2.5,31.81,0.0015625,2758.8771
2300,A,2300.0,18.32,533.17,8.59,32.52,0.0037347826086956,9767.6744
2300,B,2300.0,17.9,526.74,8.51,31.67,0.0036999999999999,9428.646
2300,C,2300.0,19.23,440.76,7.61,32.05,0.0033086956521739,8475.8148


The DataFrame's `pint.dequantify` method then allows us to retrieve the units information as a header row once again.

In [19]:
df_.pint.dequantify()

Unnamed: 0_level_0,Unnamed: 1_level_0,ShaftSpeed,FlowRate,DifferentialPressure,ShaftPower,Efficiency,ShaftTorque,FluidPower
Unnamed: 0_level_1,unit,revolutions_per_minute,meter ** 3 / hour,kilopascal,kilowatt,dimensionless,kilowatt / revolutions_per_minute,kilopascal * meter ** 3 / hour
ShaftSpeedIndex,pump,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
1200,A,1200.0,8.72,162.03,1.32,30.6,0.0011,1412.9016
1200,B,1200.0,9.28,144.16,1.23,31.16,0.001025,1337.8048
1200,C,1200.0,9.31,136.47,1.18,30.7,0.000983,1270.5357
1600,A,1600.0,11.61,286.86,3.09,30.72,0.001931,3330.4446
1600,B,1600.0,12.78,241.41,2.78,31.83,0.001737,3085.2198
1600,C,1600.0,13.51,204.21,2.5,31.81,0.001563,2758.8771
2300,A,2300.0,18.32,533.17,8.59,32.52,0.003735,9767.6744
2300,B,2300.0,17.9,526.74,8.51,31.67,0.0037,9428.646
2300,C,2300.0,19.23,440.76,7.61,32.05,0.003309,8475.8148


This allows for some rather powerful abilities. For example, to change single column units

In [20]:
df_['FluidPower'] = df_['FluidPower'].pint.to("kW")
df_['FlowRate'] = df_['FlowRate'].pint.to("L/s")
df_['ShaftTorque'] = df_['ShaftTorque'].pint.to("N m")
df_.pint.dequantify()

Unnamed: 0_level_0,Unnamed: 1_level_0,ShaftSpeed,FlowRate,DifferentialPressure,ShaftPower,Efficiency,ShaftTorque,FluidPower
Unnamed: 0_level_1,unit,revolutions_per_minute,liter / second,kilopascal,kilowatt,dimensionless,meter * newton,kilowatt
ShaftSpeedIndex,pump,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
1200,A,1200.0,2.422222,162.03,1.32,30.6,10.504226,0.392473
1200,B,1200.0,2.577778,144.16,1.23,31.16,9.788029,0.371612
1200,C,1200.0,2.586111,136.47,1.18,30.7,9.390142,0.352927
1600,A,1600.0,3.225,286.86,3.09,30.72,18.442079,0.925123
1600,B,1600.0,3.55,241.41,2.78,31.83,16.591903,0.857005
1600,C,1600.0,3.752778,204.21,2.5,31.81,14.920776,0.766355
2300,A,2300.0,5.088889,533.17,8.59,32.52,35.664547,2.713243
2300,B,2300.0,4.972222,526.74,8.51,31.67,35.332397,2.619068
2300,C,2300.0,5.341667,440.76,7.61,32.05,31.595716,2.354393


The units are harder to read than they need be, so lets change pints default format for displaying units.

In [21]:
df_.pint.dequantify()

Unnamed: 0_level_0,Unnamed: 1_level_0,ShaftSpeed,FlowRate,DifferentialPressure,ShaftPower,Efficiency,ShaftTorque,FluidPower
Unnamed: 0_level_1,unit,revolutions_per_minute,liter / second,kilopascal,kilowatt,dimensionless,meter * newton,kilowatt
ShaftSpeedIndex,pump,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
1200,A,1200.0,2.422222,162.03,1.32,30.6,10.504226,0.392473
1200,B,1200.0,2.577778,144.16,1.23,31.16,9.788029,0.371612
1200,C,1200.0,2.586111,136.47,1.18,30.7,9.390142,0.352927
1600,A,1600.0,3.225,286.86,3.09,30.72,18.442079,0.925123
1600,B,1600.0,3.55,241.41,2.78,31.83,16.591903,0.857005
1600,C,1600.0,3.752778,204.21,2.5,31.81,14.920776,0.766355
2300,A,2300.0,5.088889,533.17,8.59,32.52,35.664547,2.713243
2300,B,2300.0,4.972222,526.74,8.51,31.67,35.332397,2.619068
2300,C,2300.0,5.341667,440.76,7.61,32.05,31.595716,2.354393


This allows for some rather powerful abilities. For example, to change single column units

In [22]:
df_['fluid power'] = df_['fluid power'].pint.to("kW")
df_['mech power'] = df_['mech power'].pint.to("kW")
df_.pint.dequantify()

KeyError: 'fluid power'

The units are harder to read than they need be, so lets change pints default format for displaying units.

In [None]:
pint_pandas.PintType.ureg.default_format = "~P"
df_.pint.dequantify()

or the entire table's units

In [None]:
df_.pint.to_base_units().pint.dequantify()

## Advanced example
This example shows alternative ways to use pint with pandas and other features.

Start with the same imports.

In [None]:
import pandas as pd 
import pint
import pint_pandas

We'll be use a shorthand for PintArray

In [None]:
PA_ = pint_pandas.PintArray

And set up a unit registry and quantity shorthand.

In [None]:
ureg = pint.UnitRegistry()
Q_ = ureg.Quantity

Operations between PintArrays of different unit registry will not work. We can change the unit registry that will be used in creating new PintArrays to prevent this issue.

In [None]:
pint_pandas.PintType.ureg = ureg

These are the possible ways to create a PintArray.

Note that pint[unit] must be used for the Series constuctor, whereas the PintArray constructor allows the unit string or object.

In [None]:
df = pd.DataFrame({
        "length" : pd.Series([1,2], dtype="pint[m]"),
        "width" : PA_([2,3], dtype="pint[m]"),
        "distance" : PA_([2,3], dtype="m"),
        "height" : PA_([2,3], dtype=ureg.m),
        "depth" : PA_.from_1darray_quantity(Q_([2,3],ureg.m)),
    })
df

In [None]:
df.length.values.units