In [3]:
from physipandas import QuantityDtype
from physipy import m

# Writing an extension Dtype for pandas

We subclass from https://github.com/pandas-dev/pandas/blob/06d230151e6f18fdb8139d09abf539867a8cd481/pandas/core/dtypes/base.py#L39
The base class describes most of the needed internals.

In [4]:
qd = QuantityDtype()

The interface includes the following abstract methods that must be implemented by subclasses:

    type

    name

    construct_array_type


In [5]:
print(qd.type) # test OK
print(qd.name)
print(qd.construct_array_type())

<class 'physipy.quantity.quantity.Quantity'>
physipy[]
<class 'physipandas.extension.QuantityArray'>


The following attributes and methods influence the behavior of the dtype in pandas operations
 - [X] : _is_numeric : returns True for now, but should it since we are not a plain number ?
 - [X] : _is_boolean : returns False by inheritance of ExtensionDtype
 - [ ] : _get_common_dtype

In [7]:
print(qd._is_numeric)
print(qd._is_boolean)
#print(qd._get_common_dtype())

True
False


The na_value class attribute can be used to set the default NA value for this type. numpy.nan is used by default.
 - [X] : we overide this with na_value = Quantity(np.nan, Dimension(None))

In [9]:
print(qd.na_value, type(qd.na_value))

nan <class 'physipy.quantity.quantity.Quantity'>


ExtensionDtypes are required to be hashable. The base class provides a default implementation,
which relies on the _metadata class attribute. _metadata should be a tuple containing the 
strings that define your data type. For example, with PeriodDtype that’s the freq attribute.
If you have a parametrized dtype you should set the ``_metadata`` class property.
Ideally, the attributes in _metadata will match the parameters to your ExtensionDtype.__init__ 
(if any). If any of the attributes in _metadata don’t implement the standard __eq__ or __hash__, 
the default implementations here will not work.
- [X] : _metadata : QuantityDtype are parametrized by a physical quantity, so we rely on the hash of the
quantity to hash the Dtype.

In [10]:
print(qd._metadata) # tuple of strings of attributes for hash

('unit',)


In [11]:
print(hash(qd))

3187536572


Methods
 - [X] : construct_array_type() : Return the array type associated with this dtype : QuantityArray
 - [X] : construct_from_string(string) : Construct this type from a string. See [the doc of ExtensionDtype.construct_from_string]( https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.api.extensions.ExtensionDtype.construct_from_string.html#pandas.api.extensions.ExtensionDtype.construct_from_string.)
 
```
Construct this type from a string.

This is useful mainly for data types that accept parameters. For example, a period dtype accepts a frequency parameter that can be set as period[H] (where H means hourly frequency).
```
For this we use a string parsing of the style `physipy[m]` for meter.
 - [X] : is_dtype(dtype) : Check if we match ‘dtype’. For now we use the default behaviour given [here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.api.extensions.ExtensionDtype.is_dtype.html#pandas.api.extensions.ExtensionDtype.is_dtype).

In [14]:
print(QuantityDtype.construct_array_type())
print(QuantityDtype().construct_array_type())
print(QuantityDtype('physipy[m]').construct_array_type())

<class 'physipandas.extension.QuantityArray'>
<class 'physipandas.extension.QuantityArray'>
<class 'physipandas.extension.QuantityArray'>


In [17]:
print(QuantityDtype.construct_from_string("physipy[m]"))
print(qd.construct_from_string("physipy[m]"))
print(qd.construct_from_string("physipy[m**2]"))

physipy[m]
physipy[m]
physipy[m**2]


In [18]:
print(qd.is_dtype(QuantityDtype()))
print(qd.is_dtype(QuantityDtype(m)))
print(qd.is_dtype(m))

True
True
False


In [19]:
print(qd.names)
print(qd.type)

None
<class 'physipy.quantity.quantity.Quantity'>
