In [4]:
from physipandas import QuantityDtype
from physipy import m

# Introduction to QuantityDtype
The QuantityDtype is a parametrizable dtype. It can be created from the following ways:

In [6]:
print(QuantityDtype()) # from nothing: dimension-less quantity dtype
print(QuantityDtype("physipy[]"))  # from a string formatted as 'physipy[X]', 
print(QuantityDtype("physipy[m]")) # where X is the str representation of a unit in physipy.units
print(QuantityDtype(m)) # from a Quantity, in which case only the associated dimension will be used
print(QuantityDtype(2.345*m)) # from a Quantity, in which case only the associated dimension will be used

physipy[]
physipy[]
physipy[m]
physipy[m]
physipy[m]


Behind the scene, a QuantityDtype just uses a `.unit` attribute that is a Quantity,object, but only the dimension of that quantity is used:

In [9]:
QuantityDtype(2.345*m).unit

AttributeError: 'dict' object has no attribute 'dimension'

<Quantity : 2.345 m, symbol=m*UndefinedSymbol>

In [1]:
# alone works
from physipy import m
print(2.345*m)
2.345*m
# if the following is executed, fails
from physipandas import QuantityDtype
from physipy import m
print(QuantityDtype()) # from nothing: dimension-less quantity dtype
print(QuantityDtype("physipy[]"))  # from a string formatted as 'physipy[X]', 
print(QuantityDtype("physipy[m]")) # where X is the str representation of a unit in physipy.units
print(QuantityDtype(m)) # from a Quantity, in which case only the associated dimension will be used
print(QuantityDtype(2.345*m)) # from a Quantity, in which case only the associated dimension will be used
print(2.345*m)
2.345*m


2.345 m
physipy[]
physipy[]
physipy[m]
physipy[m]
physipy[m]
2.345 m


AttributeError: 'dict' object has no attribute 'dimension'

<Quantity : 2.345 m, symbol=m*UndefinedSymbol>

# Writing an extension Dtype for pandas

## Subclass from ExtensionDtype
We subclass from https://github.com/pandas-dev/pandas/blob/06d230151e6f18fdb8139d09abf539867a8cd481/pandas/core/dtypes/base.py#L39.  
The base class describes most of the needed internals.

In [4]:
qd = QuantityDtype()

## Mandatory methods
The interface includes the following abstract methods that must be implemented by subclasses:
 - `type` : The scalar type for the array, it’s expected `ExtensionArray[item]` returns an instance of ExtensionDtype.type for scalar item : hence we use Quantity, for example : `QuantityArray(..)[3] -> Quantity`
 - `name`: What to print below the content of the following call `df["quanti"].values` or `df["quanti"].dtype`, hence we use for example `physipy[m]` when the dimension of the dtype is `Dimension('L')`
 - `construct_array_type`: Return the array type associated with this dtype, in our case `QuantityArray`

In [20]:
print(qd.type)
print(qd.name)
print(qd.construct_array_type())

<class 'physipy.quantity.quantity.Quantity'>
physipy[]
<class 'physipandas.extension.QuantityArray'>


## Optionnal methods

The following attributes and methods can be overloaded, and  influence the behavior of the dtype in pandas operations
 - [X] : `_is_numeric` : returns True for now, but should it since we are not a plain number ?
 - [ ] : `_is_boolean` : returns False by inheritance of ExtensionDtype
 - [ ] : `_get_common_dtype(dtypes)`

In [25]:
print(qd._is_numeric)
print(qd._is_boolean)
print(qd._get_common_dtype([qd, float]))

True
False
None


The na_value class attribute can be used to set the default NA value for this type. numpy.nan is used by default.
 - [X] : we overide this with `na_value = Quantity(np.nan, self.dimension)`

In [29]:
print(qd.na_value, type(qd.na_value))
qdtype_m = QuantityDtype(m)
print(qdtype_m.na_value, type(qdtype_m.na_value))

nan <class 'physipy.quantity.quantity.Quantity'>
nan m <class 'physipy.quantity.quantity.Quantity'>


## Hashability
ExtensionDtypes are required to be hashable. The base class provides a default implementation, which relies on the `_metadata` class attribute. `_metadata` should be a tuple containing the strings that define your data type. For example, with `PeriodDtype` that’s the `freq` attribute.
If you have a parametrized dtype you should set the `_metadata` class property. Ideally, the attributes in `_metadata` will match the parameters to your `ExtensionDtype.__init__` (if any). If any of the attributes in `_metadata` don’t implement the standard `__eq__` or `__hash__`, the default implementations here will not work.
- [X] : `_metadata` : `QuantityDtype` are parametrized by a physical quantity, so we rely on the hash of the quantity to hash the Dtype.

In [31]:
print(qd._metadata) # tuple of strings of attributes for hash
print(hash(qd))

('unit',)
3187536572


 - [X] : `construct_from_string(string)` : Construct this type from a string. See [the doc of ExtensionDtype.construct_from_string]( https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.api.extensions.ExtensionDtype.construct_from_string.html#pandas.api.extensions.ExtensionDtype.construct_from_string): this is useful mainly for data types that accept parameters. For example, a `period` dtype accepts a frequency parameter that can be set as `period[H]` (where H means hourly frequency). In our case we use a string parsing of the style `physipy[m]` for meter.
 - [ ] : `is_dtype(dtype)` : Check if we match ‘dtype’. For now we use the default behaviour given [here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.api.extensions.ExtensionDtype.is_dtype.html#pandas.api.extensions.ExtensionDtype.is_dtype).

In [33]:
print(QuantityDtype.construct_array_type())
print(QuantityDtype().construct_array_type())
print(QuantityDtype('physipy[m]').construct_array_type()) # should this return a dtyped array ?

<class 'physipandas.extension.QuantityArray'>
<class 'physipandas.extension.QuantityArray'>
<class 'physipandas.extension.QuantityArray'>


In [35]:
print(QuantityDtype.construct_from_string("physipy[m]"))
print(qd.construct_from_string("physipy[m]"))
print(qd.construct_from_string("physipy[m**2]"))

physipy[m]
physipy[m]
physipy[m**2]


In [36]:
print(qd.is_dtype(QuantityDtype()))
print(qd.is_dtype(QuantityDtype(m)))
print(qd.is_dtype(m))

True
True
False
