# physipandas series accessor

In [2]:
# create series
import pandas as pd
from physipy import m
import numpy as np
from physipandas import QuantityDtype, QuantityArray

c = pd.Series(QuantityArray(np.arange(10)*m), 
              dtype=QuantityDtype(m))
c

0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
dtype: physipy[m]

Without using the accessors, there are already 3 levels to access the data : 
 - as a pandas.Series
 - as a physipandas.QuantityArray using `c.values`
 - as a physipy.Quantity using `c.values.quantity`

In [3]:
print(type(c))
print(type(c.values))
print(type(c.values.quantity))

<class 'pandas.core.series.Series'>
<class 'physipandas.extension.QuantityArray'>
<class 'physipy.quantity.quantity.Quantity'>


Now using the physipy accessor of the Series allows to access several methods and attributes

In [4]:
print(f"Physipy dimension : {c.physipy.dimension}")
print(f"Physipy SI unitary quantity : {c.physipy._SI_unitary_quantity}")

Physipy dimension : L
Physipy SI unitary quantity : 1 m


For now it doesn't do much, but lots of attributes or methods could be provided, see the delegated methods and attributes in pint for example.

For comparison, an example with categorical extension : 

In [5]:
cs = pd.Series(["a", "b", "c"], dtype=pd.CategoricalDtype(["a", "b", "c"]))
print(cs)
print(type(cs.values))
print(type(cs.cat.codes))

0    a
1    b
2    c
dtype: category
Categories (3, object): ['a', 'b', 'c']
<class 'pandas.core.arrays.categorical.Categorical'>
<class 'pandas.core.series.Series'>


# Ressources on Series accessors

- Pandas string accessor definition : https://github.com/pandas-dev/pandas/blob/ad190575aa75962d2d0eade2de81a5fe5a2e285b/pandas/core/strings/accessor.py#L143

 - The doc for Pint is https://pint.readthedocs.io/en/0.10.1/pint-pandas.html. 
 - The source code for pint is at https://github.com/hgrecco/pint-pandas/blob/cf527e48557a1e028c6f2d4e628aa7a6cd1b30d4/pint_pandas/pint_array.py#L851
 - The source code for cyberpandas is at https://github.com/ContinuumIO/cyberpandas/blob/dbce13f94a75145a59d7a7981a8a07571a2e5eb6/cyberpandas/ip_array.py#L667
 - The source code for pdvega is at https://github.com/altair-viz/pdvega/blob/e3f1fc9730f8cd9ad70e7ba0f0a557f41279839a/pdvega/_core.py#L58
 - realpython intro to standard accessors : https://realpython.com/python-pandas-tricks/#3-take-advantage-of-accessor-methods

- List of custom accessors : https://pandas.pydata.org/docs/ecosystem.html#accessors
- Pandzs has builtin accessors for various dtype like datetime as dt or string as str. See the intro at :  https://pandas.pydata.org/pandas-docs/stable/reference/series.html#accessors

- Pandas intro to registering accessors : https://pandas.pydata.org/pandas-docs/stable/development/extending.html#registering-custom-accessors

- Doc for `pandas.api.extensions.register_series_accessor` : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.api.extensions.register_series_accessor.html#pandas.api.extensions.register_series_accessor. This is used as : 

```python
from pandas.api.extensions import register_series_accessor

@register_series_accessor("physipy")
class PhysipySeriesAccessor(object):
    #...
```

You can see the available accesors using : 


In [6]:
print(pd.Series._accessors)

{'cat', 'dt', 'str', 'physipy', 'sparse'}


# Registering custom accessors

The acessors allow to add features/methods to regular dataframe : IT DOES NOT SUBCLASS OR WRAP Dataframe : they are still dataframe.

https://pandas.pydata.org/pandas-docs/stable/development/extending.html#registering-custom-accessors


 - https://towardsdatascience.com/pandas-dtype-specific-operations-accessors-c749bafb30a4
 - https://pandas.pydata.org/pandas-docs/stable/user_guide/basics.html#dt-accessor
 - https://towardsdatascience.com/ready-the-easy-way-to-extend-pandas-api-dcf4f6612615
 - https://pandas.pydata.org/pandas-docs/stable/reference/series.html#string-handling
 - https://pandas.pydata.org/pandas-docs/stable/reference/series.html#accessors
 - https://realpython.com/python-pandas-tricks/#3-take-advantage-of-accessor-methods
 - https://github.com/pandas-dev/pandas/blob/3e4839301fc2927646889b194c9eb41c62b76bda/pandas/core/arrays/categorical.py#L2356
 - https://github.com/pandas-dev/pandas/blob/3e4839301fc2927646889b194c9eb41c62b76bda/pandas/core/strings.py#L1766
 - https://github.com/hgrecco/pint-pandas/blob/master/pint_pandas/pint_array.py


In [7]:
import pandas as pd
import numpy as np

@pd.api.extensions.register_dataframe_accessor("geo")
class GeoAccessor:
    def __init__(self, pandas_obj):
        self._validate(pandas_obj)
        self._obj = pandas_obj

    @staticmethod
    def _validate(obj):
        # verify there is a column latitude and a column longitude
        if "latitude" not in obj.columns or "longitude" not in obj.columns:
            raise AttributeError("Must have 'latitude' and 'longitude'.")

    @property
    def center(self):
        # return the geographic center point of this DataFrame
        lat = self._obj.latitude
        lon = self._obj.longitude
        return (float(lon.mean()), float(lat.mean()))

    def plot(self):
        # plot this array's data on a map, e.g., using Cartopy
        print(self.center)

In [8]:
ds = pd.DataFrame(
    {"longitude": np.linspace(0, 10), "latitude": np.linspace(0, 20)}
)
print(ds.geo.center)

ds.geo.plot()
# plots data on a map


(5.0, 10.0)
(5.0, 10.0)


# Physipy series accessor

In [9]:
import pandas as pd
import numpy as np
from physipy import m
from physipandas import QuantityDtype, QuantityArray

c = pd.Series(QuantityArray(np.arange(10)*m), 
              dtype=QuantityDtype(m))
print(type(c))

print("-------- Use the physipy accessor")
try:
    print(c.dimension)
except Exception as e:
    print("Raised ", e)
    print(c.physipy.dimension)
    
try:
    print(c._SI_unitary_quantity)
except Exception as e:
    print("Raised ", e)
    print(c.physipy._SI_unitary_quantity)
    
try:
    print(c.mean())
except Exception as e:
    print("Raised ", e)
    print(c.physipy.values.mean())
    
c.physipy.values.is_length()

<class 'pandas.core.series.Series'>
-------- Use the physipy accessor
Raised  'Series' object has no attribute 'dimension'
L
Raised  'Series' object has no attribute '_SI_unitary_quantity'
1 m
4.5 m


True

In [10]:
type(c.physipy.values) == type(np.arange(10)*m)

True

In [11]:
print(c.values, type(c), type(c.values))

<QuantityArray>
[0 m, 1 m, 2 m, 3 m, 4 m, 5 m, 6 m, 7 m, 8 m, 9 m]
Length: 10, dtype: physipy[m] <class 'pandas.core.series.Series'> <class 'physipandas.extension.QuantityArray'>


In [12]:
print(len(c))
print(c.shape)
print(c.ndim)

10
(10,)
1


In [13]:
arr = pd.Series(np.arange(10))
print(len(arr))
print(arr.shape)
print(arr.ndim)

10
(10,)
1
