category is an abstract structure that contains objects and the morphisms (relationships between objects)
a category C can thus be defined as:
    - Obj(C): class or collections of objects 
    - Hom(A, B): for any pair of objects A and in Obj(C), a set of morphisms, 
            where each morphism f in Hom(A, B) is an rule of assignemnt (process or transformation) from A to B (f: A -> B), A is source or domain, B is target or codomain

identity morphism (id_A: A -> A): for every object A in a category C identity morphism acts as a do nothing operation, so that for any morphism f: A -> B
                                (f o id_A) = f;  (id_B o f) = f

composition: if f and g are morphisms with f: A -> B and g: B -> C, then (g o f) : A -> C
associativity: morphisms can be composed in any segments or parts; (f o g) o h = f o (g o h)

category of set

category of sets:
    - objects are sets
    - morphisms are functions between sets
    - identity morphism is an identity function (input is the same as output)

in the context of data:
    - object can be regarded as an entity that contains data and morphisms as operations (filtering, scalring, feature extraction)
    - associativity of composition ensures that when applying a series of transformation the order in which the operations are grouped doesn't matter
    - breaking data into composable pieces helps in clearly identity how operations transform the data
    

In [16]:
# sample implementation
import numpy as np
from sklearn.datasets import load_iris 
from functools import reduce
from typing import Callable

In [17]:
iris_data = load_iris().data

# a morphism that scales the entire dataset
scale_data: Callable[[np.ndarray, float], np.ndarray] = lambda x, y: x*y # x: data, y: scaling factor

# a morphism to add constant value
constant_add: Callable[[np.ndarray, float], np.ndarray] = lambda x, y: x + y # x: data, y: scaling factor

# identity morphism
identity_morph: Callable[[np.ndarray], np.ndarray] = lambda x: x # returns the same input 

# compose morphisms f, g; f: A -> B, g: B -> C, (g o f)(x) = g(f(x))
compose_morphism = lambda f, g: lambda x: g(f(x)) 


# compose multiple or a sequence of morphisms/functions

compose_all = lambda *funcs: reduce(

    lambda f, g:lambda x: g(f(x)),
    funcs,
    identity_morph
)

scaled_iris_data = scale_data(iris_data, 2)



In [None]:
# verifying identity function applied to data produces a result that is elementwise equal to the original data
# np.closeall: method that compares two arrays or array like objects and returns True if equal 
assert np.allclose(identity_morph(iris_data), iris_data)