## Verbal Theory
<span style="font-variant: small-caps;">Categorization</span> (Informal)

*Input:* A set of objects.

*Output:* A categorization of the objects such that within-group similarity and between-group dissimilarity are maximized.


## Formal Model

Assumptions:
- dissimilarity is not inverse similarity
- (dis)similarity is binary (between two objects)
- maximizing both similarity and dissimilarity is not possible, allow for tradeof and maximize the sum

<span style="font-variant: small-caps;">Categorization</span>

*Input:* A set of objects $O$, a similarity function $sim: O\times O \rightarrow \mathbb{N}^0$, and a dissimilarity function $dis: O\times O \rightarrow \mathbb{N}^0$.

*Output:* A partitioning $C$ of $O$ such that maximizes the sum of within-categority similarities and between-category similarities:

$$\left(\sum_{c\in C} sim(c)\right) + \left(\sum_{c, d \in C} dis(c, d)\right)$$

Here, the within-categority similarity of $c\in C$ be defined as:
$$sim(c) = \sum_{o_i, o_j \in c}sim(o_i, o_j)$$
and the between-category dissimilarity of two categories $c$ and $d$ is defined as:
$$dis(c, d) = \sum_{o_c\in c\\ o_d \in d} dis(o_c, o_d)$$

A parition $C$ of $O$ is a set of sets such that all element in $O$ are part of at least one partition (set):
$$\forall_{o \in O}\exists_{c\in C} o \in c$$ 
and no object is part of two (or more) categories:
$$\forall_{c, d \in C} c \cap d = \varnothing$$

In [None]:
import $ivy.`com.markblokpoel::mathlib:0.8.1`
import mathlib.set.SetTheory._

In [None]:
// Objects are a String

def categorization(
    objects: Set[String],
    sim: (String, String) => Double,
    dis: (String, String) => Double
): Set[Set[String]] = {
    def simCat(c: Set[String]): Double = 
        sum(c.pairs.map(sim.tupled))
    
    def disCat(cd: (Set[String], Set[String])): Double = {
        val (c, d) = cd
        sum((c x d).map(dis.tupled))
    }
    
    def score(partition: Set[Set[String]]): Double =
        sum(partition.map(simCat _)) + 
        sum((partition x partition).map(disCat _))
    
    objects.allPartitions
    .argMax(score).random.get
}

In [None]:
val t1 = "Orange"
val t2 = "Apple"
val t3 = "Tangerine"
val t4 = "Tomato"
val t5 = "Bean"

val things = Set(t1, t2, t3, t4, t5)


implicit class ImplMap[A, B](map: Map[A, B]) {
    def function: A => B = (a: A) => {
        map(a)
    }
}


val simMap = Map(
    (t1, t1) -> 1.0,
    (t1, t2) -> .6,
    (t1, t3) -> .8,
    (t1, t4) -> .3,
    (t1, t5) -> .2,
    (t2, t1) -> .6,
    (t2, t2) -> 1.0,
    (t2, t3) -> .5,
    (t2, t4) -> .4,
    (t2, t5) -> .4,
    (t3, t1) -> .8,
    (t3, t2) -> .5,
    (t3, t3) -> 1.0,
    (t3, t4) -> .5,
    (t3, t5) -> .2,
    (t4, t1) -> .3,
    (t4, t2) -> .4,
    (t4, t3) -> .5,
    (t4, t4) -> 1.0,
    (t4, t5) -> .7,
    (t5, t1) -> .2,
    (t5, t2) -> .4,
    (t5, t3) -> .2,
    (t5, t4) -> .7,
    (t5, t5) -> 1.0
)

def sim(a: String, b: String) = simMap(a, b)

def dis(a: String, b: String) = 1 - sim(a, b)

categorization(things, sim, dis)