# Getting started with catfish-sim

Written using version `0.2.0`.

In this bottom-up tutorial, you will learn the basics of catfish-sim to simulate an online dating environment.

You can install the package using `pip install catfish-sim`.

## Creating an agent

Before we create an agent, we need to understand preference, attribute, and strategy concepts, which constitute a significant portion of agents.

### Preference

In catfish-sim, agents have preference objects that define the attribute-specific preference of an agent. Depending on how the relevant attribute is used, different preference classes can be used. For categorical attributes, an example [CategoricalPreference](../catfish_sim.html#catfish_sim.compatibility.CategoricalPreference) object is shown below:

In [1]:
from catfish_sim.compatibility import CategoricalPreference

cat_pref = CategoricalPreference(
    preferred_values=["a", "b"],
    allowed_values=["a", "b", "c", "d"],
    preferred_score=1.25,
    nonpreferred_score=0.75,
    compatibility_weight=1,
)

print(cat_pref)

CategoricalPreference(
	preferred_values=['a', 'b'], 
	preferred_score=1.25, 
	nonpreferred_score=0.75
)


This preference suggests that the agent that carries it obtains an attribute-specific compatibility score of $1.25$ (`preferred_score`) when their candidate's attribute value is "a" or "b", because "a" and "b" are stated to be preferred. For other values, the compatibility score is $0.75$ (`nonpreferred_score`). We can see these compatibilities by calling `evaluate_attribute` with the attribute value that will be judged by the preference.

In [2]:
print("Compatibility with a:", cat_pref.evaluate_attribute("a"))
print("Compatibility with c:", cat_pref.evaluate_attribute("c"))

Compatibility with a: 1.25
Compatibility with c: 0.75


Note that categories do not need to be strings. For example, you could have other types, such as boolean or integer, that can be compared with the candidate's attribute.

For numerical attributes where the agent has a continuous preference range, an example [NumericalPreference](../catfish_sim.html#catfish_sim.compatibility.NumericalPreference) object is shown below:

In [3]:
from catfish_sim.compatibility import NumericalPreference

num_pref = NumericalPreference(
    preferred_range=[2, 4],
    allowed_range=[1, 10],
    preferred_score=1.25,
    nonpreferred_score=0.75,
    distance_sensitive=False,
    compatibility_weight=1,
    compatibility_fn=None,
)

print(num_pref)

NumericalPreference(
	preferred_range=[2, 4], 
	preferred_score=1.25, 
	nonpreferred_score=0.75, 
	distance_sensitive=False, 
	compatibility_weight=1, 
	compatibility_fn=None
)


Since `distance_sensitive` is set to `False`, the preference here denotes that a value between 2 and 4 yields a compatibility score of $1.25$, while anything outside this range yields $0.75$:


In [4]:
print(num_pref.evaluate_attribute(2.5))
print(num_pref.evaluate_attribute(1.5))

1.25
0.75


It is possible to have a distance-sensitive evaluation where the compatibility score is mapped to a value between `nonpreferred_score` and $1$ based on the difference between the evaluated value and the closest preferred value:

In [5]:
num_pref = NumericalPreference(
    preferred_range=[2, 4],
    allowed_range=[1, 10],
    preferred_score=1.25,
    nonpreferred_score=0.75,
    distance_sensitive=True,  # Distance-sensitive calculation.
    compatibility_weight=1,
    compatibility_fn=None,  # None makes the preference use the default scaling.
)

print(num_pref)
print(num_pref.evaluate_attribute(2.5))
print(num_pref.evaluate_attribute(1.5))
print(num_pref.evaluate_attribute(10))

NumericalPreference(
	preferred_range=[2, 4], 
	preferred_score=1.25, 
	nonpreferred_score=0.75, 
	distance_sensitive=True, 
	compatibility_weight=1, 
	compatibility_fn=<function Preference.__init__.<locals>.<lambda> at 0x000001E0F4BAB640>
)
1.25
0.9861111111111112
0.8333333333333334


Here we see that the difference between $1.5$ and its closest preferred value ($2$) is smaller than the difference between $10$ and its closest preferred value ($4$), so it yields a higher compatibility score. It is possible to pass a custom function as the `compatibility_fn` that takes the evaluated value to specify how compatibility is calculated.

If you need to manually specify many different compatibility scores for different attribute values, you can use [DictBasedPreference](../catfish_sim.html#catfish_sim.compatibility.DictBasedPreference) which directly uses the provided dictionary.

In [6]:
from catfish_sim.compatibility import DictBasedPreference

dict_pref = DictBasedPreference(
    compatibility_dict={"a": 1.25, "b": 1, "c": 0.9, "d": 0.75},
    default_value=1,
    compatibility_weight=1,
)

print(dict_pref)
print(dict_pref.evaluate_attribute("c"))
print(
    dict_pref.evaluate_attribute("e")
)  # e is not included in the dictionary, so the default value is returned.

DictPreference(
	compatibility_dict={'a': 1.25, 'b': 1, 'c': 0.9, 'd': 0.75}, 
	default_value=1, 
	compatibility_weight=1)
0.9
1


`compatibility_weight` of a preference is used to calculate the overall compatibility score for a given candidate using the weighted average of all attribute compatibilities. This allows a preference that is more important for the agent to be more dominant over the less important ones.

In our model, preference is modeled as a multiplier that can enhance or diminish the effect of the candidate's attractiveness. For this reason, a full compatibility is considered to have a value of 1.25 while a total incompatibility is considered to have a value of 0.75. However, you may want to have a different calculation and therefore compatibility values.

You can write a custom preference class and implement the `evaluate_attribute` method that takes the candidate value and returns the compatibility score.

For definite deal-breakers, `-math.inf` value can be used as the compatibility score. This is especially useful to prevent making impossible recommendations where the candidate would not like the judging agent. An example case is shown below (additional information is given under matchers):

In [7]:
import math

CategoricalPreference(
    preferred_values=["Female"],
    allowed_values=["Male", "Female"],
    preferred_score=1,
    nonpreferred_score=-math.inf,  # Absolutely does not want any non-femmale candidate.
)

CategoricalPreference(
	preferred_values=['Female'], 
	preferred_score=1, 
	nonpreferred_score=-inf
)

### Attribute

Agents can have an arbitrary amount of attributes that are used to calculate the compatibility. An [Attribute](../catfish_sim.html#catfish_sim.compatibility.Attribute) object has an attribute name, attribute value, and a [Preference](../catfish_sim.html#catfish_sim.compatibility.Preference) object that is tied to that attribute. For example, a heterosexual male agent who only prefers female candidates can be set to have the following gender attribute:

In [8]:
from catfish_sim.compatibility import Attribute

gender_attr = Attribute(
    name="Gender",
    value="Male",
    preference=CategoricalPreference(
        preferred_values=["Female"],
        allowed_values=["Male", "Female"],
        preferred_score=1,
        nonpreferred_score=-math.inf,
        compatibility_weight=1,
    ),
)

print(gender_attr)

Attribute(name=Gender, value=Male, preference=CategoricalPreference(
	preferred_values=['Female'], 
	preferred_score=1, 
	nonpreferred_score=-inf
))


An important detail is that each agent must have a gender attribute which affects various things such as how their attributes are sampled or how they derive utility.

### Strategy

A strategy object is used by an agent to like or pass a candidate. Currently, the following strategy classes that extend the base [Strategy](../catfish_sim.html#catfish_sim.strategies.Strategy) class exist:
*   [WeightedMinimal](../catfish_sim.html#catfish_sim.strategies.WeightedMinimal): Likes a candidate if the multiplication of the candidate's attractiveness and overall (weighted average) compatibility is equal or greater than the agent's estimated attractiveness.
*   [Adventurous](../catfish_sim.html#catfish_sim.strategies.Adventurous): Randomly likes or passes the candidate.
*   [PhysicalHomophiliac](../catfish_sim.html#catfish_sim.strategies.PhysicalHomophiliac): Likes a candidate if their attractiveness within the specified range of their own estimated attractiveness.
*   [SocialClimber](../catfish_sim.html#catfish_sim.strategies.SocialClimber): Likes a candidate whose attractiveness is greater than the agent's own estimated attractiveness.

You can check the documentation for details and write your own strategy class as well. Note that all strategy classes extend the base `Strategy` class and implement the following methods:
*   `is_interested`: This method decides whether an agent will like the candidate.
*   `match_hook`: This hook function is called when there is a match. 
*   `new_round_hook`: This hook function is called when a new round starts.

The hook functions are optional and can be used by strategies that can make use of additional information.

Let us create a `WeightedMinimal` strategy object:

In [9]:
from catfish_sim.strategies import WeightedMinimal

strat = WeightedMinimal()

Note that these strategies cannot function without an agent object, as agents pass their information to the `is_interested` method of their strategy object.

### Agent

Now that we know how to create preference, attribute, and strategy objects, we can create an [Agent](../catfish_sim.html#catfish_sim.agents.Agent) object. Each agent represents an online dating user in catfish-sim, and has the following attributes:

*   `reported_attributes`: A dictionary of `Attribute` objects. These attributes can be seen by the matchmaking algorithm and other agents. Reported attributes' preference objects are reported preferences.
*   `hidden_attributes`: A dictionary of `Attribute` objects. These attributes are only known to the agent itself. They ultimately override the reported ones (for example, when evaluating a candidate or calculating utility). Since hidden attributes' preferences are also hidden, they can be also used to make agent have hidden preferences for candidate evaluation purposes. Even if all attributes and preferences of the agent are truthfully reported, `hidden_attributes` must be still used, as attribute-related matters are handled through hidden attributes that are guaranteed to be truthful.
*   `like_allowance`: Liking budget that limits the amount of candidates an agent can like in a round. With each new round, their allowance resets to this value.
*   `strategey`: A `Strategy` object that is used to evaluate a candidate. 
*   `compatibility_calculator`: A `CompatibilityCalculator` object that is used to evaluate the weighted average compatibility of a candidate. You will most likely use the default class rather than writing your own.
*   `attractiveness`: Average perceived attractiveness of the agent, between 1.0 and 5.0, known to every other agent but the agent itself. If `None`, this value is sampled based on the agent's gender attribute in `hidden_attributes`. 
*   `estimated_attractiveness`: The self-estimated attractiveness of the agent between 1.0 and 5.0. The agent uses this value to make decisions. If `None`, this value is sampled based on the agent's `attractiveness` attribute.

Let us create our first agent who:
*   Is a 30-year-old female with a height of 165cm.
*   Strictly prefers males. 
*   Prefers their candidates who are between 175 and 190 cm. This is the most important attribute in their evaluation.
*   Prefers their candidates who are between 28 and 45 years old. This is the least important attribute in their evaluation.
*   Uses the `WeightedMinimal` strategy.
*   Can like 100 agents per round.

In [10]:
from catfish_sim.agents import Agent
from catfish_sim.compatibility import CompatibilityCalculator
from catfish_sim.strategies import WeightedMinimal
import copy

attributes = {
    "Gender": Attribute(
        name="Gender",
        value="Female",  # Agent's gender
        preference=CategoricalPreference(
            preferred_values=["Male"],
            allowed_values=["Male", "Female"],
            preferred_score=1,
            nonpreferred_score=-math.inf,
            compatibility_weight=1,
        ),
    ),
    "Age": Attribute(
        name="Age",
        value=30,  # Agent's age
        preference=NumericalPreference(
            preferred_range=[28, 45],
            allowed_range=[18, 100],  # Depends on your/modeled population's range.
            preferred_score=1.25,
            nonpreferred_score=0.75,
            distance_sensitive=True,
            compatibility_weight=0.5,  # Less important.
        ),
    ),
    "Height": Attribute(
        name="Height",
        value=165,  # Agent's height
        preference=NumericalPreference(
            preferred_range=[175, 190],
            allowed_range=[110, 250],  # Depends on your/modeled population's range.
            preferred_score=1.25,
            nonpreferred_score=0.75,
            distance_sensitive=True,
            compatibility_weight=1.5,  # More important.
        ),
    ),
}

an_agent = Agent(
    id=0,
    reported_attributes=attributes,
    hidden_attributes=copy.deepcopy(attributes),  # The agent is truthful.
    like_allowance=100,
    strategy=WeightedMinimal(),
    compatibility_calculator=CompatibilityCalculator(),
    attractiveness=None,  # Automatically sampled based on gender.
    estimated_attractiveness=None,  # Automatically sampled based on attractiveness.
)

### Sampling a population

You may want to sample an entire population of agents rather than specifying attributes and preferences. In that case, you can use the provided helper functions, which work based population data and our methods explained in our [study](https://dl.acm.org/doi/10.5555/3635637.3662956). You can read our paper and utility functions' documentation for more information.

Let us create 1000 agents with gender, age, height, and body-mass index (BMI) attributes. You can use the following code block as a starting point and make changes as you like.

In [11]:
from catfish_sim import utils


def create_random_agent(agent_id, like_allowance):
    # Based on gender distribution on Tinder.
    gender = utils.get_random_gender()

    # Based on LLCP2022 dataset, we used age group IDs (1: 18-24, 2: 25-29, 3: 30-34,
    # 4: 35-39, 5: 40-44, 6: 45-49, 7: 50-54, 8: 55-59, 9: 60-64, 10: 65-69, 11: 70-74,
    # 12: 75-79, 13: 80+) as ordinal age values.
    age = utils.sample_age_from_sex(gender)
    preferred_age_range = utils.sample_age_preference(gender, age)

    # We rounded height values for easier analysis.
    height = round(utils.sample_height_from_sex_age(gender, age))
    preferred_height_range = utils.get_height_preference(gender, height)

    # Based on LLCP2022 dataset, we used BMI group IDs (1: Underweight, 2: Normal
    # weight, 3: Overweight, 4: Obesity) as ordinal values.
    bmi = utils.sample_bmi_from_sex_age(gender, age)
    preferred_bmi_range = utils.get_bmi_preference(gender, bmi)

    # You can use different compatibility weights based on the gender as follows (or
    # completely randomize it).
    if gender == "Male":
        # Males care more about weight compatibility
        weight_preferred_score = 1.25
        weight_importance = 1.5
        height_preferred_score = 1.25
        height_importance = 1
        age_preferred_score = 1.25
        age_importance = 1
    else:  # Female
        # Females care more about height compatibility
        weight_preferred_score = 1.25
        weight_importance = 1
        height_preferred_score = 1.25
        height_importance = 1.5
        age_preferred_score = 1.25
        age_importance = 1

    reported_attributes = {
        "Gender": Attribute(
            name="Gender",
            value=gender,
            preference=CategoricalPreference(
                preferred_values=[("Female" if gender == "Male" else "Male")],
                allowed_values=["Male", "Female"],
                preferred_score=1,
                nonpreferred_score=-math.inf,
            ),
        ),
        "Age": Attribute(
            name="Age",
            value=age,
            preference=NumericalPreference(
                preferred_range=preferred_age_range,
                allowed_range=utils.LLCP2022_AGE_GROUP_RANGE,  # Based on LLCP2022.
                preferred_score=age_preferred_score,
                nonpreferred_score=0.25,
                distance_sensitive=True,
                compatibility_weight=age_importance,
            ),
        ),
        "Height": Attribute(
            name="Height",
            value=height,
            preference=NumericalPreference(
                preferred_range=preferred_height_range,
                allowed_range=utils.LLCP2022_HEIGHT_RANGE,  # Based on LLCP2022.
                preferred_score=height_preferred_score,
                nonpreferred_score=0.25,
                distance_sensitive=True,
                compatibility_weight=height_importance,
            ),
        ),
        "BMI": Attribute(
            name="BMI",
            value=bmi,
            preference=NumericalPreference(
                preferred_range=preferred_bmi_range,
                allowed_range=utils.LLCP2022_BMI_GROUP_RANGE,  # Based on LLCP2022.
                preferred_score=weight_preferred_score,
                nonpreferred_score=0.25,
                distance_sensitive=True,
                compatibility_weight=weight_importance,
            ),
        ),
    }

    hidden_attributes = copy.deepcopy(reported_attributes)

    agent = Agent(
        id=agent_id,
        reported_attributes=reported_attributes,
        hidden_attributes=hidden_attributes,
        like_allowance=like_allowance,
        strategy=WeightedMinimal(),
        compatibility_calculator=CompatibilityCalculator(),
    )

    return agent


n_agents = 1000
like_allowance = 100
dating_agents = [None] * n_agents

for i in range(n_agents):
    # Note that agent IDs must correspond to their IDs in the agent list.
    agent = create_random_agent(agent_id=i, like_allowance=like_allowance)
    dating_agents[i] = agent

## Matchmaking

### Matcher

Now that we have our agents ready, we can create the matchmaking system (also referred to as "matcher") that will recommend agents to each other. There are different kinds of matchers:
*   [RandomAgentMatcher](../catfish_sim.html#catfish_sim.matchers.RandomAgentMatcher): Makes random recommendations.
*   [PreferentialAgentMatcher](../catfish_sim.html#catfish_sim.matchers.PreferentialAgentMatcher): Sorts and recommends agents based on their compatibility, which is calculated using reported attributes and preferences.
*   [RankedAgentMatcher](../catfish_sim.html#catfish_sim.matchers.RankedAgentMatcher): Uses an Elo-like rating system based on agents being liked/passed by other agents and make recommendations based on ratings.

Different matchers have different parameters and additional details that are not mentioned here. Please check the documentation for more information.

Let us create a `PreferentialAgentMatcher` object with the agent population we have just created:

In [12]:
from catfish_sim.matchers import PreferentialAgentMatcher

matcher = PreferentialAgentMatcher(
    agents=dating_agents,
    recommendation_limit=200,  # See below for more information.
    compatibility_calculator=CompatibilityCalculator(),
    judger_weight=0.99,  # See below for more information.
    logging=True,  # This logs agent states after each round ends.
    recalculate=False,  # See below for more information.
)

This matcher's `recommendation_limit` is set to 200, which means each agent is provided 200 candidates at maximum for every round. `recommendation_limit` should be equal or greater than the agent's `like_allowance`. Otherwise, agents cannot properly use their like budget.

`judger_weight` is used to calculate the weighted average compatibility between two agents. If it is set to $1$, the compatibility-based sorting only considers the judging agent who evaluates the candidates. Otherwise, the evaluated candidates' perspectives are also considered with a weight of `1 - judger_weight`. Using a `judger_weight` value smaller than $1$ is useful to prevent impossible recommendations where the judged agent is already known to have a deal-breaker which would prevent being matched with the judging agent. For example, a *reportedly* heterosexual male with a `-math.inf` compatibility value for male candidates would never be shown to a homosexual male agent although the juding agent (homosexual male) could like the candidate.

`recalculate` toggles recalculating preferences and therefore recommendation priorities. It must be set to True if agents can change their attributes or preferences during simulation. However, this is computationally expensive. If when agents change their attributes/preference is known and they do not change it every round, setting this to False and calling `PreferentialAgentMatcher.generate_recommendation_priorities()` before the recommendations is a better approach.

### Running a simulation

We can now run our simulation in a loop:

In [13]:
n_rounds = 10

for i in range(n_rounds):  # You can use the tqdm package here to track the progress.
    matcher.run_new_round()

Once your simulation is complete (or during the simulation), you can retrieve agent objects' attributes either using your agent list or `matcher.agents`. Let us retrieve them for an agent:

In [14]:
reported_agent = matcher.agents[10]

print("Agent's reported attributes:", reported_agent.reported_attributes)
print("Agent's attractiveness:", reported_agent.attractiveness)
print("Agent's estimated attractiveness:", reported_agent.estimated_attractiveness)
print("Agent's match count:", reported_agent.match_count)
print("Agent's happiness:", reported_agent.happiness)

Agent's reported attributes: {'Gender': Attribute(name=Gender, value=Female, preference=CategoricalPreference(
	preferred_values=['Male'], 
	preferred_score=1, 
	nonpreferred_score=-inf
)), 'Age': Attribute(name=Age, value=5, preference=NumericalPreference(
	preferred_range=[4, 6], 
	preferred_score=1.25, 
	nonpreferred_score=0.25, 
	distance_sensitive=True, 
	compatibility_weight=1, 
	compatibility_fn=<function Preference.__init__.<locals>.<lambda> at 0x000001E0B5677BE0>
)), 'Height': Attribute(name=Height, value=167, preference=NumericalPreference(
	preferred_range=[172, 217], 
	preferred_score=1.25, 
	nonpreferred_score=0.25, 
	distance_sensitive=True, 
	compatibility_weight=1.5, 
	compatibility_fn=<function Preference.__init__.<locals>.<lambda> at 0x000001E0B5677910>
)), 'BMI': Attribute(name=BMI, value=3, preference=NumericalPreference(
	preferred_range=[2, 4], 
	preferred_score=1.25, 
	nonpreferred_score=0.25, 
	distance_sensitive=True, 
	compatibility_weight=1, 
	compatibility_f

You can put all agents' attributes and results into a pandas DataFrame for analysis. Also, if logging was enabled before the simulation, it is possible to retrieve the past states of an agent as follows:

In [15]:
# Retrieves all rounds and variables (match_count and happiness):
print(reported_agent.get_logs())

# This retrieves the happiness value for the third round's ending. Log ID is 1-based and
# indicates the round number.
print(reported_agent.get_logs(log_id=3, variables=["happiness"]))

{'match_count': {1: 13, 2: 15, 3: 17, 4: 18, 5: 18, 6: 18, 7: 18, 8: 18, 9: 18, 10: 18}, 'happiness': {1: 47.582190551992035, 2: 54.635216687050644, 3: 61.53679236021338, 4: 64.89399280009835, 5: 64.89399280009835, 6: 64.89399280009835, 7: 64.89399280009835, 8: 64.89399280009835, 9: 64.89399280009835, 10: 64.89399280009835}}
{'happiness': 61.53679236021338}


### Behind the curtain

When you run a simulation round as shown above using `PreferentialAgentMatcher.run_new_round()`, the most important operations that take place in the background are as follows:

*   `PreferentialAgentMatcher` increments its round counter and prepares itself for a new round.
*   If recalculation is enabled, `PreferentialAgentMatcher` recalculates the compatibilities and therefore recommendation priorities for agents.
*   For each `Agent` object in the matcher:
    *   `Agent` is informed about the new round.
    *   Fresh candidates that were not previously evaluated by the agent are identified.
    *   These new candidates' public details (ID, attractiveness, reported attributes) are provided to the agent.
    *   Agent uses their strategy object to like or pass each candidate in the order they were provided. These likes and passes are recorded.
* Round likes and passes are processed. New reciprocal likes are detected (it is possible for an agent to like another agent and get liked many rounds later) and the agents are informed.
*   If logging is enabled, all agents' states are logged.