## Favorable Investment Detection
In the [previous article](https://diogenesanalytics.com/blog/2024/05/19/optimal-options), the following equation was defined and used to calculate the *net expected utility*:

$$
E(n) = (1 - p^n)U - nC
$$

In the article, a lottery example was used to demonstrate the *equation* and show objectively which scenario was more advantageous. But this required a much more *intensive computation*, all to simply show that the second scenario (i.e. the *"real"* lottery) was completely **unfavorable**. Is there a better way to determine if some *investment* is favorable *before* calculating out the *optimal options* number $n$?

## The Ratio
Consider the following problem: you have a series of investments with known $U$, $C$, and $p$ values, which ones do you choose to further investigate the *optimal options number* $n$? When this collection of investments is *small* you can use the original *net expected utility* equation, but what if you have a *non-trivial* amount of data? Consider the following collection of investments that require some *high-level* decision on which are more favorable:

In [None]:
# libs
import random
from typing import Generator
from typing import Tuple

# seed rng
random.seed(42)

# select random number from partitions
def gen_partition_ranges(start: int, end: int, partitions: int) -> Generator[Tuple[float, float], None, None]:
    """Generate sub-ranges from partitions of a given range."""    
    # calculate partition sizes with a shifted distribution
    partition_sizes = [((i + 1) ** 2) for i in range(partitions)]
    total_partition_size = sum(partition_sizes)

    # adjust partition sizes to fit the range
    normal_partition_sizes = [size / total_partition_size * (end - start) for size in partition_sizes]

    # begin iterating over partitions
    for partition, partition_size in enumerate(normal_partition_sizes):
        # calculate partition bounds
        partition_start = start + partition * partition_size
        partition_end = partition_start + partition_size

        # generate i-th partition range
        yield partition_start, partition_end

def gen_random_partition(start: int, end: int, partitions: int) -> Generator[Tuple[float, float], None, None]:
    """Generate random numbers from partitions of a given range."""
    # loop over partition
    for start, end in gen_partition_ranges(start, end, partitions):
        # get random float
        yield random.uniform(start, end)

# func for U/C/p pairs
def gen_investments(partitions: int) -> Generator[Tuple[float, float, float], None, None]:
    """Generate pairs of U/C/p values."""    
    # get random partition samples for U/C
    uc_random_samples = list(gen_random_partition(1, 1e7, partitions))
    
    # loop until U > C
    while True:        
        # generate U and C as positive non-zero integers
        U = int(random.choice(uc_random_samples))
        C = int(random.choice(uc_random_samples))
        
        # termination condition
        if U > C:
            break

    # get random partition samples for p
    p_random_samples = list(gen_random_partition(1e-6, 1 - 1e-6, partitions))
    
    # generate 0 < p < 1
    p = random.choice(p_random_samples)

    # check if greater than 1
    p = p if p < 1 else 1 / p

    # get U/C/p pair
    return U, C, p

In [None]:
# get libs
import pandas as pd

# generate example investments
example_investments = [gen_investments(1000) for _ in range(20)]

# create table
example_investments_table = pd.DataFrame(example_investments, columns=["U", "C", "p"])

# set index name
example_investments_table.index.name = "id"

# display
print(example_investments_table.to_string(index=False))

Looking at the above list of *investment data* can you tell which are more favorable? Should we go through the process of evaluating the *net expected utility* equation for each pair of $U$, $C$, and $p$ values? No we should not. Instead we should apply the following equation:

$$
\frac{(1 - p)U}{C}
$$

The above equation is a type of [risk/reward ratio](https://www.investopedia.com/terms/r/riskrewardratio.asp) (in this case *reward*/*risk*) and it can be used to get a sense of when a *potential investment* is favorable (or not).

## Return of the Lottery
Now let us apply the previous *value/cost ratio* to the lottery data from the [previous article](https://diogenesanalytics.com/blog/2024/05/19/optimal-options):

In [None]:
# value/cost ratio func
def value_cost_ratio(U: float, C: float, p: float) -> float:
    """Calculates the value/cost ratio."""
    return ((1 - p) * U) / C

In [None]:
# get plotting libs
import matplotlib.pyplot as plt

# set the style to a dark theme
plt.style.use("dark_background")

# match website background
plt.rcParams["figure.facecolor"] = "#181818"
plt.rcParams["axes.facecolor"] = "#181818"
plt.rcParams["axes.edgecolor"] = "#181818"

# calculate p_l
p_lose = 1.0 - (1.0 / (2.55 * 1e6))

# calculate heights for bar graph
heights = [
    value_cost_ratio(20 * 1e6, 2, p_lose),
    value_cost_ratio(20 * 1e6, 100, p_lose)
]

# plot bar data
plt.bar(["Ideal", "Real"], heights, color=plt.cm.inferno(0.50), alpha=0.50)

# plot threshold
plt.axhline(y=1, color=plt.cm.magma(0.90), linestyle="--", label="Advantage Threshold")

# add titles and labels
plt.ylabel("Value/Cost Ratio")
plt.suptitle(
    "Figure 1. Lottery Scenarios Value/Cost Ratios", y=0.0001, fontsize=10
)

# displaying the plot
plt.legend()
plt.show()

From *figure 1* it seems like the *threshold* for an investment to become *favorable* is that the *value/cost* ratio must exceed $1$:

$$
\frac{(1 - p)U}{C} > 1
$$

This would make sense, because this corresponds to the following equation:

$$
(1 - p)U = C
$$

Which is to say that the *value* and *cost* terms are *equal* and when evaluated normally:

$$
E(n) = (1 - p)U - C = 0
$$

So in this case your $E(n) = 0$, i.e. you will not be **winning**... but at least you will not be **losing** (i.e. $E(n) < 0$).

## Application
Finally we can apply the *value/cost* ratio to our *example investment data* and see which are favorable:

In [None]:
# add new colum for value/cost ratio
example_investments_table["vc_ratio"] = example_investments_table.apply(
    lambda row: value_cost_ratio(row["U"], row["C"], row["p"]),
    axis=1
)

# plot bar data
plt.bar(
    example_investments_table.index.values,
    example_investments_table["vc_ratio"].values,
    color=plt.cm.inferno(0.50),
    alpha=0.50
)

# setup x/y scales
plt.xticks(
    ticks=example_investments_table.index.values,
    labels=example_investments_table.index.astype(int),
)
plt.yscale("log")

# plot threshold
plt.axhline(y=1, color=plt.cm.magma(0.90), linestyle="--", label="Advantage Threshold")

# add titles and labels
plt.xlabel("Investment ID")
plt.ylabel("Log Transformed Value/Cost Ratio")
plt.suptitle(
    "Figure 2. Synthetic Investment Data Value/Cost Ratios", y=0.0001, fontsize=10
)

# displaying the plot
plt.legend()
plt.show()

Looking at *figure 2* we can **easily** discern which investments are *not favorable*, which are *barely favorable*, which are a *little favorable*, and finally which investments are *reasonably* and *significantly favorable*. But let us now actually apply the original *net expected utility* and *reaffirm* that our method works by providing some *additional evidence*.

## Bonus Round
Now let us see how accurate our little *value/cost* ratio actually is. We will plot the *optimal options* of the investment data, by grouping them based on their *value/cost* ratio ($\text{vcr}$) as follows:
+ `not favorable` ($\text{vcr} < 1$)
+ `barely favorable` ($1 < \text{vcr} < 10$)
+ `little favorable` ($10 < \text{vcr} < 100$)
+ `reasonably favorable` ($100 < \text{vcr} < 5000$)
+ `significantly favorable` ($\text{vcr} > 5000$)

In [None]:
# setup E(n) function
def expected_utility(u: float, c: float, p: float, n: int) -> float:
    """Calculates the expected utility."""
    return ((1 - p**n) * u) - (n * c)

In [None]:
# pointer to long example investment table name
eit = example_investments_table

# create dictionary for favorability groups
favorable_groups = {
    "Not Favorable": eit["vc_ratio"] < 1,
    "Barely Favorable": (eit["vc_ratio"] > 1) & (eit["vc_ratio"] < 10),
    "Little Favorable": (eit["vc_ratio"] > 10) & (eit["vc_ratio"] < 100),
    "Reasonably Favorable": (eit["vc_ratio"] > 100) & (eit["vc_ratio"] < 5000),
    "Significantly Favorable": eit["vc_ratio"] > 5000                             
}

# loop through favorable groups
for fig_num, (key, mask) in enumerate(favorable_groups.items()):
    # get sub data frame
    sub_data_frame = eit[mask]
    
    # loop over rows
    for idx, row in sub_data_frame.iterrows():
        # get values
        U = row["U"]
        C = row["C"]
        p = row["p"]
        
        # generate x/y pairs
        x_values, y_values = zip(*((n, expected_utility(U, C, p, n)) for n in range(1, 11)))
    
        # create the line plot
        plt.plot(x_values, y_values, label=idx)

    # add titles and labels
    plt.xlabel("Number of Options (n)")
    plt.ylabel("Expected Utility (E)")
    
    # set title
    plt.suptitle(
        f"Figure {fig_num + 3}. Optimal Number for {key!r} Investments", y=0.0001, fontsize=10
    )

    # now show
    plt.legend(title="id#")
    plt.show()

The results are quite interesting (see below data table for *investment id* lookup). In *figure 3* we see basically what we expected (i.e. nothing favorable). In *figure 4* there is something interesting happening with the *investment ids* $1$, $2$, and $6$. And finally in *figures* *5*, *6*, and *7* we see what we would expect (several *favorable investment options*). Of course we also realize from these figures the limits of the *value/cost* ratio: the ratio alone is not enough to **completely** filter *investment options*, unless these options have a $\text{vcr} < 1$ (i.e *not favorable*). If the $\text{vcr} > 1$, then the particular investment *could be favorable*. However, the *net expected utility* equation is still needed to figure out if $E(n)$ decreases as $n$ increases, and if it increases, what the optimal number of options (e.g. the tickets in the lottery) will be.

In [None]:
# show final table of example investment data
print(example_investments_table.to_string())

## Moral
The power of numbers is not just their *objectivity*, but also in their ability to *obfuscate* the truth. As paradoxical as this may sound, what else could be said in regards to the *example investment data*? It is simply not a simple task to look at columns of numbers and directly determine which will be *more favorable* in their *net expected utility* by first impression alone. It is only through a more *sharpened application* of mathematics (and hence the mind) that we can *extract* from this infinitude of numbers that *mysterious* and *obscure* truth that we desire above all: which investments are *favorable*?