<a href="https://colab.research.google.com/github/pmontman/tmp_choicemodels/blob/main/nb/WK_02_utility.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Utility and Random Utility Models


---
---


Our starting point is that we want to do quantitative analysis on a decision making process. The outcomes of decisions are almost always not numerical.

**Why quantitative?**
 * Because it seems to work well in other domains: physics, engineering, etc.
   * It allows for a better level of precision, understanding and we  can make rigorous predictions (quantitative is good)
 * Probabilistic on top of quantitative might be helpful, because individual choices seem quite difficult to pinpoint exactly.

We will use a recurring example during this lecture to illustrate the points:
Imagine that we own a business, such as a coffee shop and we sell three products: latte, long black and expresso *(you can ask for other examples in class, for clarification)*.

Quantitative analysis might provide an answer to very interesting questions about our business, or provide a resonable good picture of hypothetical scenarios. Here are some examples where having a quantitative model might be helpful. All of these situations involve people making choices:

* Maybe we want to understand the population of an area, in case we want to open another store there. Knowing about the demographics in an area and how they relate to the choices they make for buying coffee can help us decide whether to open a store there or not.

 **A sketch here of a table to capture the effects of the demographics**


* Maybe we want to know the effect that the pricing of our products has on the sales.

 **A sketch of what happens when we add details to the chart.**

* Maybe there is an combination of factors: What is the effect of pricing AND waiting time? (this is when advance quantitative starts to become important).

* Maybe we want to offer a new product, say an extra roasted coffee or some exotic type of latte, say himalayan goat milk. It would be nice to have an approximation of the demand for the product, so we know if it will be profitable.

* Maybe we want to know the demand for the next day better so we do not waste extra resources (logistic, storage, maybe less baristas are needed). For example, rainy days might attract less costumers, or the preferences shifts toward hot beverages rather than cold.

* Maybe we want to recommend products to our customers but we want to make a good job when recommending.



For any of the above problems, we might want to do a experiment and 'eyeball 'the results.
But we migh also want to do something more structured and efficient.

---
---

# Formalizing the problem


So what can we do? First step is to lay down some definitions to clarify what we are talking about and communicate the ideas better:

* **Alternatives** are the possible options that we choose from (in previous example were latte, long black, expresso). More elaborated examples migh incluce combinations with sizes and extras.
* **Choice set** Is the set of all alternatives that are considered when the individuals are presented a choosing.

Each alternative has some properties (price of the coffee, amount of milk, roast).
 * The properties of the alternative are called **attributes**.

The **individuals** that make the decision also have properties or variables (age, income, gender).
 * We will use **socio-economic characteristics** or **characteristics** to refer to the properties of the individuals.

 * **Preferences** of the individuals for specific alternatives is what drives decision making. We prefer long black to latte.

We will try to follow these conventions. However, it is possible that *attributes* and *characteristics* will be used interchangeably because they are very common words (we shouldn't but its human nature).

---
---

# Analyzing choices

We can think of quantitative analysis as finding numerical relationships between the properties of the **choice situation** (alternatives offered, individuals, context) and the **choice outcome**.

Therefore our task as analyists is to find which properties (attributes, characteristics) influence the choice and to find a good mathematical approximation of the relationship, a function mapping from the properties to the choice outcome.

 - We might think that the choice of coffee depends on attributes such as sourness, sweetness, the calorie content, even the time it takes to prepare. Other attributes can be less physical, such as the origin (fair trade and so on).
 - Some very relevant ones such as price.

As a general guideline, we should think about a model that:


1. **Captures the past choices** At least get a reasonable explanation on what choices we have observed until now.
2. **Captures future choices** It should be able to predict. We need to consider what type of information is available for prediction. For example, if we include a history of purchases of each client, the model will not be suitable for predicting choices of new clients for whom we have no information. Other source of information is attributes and characteristics that are likely to change in the future, such as the pricing or income.
3. **Accomodates alternatives that might not be available at the time of modelling**.
For example, if we only sell long black and latte initially, we can get away with not measuring attributes of these alternatives, just the characteristics of the individuals. This model will not be helpful if we introduce a new product, such as expresso. On the other hand, we can measure the level of roastness, intensity, sweetness and so on, so we can get a better understanding of what will happen when we introduce the expresso.

We, the analysts, have to identify all possible "sources of preferences", all information that might affect the decisions. We can do this by thinking about the problem ourselves (applying our knowledge), interviewing decision makers and so on.
**Later in the unit we will see automatic techniques that might assist in this process, commonly associated with Artificial Intelligence or Machine Learning. These will be useful when there are large quantities of data available.**





---
---

# Utility

...So where do we start?

We have already established intuitively that the decisions depend on the attributes of the alternatives and the characteristics of the individuals, and proposed some general guidelines on what properties we should be measuring. However, this is not enough, we will need to say something **more specific, more useful**. If what we say is quantitative and can be analyzed through the same mathematical tools that we use to analyze everything else (i.e. statistical models), even better.

The main issue is that we lack a way of measuring preferences in a numerical way.
Assigning numbers to preferences is a good way to start, so then we can follow with the mathematical tools at our disposal.
We want to be able to measure the influence of different properties and **compare them.** For example, both the price of a beverage and its sweetness
affect the preference of a given individual towards that beverage.
Modifying the price might make this person change its decision. Making the beverage a bit more sweet might also affect the decision. The observation is that both price and sweetness are affecting 'the same thing' they can be compared in some form, they can be interchaned. We can say something like: "lowering the price by 1 dollar has the same influence in preference that adding an extra gram of sugar".
There is a philosophical realization:
**To compare two things, we need to measure them in the same units of measure.**
Therefore we are transforming units such as dollars or grams into the same 'unit of measure'.


 A crude metaphore is the concept of 'energy' in physical systems,
 how it allows us to compare things like moving objects at some velocity, but also calories in food. These different properties are transformed into a single unit of measure, e.g. joules. Completely different situations can be compared in terms of the energy, and this can be used to understand things such as how long we will be able to run when eating a banana for fuel.

Economics uses the concept of 'utility', the idea that we can assign a number to the 'satisfaction', 'happiness' or 'pleasure' that we get from each of the alternatives that we have to choose from. Imagine that the utility is measured in 'utils' and different alternatives such as coffee or a massage, are transformed to utils and then they can be compared.

In the example of coffee, when presented a choice between latte, long black or expresso, we assign a number, the utility, to each of the alternatives:
 * a number for latte, the utility or satifaction that I derive from getting the late.
 * a number for long black.
 * a number for expresso.

These three numbers can be different, or they can coincide, for example, the same number for a long black or expresso, I derive the same pleasure from gettnig a long black than from a expresso.
Each one of us might assign numbers in a different way (maybe I assign a higher number to long black than to latte, others can do the opposite). More over, the utility that we assign might depend on the context (after a meal or first thing in the morning, weather) and change over time (can become allergic to milk). there are properties of alternatives, e.g. the price, that we take into account.

---
---

# Utility functions

The **Utility function** is the process that assigns that number to each possible choices. Basic economic theory assumes that individuals act to **maximize the utility**, they will choose the alternative with the highest number.

Now we have a target to aim for: modelling the utility for each of the choice situations.
#### **Choice modelling, from the economic point of view, is about finding utility functions**


We will also see that the conept of utility is not really needed, it is a derived concept, so we will critizise it a bit. A common criticism is that the concept is circular, it cannot be disproved, utility cannot be measured. It is not saying much, not new information. For example we can say that all objects have this property: 'shoeness', and if an object has certain amount of 'shoeness' then we will call it a shoe. Does this idea help when we want to identify is an item is a shoe or not? A defense of this criticism is that what people pay for goods reveal the utility that they assign to those goods.

 We could go on philosphical discussions on this but it is not the purpose of this course.

**The important point is that choices can be predicted as a function of the properties of the alternatives and the individual making the decision.**


**In this unit** we will use the concept of utility as a 'secondary' mechanism, mostly to communicate properties of the models that we use to predict the choices.

By thinking about properties of these utility functions, we can make progress in our understanding of the decision making. One example is the function type that we choose, such as a linear function. Another example is a high-level property, the relative utility (ranking, which one we prefer) between two alternatives does not depends on the presence of a third. By assuming that these properties appear on the choice situation that we want to model, we automatically get some models that we can apply to the data.

---
---



# Axiomatic basis of choice models: The mathematical formalism

We will impose some useful properties in the choice, some axioms.
*What are axioms?*

 Choice set must have three characteristics ([from Kenneth Train](https://eml.berkeley.edu/books/choice2nd/Ch02_p9-33.pdf)):
  * Alternatives mush be **mututally exclusive**. We cannot choose two alternatives at the same time. That being said, we can say that one alternative is 'a only A' another 'both A and B' and another 'B only'. But we must choose one.
  * The choice set must be **exhaustive**. All options must be captured. It can mean that we can include 'none of the other alternatives' as an alternative.
  In practive these means that we have to be careful when defining the choice set. If we do not include all the options, we must be aware of the limitations of our model for representing a 'reality' that has options such as the 'A and B' or 'none of the other'. In terms of demand, for marketing, the inclusion of a 'no choice' alternative is needed.
  * **Finite**. This is why we call discrete choice modelling, just clarifying that what we will study are finite choice sets, we limit ourselves to this case. Note that it can be a very large number and still be finite, such as the number of atoms in the universe.

Choice set $X = \{a, b, c, d, ...\}$

(think of $a, b, c, ...$ as the alternatives, such as latte, long black, etc. just to write it in a more compact form)

An individual "weakly prefers" $a$ to $b$ if for that individual the alternative $a$ is at least as good as alternative $b$. We denote "weakly prefers" by the symbol $\succcurlyeq$. So $a \succcurlyeq b$ means that the individual weakly prefers $a$ over $b$.
*(The weakly prefers might seem a bit strange, it is there to capture cases such as the preference between two identical items, for example $a \succcurlyeq a$)*

**Completeness Axiom** For every pair of alternatives in $X$, say $x,y \in X$, either $x \succcurlyeq y$, $y \succcurlyeq x$ or both. This means that the individual must always have 'an opinion' about the choice presented, it cannot 'opt out'. There is no 'I do not know how to choose'. Note that it can be 'I do not care between $x$ and $y$.

**Transitivity Axiom** For every triple $x, y, z$ in $X$, IF $x \succcurlyeq y$ AND  $x \succcurlyeq z$ THEN $x \succcurlyeq z$. This one is what we know since primary school. The example is the relationship 'older than'.
A non-transitive is 'beats' in the Rock, Paper, Scissors game.


 Transitivity seems reasonable things to assume. What do you think, [is it a reasonable property, are preferences always transitive?](https://doi.org/10.2307/2938263), there seems to be some [evidence against it](https://doi.org/10.1016/j.heliyon.2020.e03459).
The debate about what axioms or properties to assume for preferences is still ongoing. Remember that this part is not the focus of the unit (though we need to know about it).

[An example taken from Simon Board](http://www.econ.ucla.edu/sboard/teaching/econ11_09/econ11_09_lecture2.pdf):

1. Suppose that, given any two cars, the agent prefers the faster one.

 These preferences are
complete: given any two cars $x$ and $y$, then either $x$ is faster, $y$ is faster or they have the same
speed.

 These preferences are also transitive: if $x$ is faster than $y$ and $y$ is faster than $z$, then $x$
is faster than $z$.

2. Suppose that, given any two cars, the agent prefers $x$ to $y$ if it is both faster and bigger.

 These preferences are transitive: if $x$ is faster and bigger than $y$ and $y$ is faster and bigger than
$z$, then $x$ is faster and bigger than $z$.

 However, these preferences are not complete: an SUV
 is bigger and slower than a BMW, so it is unclear which the agent prefers. The  completeness
 axiom says these preferences are unreasonable: after examining the SUV and  BMW, the agent
 will have a preference between the two.


3. Suppose that the agent prefers a BMW over a Prius because it is faster, an SUV over a
BMW because it is bigger, and a Prius over an SUV, because it is more environmentally friendly.

 In this case, the agent’s preferences cycle and are therefore intransitive. The transitivity axiom
says these preferences are unreasonable: if environmental concerns are so important to the
agent, then she should also take them into account when choosing between the Prius and
BMW, and the BMW and the SUV."


---
---


## Utility functions emerge from the axioms

A utility function is a mapping from any alernative within the choice set $X$ to a real number. $u : X \to \mathbb{R}$.
 For example $u(a)$ is the 'utility', the number of utils, of alternative $a$ in the choice set $X$.

Utility functions transform alternatives to numbers, and we say that the utility function $u$ **represents** the relationship $\succcurlyeq$ if
$u(x) \geq u(y)$ if and only if $x \succcurlyeq y$. The $\geq$ symbol means
the familiar 'greater or equal than' that we all understand for numbers.

 So this means that we can represent the preferences of the individual by assigning numbers to the alternatives and then comparing which numbers .

 The axioms come into play because with the axioms we can show that it does not matter the preference, we can always find an utility function that transforms the alternatives into numbers so that the $\geq$ relationship can be used to compare them. So with these axioms we can prove the Utility Representation Theorem using very fundamental mathematics.

**Utility Representation Theorem**

 Suppose the agent’s preferences, $\succcurlyeq$, are complete and transitive, and that $X$ is finite. Then there exists a utility function $u(x) : X \to \mathbb{R}$ which represents $\succcurlyeq$.

So now we know that we can in fact treat preferences as numbers, which will allow us to apply all the mathematical tools to analyze them. An important
question pops up, how to find these functions from the data? Can we measure utility?. *Part of the impact of Choice Modelling is that it links this axiomatic theory of decision making to data, so the utilities can be recovered, before discrete choice modelling, there was no rigorous way of obtaining utility functions, economics use to just make "very educated guesses".*

Later, we will see that we can add more axioms to the mix and further simplify problem in useful ways that allows us to derive very nice mathematical expressions for the utility functions.


---
---




# Random Utility Models (Level 1)

We will motivate and develop the concept of uncertainty in the utility functions which will be the base for all statistical models in the unit. We will do the derivation step-by-step, adding some complexity until we reach the complete formulation of the random utility model.

We have seen that if the assumptions axioms for utility representation hold, we can indeed assign a number that indicates the preference for each of the alternatives, the utility function. Then decision makers choose the alternative that gives them maximum utility.
We can represent this mathematically:

Let $J$ represent the number of alternatives in our experiment, the size
of our choice set $X$. For example if $X = \{latte, long black, expresso\}$ then $J= 3$.
The utility that option $j$ receives is represented as $U_j$.

In this first setting, our universe only considers one specific decision maker,
we will later expand the notation to many decision makers. Since we are considering only one individual, we can, for the time being, ignore the characteristics of the individual.
Therefore, what we say is that the utility of alternative $j$ for this individual depends on the attributes of the alternative $j$. We will denote the
attributes of alternative $j$ by $a_j$.

Because utility is a number and the attributes of the alternative can be measured as numbers, we make the natural conection and say that the relationship between the attributes and the utility can be represented mathematically by some **function of the attributes**.
$$U_j = f_j(a_j)$$
This is the begginning of the road towards the full modelling of choice.
We are saying in mathematical form: "The utility for each alternative depends on the values of its attributes following some function. The utility of each alternative might depend on a different function (or they migh be the same)". So far, I would say we have not done anything sketchy or unreasonable, would you agree? At this point, this is also not very useful in practice, **but it gives us a target to aim for: finding the functions $f_j$**. If we knew the functions, we could perfectly predict what the individal will choose. As a reminder, remember that the attributes of the alternative also depends on our judgement, of what variables we think influence the decision.

---
---



# Random Utility Models (Level 2)

 In practice, we are interested in modelling the decision making of many individuals in a population. For example, in the coffee setting, we are not interested in one specific customer, but on 'all possible customers'. What we consider 'all potential customers' is also a matter of discussion, we could define them as the people in our district, the people in our city or even then whole of Earth's population. What we define as our target population depends on our interests and analysts, and also on our skill, because we could be modelling the wrong population. For example, if we take data from Australia and we think this is representative for the tastes of the rest of the world.

 We can expand our mathematical notation in two ways:
  1. Consider each of the decision makers,
   lets call the number of decision makers $N$ and identify a specific decision maker by $n$. The notation end up being:

  $$ U_{nj} = f_{nj}(a_j)$$
  Now we have a function for each alternative and each decision maker.

  2. This way of modelling the utility is technically correct, but it is also
   not very useful.
   
    Can you think of some problems or limitations with that view?
    <details>
  <summary>Spoiler warning (click here) </summary>
  The main problem is that having to find a function for each individual is not feasible, we would have to have data from all individuals. A new individual that we have not observed is the one we want to predict the choice mos of the time. If we would have data for all individuals, there would also be nothing to do as modelers. Exceptions are predicint for the same individuals under different circumstances, such as in different days, for that problem, this deffinition could work.
  </details>

   For example, we have data on all of our potential clients, we already know what their preferences are, why bother finding a function? So we need to make **stronger assumptions**. By stronger, we mean more restrictive, risking the potential to fail.
   So what we will do is consider the characteristics of the decision makers. We can denote them by $c$. For example, we can masure 'age', 'income' and 'gender'. We denote the characteristics of the decision maker $n$ by $c_n$.
   The stronger assumption comes from assuming now that all decision makers have the same function for each of the alternatives.
   $$U_{nj} = f_j(a_j, c_n)$$
   See how this is very different from the point 1. We now have to estimate $J$ functions, not $J\times N$. If a new individual comes along, which we have not observed, but we know their characteristics, we could predict their preferences with $f_j$. The problem is again, which characteristics we consider.
   If we think carefully, we can even spot **one oversight in my exposition**.

   Can you think what it is?
<details>
  <summary>Spoiler warning (click here) </summary>
 Technically speaking, if we consider that we can observe the characteristics of the individuals perfectly, we would not be making stronger assumptions compared to the point 1. For example, on such attribute would be 'id of the person'. If we can find a function that maps the id of the person and the attributes of the alternatives to the utility, then we also have a perfect mapping to each individual. In practice, this is not realistic, so we usually say that the notation in point 2 is more restrictive than point 1. This might seem too philosophical, but it pays to think about what we measure and the limitations of our models. Everything will make more sense when we add some form of error or randomness in our model, as we will see in Level 3.
</details>

---
---


# Random Utility Models (Level 3)

Randomness.

 We have hinted at a very reasonable limitation, in practice, it is very difficult to find a function that matches the utility and our measured variables (attributes and characteristics) perfectly. As a result, the matching will not be exact, there will be errors, and our carefully crafted story breaks down. A possible exit to this dilemma is to assume that this error is in fact a random component of the utility function, we introduce statistics and probability into the mix.

 This random component can come from many sources:
  * We might be measuring with noise (we do not know the exact values of the variables, imagine the amount of sugar that goes into the coffee, the
  * We are not capturing all relevant information (all relevant attributes and characteristics in our choice setting),
  *  Or maybe we do not know how to find the function, we get the wrong formula that matches the attributes and characteristics to the utility.

A way to express the random component mathematically is:

$$U_{nj} = f(a_j, c_n) + \varepsilon_{nj}$$

So now every source of problems is 'absorbed' in the $\varepsilon_{nj}$. The notation indicates that there is a random component per alternative and per decision maker.

In books an papers on choice modelling, there is a common representation of this decomposition, to simplify the notation. The part of the utility that is modelled by the function $f$ is simple denoted as $V_{nj}$ simplifying the notation. This part $V_{nj}$ is commonly referred to as the 'observed component' of the utility. The term $\varepsilon_{nj}$ is termed the 'unobserved part' of the utility.
$$U_{nj} = V_{nj} + \varepsilon_{nj}$$
This notation is less cumbersome when we want to discuss something at a very general level.
This decomposition is called **Random Utility Model (RUM)**, it is a fundamental concept in choice modelling.

Can you think of a probability distribution that could describe the random error well?
<details>
 <summary>Spoiler warning (click here) </summary>
  Well, the normal distribution of course!.
  We will see that in fact it is not the most commonly used, though it is very appropriate.
</details>

---
---

# Brief historical background for RUM

Historically, this decomposition comes from psychology, when trying to understand subjective perception of stimuli. Humans perceive the stimuli at the actual level of the stimulus plus a random factor. For example, we ask an individual to decide between two sounds, which one is louder. We know perfectly well the loudness of each sound, we have a physical definition, we can even control it with a computer. But even then, the indivudal sometimes fails at identifying the louder noise. The idea then is that the level of stimilus plus a random component.

* [An example of detecting audio differences](https://www.audiocheck.net/blindtests_index.php)
* [(FOR FUN) An example with audio quality differences](https://www.npr.org/sections/therecord/2015/06/02/411473508/how-well-can-you-hear-audio-quality)
* [(FOR FUN) Another more sophisticated for audio quality, including the Spotify example](http://abx.digitalfeed.net/list.html)



The idea was later extended to utility by economists, and it ended defining the random utility model. Basically, the intensity of the stimulus is the utility, and then there is random component. In choice modelling, this metaphore is helpful, but it is not that appropriate because we lack a proper measurement of the level of the stimulus. Since we do not know how to measure utility, we the analysts, define what is the level of the stimulus according to our subjective idea by choosing the function $f$ and the variables that we measure. So use this metaphor if you find it helpful, but please be aware of the limitations.

It is very important to know that **what we are showing is just a decomposition, it is not something fundamental**. For example depending on the function that we choose for the model, we could get different values for $V_{nj}$ and $\varepsilon_{nj}$. We could get to the same utility with different decompositions, there are many ways to decompose the utility, all getting to the same results. In other words, so far this decomposition is so general that it is not saying anything too useful. When we, the analysts, impose some restrictions on the functions or even on how the random component (such as a specific distribution), *then* we would be saying something. **In fact the 'name of the game' in choice modelling is making these decisions, selecting restriction on the functions and random components gets us to any of the major models that we are going to study in this course, and the ones that are more applied in practice.**

This creates and important change in point of view, we will not be able to predict which choice, we will give a probability of choosing each alternative.

What do you think now, do you think choices are random?

 ---
 ---

# Detailed notation

Some researchers even consider a more specific notation, introducing the context or 'situation' explicitly into the mix. For example, when buying coffee we could be buying for personal consumption or for our boss/supervisor.
The notation expands to (adding the subindex $s$):
$$ U_{nsj} = V_{nsj} + \varepsilon_{nsj} $$

Others consider that the attributes of the alternative change depending on the individuals. For example, when choosing the mode of transport, the attribute 'time spent in transit' varies between bus and car, but will also vary between individuals, because they have different origin and destination points, different driving skills and so on.
$$ f(a_{nj}, c_n) $$
The message is the same, we can think of representing every situation by  introducing them into either the attributes or characteristics, we will get to the same conclusions.  As long as everybody knows what we are talking about we will be fine. We opt for the simple notation.

**Can you think on how to solve these two examples (coffe for supervisor, time in transit) in our simple notation?**.

<details>
 <summary>Spoiler warning (click here) </summary>
  We can add an attribute to the individual, the variable 'for the supervisor'. On the transit example, we can have origin and destination locations as characteristics of the individual, speed in the alternative.
</details>

---
---


# Ambiguity: Alternatives vs attributes

Perhaps you have noticed that there is a form of ambiguity in representing the choice set, how can we define which are the alternatives in a particular situation?

When we can change the attribute values of the alternatives, we have many options avaiable. For example, we can represent our alternatives in the coffee example by some attributes such as 'amount of milk', 'amount of water', 'roastness', 'amount of sugar' and so on. By combining these correctly we can get to any of the 'latte', 'expresso', 'long black' choice set. We could think that a long black is an expresso with just more water (and pardon me if this is an aberration for the connosseur, I hope the point is clear). Then what should we do?

I have no good guideline for that, other than that it will depend on the context. In the coffee example, the tradition or natural way is to separate the types of coffee into different alternatives (we talk about latte, long black and expresso, not about coffee with $x$ ml of milk, $y$ ml of water). In discrete choice experiments, the researcher defines which are the alternatives, thinking about how they will be presented to the participants. There are technical issues with the number of alternatives, if everything is an alternative, the number of alternatives can get too big, complicating the interpretation and predictive capabilities of the model. For example, when we have many optionals for our coffee: size, sugar, extra flavours on top, ice the numbers of alternatives grows exponentially, we can soon get in the scenario of more alternatives that observations in out experiment. Modelling everything as an alternative will make interpretation difficult, for example to recover the effect of size, we will not get a simple mathematical relationship, we will get a different utility function per size.

---
---

# Choice Probabilities

We observe only the result of the choice not the utility. Moreover, under the Random Utility Model, the random or 'unobserved' component will make the results of that choice also random. This means that we will be talking about choice probabilities rather than choices per se. We will talk about the 'most likely' alternative, but in most situations we cannot talk about the alternative that is going to be chosen with certainty.

Remember that under our framework of utility maximization, decision makers choose the alternative that maximizes the utility. If we could perfectly predict the utility, the we could predict which alternative is going to be chosen.

Imagine that we want to predict the choice among 'latte' and 'long black' for a individual walking into or shop. Our model specifices an 'observed component' of the utility of 10 for latte and 8 for long black. The unoberved component is a random variable, lets say that it is like rolling a dice for each of the alternatives. Then the utility for latte will be 10 + 1d6, the utility for long black will be 8 + 1d6. There is a chance that the indivudal will choose long black over latte, even when the observed part of the utility is smaller. Approximately, there is a 72% chance that they will choose latte, a 16.7% chance that they will choose long black, and the remaining that they will be indifferent according to the utility. We can split the indifference in equal ways among the two alternatives to arrive at 77.8% for latte and a 22.2% that they will choose long black.

When talking about choice probabilities we come full circle to our initial statement about utility. We do not need utility to apply a statistical method to predict probabilities of events.

Can you think about statistical methods for doing that?
<details>
 <summary>Spoiler warning (click here) </summary>
  Logistic regression, classification.
</details>


That being said, thinking about utility can be helpful in several ways. It gives us a rigorous frame of understanding, and it identifyes the two components of the utility, so we can impose restrictions on them and derive models. There is usually an equivalence between a random utility model and a pure probabilistic one. We will see specific examples when we start with the more fundamentals models for choice modelling.




# Summary

1. Choices depend on attributes of the alternatives and characteristics of the individuals.
2. We lack a way of doing numerical analysis on choice.
2. We can assume some very basic truths about the decision-making process and show that we there is always a map from preferences among alternatives to assigning numeric value to these alternatives, the utility function. Individual choose the alternative that maximizes their utility. Therefore utility determines the outcome of the choice.
3. Utility functions allow us to do quantitative analysis of choice, and links it to basic economic theory.
4. There are realistic limitations on how we we can map utility, so we introduce a random component in the utility.
5. A consequence is that we now go from choices to choice probabilities.
6. We will begin with the basic choice models in the next lecture.
