# Cookie factory

You are the manager of a company which produces cookies and you want to introduce a new product. Your R\&D department has proposed and developed the following two alternatives:

1. Unicorn cookies (UC)
2. Vanilla-chip cookies (VC).

As part of your market research, you are interested in predicting whether certain customers are likely to buy one of the new products. For that, you have already collected data from a large number of test persons.
In particular, you asked them to fill out a query with the following questions:


1. How old are you? (variable $age$)
2. What do you think is the most fascinating: Rainbows, Black holes or Cats? (variable $preferences$)
3. How much money do you spend on cookies per month? (variable $money$)
4. Which of our cookies would you buy? (variable $product$)  
   *Note*: The variable $product$ can also take on the value "No product" (NP).

You can find the data in *cookie-factory.csv*.

## Imports and data

In [None]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

import scipy.stats as dists

In [None]:
data = pd.read_csv("cookie-factory.csv")
data.head()

## Data visualization

Nothing to do here...

In [None]:
fig, axs = plt.subplots(1, 3, figsize=(20, 6))

x = data["product"].replace(["No product", "Unicorn", "Vanilla"], [0, 1, 2])
x += np.random.uniform(-0.2, 0.2, len(data))

axs[0].scatter(x, data["age"], alpha=0.2)
axs[0].set_title("Age")
axs[0].set_xticks([0, 1, 2])
axs[0].set_xticklabels(["No product", "Unicorn", "Vanilla"])

prefs = data["preferences"].replace(["Rainbows", "Black holes", "Cats"], [0, 1, 2])
prefs += np.random.uniform(-0.2, 0.2, len(data))
axs[1].scatter(x, prefs, alpha=0.2)
axs[1].set_title("Preferences")
axs[1].set_xticks([0, 1, 2])
axs[1].set_xticklabels(["No product", "Unicorn", "Vanilla"])
axs[1].set_yticks([0, 1, 2])
axs[1].set_yticklabels(["Rainbows", "Black holes", "Cats"])


axs[2].scatter(x, data["money"], alpha=0.2)
axs[2].set_xticks([0, 1, 2])
axs[2].set_title("Money")
axs[2].set_xticklabels(["No product", "Unicorn", "Vanilla"]);


## a)

For each of the questions 1-4, decide
- whether the answers are continuous or discrete outcomes,
- which range the outcomes could have
- to which scale of measurement (nominal, ordinal, interval, ratio) the outcomes belong to.

## b)

To infer which products new customers are likely to buy, you set up a probabilistic model.
You assume that the answers to questions 1 - 3 are conditionally independent (Naive Bayes) given $product$ and model the dependencies as follows:
$$
f(age, preferences, money, product) = \\
		\mathbb{P}(age ~\vert~ product) \cdot \mathbb{P}(preferences ~\vert~ product) \cdot f_{money}(money ~\vert~ product) \cdot \mathbb{P}(product)
$$

Estimate the parameters of your categorical prior by using maximum likelihood:
$$
\mathbb{P}(product = UC) = p_{UC} \qquad \mathbb{P}(product = VC) = p_{VC} \qquad \mathbb{P}(product = NP) = p_{NP}
$$

*Hint*: The maximum likelihood estimate of the parameters for categorically distributed variables is simply the fraction of samples from a category.

In [None]:
# TODO

## c)

Based on your observations in a), you decide to model the likelihoods as follows:

1. $age$ follows a Poisson distribution where the parameter $\lambda_{product}$ depends on the product the customers would buy ($\lambda_{product} = \lambda_{UC}$, $\lambda_{product} = \lambda_{VC}$, or $\lambda_{product} = \lambda_{NP}$):
    $$
        \mathbb{P}(age = k \vert product) = \frac{\lambda_{product}^k}{k!} e^{-\lambda_{product}}
    $$

2. $preferences$ follows a Categorical distribution where the parameters depend on the product the customers would buy.

3. $money$ follows an exponential distribution where the parameter $\lambda_{product}$ depends on the product the customers would buy ($\eta_{product} = \eta_{UC}$, $\eta_{product} = \eta_{VC}$ or $\eta_{product} = \eta_{NP}$):
    $$
        f_{money}(m \vert product) = \begin{cases}
            \eta_{product} \cdot e^{-\eta_{product} \cdot m} & m \geq 0 \\
            0 & \text{else}
        \end{cases}
    $$


Intuitively, your model describes the profile ($age$, $preferences$, $money$) of a customer if you already know which product they would buy ($product$).
        
Using the data, derive maximum likelihood estimates for all parameters.

*Hint*: The maximum likelihood estimate of the parameters for Poisson distributed variables is simply the sample mean: $\bar{x}$.  
*Hint*: The maximum likelihood estimate of the parameters for exponentially distributed variables is the inverse of their sample mean: $\bar{x}^{-1}$.  
*Hint*: The maximum likelihood estimate of the parameters for categorically distributed variables is simply the fraction of samples from a category.

Age

In [None]:
# TODO

Preferences

In [None]:
# TODO

Money

In [None]:
# TODO

## d) + e)

You now have access to a joint density over your data:
$$
f(age, preferences, money, product) = \\
    \mathbb{P}(age ~\vert~ product) \cdot \mathbb{P}(preferences ~\vert~ product) \cdot f_{money}(money ~\vert~ product) \cdot \mathbb{P}(product)
$$
	
With the fitted model, predict the (posterior) probability
$$
	\mathbb{P}(product ~\vert~ age, preferences, money)
$$
that the customers below buy a unicorn cookie, a vanilla-chip cookie or no cookie at all:

| Customer  | $age$ | $preferences$  | $money$   |
| --------- | -----:| ------------- | ---------:|
| Anna      | 81    | Cats          | 53.10 €   |
| Ben       | 15    | Rainbows      | 2.30 €    |
| Caroline  | 42    | Black holes   | 10.25 €   |
| ???       | ??    | Rainbows      | ??        |

Helpful distributions:
- [Poisson](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.poisson.html#scipy.stats.poisson)
- [Exponential](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.expon.html#scipy.stats.expon)

### Anna

In [None]:
# No product
# TODO

# Unicorn cookie
# TODO

# Vanilla chip cookie
# TODO

# Normalize
# TODO

### Ben

In [None]:
# No product
# TODO

# Unicorn cookie
# TODO

# Vanilla chip cookie
# TODO

# Normalize
# TODO

### Caroline

In [None]:
# No product
# TODO

# Unicorn cookie
# TODO

# Vanilla chip cookie
# TODO

# Normalize
# TODO

### Unknown customer

In [None]:
# No product
# TODO

# Unicorn cookie
# TODO

# Vanilla chip cookie
# TODO

# Normalize
# TODO