In [1]:
#Import libraries for plotting, sampling
import numpy as np
import matplotlib.pyplot as plt

## Binance Take Home Test - Question 1

*Detection of fake commodity: There are a total of 100,000 pairs of shoes of a well-known brand on an e-commerce platform.
The official guide price of these shoes is $48 ~ $68. Now we randomly sample 1,000 pairs from them for inspection, where 100 pairs of shoes with quality or counterfeit problems are found. The price distribution of the 100 pairs of counterfeit shoes is mainly concentrated on two price points, with 60 pairs for $30 and 40 pairs for $50. Please give the price point at which you can most likely buy qualified shoes at the lowest price. Prerequisite judgments can be made based on personal understanding of real life.*
<br/><br/>

It is given in the problem statement that the price distribution of real and fake shoes differs. By inferring how likely it is that an observed price has been drawn from a particular one of these distributions, it is possible to calculate the probability that the advertised pair of shoes is real. Let the variable $Z$ indicate whether a particular pair of shoes is real or fake: <br/>

$$
Z = 
     \begin{cases}
     1 &\quad\text{Real}\\
     0 &\quad\text{Fake}
     \end{cases}
$$


<br/><br/>
This variable cannot be observed directly - the aim is to therefore to infer the probability that the latent variable $Z=1$, given/conditioned on a the observed price $x$:

$$
p(Z=1|x)
$$

which can be rewritten in terms of the conditional distribution of $x$, and the marginal distribution of $Z$ - distributions that can be estimated from the observations in the problem statement - using Bayes' rule:

$$
\begin{aligned}
p(Z=1|x) &= \frac{p(x|Z=1)p(Z=1)}{p(x)}\\ \\
         &= \frac{p(x|Z=1)p(Z=1)}{\sum_{z\in\{0,1\}}p(x|Z=z)p(Z=z)}
\end{aligned}
$$

<br/>
---

#### **Marginal distribution of Z (real/fake)**

It was observed in the problem statement that, for a sample size of 1000 shoes, 100 shoes were counterfeit or of poor quality. Assuming that the sampling was unbiased, I adopt the following marginal distribution for Z:

$$
p(Z=z) =
     \begin{cases}
     0.9 &\quad z=1\\
     0.1 &\quad z=0
     \end{cases}
$$
---

#### **Conditional price distributions**

I will assume that the conditional price distributions - the price distribution given that a pair of shoes is either real or fake - are Gaussian (or a mixture with Gaussian components). In addition to being common in real life, sampling from a Gaussian distribution is easily facilitated in python. $\mathcal{N}(\mu,\sigma^2)$ denotes a Gaussian/normal distribution with mean $\mu$ and standard deviation $\sigma^2$.

##### **Real shoes**
I will take the guide price of the shoes (\$48 ~ \$68) to correspond to the $\pm 2\sigma$ range, with the mean in the centre (\$58), so that ~95\% of real shoe prices lie in this range. The real distribution cannot feasibly be Gaussian, assuming shoe prices cannot be negative - but this should be a good approximation in the feasible price range, given the mean is ~12 $\sigma$ above the \$0 point.

$$
X|Z=1 \sim \mathcal{N}(58, 5^2)
$$

##### **Fake shoes**
For the price distribution of fake shoes, I assume a Gaussian mixture distribution with component means at the two observed price points of $30 and $50. It is unrealistic to assume all fake shoes are given these prices exactly - otherwise it would be straightforward to avoid fake shoes, by simply buying at any other price point. I will therefore assume an equal standard deviation for each component, with the same value as the standard deviation for real shoe price. The components are weighted according the quantities observed around each price point - 60 pairs at $30 and 40 pairs at $50.

$$
X|Z=0 \sim 0.6\mathcal{N}(30, 5^2) + 0.4\mathcal{N}(50, 5^2)
$$

Both conditional price distributions are plotted below.

In [4]:
#Functions to evaluate normal distribution pdfs
def _standard_normal_pdf(z: float) -> float:
    """
    Evaluates the standard normal distribution at z
    (i.e mean 0, std 1)
    """
    return 1/np.sqrt(2*np.pi) * np.exp(-z**2/2)

def normal_pdf(x: float, mu: float, sigma:float) -> float:
    """
    Evaluates a normal distribution with specified mean and
    standard deviation at x
    """
    if sigma < 0:
        raise ValueError("Standard deviation must be > 0")
    
    return 1/sigma * _standard_normal_pdf((x-mu)/sigma)

In [8]:
#Functions to evaluate conditional pdfs
def real_conditional_pdf(x: float) -> float:
    """
    Evaluates the conditional pdf for real shoes at price x
    """
    return normal_pdf(x, 58, 5)

def fake_conditional_pdf(x:float, sigma:float=5):
    """
    Evaluates conditional pdf for fake shoes at price x, two component gaussians
    """
    return 0.6*normal_pdf(x, 30, sigma) + 0.4*normal_pdf(x, 50, sigma)

In [5]:
x_samples = np.linspace(10, 100, 1000)

pdf_real = 