# Probability and Stochastic Processes 2024-25: **Problem Set 1**

Provide a solution to the following exercises.

Some rules:

- You should **use the libraries that we have seen** during our tutorials. If you want to provide another solution (which uses different libraries or concepts that we have not seen together) you can do so, but you will need to ALSO write down a solution that uses the concepts seen in class. Providing more than one solution will not receive extra points.

- You might need some formulas (theorems, definitions, properties...) coming from our lectures. Whenever your solution is backed up by a formula that we have seen during a lecture, write a comment citing - if available - the number of the book corresponding to this formula (for example, if you are using the theorem of continuity of probability measure, write down "by the thm of continuity of probability measure 1.54").

- **Comment** the code explaining the passages that you followed using the # command in the code cell. Code comments are for short explanations. If you want to write a "text-heavy" answer, then I suggest you use a markdown cell.

- The output of your code should solely be the answers to the questions (this means that you should not print extra things or plots). In general, **be tidy**!

- Before you submit the notebook, be sure to **compile it** (i.e., you should run all the code before downloading your notebook, so that I can see your output and do not have to re-run it).

- When you are simulating replicas of experiments, you can use the number 1000.

In [21]:
num_experiment = 1000

It might come in handy to use the `while` command. A `while` statement is used for repeated execution as long as an expression (written right after the `while` command) is true (alternatively, you can use the `break` command to exit a for loop...we have seen it in the first weeks). For an example, see below:

In [22]:
import scipy
from scipy import stats

def assign_grade(initial_grade, p):
  grade = initial_grade
  reply = False
  num_wrong_ans = 0
  while (reply is False) and (num_wrong_ans < 3): # Python repeats this part until reply is wrong and the number of questions is less than 4
      question = stats.randint(0, 30).rvs() # generate a random question
      reply = stats.bernoulli(p).rvs() # reply of the student
      grade -= (1 - reply) * (initial_grade - 18)/3 # update the grade
      num_wrong_ans += 1
  return grade, num_wrong_ans

print(assign_grade(28, 0.1))
print(assign_grade(28, 0.9))

(24.666666666666668, 1)
(28.0, 1)


# 1 - Minimum grade of the exam [full marks: 10]

The 40 students enrolled in PSP 2024 will take the exam in January. Each of them will receive a grade between 0 (I really hope nobody gets this grade) and 30.
Let's assume that the grades are independent and identically distributed as a Truncated Normal (look this up!) with mean 23 and variance 5 and support over [0, 30] (therefore, we assume that the grades take continuous values).

1. Plot the PDF and the CDF of the grade of one of the students, using the methods for random variables (i.e., do not simulate). [max 4 points]

2. Design an experiment to estimate the cumulative distribution function of the minimum of the grades. Plot this empirical CDF against the theoretical CDF. [3]

3. What is the probability that the minimum of the grades is less than 18? And between 18 and 20? Find both the empirical (using the simulations of point 2) and theoretical values of these probabilities. [3]

In [None]:
# Start writing your answer here...

# 2. Calciatori Panini and best sister award [full marks: 10]
Back in the days when Francesca was young, she and her brother collected football stickers to attach to their albums (for reference, these are called the "Calciatori Panini Album"). Francesca knew that her brother and 2 of his friends were obsessed with the player Alberto Gilardino. Wishing to be a very nice sister, she decided to look for 3 stickers of Gilardino that she could gift to her brother and his friends. She knew that in the album there was space for 300 players, the stickers were sold in packets of 6 cards each, and each packet costed 0.25 cents. Each player had the same probability of appearing on a sticker and in a packet you can find the same player more than once.

Propose a simulation experiment that answers empirically the following questions:

1. What is the average number of packets she has to buy to be elected the "most lovely sister of the year" as a result of finding the deesired number of stickers of Gilardino? What is the average cost? And the median cost? [4]

2. Unfortunately, the mean does not tell us the full picture. Plot an estimate of the probability mass function of the number of packets she will have to buy for her gift. [2]

3. Plot the distribution function of her costs. What is the probability that she will end up paying more than 80 euros to fullfill her dream? [2]

4. After having done these calculations, little Francesca is a bit concerned about her expenses. What should be the probability of finding Gilardino on a sticker to know that her median expense does not exceed 4 euros? To reply to this, consider possible probabilities that range from 1/100 to 1/10 with steps of 0.01. Once you have computed the median for each of these probabilities $p$, plot $p$ vs the corresponding median and draw an horizontal line to indentify the value 4 for the median. [2]

In [None]:
# Start writing your answer here...

# 3 Let's make this bank not go bankrupt [full marks: 10]

You are running a small bank which has loaned 2000 loans to its customers. You are concerned about the probability of these loans defaulting in the incoming year, so you decide to use your PSP knowledge to figure out how likely it is that, under "normal" conditions of the market, too many loans default.

1. To start with, you assume that loans all have the same probability of defaulting in the incoming year equal to 0.005 and they all behave independently of each others. What is the expected number of defaulting loans? What is the probability than more than 1% of the loans default? To solve this, you can either simulate things numerically or give the exact answer using the methods that we have seen for random variables. [5]

2. Now that you have established a baseline probability, you know that reality is a bit more complicated than that, so you want to compute the same two quantities (expected value and probability than more than 1% of the loans default) for the 2000 loans, but this time you assume that 100 of them have probability of default 0.01, 900 have probability of default 0.008 and 1000 have probability of default 0.005. You can run experiments to answer the question, but full marks will be given to an answer that uses the methods that we have seen for random variables and the distributions/theorems that we have seen during the course. [3]

3. Again, we spicy it up to make it more realistic. This time, the second and third group of loans are like in point 2 (i.e. the groups with 900 and 1000 loans), but the first group of 100 loans behaves slightly differently: the first 2 loans of group 1 have probability 0.1 of defaulting and, if these first 2 loans default, then all the other 90 will default too, while if the first 2 loans do not default, then the others default independently with probability 0.01. Compute the expected number of defaulting loans and the probability than more than 1% of the loans default. For this answer, you can use a sampling method. [2]

In [None]:
# Start writing your answer here...