<a href="https://colab.research.google.com/github/ksawesome/Blog/blob/main/Lab4_Random_Variables.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Use torch.distributions in all the below questions

## Imports

In [None]:
import torch
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import random
from collections import Counter
import ipywidgets as widgets
from ipywidgets import interact

# 1. **Random Variables**

#### Q1. (**Mandatory**)
Write a Python function that simulates rolling two six-sided dice and returns their sum. Use this function to perform 10,000 simulations. Calculate and print:

a) The experimental probability of rolling a sum of 7 <br/>
b) The experimental probability of rolling a sum of 2 or 12<br/>

Compare these experimental probabilities with the theoretical probabilities. What conclusions can you draw about the accuracy of the simulation?

#### Q2. (Optional)
Write a function coin_toss(n) that simulates flipping a coin n times, returning a list of 'H' (Heads) and 'T' (Tails). Then, write a function count_heads(flips) that takes the list of flips and returns the number of heads (this is your random variable, X). Finally, simulate 10 coin tosses and print the outcome of the coin tosses and the number of heads.


#### Q3. (Optional)
#### Coin Flip Simulation and Inverse Mapping Check

Using the `coin_toss(n)` and `count_heads(flips)` functions from Question 1, simulate **n = 4** coin flips.  

Then, given a value **a**, write a function `check_inverse_mapping(flips, a)` that checks whether the number of heads in the simulated coin flips equals **a**. In other words, verify if:  

$$
X(\xi) = a
$$

where:  
- **$X$** represents the function that counts the number of heads in the sequence.  
- **$\xi$** is the outcome of the 4 coin flips.  

The function should print whether **$X(\xi) = a$** is **True or False**.  
Finally, test the function with **a = 2**.  


# 2. **Probalility Mass function**

#### Q1. (**Mandatory**)
Consider a discrete random variable X with possible values {1, 2, 3, 4, 5}. You are given a function f(x) defined as:

$f(x) = c  (x^2 - 6x + 10)$ for x in {1, 2, 3, 4, 5}

where c is a constant.

Write a Python function is_valid_pmf(f) that:

* Determines if f(x) can be a valid PMF for some value of c.

* If valid, calculates and returns the value of c that makes f(x) a legitimate PMF.If not valid, returns None and prints why it's not valid.

Your function should check both properties of a valid PMF:

* Non-negativity for all x

* Sum of probabilities equals 1

Test your function with the given f(x) and explain your results.

#### Q2. (Optional)
Simulate flipping a coin 100 times and calculate the empirical PMF for the number of heads (0 to 100).

#### Q3. (Optional)

A small café records the number of customers visiting per day for a month (30 days). The recorded data is as follows:

    customer_counts = [12, 15, 14, 10, 18, 16, 14, 12, 15, 11, 17, 14, 13, 19, 12, 18, 14, 10, 16, 15, 14, 13, 17, 12, 18, 16, 14, 11, 19, 13]
Compute the PMF for the given data and plot it using Matplotlib. Identify the most common number of daily customers and compare the distribution.

# 3. **Cumulative Distribution Function**

#### Q1. (**Mandatory**)

Consider a discrete random variable 𝑋 with possible values 𝑥={1,2,3,4,5} and corresponding probability mass function (PMF):
$$ P(X=x)= \frac{x}{15} ,x∈{1,2,3,4,5}$$

where
𝑃(𝑋=𝑥) represents the PMF of 𝑋.

The Cumulative Distribution Function (CDF) is defined as:
 $$F_X(x) = P(X≤x)=\sum_{k≤x}
 P(X=k)$$


Your Task:
1. Compute the PMF values for each 𝑥∈{1,2,3,4,5}.
2. Compute the CDF $𝐹_X(𝑥)$ for each 𝑥.
3. Verify the relationship between PMF and CDF:
- $ P(X=x)=F_X(x)−F_X(x−1) $ for all 𝑥≥2.
- For 𝑥=1, $𝑃(𝑋=1)=𝐹_𝑋(1)$.
4. Print and compare the computed PMF values with the differences in CDF values to confirm the relationship.

#### Q2. (Optional)

  Write a Python function to compute the **CDF** for a given **discrete probability distribution**. Your function should:  

1. Take as input:  
   - A **list or NumPy array** of possible values of a discrete random variable.  
   - A corresponding **list or NumPy array** of probability mass function (PMF) values.  

2. Return:  
   - A list of CDF values corresponding to the input values.  

Additionally, perform the following:  

- Compute and plot the **CDF** for a **biased 6-sided die**, where the probability distribution is:  

| X   | 3   | 1   | 6   | 2   | 5   | 4   |
|-----|-----|-----|-----|-----|-----|-----|
| P(X)| 0.15| 0.05| 0.25| 0.10| 0.25| 0.20|


- Plot the **CDF as a step function** using **Matplotlib**.  

#### Q3. (Optional)

Define a function pdf(x) representing a normal distribution.

$$
f(x) = \frac{1}{2\pi\sigma^2} e^{-\frac{(x-\mu)^2}{2\sigma ^2}}
$$

Now calcluate the CDF of this function and plot it along with the curve for the above normal distribution.

# 4. **Expectation**

#### Q1. (**Mandatory**)
Simulate a game where you flip a coin three times. The reward system is as follows:


*   $8 for exactly 3 heads

*   $1 for exactly 2 heads,

*   $0 otherwise.

The cost to play the game is $1.5. Run the simulation for 10,000 trials and compute the average net gain, compare with theoretical expected net gain?


#### Q2. (Optional)
Write a Python function that computes the expected value of a discrete random variable given its probability mass function (PMF). The function should take a dictionary where keys represent outcomes and values represent their corresponding probabilities.

Example:
```python
pmf = {1: 0.2, 2: 0.3, 3: 0.5}
expected_value = compute_expected_value(pmf)
print(expected_value)  # Expected Output: 2.3

```




#### Q3. (Optional)

Given a discrete random variable with its probability mass function (PMF), compute the expected value of a function \(g(x)\). The function \(g(x)\) should be provided as input.

Example:
```python
pmf = {1: 0.2, 2: 0.3, 3: 0.5}
g = lambda x: x**2
expected_value = compute_expected_value_of_function(pmf, g)
print(expected_value)


# 5. **Moments and Variance**

#### Q1. (**Mandatory**)
Write a function to compute the variance of a discrete random variable given its probability mass function.

Example:
```python
pmf = {1: 0.2, 2: 0.3, 3: 0.5}
variance = compute_variance(pmf)
print(variance)


#### Q2. (Optional)
Write a function to compute the k-th moment of a discrete random variable given its probability mass function.

Example:
```python
pmf = {1: 0.2, 2: 0.3, 3: 0.5}
k = 3
moment = compute_moment(pmf, k)
print(moment)


#### Q3. (Optional)
Simulate and Estimate the variance of a given function \( f(x) \) when \( X \) is uniformly distributed over [a, b].

Example:
```python
import numpy as np
f = lambda x: x**2
a, b, N = 0, 1, 10000
variance = estimate_variance(f, a, b, N)
print(variance)


# 6. **Bernoulli Random Variables**

#### Q1. (**Mandatory**)
Write a function that simulates a Bernoulli-distributed random variable with a given probability \( p \).

Example:
```python
p = 0.7
outcome = bernoulli_trial(p)
print(outcome)  # Expected Output: 0 or 1


#### Q2. (Optional)
Given \( p \), the probability of success in a Bernoulli distribution, compute its expectation and variance.

Example:
```python
p = 0.5
expectation, variance = bernoulli_stats(p)
print(expectation, variance)


#### Q3. (Optional)
Write a function that simulates `N` Bernoulli trials and computes the proportion of successes.

Example:
```python
p = 0.3
N = 1000
proportion = simulate_bernoulli(p, N)
print(proportion)


# 7. **Binomial Distribution**

#### Q1. (**Mandatory**)
Write a Python function that simulates a Binomial random variable. The function should take three inputs:
- `n` (number of trials)
- `p` (probability of success)
- `size` (number of random values to generate)

Plot a histogram of the generated values for `n=10`, `p=0.5`, and `size=1000`.


#### Q2. (Optional)
Write a Python program that:
1. Computes the probability mass function (PMF) of a Binomial distribution.
2. Plots the PMF for `n=20` and `p=0.3`.


#### Q3. (Optional)
Simulate a fair coin being flipped 100 times and count the number of heads. Repeat this experiment 1000 times and visualize the distribution using a histogram. Explain how this relates to the Binomial distribution.

# 8. **Geometric Distribution**

#### Q1. (**Mandatory**)
#### Simulating Geometric Distribution
Write a Python function that generates random numbers from a Geometric distribution. The function should take:
- `p` (probability of success)
- `size` (number of values to generate)

Plot a histogram for `p=0.3` and `size=1000`.

#### Q2. (Optional)
#### Expected Number of Trials
Create an interactive Python program that uses `ipywidgets` and `matplotlib` to visualize the probability mass function (PMF) of the Geometric distribution. Users can adjust the probability of success \( p \) and the maximum number of trials \( k \) to display.

#### Q3. (Optional)
#### Estimating `p` from Sample Data
Generate a dataset of 500 Geometric random variables with `p=0.4`. Then, estimate `p` from the data using the formula:

$$\hat{p} = \frac{1}{\text{sample mean}}$$

Compare the estimated value with the true `p=0.4` and compute the percentage error.

# 9. **Poisson Distribution**

#### Q1. (**Mandatory**)
Write a Python function that generates Poisson random variables. The function should take:
- `λ` (expected number of events per interval)
- `size` (number of values to generate)

Plot a histogram for `λ=5` and `size=1000`.

#### Q2. (Optional)
Generate 1000 samples from a Binomial distribution with `n=100` and `p=0.05`. Then, generate 1000 samples from a Poisson distribution with `λ=n*p=5`. Compare their histograms and explain why the Poisson distribution approximates the Binomial distribution for large `n` and small `p`.

#### Q3. (Optional)
A call center receives calls at an average rate of 3 calls per minute. Assume the number of calls follows a Poisson distribution.
1. Compute the probability of receiving exactly 5 calls in a minute.
2. Compute the probability of receiving at most 5 calls.

# 10. **Probability Density function**

#### Q1. (**Mandatory**)
Implement a piecewise PDF defined as:

$$
f(x) =
\begin{cases}
0.5x & \text{if } 0 \leq x \leq 1 \\
2 - x & \text{if } 1 < x \leq 2 \\
0 & \text{otherwise}
\end{cases}
$$

Compute P(0.5≤X≤1.5) analytically and verify using numerical integration.

Plot the PDF and highlight the area under the curve for the computed probability.

#### Q2. (Optional)
Write a Python program to estimate the probability density function of a given dataset using a histogram.

* Generate or load a dataset (e.g., from a normal distribution with mean = 0 and standard deviation = 1, sample size = 1000).
* Compute the histogram with an appropriate number of bins.
* Normalize the histogram so that it represents a probability density function and plot the estimated PDF.

#### Q3. (Optional)
Random variable X defined as:

Discrete PMF: P(X=0)=0.2.

Continuous PDF: For 1≤x≤3, f(x)=0.4.

X is undefined outside {0}∪[1,3].

* a) Confirm the total probability is 1.
* b) Compute P(1.5≤X≤2.5).

# 11. **Expectation(continuous)**

####Q1. (**Mandatory**)
Implement a function to compute E[X] for a continuous random variable with a piecewise probability density function (PDF).

```python
def expectation_piecewise_pdf(intervals: list, funcs: list) -> float:
    """
    Compute E[X] for a piecewise PDF defined over intervals.
    
    Args:
        intervals: List of tuples defining intervals, e.g., [(0, 1), (1, 2)]
        funcs: List of functions corresponding to each interval's PDF
    """
    
    # Example:
    def f1(x): return x   # For 0 <= x <= 1
    def f2(x): return 2 - x  # For 1 < x <= 2
    expectation_piecewise_pdf([(0, 1), (1, 2)], [f1, f2]) ≈ 1.0



####Q2. (Optional)
Compute E[g(X)] for a continuous random variable X with PDF f(x) and transformation g, using numerical integration.
```python
def expectation_transform(g, pdf, lower: float, upper: float) -> float:
    """
    Compute E[g(X)] where X has the given PDF.
    
    Args:
        g: Function to transform X (e.g., lambda x: x**2)
        pdf: Probability density function of X
        lower: Lower bound of X's support
        upper: Upper bound of X's support
    """
    # Example:
    f = lambda x: 2*x  # PDF for X ~ Uniform(0,1) transformed by Y=2X
    expectation_transform(lambda x: x, f, 0, 1) ≈ 2/3
  


# **12. Cumulative distribution function for continuous random variables**

#### Q1. (**Mandatory**)
Write a function to compute and plot the Cumulative Distribution Function (CDF) for a Uniform continuous random variable
𝑋 ∼ 𝑈(𝑎,𝑏) given the probability density function (PDF):


$$ f_X(x)= \begin{cases}
\frac{1}{b−a} & ,\text a≤x≤b \\
0 & ,\text {otherwise}
\end {cases}
$$

Your function should:

1. Take a and b (where 𝑎 < 𝑏) as inputs.
2. Compute the CDF $𝐹_𝑋(𝑥)$
3. Generate an array of 𝑥-values and compute $ 𝐹_𝑋(𝑥) $.
4. Plot the CDF curve using Matplotlib.​















#### Q2. (Optional)
A continuous random variable 𝑋 follows an Exponential distribution with a given rate parameter 𝜆. The probability density function (PDF) of 𝑋 is:
    $$ f_X(x)=λe^{−λx} ,x≥0 $$

Your task is to derive and compute the CDF $𝐹_𝑋(𝑥)$ of the given exponential distribution for a given 𝜆.

Instructions:
1. Write a function that takes 𝑥 and 𝜆 as input and returns the cumulative distribution function (CDF) $ 𝐹_𝑋(𝑥)$.
2. Compute the CDF for different values of 𝑥 in the range [0, 5].
3. Plot the CDF curve for a given 𝜆.

#### Q3. (Optional)
A continuous random variable 𝑋 follows an Exponential distribution with a rate parameter 𝜆. The PDF of 𝑋 is given as:
$$ 𝑓_𝑋(𝑥)=𝜆𝑒^{−𝜆𝑥},𝑥≥0 $$

Your task is to compute the probability 𝑃(1≤𝑋≤3) using two different approaches:

(a) PDF Approach:
Compute the probability using numerical integration of the PDF over the interval [1,3].

(b) CDF Approach:
Compute the probability using the CDF formula.



# 13. **Mean, Mode, and Median**  
*(Refer to https://probability4datascience.com/slides/Slide_4_04.pdf)*



####Q1. (Optional)
#### Compute Summary Statistics:
   Using a numerical dataset, write a Python snippet to compute the mean, median, and mode using built-in libraries.  



####Q2. (**Mandatory**)
#### Visualize Distribution with Markers:  
   Generate a sample (e.g., from a normal distribution) and plot its histogram. Then mark the mean (red), median (green), and an approximate mode (blue, as the bin with maximum count) on the plot.  
   



###Q3. (Optional)
#### Sample vs. Theoretical Comparison:
   Generate 1000 samples from a known distribution (e.g., Uniform over [0, 10]) and compute the sample mean, median, and mode. Compare these with the theoretical values (mean = 5, median = 5).  



# 14. **Uniform Random Variables**  
*(Refer to https://probability4datascience.com/slides/Slide_4_05.pdf)*


####Q1. (**Mandatory**)
#### Simulate and Plot PDF & CDF:  
   Simulate a Uniform random variable over [2, 5] and plot its PDF and CDF using matplotlib, matching the formulas in the slide.  
  


####Q2. (Optional)
#### Compute Theoretical Moments:
   Calculate the theoretical mean and variance for a Uniform(a, b) random variable with a = 2 and b = 5 using the formulas from the slide.  
  

####Q3. (Optional)
#### Probability Calculation Using CDF:  
   For X ∼ Uniform(2, 5), compute P[3 ≤ X ≤ 4] using the CDF approach described in the slide.  


# 15. **Exponential Random Variables**  
*(Refer https://probability4datascience.com/slides/Slide_4_06.pdf)*


####Q1. (**Mandatory**)
#### Simulate and Overlay Theoretical PDF:  
   Simulate 1000 samples from an Exponential(λ = 0.5) distribution, then plot a histogram of the samples with the theoretical PDF overlay.  
  


####Q2. (Optional)
#### Theoretical Mean and Variance:  
   Using the formulas from the slide, compute the theoretical mean and variance for an Exponential distribution with λ = 0.5.  
   


####Q3. (Optional)
#### Compute Tail Probability:  
   For an Exponential(λ = 0.5) random variable, calculate P[T > 3] using the survival function (i.e. 1 – CDF) as derived in the slide.  
  

# 16. **Gaussian Random Variables**

#### Q1. (**Mandatory**)
The PDF of a Gaussian random variable $ X \sim \mathcal{N}(\mu, \sigma^2) $ is given by:

 $$
 f_X(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right).
 $$
1. Use `torch.distributions`, generate 100 random samples from a Gaussian distribution with a mean of 3 and a standard deviation of 2.
2. Plot a histogram of the samples and calculate the sample mean and variance of the distribution.
3. Repeat this process for sample sizes of 10000 and 1000000.

Observe how the shape of the histogram changes as the sample size increases. What patterns or trends do you notice in the calculated sample mean and variance?

#### Q2. (Optional)
Let $ X \sim \mathcal{N}(0,1) $ be a standard Gaussian random variable. Define a new random
variable $ Y = 4X + 10 $. What is the distribution of $ Y $? Verify your answer by ploting histogram of random variable $Y$. You can use `torch.distributions` for generating samples for Gaussian random vairable.

#### Q3. (Optional)

Let $X \sim \mathcal{N}(\mu_1, \sigma_1^2)$ with $\mu_1 = 3$ and $\sigma_1 = 2$, and let
$Y \sim \mathcal{N}(\mu_2, \sigma_2^2)$ with $\mu_2 = -1$ and $\sigma_2 = 1$. Define a new random variable
$Z = X + Y$.
1. Determine the theoretical mean and variance of Z. Write this in a markdown cell.
2. Verify the theoretical mean and variance that you get by generating 10,000 samples for 𝑋 and 𝑌 using `torch.distributions` and computing their sum to get 𝑍. Print the empirical mean and variance of 𝑍.
3. Plot histogram of the distribution of random variable 𝑍.


# 17. **Transformation of random Variables**

#### Q1. (**Mandatory**)
Write a function that performs the following tasks:

1. Generate 10,000 samples from a uniform distribution in the range [0, 1].
2. Apply a linear transformation Y = 2X + 3 to the generated samples.
3. Plot the histogram of the original samples and the transformed samples side by side.
4. Calculate and print the mean and variance of both the original and transformed distributions.

#### Q2. (Optional)
Let X be a continuous random variable following a Uniform(0,1) distribution. Define a new random variable: Y=−ln(X)

(a). Generate 10,000 samples of **X** and compute **Y**.

(b). Plot the histogram of **Y**, and compare it with the probability density function (PDF) of an exponential distribution with parameter λ=1/2.

#### Q3. (Optional)
Combined Transformation Involving a Logarithmic Function

Let $X \sim \mathcal{N}(0,1)$ and define a new random variable $Z = \ln(|X| + 1)$.

1. Use `torch.distributions` to simulate 10,000 samples of $X$, compute $Z$, and estimate its empirical distribution.
2. Plot the histogram of $Z$ and compare it with the histogram of $X$.


# 18. **Generating Random Numbers**

#### Q1. (**Mandatory**)

(a) Generate 10,000 samples from a uniform distribution U(0,1).

(b) Transform these samples to generate exponential random variables with parameter λ = 2.

(c) Plot a histogram of the generated exponential random variables.

#### Q2. (Optional)
Given an exponential distribution with rate parameter λ=3.

write a Python function that:

(a) Generates a single exponential random variable using inverse transform sampling.
Uses the transformation X=− 1/λ ln(1−U), where U is a uniform random number between 0 and 1 and returns the generated exponential random variable.

#### Q3. (Optional)
 Write a Python function that:

(a). Generates N random samples from a uniform distribution U(0,1) and transforms the samples into exponential random variables with rate λ=5 using inverse transform sampling.

(b) Apply a nonlinear transformation Z = X² to each generated exponential random variable X.

(c) Calculates and returns the following:

    (a) The proportion of values in Z that exceed a threshold t (provided as a function parameter).

    (b) The sample mean and sample variance of both X and Z.