# Statistics Advanced - Assignment

---


## Question 1
**What is a random variable in probability theory?**

**Answer:**
A random variable is a function that assigns a numerical value to each outcome of a random experiment. It can be discrete (countable values) or continuous (values in intervals). Example: number of heads in 3 coin tosses (discrete); height of a person (continuous).


## Question 2
**What are the types of random variables?**

**Answer:**
1. Discrete random variables — have countable values; use PMF.
2. Continuous random variables — have values over intervals; use PDF.
3. Mixed — both discrete and continuous components.


## Question 3
**Explain the difference between discrete and continuous distributions.**

**Answer:**
Discrete distributions assign probabilities to points (sum to 1). Continuous distributions assign probability density; probabilities for intervals are integrals of the PDF.


## Question 4
**What is a binomial distribution, and how is it used in probability?**

**Answer:**
Models the number of successes in n independent Bernoulli trials with success probability p.
PMF: P(X=k) = C(n,k) p^k (1-p)^(n-k). Mean=np, Variance=np(1-p).


## Question 5
**What is the standard normal distribution, and why is it important?**

**Answer:**
The standard normal is N(0,1) with PDF f(z)=1/sqrt(2π) * e^{-z^2/2}. It's used to standardize other normals and appears in inference due to CLT.


## Question 6
**What is the Central Limit Theorem (CLT), and why is it critical in statistics?**

**Answer:**
CLT: the sampling distribution of the sample mean approaches normality as sample size increases, regardless of original distribution (under mild conditions). This justifies normal-based inference for large samples.


## Question 7
**What is the significance of confidence intervals in statistical analysis?**

**Answer:**
A confidence interval gives a range of plausible values for a population parameter with an associated confidence level (e.g., 95%). It communicates uncertainty around an estimate.


## Question 8
**What is the concept of expected value in a probability distribution?**

**Answer:**
Expected value is the long-run average of a random variable: sum x*P(X=x) for discrete, integral x*f(x) dx for continuous.


## Question 9
Write Python code to generate 1000 random numbers from a normal distribution with mean = 50 and std = 5, compute its mean and std, and draw a histogram. Code and output below.


In [None]:
# Question 9
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(0)
samples = np.random.normal(loc=50, scale=5, size=1000)
mean_samples = np.mean(samples)
std_samples = np.std(samples, ddof=1)

print(f'Mean of generated samples = {mean_samples:.4f}')
print(f'Sample standard deviation = {std_samples:.4f}')

plt.figure(figsize=(7,4))
plt.hist(samples, bins=30, edgecolor='black')
plt.title('Histogram of 1000 samples from N(50, 5^2)')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.grid(axis='y', linestyle='--', alpha=0.5)
plt.show()


## Question 10
Daily sales data (20 observations) are provided. (a) Explain how to apply CLT for a 95% CI for the mean. (b) Python code to compute mean and 95% CI and a simple trend plot is below.


In [None]:
# Question 10
import numpy as np, math
import matplotlib.pyplot as plt

daily_sales = [220, 245, 210, 265, 230, 250, 260, 275, 240, 255,
               235, 260, 245, 250, 225, 270, 265, 255, 250, 260]

n = len(daily_sales)
mean_sales = np.mean(daily_sales)
s = np.std(daily_sales, ddof=1)
z = 1.96  # 95% CI
se = s / math.sqrt(n)
ci_lower = mean_sales - z * se
ci_upper = mean_sales + z * se

print('n =', n)
print(f'Sample mean = {mean_sales:.4f}')
print(f'Sample std dev = {s:.4f}')
print(f'95% CI for mean = ({ci_lower:.4f}, {ci_upper:.4f})')

plt.figure(figsize=(8,3))
plt.plot(daily_sales, marker='o', linestyle='-')
plt.title('Daily sales (20 observations)')
plt.xlabel('Day index')
plt.ylabel('Sales')
plt.grid(alpha=0.5)
plt.show()


---

*End of solutions.*