# Theory Questions

1. What is a random variable in probability theory?
 - A random variable is a function that assigns numerical values to the outcomes of a random experiment.
Example: Tossing a die → Random variable X = number showing up (1 to 6).

2. What are the types of random variables?
-  Discrete – Takes countable values (e.g., number of heads in 5 coin tosses).
 Continuous – Takes any value within a range (e.g., height, weight, temperature).

3. What is the difference between discrete and continuous distributions?
- Discrete Distribution: Probability is assigned to exact values.
Continuous Distribution: Probability is described over intervals using a density function; individual values have zero probability.

4. What are probability distribution functions (PDF)?
- For discrete variables: PDF = Probability Mass Function (PMF), gives P(X = x).
For continuous variables: PDF is a curve showing the relative likelihood of values; area under the curve gives probabilities.

5. How do cumulative distribution functions (CDF) differ from PDFs?
- CDF gives the probability that a random variable is less than or equal to a value: P(X ≤ x).
It is the integral (area under curve) of a PDF in continuous cases.



6. What is a discrete uniform distribution?
- All outcomes are equally likely.
Example: Fair die → P(X = x) = 1/6 for x = 1 to 6.

7. What are the key properties of a Bernoulli distribution?

- Two outcomes: success (1) and failure (0).
One trial only.
Mean = p, Variance = p(1 – p).

8. What is the binomial distribution, and how is it used in probability?

- Describes number of successes in n independent Bernoulli trials.

  Parameters: n (trials), p (success probability).

  Used in scenarios like "What’s the probability of getting 3 heads in 5 coin tosses?"

9.  What is the Poisson distribution and where is it applied?

- Models the number of events occurring in a fixed interval (time, space) when events happen independently.
Example: Number of calls at a call center per hour.



10.  What is a continuous uniform distribution?

- Every outcome in an interval [a, b] is equally likely.
PDF = 1 / (b - a).

11. What are the characteristics of a normal distribution?

- Bell-shaped, symmetric around the mean.

  Defined by mean (μ) and standard deviation (σ).

  Follows empirical rule:

   68% within 1σ,

   95% within 2σ,

  99.7% within 3σ.

12.  What is the standard normal distribution, and why is it important?

- A normal distribution with μ = 0, σ = 1.

   Used to compute Z-scores and compare across different distributions.



13.  What is the Central Limit Theorem (CLT), and why is it critical in statistics?
- For large n, the sampling distribution of the sample mean approaches a normal distribution, regardless of the original population distribution.
Enables inference using the normal model.

14. How does the Central Limit Theorem relate to the normal distribution?
- It justifies using the normal distribution for hypothesis testing and confidence intervals when sample sizes are large.

15.  What is the application of Z statistics in hypothesis testing?
- Used to compare sample data to population parameters.
  Helps determine how extreme a sample result is under the null hypothesis.-

16.  How do you calculate a Z-score, and what does it represent?

- Formula: Z = (X - μ) / σ
It tells how many standard deviations a value X is from the mean μ.

17. What are point estimates and interval estimates in statistics?

- Point estimate: A single best guess (e.g., sample mean).
Interval estimate: A range (like a confidence interval) where the true parameter likely lies.

18. What is the significance of confidence intervals in statistical analysis?

- Express uncertainty in estimates.
Example: A 95% CI means if we repeated the study, 95% of such intervals would contain the true value.

19.  How are Z-scores used to compare different distributions?

- Convert values from different distributions to a standard normal scale, allowing for direct comparison.

20. What are the assumptions for applying the Central Limit Theorem?

     Independence of samples.

    Identically distributed variables.

     Large enough sample size (n ≥ 30 is a rule of thumb, more if skewed).

21. What is the concept of expected value in a probability distribution?

Weighted average of all possible values of a random variable.
Formula: E(X) = Σ [x × P(x)] for discrete; E(X) = ∫ x·f(x) dx for continuous.

22.  How does a probability distribution relate to the expected outcome of a random variable?

The shape and probabilities in a distribution determine the expected (mean) value of the random variable.



# Practical Questions

1. Generate a random variable and display its value

In [None]:
import random

# Generate a random variable (example: integer from 1 to 6)
random_var = random.randint(1, 6)
print(f"Random variable value: {random_var}")


2. Discrete uniform distribution and plot PMF

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Discrete uniform distribution from 1 to 6
values = np.arange(1, 7)
probabilities = np.full_like(values, 1/len(values), dtype=np.float64)

# Plot PMF
plt.stem(values, probabilities)
plt.title("PMF of Discrete Uniform Distribution (1-6)")
plt.xlabel("Value")
plt.ylabel("Probability")
plt.grid(True)
plt.show()

3.  PDF of a Bernoulli distribution

In [None]:
def bernoulli_pdf(x, p):
    if x == 0:
        return 1 - p
    elif x == 1:
        return p
    else:
        return 0

# Example usage
print(f"P(X=0): {bernoulli_pdf(0, 0.3)}")
print(f"P(X=1): {bernoulli_pdf(1, 0.3)}")


4. Simulate binomial distribution (n=10, p=0.5) and plot

In [None]:
from numpy.random import binomial
import matplotlib.pyplot as plt

n, p = 10, 0.5
samples = binomial(n, p, size=1000)

plt.hist(samples, bins=np.arange(n+2)-0.5, density=True, edgecolor='black')
plt.title("Binomial Distribution (n=10, p=0.5)")
plt.xlabel("Number of successes")
plt.ylabel("Frequency")
plt.grid(True)
plt.show()


5. Create and visualize a Poisson distribution

In [None]:
from scipy.stats import poisson
import matplotlib.pyplot as plt
import numpy as np

mu = 3
x = np.arange(0, 15)
pmf = poisson.pmf(x, mu)

plt.stem(x, pmf)
plt.title("Poisson Distribution (μ=3)")
plt.xlabel("k")
plt.ylabel("P(X=k)")
plt.grid(True)
plt.show()

6. Calculate and plot CDF of discrete uniform distribution

In [None]:
from scipy.stats import randint

a, b = 1, 6
x = np.arange(a, b+1)
cdf = randint.cdf(x, a, b+1)

plt.step(x, cdf, where='mid')
plt.title("CDF of Discrete Uniform Distribution (1-6)")
plt.xlabel("Value")
plt.ylabel("Cumulative Probability")
plt.grid(True)
plt.show()


7. Generate a continuous uniform distribution and visualize it

In [None]:
from scipy.stats import uniform

a, b = 0, 10
x = np.linspace(a, b, 1000)
pdf = uniform.pdf(x, loc=a, scale=b-a)

plt.plot(x, pdf)
plt.title("PDF of Continuous Uniform Distribution (0-10)")
plt.xlabel("x")
plt.ylabel("Density")
plt.grid(True)
plt.show()


8. Simulate normal distribution and plot histogram

In [None]:
mu, sigma = 0, 1
samples = np.random.normal(mu, sigma, 1000)

plt.hist(samples, bins=30, density=True, alpha=0.7, edgecolor='black')
plt.title("Normal Distribution Histogram")
plt.xlabel("Value")
plt.ylabel("Density")
plt.grid(True)
plt.show()


9. Calculate and plot Z-score

In [None]:
from scipy.stats import zscore

data = np.random.normal(loc=50, scale=10, size=100)
z_scores = zscore(data)

plt.plot(z_scores, marker='o', linestyle='None')
plt.axhline(0, color='red', linestyle='--')
plt.title("Z-scores of Data")
plt.xlabel("Index")
plt.ylabel("Z-score")
plt.grid(True)
plt.show()


10.  Central Limit Theorem (CLT) simulation from a non-normal distribution

In [None]:
# Simulate CLT using an exponential distribution
sample_means = []
for _ in range(1000):
    sample = np.random.exponential(scale=2.0, size=30)
    sample_means.append(np.mean(sample))

plt.hist(sample_means, bins=30, density=True, alpha=0.7, edgecolor='black')
plt.title("Central Limit Theorem (Sample Means from Exponential Distribution)")
plt.xlabel("Sample Mean")
plt.ylabel("Density")
plt.grid(True)
plt.show()


11. Simulate multiple samples from a normal distribution to verify CLT

In [None]:
import numpy as np
import matplotlib.pyplot as plt

means = []
for _ in range(1000):
    sample = np.random.normal(loc=5, scale=2, size=30)
    means.append(np.mean(sample))

plt.hist(means, bins=30, density=True, edgecolor='black')
plt.title("Central Limit Theorem Verification")
plt.xlabel("Sample Means")
plt.ylabel("Density")
plt.grid(True)
plt.show()


12.  Plot the standard normal distribution

In [None]:
from scipy.stats import norm

x = np.linspace(-4, 4, 1000)
pdf = norm.pdf(x, loc=0, scale=1)

plt.plot(x, pdf)
plt.title("Standard Normal Distribution (μ=0, σ=1)")
plt.xlabel("Z-score")
plt.ylabel("Density")
plt.grid(True)
plt.show()


13. Calculate probabilities using the binomial distribution

In [None]:
from scipy.stats import binom

n, p = 10, 0.5
x = np.arange(0, n+1)
pmf = binom.pmf(x, n, p)

for i in range(len(x)):
    print(f"P(X={x[i]}) = {pmf[i]:.4f}")


14. Calculate Z-score and compare to standard normal

In [None]:
value = 75
mean = 70
std_dev = 10

z = (value - mean) / std_dev
print(f"Z-score: {z:.2f}")

# Probability to the left of the Z-score
prob = norm.cdf(z)
print(f"P(X ≤ {value}) = {prob:.4f}")


15. Hypothesis testing using Z-statistics

In [None]:
# H0: μ = 100, H1: μ ≠ 100
sample = np.random.normal(105, 10, 30)
sample_mean = np.mean(sample)
population_mean = 100
std_dev = 10
n = len(sample)

z = (sample_mean - population_mean) / (std_dev / np.sqrt(n))
p_value = 2 * (1 - norm.cdf(abs(z)))

print(f"Z = {z:.3f}, p-value = {p_value:.4f}")


16. Confidence interval for a dataset

In [None]:
from scipy.stats import norm

data = np.random.normal(50, 8, 100)
mean = np.mean(data)
std_err = np.std(data, ddof=1) / np.sqrt(len(data))

conf_int = norm.interval(0.95, loc=mean, scale=std_err)
print(f"95% Confidence Interval: {conf_int}")


17. CI from normal data and interpretation

In [None]:
data = np.random.normal(70, 12, 60)
mean = np.mean(data)
std_err = np.std(data, ddof=1) / np.sqrt(len(data))

ci = norm.interval(0.95, loc=mean, scale=std_err)
print(f"95% CI for the mean: {ci}")


18. Visualize the PDF of a normal distribution

In [None]:
mu, sigma = 50, 10
x = np.linspace(mu - 4*sigma, mu + 4*sigma, 1000)
pdf = norm.pdf(x, mu, sigma)

plt.plot(x, pdf)
plt.title("Normal Distribution PDF")
plt.xlabel("x")
plt.ylabel("Density")
plt.grid(True)
plt.show()


19.  CDF of a Poisson distribution

In [None]:
from scipy.stats import poisson

mu = 4
x = np.arange(0, 15)
cdf = poisson.cdf(x, mu)

plt.step(x, cdf, where='mid')
plt.title("Poisson CDF (μ=4)")
plt.xlabel("x")
plt.ylabel("Cumulative Probability")
plt.grid(True)
plt.show()


20. Continuous uniform variable and expected value

In [None]:
from scipy.stats import uniform

a, b = 5, 15
samples = uniform.rvs(loc=a, scale=b-a, size=1000)
expected = np.mean(samples)

print(f"Expected value from simulation: {expected:.2f}")


21. Compare standard deviations and visualize

In [None]:
data1 = np.random.normal(50, 5, 100)
data2 = np.random.normal(50, 15, 100)

std1 = np.std(data1)
std2 = np.std(data2)

plt.boxplot([data1, data2], labels=["Low SD", "High SD"])
plt.title("Comparison of Standard Deviations")
plt.ylabel("Values")
plt.grid(True)
plt.show()

print(f"Standard Deviation 1: {std1:.2f}, Standard Deviation 2: {std2:.2f}")


 22. Range and IQR of normal data

In [None]:
data = np.random.normal(100, 20, 1000)

data_range = np.max(data) - np.min(data)
q1, q3 = np.percentile(data, [25, 75])
iqr = q3 - q1

print(f"Range: {data_range:.2f}")
print(f"IQR: {iqr:.2f}")


23. Z-score normalization and plot

In [None]:
data = np.random.normal(60, 10, 100)
z_scores = zscore(data)

plt.plot(data, label="Original")
plt.plot(z_scores, label="Z-score Normalized")
plt.legend()
plt.title("Z-score Normalization")
plt.grid(True)
plt.show()


24.  Skewness and kurtosis of normal data

In [None]:
from scipy.stats import skew, kurtosis

data = np.random.normal(0, 1, 1000)

print(f"Skewness: {skew(data):.2f}")
print(f"Kurtosis: {kurtosis(data):.2f}")  # excess kurtosis (normal = 0)
