#Theoretical Question

What is a random variable in probability theory?
- A random variable in probability theory is a numerical outcome of a random phenomenon. It's a function that maps the possible outcomes of a random experiment to real numbers.

What are the types of random variables?
- There are two main types of random variables:

   - Discrete Random Variable: A variable whose possible values are finite or countably infinite (e.g., number of heads in coin flips, number of cars passing a point).

   - Continuous Random Variable: A variable whose possible values can take any value within a given range (e.g., height, weight, temperature).

What is the difference between discrete and continuous distributions?
- The core difference lies in the nature of the outcomes:

    Discrete Distributions: Deal with discrete random variables. The probabilities are assigned to specific, distinct values.

    Continuous Distributions: Deal with continuous random variables. Probabilities are assigned to ranges of values, and the probability of any single exact value is zero.

What are probability distribution functions (PDF)?
- A Probability Distribution Function (PDF), for continuous variables, describes the likelihood of the random variable taking on a given value within a continuous range. For discrete variables, it's called a Probability Mass Function (PMF) and gives the probability of each specific outcome.

How do cumulative distribution functions (CDF) differ from probability distribution functions (PDF)?
- CDF vs. PDF
    
    Probability Distribution Function (PDF) (or PMF for discrete): Gives the probability of a specific value (discrete) or the probability density at a specific point (continuous).

    Cumulative Distribution Function (CDF): Gives the probability that a random variable will take a value less than or equal to a given value. It's the cumulative sum of probabilities up to that point.


What is a discrete uniform distribution?
- A discrete uniform distribution is a probability distribution where all possible outcomes are equally likely. For example, rolling a fair six-sided die, each face has a 1/6 probability.

What are the key properties of a Bernoulli distribution?
- The Bernoulli distribution models a single trial with two possible outcomes: success (with probability p) or failure (with probability 1−p). Key properties:

Only two outcomes: 0 (failure) or 1 (success).
Parameter: p (probability of success).
Expected Value: E[X]=p.
Variance: Var(X)=p(1−p).

What is the binomial distribution, and how is it used in probability?
- The binomial distribution describes the number of successes in a fixed number of independent Bernoulli trials. It's used when you have a set number of attempts, and each attempt has only two outcomes. For example, the number of heads in 10 coin flips.

What is the Poisson distribution and where is it applied?
- The Poisson distribution models the number of events occurring in a fixed interval of time or space, given a constant average rate of occurrence and that these events happen independently. It's applied in scenarios like counting the number of calls a call center receives in an hour, or the number of defects per square meter of fabric.

What is a continuous uniform distribution?
- A continuous uniform distribution assigns equal probability density to all values within a given interval. For example, a random number generator producing values between 0 and 1.

What are the characteristics of a normal distribution?
- The normal distribution (also known as the Gaussian distribution) is a symmetrical, bell-shaped probability distribution.

- Key characteristics:

    Symmetrical around its mean.
    Mean, median, and mode are equal.
    Tails extend to infinity but rarely touch the x-axis.
    Defined by two parameters: mean (μ) and standard deviation (σ).
    The Empirical Rule states that approximately 68% of data falls within 1 standard deviation, 95% within 2, and 99.7% within 3.

What is the standard normal distribution, and why is it important?
- The standard normal distribution is a special case of the normal distribution with a mean of 0 and a standard deviation of 1. It's important because any normal distribution can be transformed into a standard normal distribution using a Z-score, allowing for standardized comparisons and probability calculations.

What is the Central Limit Theorem (CLT), and why is it critical in statistics?
- The Central Limit Theorem (CLT) states that, given a sufficiently large sample size from any population with a finite mean and variance, the sampling distribution of the sample mean will be approximately normally distributed. It's critical in statistics because it allows us to make inferences about population parameters using sample data, even if the population distribution isn't normal.

How does the Central Limit Theorem relate to the normal distribution?
- The Central Limit Theorem establishes that the sampling distribution of the sample mean tends toward a normal distribution as the sample size increases, regardless of the original population's distribution. This means we can use normal distribution properties for hypothesis testing and confidence intervals even when dealing with non-normal data, as long as the sample is large enough.

What is the application of Z statistics in hypothesis testing?
- Z-statistics are used in hypothesis testing to determine how many standard deviations an observed sample mean (or proportion) is from the hypothesized population mean. They help decide whether to reject or fail to reject a null hypothesis by comparing the calculated Z-score to critical values from the standard normal distribution.

How do you calculate a Z-score, and what does it represent?
- A Z-score (or standard score) measures how many standard deviations an individual data point is from the mean of its distribution. It's calculated as:

Z=(X−μ)/σ

Where:

X = individual data point
μ = population mean
σ = population standard deviation

A Z-score represents the standardized position of a data point. A positive Z-score means the data point is above the mean, a negative Z-score means it's below the mean, and a Z-score of 0 means it's equal to the mean.

What are point estimates and interval estimates in statistics?
- Point Estimate: A single value used to estimate a population parameter (e.g., sample mean as an estimate of population mean).

- Interval Estimate: A range of values used to estimate a population parameter (e.g., a confidence interval). It provides a measure of the precision and uncertainty of the estimate.

What is the significance of confidence intervals in statistical analysis?
- Confidence intervals provide a range within which the true population parameter is likely to lie, with a certain level of confidence. They quantify the uncertainty associated with a sample estimate and are crucial for making reliable inferences about populations.

What is the relationship between a Z-score and a confidence interval?
- Z-scores are integral to constructing confidence intervals. For a given confidence level (e.g., 95%), a corresponding Z-score (critical value) is used to define the boundaries of the interval around the point estimate. The Z-score tells us how many standard errors to extend from the sample mean to capture the true population mean with the desired confidence.

How are Z-scores used to compare different distributions?
- Z-scores allow for the comparison of data points from different distributions by standardizing them. By converting raw scores into Z-scores, you can see how many standard deviations each score is from its respective mean, making it possible to compare their relative positions, even if the original scales are different. For example, comparing a student's score in a math test to their score in a history test, where the tests might have different means and standard deviations.

What are the assumptions for applying the Central Limit Theorem?
- Key assumptions for applying the Central Limit Theorem:

    Random Sampling: Samples must be randomly selected.
    Independence: Observations within samples must be independent.
    Sample Size: The sample size must be sufficiently large (generally n≥30).
    Finite Variance: The population must have a finite mean and variance.

What is the concept of expected value in a probability distribution?
- The expected value (E[X]) of a random variable is the weighted average of all possible values the variable can take, with the weights being their respective probabilities. It represents the long-run average outcome if the experiment were repeated many times. It's a measure of the central tendency of a probability distribution.

How does a probability distribution relate to the expected outcome of a random variable?
- A probability distribution provides a complete picture of all possible outcomes of a random variable and their associated probabilities. The expected outcome (expected value) is a single, summary statistic derived from this distribution. It represents the theoretical average outcome of the random variable based on its probability distribution, essentially a long-term prediction of what to expect from the random process.


#Practical Queston

In [None]:
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

# 1. Generate a random variable and display its value
rv = np.random.random()  # uniform on [0,1)
print("Random variable:", rv)

In [None]:
# 2. Discrete uniform distribution PMF and plot
from scipy.stats import randint
low, high = 1, 7  # like a die
du = randint(low, high)
x = np.arange(low, high)
pmf = du.pmf(x)
plt.bar(x, pmf, alpha=0.6)
plt.title("Discrete Uniform PMF")
plt.xlabel("x")
plt.ylabel("P(X=x)")
plt.show()

In [None]:
# 3. Bernoulli PDF (PMF) function
def bernoulli_pmf(p):
    def pmf(k):
        return stats.bernoulli.pmf(k, p)
    return pmf

p = 0.3
b_pmf = bernoulli_pmf(p)
print(f"Bernoulli PMF p={p}: P(0)={b_pmf(0):.2f}, P(1)={b_pmf(1):.2f}")

In [None]:
# 4. Simulate a binomial (n=10, p=0.5), histogram
n, p = 10, 0.5
data_bin = np.random.binomial(n, p, size=10000)
plt.hist(data_bin, bins=range(n+2), align='left', rwidth=0.8, alpha=0.6)
plt.title("Binomial (n=10, p=0.5)")
plt.xlabel("Number of successes")
plt.ylabel("Count")
plt.show()

In [None]:
# 5. Poisson distribution and visualization
mu = 3
data_pois = stats.poisson.rvs(mu, size=5000)
plt.hist(data_pois, bins=range(0, max(data_pois)+2), align='left', alpha=0.6)
plt.title("Poisson (μ=3)")
plt.xlabel("k")
plt.ylabel("Count")
plt.show()

# PMF visualization
kmax = np.arange(0, mu+10)
pmf_pois = stats.poisson.pmf(kmax, mu)
plt.scatter(kmax, pmf_pois, color='red')
plt.vlines(kmax, 0, pmf_pois, alpha=0.6)
plt.title("Poisson PMF")
plt.xlabel("k")
plt.ylabel("P(X=k)")
plt.show()

In [None]:
# 6. CDF of discrete uniform and plot
cdf = du.cdf(x)
plt.step(x, cdf, where='mid')
plt.title("Discrete Uniform CDF")
plt.xlabel("x")
plt.ylabel("F(x)")
plt.show()

In [None]:
# 7. Continuous uniform via NumPy and plot PDF histogram
cu = np.random.uniform(0, 5, size=100000)
plt.hist(cu, bins=50, density=True, alpha=0.6)
plt.title("Continuous Uniform [0,5]")
plt.xlabel("x")
plt.ylabel("Density")
plt.show()

In [None]:
# 8. Simulate normal data and histogram
norm_data = np.random.normal(loc=0, scale=1, size=10000)
plt.hist(norm_data, bins=50, density=True, alpha=0.6)
x = np.linspace(-4, 4, 200)
plt.plot(x, stats.norm.pdf(x, 0, 1), 'r--')
plt.title("Normal Distribution Sample + PDF")
plt.show()

In [None]:
# 9. Z-score function + plot
def z_scores(arr):
    mu, sigma = np.mean(arr), np.std(arr, ddof=0)
    return (arr - mu) / sigma

zs = z_scores(norm_data)
plt.hist(zs, bins=50, density=True, alpha=0.6)
plt.title("Z-Scores Histogram")
plt.xlabel("Z")
plt.ylabel("Density")
plt.show()

In [None]:
# 10. CLT demonstration with non-normal (exponential)
def simulate_clt(dist, args, sample_size=30, trials=5000):
    means = [np.mean(dist(*args, size=sample_size)) for _ in range(trials)]
    plt.hist(means, bins=30, density=True, alpha=0.6)
    mu, sigma = np.mean(means), np.std(means)
    x = np.linspace(mu-4*sigma, mu+4*sigma, 200)
    plt.plot(x, stats.norm.pdf(x, mu, sigma), 'r--')
    plt.title(f"CLT: {dist.__name__}, n={sample_size}")
    plt.show()

simulate_clt(np.random.exponential, (1,), sample_size=50)

In [None]:
# 11. Simulate multiple samples from a normal distribution and verify the Central Limit Theorem
def simulate_clt_normal(mu=5, sigma=2, sample_size=30, trials=5000):
    samples = [np.mean(np.random.normal(mu, sigma, sample_size)) for _ in range(trials)]
    plt.hist(samples, bins=30, density=True, alpha=0.6)
    x = np.linspace(mu-4*sigma/np.sqrt(sample_size), mu+4*sigma/np.sqrt(sample_size), 200)
    plt.plot(x, stats.norm.pdf(x, mu, sigma/np.sqrt(sample_size)), 'r--')
    plt.title("CLT (normal parent) – distribution of sample means")
    plt.show()

simulate_clt_normal()

In [None]:
# 12. Function to calculate & plot standard normal distribution (mean=0, std=1)
def plot_standard_normal():
    x = np.linspace(-4, 4, 400)
    y = stats.norm.pdf(x, 0, 1)
    plt.plot(x, y)
    plt.title("Standard Normal PDF (μ=0, σ=1)")
    plt.show()

plot_standard_normal()

In [None]:
# 13. Generate random variables & calculate binomial probabilities
n, p = 10, 0.5
k = np.arange(0, n+1)
pmf_binom = stats.binom.pmf(k, n, p)
print("Binomial PMF (n=10, p=0.5):")
for ki, pki in zip(k, pmf_binom):
    print(f"P(X={ki}) = {pki:.4f}")

In [None]:
# 14. Z‑score calculation and comparison to standard normal
def z_score(x, data):
    return (x - np.mean(data)) / np.std(data, ddof=0)

data = np.random.normal(0, 1, 1000)
print("Z-score of 1.5 in sample:", z_score(1.5, data))

In [None]:
# 15. Hypothesis testing using Z-statistics
def z_test(sample, mu0=0):
    xbar, sigma, n = np.mean(sample), np.std(sample, ddof=0), len(sample)
    z = (xbar - mu0) / (sigma/np.sqrt(n))
    pval = 2 * (1 - stats.norm.cdf(abs(z)))
    return z, pval
sample = np.random.normal(5, 1, size=100)
print("Z-test (H₀: μ=5):", z_test(sample, mu0=5))

In [None]:
# 16. Create a confidence interval for a dataset
def conf_int(sample, alpha=0.05):
    xbar, s, n = np.mean(sample), np.std(sample, ddof=1), len(sample)
    margin = stats.norm.ppf(1 - alpha/2) * (s / np.sqrt(n))
    return xbar - margin, xbar + margin

sample2 = np.random.normal(0, 1, 100)
print("95% CI for sample2 mean:", conf_int(sample2))

In [None]:
# 17. Generate data from normal distribution & interpret its CI
data_norm = np.random.normal(50, 5, 200)
ci2 = conf_int(data_norm)
print(f"Sample mean={np.mean(data_norm):.2f}, 95% CI={ci2}")

In [None]:
# 18. PDF of a normal distribution (general μ, σ)
def plot_pdf(mu=10, sigma=2):
    x = np.linspace(mu - 4*sigma, mu + 4*sigma, 400)
    plt.plot(x, stats.norm.pdf(x, mu, sigma))
    plt.title(f"Normal PDF (μ={mu}, σ={sigma})")
    plt.show()

plot_pdf(10, 2)

In [None]:
# 19. CDF of a Poisson distribution
pois = stats.poisson(mu=3)
print("Poisson CDF P(X ≤ k):")
for k in range(10):
    print(f"k={k}: {pois.cdf(k):.4f}")

In [None]:
# 20. Simulate a continuous uniform distribution & compute expected value
uni = np.random.uniform(0, 5, 100000)
print("Continuous uniform [0,5] mean ≈", np.mean(uni))

In [None]:
# 21. Compare standard deviations of two datasets & visualize
a = np.random.normal(0, 1, 500)
b = np.random.normal(0, 2, 500)
print("STD(a) ≈", np.std(a, ddof=1), "STD(b) ≈", np.std(b, ddof=1))
plt.boxplot([a, b], labels=['a','b'])
plt.title("Std dev comparison")
plt.show()

In [None]:
# 22. Range and interquartile range (IQR) of normal dataset
d = np.random.normal(5, 2, 1000)
print("Range:", np.ptp(d), "IQR:", np.percentile(d,75) - np.percentile(d,25))

In [None]:
# 23. Z‑score normalization & visualization
normed = (d - np.mean(d)) / np.std(d, ddof=0)
plt.hist([d, normed], bins=30, label=['Original','Normalized'], alpha=0.6)
plt.legend()
plt.title("Before vs after Z‑score normalization")
plt.show()

In [None]:
# 14. Skewness and kurtosis of dataset
sk = stats.skew(d, bias=False)
ku = stats.kurtosis(d, fisher=True, bias=False)
print("Skewness:", sk, "Excess Kurtosis:", ku)