**THEORETICAL QUESTIONS**

1. What is a random variable in probability theory?
A random variable is a numerical outcome of a random phenomenon. It assigns a real number to each outcome in a sample space.

2. What are the types of random variables?
Discrete Random Variable: Takes countable values (e.g., 0, 1, 2,...).

Continuous Random Variable: Takes any value in an interval (e.g., height, weight).

3. What is the difference between discrete and continuous distributions?
Discrete distribution deals with countable outcomes.

Continuous distribution deals with outcomes over a continuous range.

4. What are probability distribution functions (PDF)?
PDFs describe the probability of a random variable taking specific values. For continuous variables, PDF gives the probability density, not the actual probability.

5. How do cumulative distribution functions (CDF) differ from PDF?
CDF gives the probability that a random variable is less than or equal to a certain value.

PDF gives the density at a specific value.

6. What is a discrete uniform distribution?
It assigns equal probabilities to a finite set of outcomes (e.g., rolling a fair die).

7. What are the key properties of a Bernoulli distribution?
Only two outcomes: success (1) and failure (0).

Defined by a single parameter
𝑝
p, the probability of success.

8. What is the binomial distribution, and how is it used in probability?
It models the number of successes in
𝑛
n independent Bernoulli trials with probability
𝑝
p of success.

9. What is the Poisson distribution and where is it applied?
It models the number of events occurring in a fixed interval of time/space, given a known average rate and independent occurrences.

10. What is a continuous uniform distribution?
It assigns equal probability density to all values in a continuous interval
[
𝑎
,
𝑏
]
[a,b].

11. What are the characteristics of a normal distribution?
Symmetrical bell-shaped curve.

Mean = median = mode.

Defined by mean
𝜇
μ and standard deviation
𝜎
σ.

12. What is the standard normal distribution, and why is it important?
A normal distribution with mean 0 and standard deviation 1. It simplifies calculations using Z-scores and is used in statistical inference.

13. What is the Central Limit Theorem (CLT), and why is it critical in statistics?
CLT states that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases, regardless of the population's distribution. It’s vital for hypothesis testing and confidence intervals.

14. How does the Central Limit Theorem relate to the normal distribution?
It justifies using the normal distribution for inference about sample means, even when the population is not normally distributed.

15. What is the application of Z statistics in hypothesis testing?
Z-statistics measure how many standard deviations an element is from the mean, used to determine p-values and test hypotheses.

16. How do you calculate a Z-score, and what does it represent?
𝑍
=
(
𝑋
−
𝜇
)
𝜎
Z=
σ
(X−μ)
​

It shows how far a data point is from the mean in terms of standard deviations.

17. What are point estimates and interval estimates in statistics?
Point estimate: A single value estimate of a parameter (e.g., sample mean).

Interval estimate: A range (like a confidence interval) that likely contains the parameter.

18. What is the significance of confidence intervals in statistical analysis?
They provide a range within which the true population parameter likely lies, indicating estimation precision.

19. What is the relationship between a Z-score and a confidence interval?
Z-scores determine the critical values used to construct confidence intervals (e.g., Z = 1.96 for 95%).

20. How are Z-scores used to compare different distributions?
They standardize different datasets to a common scale, allowing direct comparisons regardless of units or means.

21. What are the assumptions for applying the Central Limit Theorem?
Samples are independent.

Sample size is sufficiently large (typically n ≥ 30).

Population has a finite variance.

22. What is the concept of expected value in a probability distribution?
It is the long-run average or mean value of repetitions of the experiment.

23. How does a probability distribution relate to the expected outcome of a random variable?

The expected value is the weighted average of all possible values that a random variable can take, based on the probability distribution.



**PRACTICAL QUESTIONS**

1️. Write a Python program to generate a random variable and display its value

In [None]:
import numpy as np

random_variable = np.random.rand()
print(f"Random variable: {random_variable}")


2️. Generate a discrete uniform distribution using Python and plot the probability mass function (PMF)

In [None]:
import matplotlib.pyplot as plt
from scipy.stats import randint
import numpy as np

x = np.arange(1, 7)
pmf = randint.pmf(x, 1, 7)

plt.stem(x, pmf, use_line_collection=True)
plt.title("PMF of Discrete Uniform Distribution (1 to 6)")
plt.xlabel("Outcome")
plt.ylabel("Probability")
plt.grid()
plt.show()


3️. Write a Python function to calculate the probability distribution function (PDF) of a Bernoulli distribution

In [None]:
from scipy.stats import bernoulli

def bernoulli_pdf(p):
    x = [0, 1]
    pdf = bernoulli.pmf(x, p)
    for xi, pi in zip(x, pdf):
        print(f"P(X={xi}) = {pi}")

bernoulli_pdf(0.3)


4️. Write a Python script to simulate a binomial distribution with n=10 and p=0.5, then plot its histogram

In [None]:
from scipy.stats import binom
import matplotlib.pyplot as plt
import numpy as np

n, p = 10, 0.5
data = binom.rvs(n, p, size=1000)

plt.hist(data, bins=np.arange(-0.5, 11.5, 1), edgecolor='black', color='skyblue')
plt.title("Binomial Distribution Histogram (n=10, p=0.5)")
plt.xlabel("Number of Successes")
plt.ylabel("Frequency")
plt.show()


5️. Create a Poisson distribution and visualize it using Python

In [None]:
from scipy.stats import poisson
import matplotlib.pyplot as plt
import numpy as np

mu = 3
x = np.arange(0, 10)
pmf = poisson.pmf(x, mu)

plt.stem(x, pmf, basefmt=" ", use_line_collection=True)
plt.title("Poisson Distribution (mu=3)")
plt.xlabel("k")
plt.ylabel("P(X=k)")
plt.grid()
plt.show()


6️. Write a Python program to calculate and plot the cumulative distribution function (CDF) of a discrete uniform distribution

In [None]:
from scipy.stats import randint
import matplotlib.pyplot as plt
import numpy as np

x = np.arange(1, 7)
cdf = randint.cdf(x, 1, 7)

plt.step(x, cdf, where='mid')
plt.title("CDF of Discrete Uniform Distribution")
plt.xlabel("x")
plt.ylabel("Cumulative Probability")
plt.grid()
plt.show()


7️. Generate a continuous uniform distribution using NumPy and visualize it

In [None]:
import numpy as np
import matplotlib.pyplot as plt

data = np.random.uniform(0, 1, 1000)

plt.hist(data, bins=30, color='orange', edgecolor='black', density=True)
plt.title("Histogram of Continuous Uniform Distribution")
plt.xlabel("Value")
plt.ylabel("Density")
plt.grid()
plt.show()


8️. Simulate data from a normal distribution and plot its histogram

In [None]:
import numpy as np
import matplotlib.pyplot as plt

data = np.random.normal(loc=0, scale=1, size=1000)

plt.hist(data, bins=30, color='lightgreen', edgecolor='black', density=True)
plt.title("Histogram of Normal Distribution")
plt.xlabel("Value")
plt.ylabel("Density")
plt.grid()
plt.show()


9️. Write a Python function to calculate Z-scores from a dataset and plot them

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import zscore

data = np.random.normal(50, 10, 100)
z_scores = zscore(data)

plt.plot(z_scores, marker='o', linestyle='', color='blue')
plt.axhline(y=0, color='red', linestyle='--')
plt.title("Z-scores of Dataset")
plt.xlabel("Index")
plt.ylabel("Z-score")
plt.grid()
plt.show()


10. Implement the Central Limit Theorem (CLT) using Python for a non-normal distribution

In [None]:
import numpy as np
import matplotlib.pyplot as plt

means = []

# Non-normal population: exponential
for _ in range(1000):
    sample = np.random.exponential(scale=2.0, size=30)
    means.append(np.mean(sample))

plt.hist(means, bins=30, color='purple', edgecolor='black')
plt.title("CLT: Sampling Distribution of Mean from Exponential Distribution")
plt.xlabel("Sample Mean")
plt.ylabel("Frequency")
plt.grid()
plt.show()


15. Simulate multiple samples from a normal distribution and verify the Central Limit Theorem

In [None]:
import numpy as np
import matplotlib.pyplot as plt

means = []
for _ in range(1000):
    sample = np.random.normal(loc=50, scale=10, size=30)
    means.append(np.mean(sample))

plt.hist(means, bins=30, color='skyblue', edgecolor='black')
plt.title("Sampling Distribution of the Mean (CLT)")
plt.xlabel("Sample Mean")
plt.ylabel("Frequency")
plt.show()


16. Write a Python function to calculate and plot the standard normal distribution (mean = 0, std = 1)

In [None]:
from scipy.stats import norm

x = np.linspace(-4, 4, 1000)
y = norm.pdf(x, 0, 1)

plt.plot(x, y)
plt.title("Standard Normal Distribution")
plt.xlabel("Z")
plt.ylabel("Density")
plt.grid()
plt.show()


17. Generate random variables and calculate their corresponding probabilities using the binomial distribution

In [1]:
from scipy.stats import binom

n, p = 10, 0.5
x = np.arange(0, n+1)
pmf = binom.pmf(x, n, p)

for val, prob in zip(x, pmf):
    print(f"P(X={val}) = {prob:.4f}")


NameError: name 'np' is not defined

18. Write a Python program to calculate the Z-score for a given data point and compare it to a standard normal distribution

In [None]:
def z_score(x, mu, sigma):
    return (x - mu) / sigma

data_point = 75
mean = 70
std_dev = 5
z = z_score(data_point, mean, std_dev)
print(f"Z-score: {z:.2f}")


19. Implement hypothesis testing using Z-statistics for a sample dataset

In [None]:
from scipy.stats import norm

sample_mean = 52
pop_mean = 50
pop_std = 5
n = 30

z = (sample_mean - pop_mean) / (pop_std / np.sqrt(n))
p_value = 1 - norm.cdf(z)
print(f"Z = {z:.2f}, p-value = {p_value:.4f}")


20. Create a confidence interval for a dataset using Python and interpret the result

In [None]:
import scipy.stats as stats

data = np.random.normal(100, 15, 50)
mean = np.mean(data)
sem = stats.sem(data)
ci = stats.t.interval(0.95, len(data)-1, loc=mean, scale=sem)
print(f"95% Confidence Interval: {ci}")


21. Generate data from a normal distribution, then calculate and interpret the confidence interval for its mean

In [None]:
data = np.random.normal(60, 10, 100)
mean = np.mean(data)
sem = stats.sem(data)
ci = stats.t.interval(0.95, df=len(data)-1, loc=mean, scale=sem)
print(f"Mean: {mean:.2f}, 95% CI: {ci}")


22. Write a Python script to calculate and visualize the probability density function (PDF) of a normal distribution

In [None]:
mu, sigma = 0, 1
x = np.linspace(-4, 4, 1000)
y = norm.pdf(x, mu, sigma)

plt.plot(x, y)
plt.title("PDF of Normal Distribution")
plt.xlabel("x")
plt.ylabel("Probability Density")
plt.grid()
plt.show()


23. Use Python to calculate and interpret the cumulative distribution function (CDF) of a Poisson distribution

In [None]:
from scipy.stats import poisson

mu = 3
x = np.arange(0, 10)
cdf = poisson.cdf(x, mu)

plt.step(x, cdf, where='mid')
plt.title("CDF of Poisson Distribution (μ=3)")
plt.xlabel("x")
plt.ylabel("CDF")
plt.grid()
plt.show()


24. Simulate a random variable using a continuous uniform distribution and calculate its expected value

In [None]:
data = np.random.uniform(10, 50, 1000)
expected_value = np.mean(data)
print(f"Expected Value: {expected_value:.2f}")


25. Write a Python program to compare the standard deviations of two datasets and visualize the difference


In [None]:
data1 = np.random.normal(50, 5, 1000)
data2 = np.random.normal(50, 10, 1000)

std1, std2 = np.std(data1), np.std(data2)
print(f"Standard Deviation - Data1: {std1:.2f}, Data2: {std2:.2f}")

plt.hist(data1, alpha=0.5, label='Std Dev 5')
plt.hist(data2, alpha=0.5, label='Std Dev 10')
plt.legend()
plt.title("Comparison of Standard Deviations")
plt.show()


26. Calculate the range and interquartile range (IQR) of a dataset generated from a normal distribution

In [None]:
data = np.random.normal(100, 20, 1000)
range_val = np.max(data) - np.min(data)
iqr = np.percentile(data, 75) - np.percentile(data, 25)
print(f"Range: {range_val:.2f}, IQR: {iqr:.2f}")


27. Implement Z-score normalization on a dataset and visualize its transformation

In [None]:
from sklearn.preprocessing import StandardScaler

data = np.random.normal(60, 15, 1000).reshape(-1, 1)
scaler = StandardScaler()
normalized_data = scaler.fit_transform(data)

plt.hist(normalized_data, bins=30)
plt.title("Z-score Normalized Data")
plt.show()


28. Write a Python function to calculate the skewness and kurtosis of a dataset generated from a normal distribution

In [None]:
from scipy.stats import skew, kurtosis

data = np.random.normal(0, 1, 1000)
print(f"Skewness: {skew(data):.2f}")
print(f"Kurtosis: {kurtosis(data):.2f}")
