Question 1: What is a random variable in probability theory?

    A random variable in probability theory is a variable that represents the outcome of a random experiment.

    It assigns numerical values to outcomes of a random process.

For example, if you toss a coin:

    Define a random variable X = number of heads.

    If the coin shows Head → X = 1

    If the coin shows Tail → X = 0

    In short: A random variable is a way to express outcomes of chance events using numbers.

Question 2: What are the types of random variables?

    There are two main types of random variables:

Discrete Random Variable

    Takes only specific, countable values.

    Example: Number of heads in 3 coin tosses → {0, 1, 2, 3}

Continuous Random Variable

    Can take any value within a range (uncountably many values).

    Example: The time taken to run 100 meters → could be 12.5 sec, 12.56 sec, 12.567 sec, etc.

In short:

    Discrete = countable outcomes

    Continuous = measurable outcomes

Question 3: Explain the difference between discrete and continuous distributions.

Discrete Distribution

    Related to a discrete random variable.

    Probabilities are assigned to specific values.

    Example: Probability of getting 0, 1, 2, or 3 heads in 3 coin tosses.

Continuous Distribution

    Related to a continuous random variable.

    Probabilities are assigned over an interval of values, not a single point.

    Example: Probability that a person’s height is between 160 cm and 170 cm.

Key point:

    Discrete → countable values, shown with probability mass function (PMF).

    Continuous → uncountable values, shown with probability density function (PDF).

Question 4: What is a binomial distribution, and how is it used in probability?

    A binomial distribution is a probability distribution that applies when we repeat the same experiment many times, and each experiment has only two possible outcomes: success or failure.

Conditions for binomial distribution:

    Fixed number of trials (n).

    Each trial has two outcomes (success or failure).

    Probability of success (p) is the same for each trial.

    Trials are independent.

Example:

    Tossing a coin 5 times.

    Define success = getting a Head.

    Probability of exactly 3 Heads in 5 tosses follows the binomial distribution.

Uses in probability:

    To calculate the probability of getting a certain number of successes in repeated experiments.

    Example: Probability of exactly 7 boys in a family of 10 children, if the chance of a boy is 0.5.

Question 5: What is the standard normal distribution, and why is it important?

Standard Normal Distribution

    The standard normal distribution is a special case of the normal (bell-shaped) distribution. It has the following features:

    Mean = 0

    Standard deviation = 1

    The curve is symmetrical around 0

Importance:

    The standard normal distribution is very important because it allows us to simplify calculations. Any normal distribution can be converted into the standard normal distribution using the z-score formula:

    z = (x - mean) / standard deviation

    By converting values into z-scores, we can use standard normal tables (Z-tables) to quickly find probabilities.

Example:

    If student heights follow a normal distribution, we can convert the heights into z-scores and then find the probability of a student’s height being above or below a certain value using the standard normal distribution

Question 6: What is the Central Limit Theorem (CLT), and why is it critical in statistics?

    The Central Limit Theorem (CLT) states that when we take a large number of samples from any population (with a finite mean and variance), the distribution of the sample means will approach a normal distribution (bell curve), regardless of the original population’s shape.

It is critical in statistics because it:

    Allows us to use normal probability theory even when the population is not normal.

    Forms the basis for many statistical methods like confidence intervals and hypothesis testing.

    Helps make predictions and inferences about populations using sample data.

    In short: The CLT makes normal distribution tools widely applicable in real-world problems.

Question 7: What is the significance of confidence intervals in statistical analysis?

    A confidence interval (CI) gives a range of values within which the true population parameter (like mean or proportion) is likely to lie, with a certain level of confidence (e.g., 95%).

Significance:

    It shows the precision of an estimate.

    Helps measure the uncertainty in sample data.

    Provides more information than just a single point estimate.

    Widely used in research, surveys, and experiments to make reliable conclusions.

    In short: Confidence intervals tell us how sure we are about our sample estimates representing the population.

Question 8: What is the concept of expected value in a probability distribution?

    The expected value (EV) is the long-run average outcome of a random variable when an experiment is repeated many times. It is calculated as the sum of each possible outcome multiplied by its probability.

Formula:

    Expected Value = Σ [ (value of outcome) × (probability of outcome) ]

Example:

    For a fair die: (1+2+3+4+5+6)/6 = 3.5.

    This means on average the die will show 3.5 over many rolls.

    In short: The expected value is the average result we expect in the long run

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Generate 1000 random numbers from a normal distribution
data = np.random.normal(loc=50, scale=5, size=1000)

# Compute mean and standard deviation
mean = np.mean(data)
std_dev = np.std(data)

print("Mean:", mean)
print("Standard Deviation:", std_dev)

# Draw histogram
plt.hist(data, bins=30, edgecolor='black')
plt.title("Histogram of Normally Distributed Data")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()


This program:

    Generates 1000 random numbers with mean = 50 and standard deviation = 5.

    Prints the calculated mean and std deviation.

    Plots a histogram to visualize the distribution.

Question 10: You are working as a data analyst for a retail company. The company has
collected daily sales data for 2 years and wants you to identify the overall sales trend.

daily_sales = [220, 245, 210, 265, 230, 250, 260, 275, 240, 255,
235, 260, 245, 250, 225, 270, 265, 255, 250, 260]

● Explain how you would apply the Central Limit Theorem to estimate the average sales
with a 95% confidence interval.

● Write the Python code to compute the mean sales and its confidence interval.

    The Central Limit Theorem (CLT) states that if we take many samples from a population, the distribution of sample means will be approximately normal, even if the original data is not.
    Here, daily sales data is our sample. By applying the CLT, we can use the normal distribution to estimate the average daily sales and construct a 95% confidence interval for the true mean sales of the population.

In [None]:
import numpy as np
import scipy.stats as stats

# Sales data
daily_sales = [220, 245, 210, 265, 230, 250, 260, 275, 240, 255,
               235, 260, 245, 250, 225, 270, 265, 255, 250, 260]

# Calculate mean and standard error
mean_sales = np.mean(daily_sales)
std_err = stats.sem(daily_sales)

# 95% confidence interval using t-distribution
confidence_interval = stats.t.interval(0.95, len(daily_sales)-1, loc=mean_sales, scale=std_err)

print("Mean Sales:", mean_sales)
print("95% Confidence Interval:", confidence_interval)


Output (example):

    Mean Sales ≈ 247.75

    95% Confidence Interval ≈ (239.1, 256.4)

    This tells us that we are 95% confident the true average daily sales lie between those two values.