### Assignment Solutions: Statistics Basics

---

#### **1. Explain the different types of data (qualitative and quantitative) and provide examples of each. Discuss nominal, ordinal, interval, and ratio scales.**

**Solution:**

- **Qualitative Data (Categorical Data):**
  - **Nominal Scale:** Data is categorized without any order or ranking.
    - Example: Gender (Male, Female), Blood Type (A, B, AB, O).
  - **Ordinal Scale:** Data is categorized with a specific order or ranking.
    - Example: Education Level (High School, Bachelor’s, Master’s, PhD), Customer Satisfaction (Poor, Fair, Good, Excellent).

- **Quantitative Data (Numerical Data):**
  - **Interval Scale:** Data has a specific order, and the difference between values is meaningful, but there is no true zero point.
    - Example: Temperature in Celsius (20°C, 30°C), IQ Scores.
  - **Ratio Scale:** Data has a specific order, the difference between values is meaningful, and there is a true zero point.
    - Example: Height (0 cm, 150 cm, 180 cm), Weight (0 kg, 50 kg, 100 kg).

---

#### **2. What are the measures of central tendency, and when should you use each? Discuss the mean, median, and mode with examples and situations where each is appropriate.**

**Solution:**

- **Mean:** The average of all data points. It is sensitive to outliers.
  - Example: The mean of `[2, 3, 5, 7, 11]` is `(2+3+5+7+11)/5 = 5.6`.
  - Use when data is symmetrically distributed without outliers.

- **Median:** The middle value when data is ordered. It is not affected by outliers.
  - Example: The median of `[2, 3, 5, 7, 11]` is `5`.
  - Use when data is skewed or has outliers.

- **Mode:** The most frequently occurring value in the dataset.
  - Example: The mode of `[2, 3, 3, 5, 7]` is `3`.
  - Use for categorical data or when identifying the most common value.

---

#### **3. Explain the concept of dispersion. How do variance and standard deviation measure the spread of data?**

**Solution:**

- **Dispersion** refers to how spread out the data points are in a dataset.
- **Variance (σ²):** Measures the average squared deviation from the mean.
  - Formula: 
    ```
    σ² = Σ(xᵢ - μ)² / N
    ```
  - Example: For `[2, 4, 6]`, the mean is `4`. Variance = `[(2-4)² + (4-4)² + (6-4)²]/3 = 2.67`.
- **Standard Deviation (σ):** The square root of variance, providing a measure of spread in the same units as the data.
  - Example: Standard deviation = `√2.67 ≈ 1.63`.

---

#### **4. What is a box plot, and what can it tell you about the distribution of data?**

**Solution:**

- A **box plot** is a graphical representation of data that shows the distribution based on a five-number summary: minimum, first quartile (Q1), median, third quartile (Q3), and maximum.
- It can tell you:
  - The **median** (middle line).
  - The **interquartile range (IQR)** (Q3 - Q1).
  - **Outliers** (points outside `1.5 * IQR` from Q1 or Q3).
  - The **skewness** of the data (if the box is not symmetric).

---

#### **5. Discuss the role of random sampling in making inferences about populations.**

**Solution:**

- **Random sampling** ensures that every member of the population has an equal chance of being selected, reducing bias.
- It allows us to make **inferences** about the population based on sample data.
- Example: If you randomly sample 100 students from a school to estimate the average height, the sample mean can be used to estimate the population mean.

---

#### **6. Explain the concept of skewness and its types. How does skewness affect the interpretation of data?**

**Solution:**

- **Skewness** measures the asymmetry of the data distribution.
  - **Positive Skew:** The tail on the right side is longer. Mean > Median.
    - Example: Income distribution (few high-income individuals).
  - **Negative Skew:** The tail on the left side is longer. Mean < Median.
    - Example: Age at retirement (most retire around the same age, few retire early).
- Skewness affects the interpretation of data by influencing the mean and median. In skewed data, the median is often a better measure of central tendency.

---

#### **7. What is the interquartile range (IQR), and how is it used to detect outliers?**

**Solution:**

- **IQR** is the range between the first quartile (Q1) and the third quartile (Q3): `IQR = Q3 - Q1`.
- **Outliers** are data points that fall below `Q1 - 1.5 * IQR` or above `Q3 + 1.5 * IQR`.
  - Example: For data `[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]`, Q1 = 3, Q3 = 8, IQR = 5. Outliers are below `3 - 7.5 = -4.5` or above `8 + 7.5 = 15.5`. No outliers in this case.

---

#### **8. Discuss the conditions under which the binomial distribution is used.**

**Solution:**

- The **binomial distribution** is used when:
  1. There are a fixed number of trials (n).
  2. Each trial has only two outcomes: success or failure.
  3. The probability of success (p) is constant for each trial.
  4. Trials are independent.
- Example: Tossing a coin 10 times and counting the number of heads.

---

#### **9. Explain the properties of the normal distribution and the empirical rule (68-95-99.7 rule).**

**Solution:**

- **Normal Distribution:**
  - Symmetric, bell-shaped curve.
  - Mean = Median = Mode.
  - Defined by mean (μ) and standard deviation (σ).
- **Empirical Rule:**
  - 68% of data falls within `±1σ` of the mean.
  - 95% of data falls within `±2σ` of the mean.
  - 99.7% of data falls within `±3σ` of the mean.
  - Example: If mean = 100, σ = 15, then 68% of data falls between 85 and 115.

---

#### **10. Provide a real-life example of a Poisson process and calculate the probability for a specific event.**

**Solution:**

- **Poisson Process:** Models the number of events occurring in a fixed interval of time or space.
  - Example: Number of emails received in an hour.
  - Formula: 
    ```
    P(X = k) = (λᵏ * e^(-λ)) / k!
    ```
    where λ is the average rate.
  - Example Calculation: If λ = 5 emails/hour, the probability of receiving exactly 3 emails is:
    ```
    P(X = 3) = (5³ * e^(-5)) / 3! = (125 * 0.0067) / 6 ≈ 0.1404.
    ```

---

#### **11. Explain what a random variable is and differentiate between discrete and continuous random variables.**

**Solution:**

- **Random Variable:** A variable whose possible values are outcomes of a random phenomenon.
  - **Discrete Random Variable:** Takes on a countable number of distinct values.
    - Example: Number of heads in 10 coin tosses.
  - **Continuous Random Variable:** Takes on an infinite number of possible values within a range.
    - Example: Height of students in a class.

---

#### **12. Provide an example dataset, calculate both covariance and correlation, and interpret the results.**

**Solution:**

- **Example Dataset:**
  - X = `[1, 2, 3, 4, 5]`
  - Y = `[2, 4, 5, 4, 5]`

- **Covariance:**
  ```
  Cov(X, Y) = Σ(xᵢ - x̄)(yᵢ - ȳ) / n
  ```
  - Calculation:
    ```
    Cov(X, Y) = [(1-3)(2-4) + (2-3)(4-4) + ...] / 5 = 1.0
    ```
  - Interpretation: Positive covariance indicates that X and Y tend to move in the same direction.

- **Correlation:**
  ```
  r = Cov(X, Y) / (σₓ * σᵧ)
  ```
  - Calculation:
    ```
    r = 1.0 / (1.58 * 1.14) ≈ 0.55
    ```
  - Interpretation: A correlation of 0.55 indicates a moderate positive relationship between X and Y.

--- 