#### Q1. What are the three measures of central tendency?

Ans.

1. Mean.
2. Median.
3. Mode

---

#### Q2. What is the difference between the mean, median, and mode? How are they used to measure the central tendency of a dataset?

Ans.

1.Mean (Arithmetic Average)
- Definition:
  - The mean is the sum of all data points divided by the number of data points.
  - Formula:
                        Mean =   (Sum of all data points / Number of data points)
 
- Key Features:
  - Sensitive to Outliers: Outliers can skew the mean, making it unrepresentative of the dataset's central value.
  - Usage:
    - Best for symmetrically distributed data without extreme values.
    - Common in fields like finance, economics, and education.
  - Example: For {10,15,20} the mean is:
                        Mean = ((10 + 15 + 20) / 3) = 15.

2.Median
- Definition:
  - The median is the middle value of a dataset when arranged in ascending order. For even-numbered datasets, it's the average of the two middle values.
- Key Features:
  - Resistant to Outliers: The median is unaffected by extreme values, making it a robust measure of central tendency for skewed data.
  - Usage:
    - Suitable for skewed distributions or datasets with outliers.
    - Frequently used in income, property prices, and other data with a long tail.
  - Example: For {10, 15, 100} the median is 15, not influenced by the outlier 100.

3.Mode
- Definition:
  - The mode is the most frequently occurring value(s) in a dataset. A dataset can have:
    - No mode (if all values are unique).
    - One mode (unimodal).
    - More than one mode (bimodal or multimodal).
- Key Features:
  - Resistant to Outliers: Outliers do not affect the mode.
  - Usage:
    - Most useful for categorical data to identify the most common category or preference.
    - In numerical data, it highlights the most frequent value(s).
  - Example: For {10,15,15,20}, the mode is 15.

---

#### Q3. Measure the three measures of central tendency for the given height data:
[178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]

Ans.

In [1]:
data = [178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]

import numpy as np
from scipy import stats

In [2]:
mean_height = np.mean(data)
median_height = np.median(data)
mode_height = stats.mode(data)

In [3]:
print("Mean:", mean_height)
print("Median:", median_height)
print("Mode:", mode_height)

Mean: 177.01875
Median: 177.0
Mode: ModeResult(mode=177.0, count=3)


---

#### Q4. Find the standard deviation for the given data:
[178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]

Ans.

In [4]:
data = [178, 177, 176, 177, 178.2, 178, 175, 179, 180, 175, 178.9, 176.2, 177, 172.5, 178, 176.5]

import numpy as np

std_dev = np.std(data)
print("Standard Deviation:", std_dev)


Standard Deviation: 1.7885814036548633


---

#### Q5. How are measures of dispersion such as range, variance, and standard deviation used to describe the spread of a dataset? Provide an example.

Ans.

1.Range
- Definition:
  - The range is the difference between the maximum and minimum values in the dataset.
  - Formula: Range = Max value − Min value
- Use:
  - Describes the overall spread of the data.
  - Limitation: Sensitive to outliers, as it only considers the extreme values.

2.Variance
- Definition:
  - Variance measures the average of the squared differences from the mean. It gives a sense of how much the data points deviate from the mean, but it is expressed in squared units of the original data, making it harder to interpret directly.
- Use:
  - Describes how spread out the data is from the mean.
  - Limitation: Since variance is expressed in squared units, it is not directly interpretable in the context of the original data.

3. Stamdard Deviation
- Definition:
  - Standard deviation is the square root of the variance, which brings the measure of spread back to the same units as the original data, making it more interpretable.
  - formula:
  Standard Deviation= (Variance)^1/2.
- Use:
  - Describes the typical deviation from the mean.
  - Interpretability: More intuitive than variance because it uses the same units as the data.

**Example:**​

In [5]:
data = [178, 177, 176, 177, 178.2, 178, 175, 179, 180, 175, 178.9, 176.2, 177, 172.5, 178, 176.5]

range_data = np.max(data) - np.min(data)
variance_data = np.var(data) 
std_deviation_data = np.std(data) 

print(f"Range: {range_data}")
print(f"Variance: {variance_data}")
print(f"Standard Deviation: {std_deviation_data}")


Range: 7.5
Variance: 3.199023437500001
Standard Deviation: 1.7885814036548633


---

#### Q6. What is a Venn diagram?

Ans.

A Venn diagram is a graphical representation used to show the relationships between different sets or groups of items. It visually illustrates how sets intersect, overlap, or remain distinct. The diagram typically consists of overlapping circles (or other shapes), where each circle represents a set, and the area where circles overlap represents the elements that are common to those sets.

Key Features of a Venn Diagram:
- Sets: Each circle or shape in the diagram represents a set or category.
- Overlap: The region where sets overlap indicates elements that are common to both sets.
- Non-overlapping regions: The parts of the circles that do not overlap represent elements that belong exclusively to one set.
- Universal Set: Sometimes, a rectangle surrounding the circles represents the universal set, which contains all possible elements under consideration.

---

#### Q7. For the two given sets A = (2,3,4,5,6,7) & B = (0,2,6,8,10). Find:
(i) A (intersect) B  
(ii) A (union)⋃ B

Ans.

(i) => (2, 6).  
(ii) => (0, 2, 3, 4, 5, 6, 7, 8, 10).

---

#### Q8. What do you understand about skewness in data?

Ans.

Skewness in data refers to the asymmetry in the distribution of data values. It provides insight into the shape of the dataset and indicates whether the data is symmetrical or if it is skewed to one side. Skewness helps us understand the direction in which the data is stretched or compressed.

**- Types of Skewness:**
1. Right Skew-
- Definition: A distribution is positively skewed if the right tail (larger values) is longer or fatter than the left tail (smaller values).
- Characteristics:
  - The mean is greater than the median.
  - Most of the data is clustered on the left side, and the tail on the right side stretches out.
  - Positive skew often occurs with data that has a natural lower bound but no upper bound (e.g., income).
- Example: The income distribution in a population where a few people earn extremely high amounts.  

2. Left Skew-
- Definition: A distribution is negatively skewed if the left tail (smaller values) is longer or fatter than the right tail (larger values).
- Characteristics:
  - The mean is less than the median.
  - Most of the data is clustered on the right side, and the tail on the left side stretches out.
  - Negative skew can happen when there is a natural upper bound but no lower bound (e.g., age at retirement).
- Example: Age at retirement where most people retire around 60 but a few retire much later.

---

#### Q9. If a data is right skewed then what will be the position of median with respect to mean?

Ans.

If the data is right-skewed (positively skewed), the mean will be greater than the median.

- Explanation:
In a right-skewed distribution, the tail on the right side of the distribution is longer or fatter than the left side. This causes the mean to be pulled in the direction of the larger values (i.e., toward the tail on the right), which makes the mean greater than the median.

- General Relationship:
                    Mean > Median in a right-skewed distribution.
This is because the mean is more sensitive to extreme values (outliers) in the tail, while the median is less affected by them, as it is the middle value in the ordered dataset. Therefore, in a right-skewed distribution, the median will be positioned to the left of the mean.

- Example:
Consider the dataset of exam scores:
                    Scores={20,25,30,35,40,45,100}
  - Mean: 39.29
  - Median: 35  

Here, the mean (39.29) is greater than the median (35), which reflects the right skewness in the data caused by the extreme value (100).

---

#### Q10. Explain the difference between covariance and correlation. How are these measures used in statistical analysis?

Ans.

Covariance and correlation are both measures used to describe the relationship between two variables. However, they differ in terms of scale, interpretation, and use.

**Covariance:**
- Definition: Covariance measures the degree to which two variables change together. It provides information about the direction of the relationship.
- Range: No fixed range. The value can be positive, negative, or zero.
  - Positive covariance: Variables tend to increase or decrease together.
  - Negative covariance: One variable increases as the other decreases.
  - Zero covariance: No linear relationship.
- Units: Depends on the units of the variables; it's not standardized.

**Correlation:**
- Definition: Correlation standardizes covariance to provide a dimensionless measure of the strength and direction of the linear relationship between two variables.
- Range: Always between 
  - 1: Perfect positive linear relationship.
  - 0: No linear relationship.
  - −1: Perfect negative linear relationship.
- Units: None; it is standardized and dimensionless.

---

#### Q11. What is the formula for calculating the sample mean? Provide an example calculation for a dataset.

Ans.

Formula of sample mean:  
                (sum of all values) / (number of obervation)

In [6]:
d = [60, 72, 65, 70, 68]

sample_mean = sum(d)/len(d)
print(f"The sample mean is: {sample_mean}")

The sample mean is: 67.0


---

#### Q12. For a normal distribution data what is the relationship between its measure of central tendency?

Ans.

For a normal distribution, the three measures of central tendency — mean, median, and mode — are equal. This relationship occurs because the normal distribution is symmetric about its center.  

**Key Points:**  
1.Mean:
The arithmetic average of the data.
In a normal distribution, the mean lies at the center of symmetry.

2.Median:
The middle value when the data is arranged in ascending order.
In a symmetric distribution like the normal distribution, the median divides the data into two equal halves.

3.Mode:
The value that appears most frequently in the dataset.
For a normal distribution, the mode corresponds to the peak of the curve, which aligns with the mean and median.

**Relationship:**
For a normal distribution: Mean = Median = Mode


---

#### Q13. How is covariance different from correlation?

Ans.

1.Covariance:  
- Positive covariance: Variables tend to move in the same direction.
- Negative covariance: Variables tend to move in opposite directions.
- Magnitude depends on the scale of the variables.

2.Correlation:  
- A standardized version of covariance, making it easier to compare relationships across different datasets.
- Values close to 1 or −1: Strong linear relationship.
- Value close to 0: Weak or no linear relationship.

---

#### Q14. How do outliers affect measures of central tendency and dispersion? Provide an example.

Ans.

**Effects on Measures of Central Tendency:**  
1.Mean:  
  - The mean is highly affected by outliers since it involves all data points. An outlier can skew the mean towards the extreme value.  

2.Median:  
  - The median is resistant to outliers because it depends only on the middle value(s) and not on the magnitude of the data.  

3.Mode:  
  - The mode is typically unaffected by outliers unless the outlier value is repeated frequently.  

**Effects on Measures of Dispersion:**  
1.Range:  
  - The range is highly sensitive to outliers as it is determined by the minimum and maximum values in the dataset.

2.Variance and Standard Deviation:  
  - Both are significantly affected by outliers since they depend on squared deviations from the mean, amplifying the impact of extreme values.

3.Interquartile Range (IQR):  
  - IQR is resistant to outliers because it considers only the middle 50% of the data.