**Q1. What are the three measures of central tendency?**

The three measures of central tendency are:
1. Mean: The sum of all values divided by the number of values in a dataset.
2. Median: The middle value of a dataset when arranged in ascending order.
3. Mode: The value that appears most frequently in a dataset.

**Q2. What is the difference between the mean, median, and mode? How are they used to measure the central tendency of a dataset?**

**Difference between Mean, Median, and Mode:**

- **Mean:**
    - Sum of all values divided by the number of values.
    - Sensitive to outliers.
    - Best measure when data is normally distributed.
- **Median:**
    - Middle value when data is arranged in ascending order.
    - Not affected by outliers.
    - Best measure when data is skewed.
- **Mode:**
    - Value that appears most frequently.
    - Not affected by outliers.
    - Best measure when data has multiple modes or is categorical.

**Measuring Central Tendency:**

- **Mean:**
    - Provides the average value of the dataset.
    - Useful for comparing datasets with similar distributions.
- **Median:**
    - Provides the value that divides the dataset into two equal halves.
    - Useful for finding the typical value when data is skewed.
- **Mode:**
    - Provides the most common value in the dataset.
    - Useful for identifying the most frequent category or value.

**Q3. Measure the three measures of central tendency for the given height data: [178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]**

In [3]:
import numpy as np
from scipy import stats
data = [178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]
mean = np.mean(data)
median = np.median(data)
mode = stats.mode(data)
print("Mean:", mean)
print("Median:", median)
print("Mode:", mode)

Mean: 177.01875
Median: 177.0
Mode: ModeResult(mode=177.0, count=3)


**Q4. Find the standard deviation for the given data:[178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]**

In [4]:
data =[178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]
std_dev = np.std(data)
print("Standard Deviation:", std_dev)

Standard Deviation: 1.7885814036548633


**Q5. How are measures of dispersion such as range, variance, and standard deviation used to describe the spread of a dataset? Provide an example.**

**Measures of Dispersion:**

- **Range:**
    - Difference between the maximum and minimum values.
    - Simple to calculate.
    - Sensitive to outliers.
- **Variance:**
    - Average of the squared differences between each data point and the mean.
    - Measures how spread out the data is.
    - Not easily interpretable.
- **Standard Deviation:**
    - Square root of the variance.
    - Measures how much the data deviates from the mean.
    - More interpretable than variance.

**Example:**

Consider two datasets:

- Dataset A: [170, 172, 174, 176, 178]
- Dataset B: [160, 170, 180, 190, 200]

**Range:**

- Dataset A: 8 (178 - 170)
- Dataset B: 40 (200 - 160)

**Standard Deviation:**

- Dataset A: 2.83
- Dataset B: 15.81

**Interpretation:**

- Dataset A has a smaller range and standard deviation, indicating that the data is more tightly clustered around the mean.
- Dataset B has a larger range and standard deviation, indicating that the data is more spread out.

**Conclusion:**

Measures of dispersion provide valuable information about the spread of a dataset. They can be used to compare different datasets, identify outliers, and make inferences about the population from which the data was drawn.


**Q6. What is a Venn diagram?**


A Venn diagram is a graphical representation of the logical relationships between different sets. It uses overlapping circles or other shapes to illustrate the elements that belong to each set and the elements that are common to multiple sets.

Venn diagrams are commonly used in mathematics, logic, probability, and statistics to visually depict set operations such as union, intersection, and complement. They are also used in other fields such as computer science, engineering, and business to represent relationships between different groups or categories.


**Q7. For the two given sets A = (2,3,4,5,6,7) & B = (0,2,6,8,10). Find: **    
(i) A intersect B**    
(ii) **A ⋃ B**

In [6]:

A = set([2,3,4,5,6,7])
B = set([0,2,6,8,10])

# (i) A intersect B
intersection = A & B
print("Intersection of sets A and B:", intersection)

# (ii) A ⋃ B
union = A | B
print("Union of sets A and B:", union)

Intersection of sets A and B: {2, 6}
Union of sets A and B: {0, 2, 3, 4, 5, 6, 7, 8, 10}


**Q8. What do you understand about skewness in data?**

Skewness is a measure of the asymmetry of a distribution. It indicates the extent to which the data is clustered around the mean.

- **Positive Skewness:**
    - Distribution is skewed to the right.
    - Mean is greater than the median.
    - Majority of the data is clustered on the left side of the distribution.
- **Negative Skewness:**
    - Distribution is skewed to the left.
    - Mean is less than the median.
    - Majority of the data is clustered on the right side of the distribution.
- **Symmetric Skewness:**
    - Distribution is symmetrical.
    - Mean is equal to the median.
    - Data is evenly distributed on both sides of the distribution.

Skewness can be measured using various methods, such as:

- **Pearson's Skewness Coefficient:**
    - Measures the direction and magnitude of skewness.
    - Positive values indicate positive skewness, negative values indicate negative skewness, and values close to zero indicate symmetry.
- **Bowley's Skewness Coefficient:**
    - Similar to Pearson's coefficient, but less sensitive to outliers.
- **Fisher-Pearson Standardized Skewness Coefficient:**
    - Similar to Pearson's coefficient, but standardized to have a mean of 0 and a standard deviation of 1.

Skewness is an important concept in statistics as it provides insights into the shape of a distribution. It can be used to:

- Identify outliers.
- Compare different distributions.
- Make inferences about the underlying population.
- Select appropriate statistical methods.

**Q9. If a data is right skewed then what will be the position of median with respect to mean?**


In a right-skewed distribution, the median is typically located to the left of the mean. This is because the majority of the data is clustered on the left side of the distribution, resulting in a longer tail on the right side. As a result, the mean, which is influenced by the extreme values in the tail, is pulled towards the right, while the median, which is less affected by outliers, remains closer to the center of the distribution.



**Q10. Explain the difference between covariance and correlation. How are these measures used in statistical analysis?**

**Covariance:**
- Measures the linear relationship between two variables.
- Can be positive, negative, or zero.
- Positive covariance indicates a positive linear relationship (as one variable increases, the other tends to increase).
- Negative covariance indicates a negative linear relationship (as one variable increases, the other tends to decrease).
- Zero covariance indicates no linear relationship.
- Sensitive to the units of measurement.

**Correlation:**
- Measures the strength and direction of the linear relationship between two variables.
- Standardized version of covariance, ranging from -1 to 1.
- 1 indicates a perfect positive linear relationship.
- -1 indicates a perfect negative linear relationship.
- 0 indicates no linear relationship.
- Unitless, making it comparable across different variables with different units.

**Statistical Analysis:**
- Covariance is used to:
    - Identify the direction of the linear relationship between two variables.
    - Quantify the strength of the linear relationship.
    - Explore the relationship between two variables in a scatter plot.
- Correlation is used to:
    - Determine the strength and direction of the linear relationship between two variables.
    - Compare the strength of the linear relationship between different pairs of variables.
    - Make predictions about one variable based on the value of another variable.


**Q11. What is the formula for calculating the sample mean? Provide an example calculation for a dataset.**

In [7]:
# Formula for calculating the sample mean:

sample_mean = sum(data) / len(data)

# Example calculation:

data = [178, 177, 176, 177, 178.2, 178, 175, 179, 180, 175, 178.9, 176.2, 177, 172.5, 178, 176.5]

sample_mean = sum(data) / len(data)

print("Sample mean:", sample_mean)

Sample mean: 177.01875


**Q12. For a normal distribution data what is the relationship between its measure of central tendency?**

In a normal distribution, the mean, median, and mode are all equal. This is because the distribution is symmetrical, with the data evenly distributed on both sides of the mean. As a result, the mean, which is the sum of all values divided by the number of values, the median, which is the middle value when the data is arranged in ascending order, and the mode, which is the most frequently occurring value, all coincide at the same point.

This relationship between the measures of central tendency in a normal distribution is important because it allows researchers to use any of the three measures to describe the typical value of the data. For example, if a researcher is interested in the average height of a population, they could use the mean, median, or mode to calculate this value, as they would all provide the same result.


**Q13. How is covariance different from correlation?**


**Meaning:**   
Covariance indicates the extent to which two variables change together. If the covariance is positive, the variables tend to increase or decrease together, whereas if it’s negative, one variable tends to increase when the other decreases. On the other hand, correlation not only assesses the direction of the relationship (like covariance) but also its strength. It measures how closely the two variables move in relation to each other.

**Range of Values:**  
 Covariance can take any value from negative infinity to positive infinity, making its interpretation difficult. Correlation, however, is standardized and always falls between -1 (perfect negative correlation) and +1 (perfect positive correlation), making it easier to interpret.

**Units:**    
Covariance is sensitive to the units of the variables. Its unit is the product of the units of the two variables. Correlation, however, is dimensionless and does not have units. This makes correlation a more robust measure as it remains unaffected by changes in scale or units

**Standardization:**   
Covariance is unstandardized, meaning it depends on the scale of the variables. Correlation, on the other hand, is a standardized measure, meaning it’s independent of the scale of the variables.

**Q14. How do outliers affect measures of central tendency and dispersion? Provide an example.**

**Outliers:**
Outliers are extreme values that are significantly different from the rest of the data. They can have a significant impact on measures of central tendency and dispersion.

**Measures of Central Tendency:**
- **Mean:** Outliers can significantly affect the mean, as they are included in the sum of all values. A single outlier can pull the mean towards its extreme value.
- **Median:** Outliers have less impact on the median, as it is the middle value when the data is arranged in ascending order. However, if there is an even number of data points, an outlier can still influence the median if it falls between the two middle values.
- **Mode:** Outliers have no effect on the mode, as it is the most frequently occurring value.

**Measures of Dispersion:**
- **Range:** Outliers can significantly affect the range, as they increase the difference between the maximum and minimum values.
- **Variance:** Outliers can significantly affect the variance, as they increase the squared differences between each data point and the mean.
- **Standard Deviation:** Outliers can significantly affect the standard deviation, as it is the square root of the variance.

**Example:**
Consider the following two datasets:

Dataset A: [170, 172, 174, 176, 178]
Dataset B: [170, 172, 174, 176, 178, 1000]

**Measures of Central Tendency:**
- **Mean:**
    - Dataset A: 174
    - Dataset B: 342
- **Median:**
    - Dataset A: 174
    - Dataset B: 174
- **Mode:**
    - Dataset A: 170, 172, 174, 176, 178
    - Dataset B: 170, 172, 174, 176, 178

**Measures of Dispersion:**
- **Range:**
    - Dataset A: 8
    - Dataset B: 900
- **Variance:**
    - Dataset A: 4
    - Dataset B: 641600
- **Standard Deviation:**
    - Dataset A: 2
    - Dataset B: 801

As observed, the outlier in Dataset B significantly impacts the mean, range, variance, and standard deviation, while it has no effect on the median or mode. This demonstrates the importance of considering outliers when interpreting measures of central tendency and dispersion.
