**Q1. What are the three measures of central tendency?**

The three measures of central tendency are:

1. Mean: The average value of a dataset, calculated by summing all values and dividing by the total number of observations.

2. Median: The middle value of a dataset when the values are arranged in ascending or descending order.

3. Mode: The most frequently occurring value in a dataset.

**Q2. What is the difference between the mean, median, and mode? How are they used to measure the
central tendency of a dataset?**

The mean, median, and mode are all measures of central tendency, but they differ in how they represent the typical or central value of a dataset:

1. Mean:
   - The mean is the average value of a dataset.
   - It is calculated by summing all values and dividing by the total number of observations.
   - The mean is sensitive to extreme values (outliers) in the dataset, as it considers all values equally.
   - It is commonly used when the data is normally distributed or symmetrically distributed.

2. Median:
   - The median is the middle value of a dataset when the values are arranged in ascending or descending order.
   - It is less affected by extreme values compared to the mean, making it more robust in the presence of outliers.
   - The median is especially useful when the data is skewed or when there are outliers that could significantly influence the mean.

3. Mode:
   - The mode is the most frequently occurring value in a dataset.
   - Unlike the mean and median, the mode can be applied to nominal or categorical data.
   - It is useful for identifying the most common value or category in a dataset.
   - In some cases, a dataset may have multiple modes (bimodal or multimodal), indicating more than one frequently occurring value.


**Q3. Measure the three measures of central tendency for the given height data:
[178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]**

In [3]:
import statistics as st
data=[178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]

mean=st.mean(data)
print("MEAN Is " ,mean)

median=st.median(data)
print("Median is ",median)

mode=st.mode(data)
print("Mode is ",mode)

MEAN Is  177.01875
Median is  177.0
Mode is  178


**Q4. Find the standard deviation for the given data:
[178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]t**

In [4]:
import numpy as np

dt=np.array([178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5])
print("Standard deviation of the data is ",np.std(dt))

Standard deviation of the data is  1.7885814036548633


**Q5. How are measures of dispersion such as range, variance, and standard deviation used to describe
the spread of a dataset? Provide an example.**

Measures of dispersion, including range, variance, and standard deviation, are used to quantify the spread or variability of a dataset. They provide valuable insights into how the individual data points are distributed around the central tendency measures, such as the mean or median. Here's how each measure is used and an example illustrating their application:

1. Range:
   - The range is the simplest measure of dispersion and represents the difference between the maximum and minimum values in a dataset.
   - It provides a rough estimate of the spread of data but is sensitive to outliers.
   - The larger the range, the more spread out the data points are.

   Example: Suppose you have a dataset representing the heights (in inches) of students in a class:
   Heights = [60, 62, 65, 68, 70]
   The range would be: Range = Max(Heights) - Min(Heights) = 70 - 60 = 10 inches.

2. Variance:
   - Variance measures the average squared deviation of each data point from the mean of the dataset.
   - It provides a more precise measure of spread than the range and considers all data points in the calculation.
   - Larger variance values indicate greater variability in the dataset.

   Example continuation: Using the same heights dataset,
   Heights = [60, 62, 65, 68, 70]
   First, calculate the mean height: Mean = (60 + 62 + 65 + 68 + 70) / 5 = 325 / 5 = 65 inches.
   Next, calculate the variance:
   Variance = [(60 - 65)^2 + (62 - 65)^2 + (65 - 65)^2 + (68 - 65)^2 + (70 - 65)^2] / 5
            = [(25) + (9) + (0) + (9) + (25)] / 5
            = 68 / 5
            = 13.6 square inches.

3. Standard Deviation:
   - The standard deviation is the square root of the variance.
   - It represents the average deviation of data points from the mean and is expressed in the same units as the original data.
   - It is a widely used measure of dispersion due to its interpretability and ease of comparison.

   Continuing the example: Calculate the standard deviation by taking the square root of the variance:
   Standard Deviation = √(13.6) ≈ 3.68 inches.


**Q6. What is a Venn diagram?**

A Venn diagram is a visual representation of the relationships between different sets or groups of objects. It consists of overlapping circles or shapes, with each representing a set, and the overlapping areas indicating common elements shared between the sets. Venn diagrams are widely used to illustrate logical relationships, intersections, and differences between sets in a clear and intuitive manner.

**Q7. For the two given sets A = (2,3,4,5,6,7) & B = (0,2,6,8,10). Find:
(i) A B
(ii) A ⋃ B**

In [None]:
(i) A B--{2,6}

(ii) A ⋃ B-{2,3,4,5,6,7,8,10}

**Q8. What do you understand about skewness in data?**

Skewness in data refers to the asymmetry or lack of symmetry in the distribution of values within a dataset. It indicates whether the data is concentrated more on one side of the distribution compared to the other. A dataset can exhibit three types of skewness:

1. Positive Skewness (Right Skewness):
   - In a positively skewed distribution, the tail of the distribution extends towards the right, meaning that the majority of the data is concentrated on the left side, and there are few extreme values on the right side.
   - The mean is typically greater than the median in a positively skewed distribution, as the presence of outliers on the right side pulls the mean towards higher values.
   - Positive skewness is also known as right skewness.

2. Negative Skewness (Left Skewness):
   - In a negatively skewed distribution, the tail of the distribution extends towards the left, indicating that the majority of the data is concentrated on the right side, with few extreme values on the left side.
   - The mean is typically less than the median in a negatively skewed distribution, as the presence of outliers on the left side pulls the mean towards lower values.
   - Negative skewness is also known as left skewness.

3. Symmetric Distribution:
   - In a symmetric distribution, the data is evenly distributed on both sides of the mean, with the tails of the distribution extending equally in both directions.
   - The mean and median are approximately equal in a symmetric distribution.



**Q9. If a data is right skewed then what will be the position of median with respect to mean?**

The mean will be greater than the median.
This occurs because in a right-skewed distribution, the tail of the distribution extends towards the higher values (right side), indicating that there are relatively few extreme values pulling the mean towards higher values. Consequently, the mean is influenced by these higher values, resulting in a higher mean compared to the median.

**Q10. Explain the difference between covariance and correlation. How are these measures used in
statistical analysis?**

Covariance and correlation are both measures of the relationship between two variables, but they differ in scale and interpretation:

1. Covariance:
   - Covariance measures the degree to which two variables change together.
   - It indicates the direction of the linear relationship between variables (positive, negative, or no relationship) and the magnitude of their joint variability.
   - Covariance can take on any value, positive, negative, or zero, depending on the relationship between the variables.
   - However, the scale of covariance depends on the scales of the variables being measured, making it difficult to interpret the strength of the relationship.

2. Correlation:
   - Correlation is a standardized measure of the linear relationship between two variables.
   - It ranges from -1 to 1, where:
     - Correlation of +1 indicates a perfect positive linear relationship,
     - Correlation of -1 indicates a perfect negative linear relationship,
     - Correlation of 0 indicates no linear relationship.
   - Unlike covariance, correlation is unitless and allows for easier interpretation of the strength and direction of the relationship between variables.
   - Correlation also helps in assessing the strength of the relationship relative to other relationships.

In statistical analysis:

- Covariance is used to measure the direction and strength of the linear relationship between two variables. However, interpreting covariance alone can be challenging due to its scale dependency.

- Correlation is used to quantify the strength and direction of the linear relationship between variables on a standardized scale. It provides a clearer understanding of the relationship between variables and allows for comparison across different datasets.

Both covariance and correlation are valuable tools in statistical analysis for identifying patterns, making predictions, and understanding the associations between variables in datasets. They are commonly used in fields such as finance, economics, social sciences, and machine learning.

**Q11. What is the formula for calculating the sample mean? Provide an example calculation for a
dataset.**

The formula for calculating the sample mean (average) is:

\[ \text{Sample Mean} = \frac{\sum_{i=1}^{n} x_i}{n} \]

Where:
- \( x_i \) represents each individual value in the dataset.
- \( n \) is the total number of observations in the dataset.

Here's an example calculation for a dataset:

Suppose we have the following dataset representing the scores of students in a class:
\[ \text{Scores} = \{85, 90, 88, 92, 87\} \]

To calculate the sample mean:
\[ \text{Sample Mean} = \frac{85 + 90 + 88 + 92 + 87}{5} \]
\[ \text{Sample Mean} = \frac{442}{5} \]
\[ \text{Sample Mean} = 88.4 \]

So, the sample mean of the dataset is \( 88.4 \). This indicates that, on average, the students scored \( 88.4 \) in the class.

In [5]:
# Calculate Sample Mean
# Example dataset representing the scores of students in a class
scores = [85, 90, 88, 92, 87]

# Calculate the sample mean
sample_mean = sum(scores) / len(scores)

# Print the sample mean
print("Sample Mean:", sample_mean)


Sample Mean: 88.4


**Q12. For a normal distribution data what is the relationship between its measure of central tendency?**

In a normal distribution:
- The mean, median, and mode are all equal.
- They are located at the center of the distribution.
- This holds true for any normal distribution.

**Q13. How is covariance different from correlation?**

Covariance measures the degree to which two variables change together. It indicates the direction of the linear relationship between variables (positive, negative, or no relationship) and the magnitude of their joint variability. However, the scale of covariance depends on the scales of the variables being measured, making it difficult to interpret the strength of the relationship.

Correlation, on the other hand, is a standardized measure of the linear relationship between two variables. It ranges from -1 to 1, where +1 indicates a perfect positive linear relationship, -1 indicates a perfect negative linear relationship, and 0 indicates no linear relationship. Unlike covariance, correlation is unitless and allows for easier interpretation of the strength and direction of the relationship between variables. It also helps in assessing the strength of the relationship relative to other relationships.

**Q14. How do outliers affect measures of central tendency and dispersion? Provide an example.**

Outliers can significantly affect measures of central tendency and dispersion:

1. Measures of Central Tendency:
   - Mean: Outliers can heavily influence the mean because it takes into account every value in the dataset. A single extreme value can pull the mean towards it, making it unrepresentative of the majority of the data.
   - Median: Outliers have minimal effect on the median, as it represents the middle value when the dataset is ordered. Since it's not influenced by extreme values, it can provide a more robust measure of central tendency in the presence of outliers.
   - Mode: Outliers generally have little to no effect on the mode, as it represents the most frequently occurring value. Unless the outlier occurs frequently enough to create a new mode, it typically doesn't impact the mode.

2. Measures of Dispersion:
   - Range: Outliers can significantly affect the range, especially if they are extreme values. The range is the difference between the maximum and minimum values in the dataset, so extreme outliers can widen the range.
   - Variance and Standard Deviation: Outliers can inflate the variance and standard deviation, as these measures consider the deviation of each data point from the mean. Since outliers can be far from the mean, they contribute to larger deviations and increase the variability in the dataset.

Example:
Consider a dataset representing the ages of people in a neighborhood:
\[ \{20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 150\} \]

In this dataset, 150 is an outlier as it's significantly larger than the rest of the data. Its presence affects the measures of central tendency and dispersion as follows:
- Mean: The mean age is influenced by the outlier and will be higher than expected.
- Median: The median age will be less affected by the outlier and will remain close to the middle value of the dataset.
- Range: The range of ages will be wider due to the presence of the outlier.
- Variance and Standard Deviation: Both measures will be larger due to the increased variability caused by the outlier.