### Q1. What are the three measures of central tendency? 

The three measures of central tendency are:

__1.Mean:__ The arithmetic mean is the most commonly used measure of central tendency. It is calculated by adding up all the values in a set of data and then dividing by the number of values. The mean is sensitive to outliers, meaning that extreme values can significantly affect the value of the mean.

__2.Median:__ The median is the middle value in a set of data when the values are arranged in order. If there is an even number of values, the median is the average of the two middle values. The median is less sensitive to outliers than the mean and is often used when there are extreme values in the data.

__3.Mode:__ The mode is the value that occurs most frequently in a set of data. There can be more than one mode or no mode at all. The mode is often used with categorical or nominal data where values cannot be added or averaged.

---

### Q2. What is the difference between the mean, median, and mode? How are they used to measure the  central tendency of a dataset? 

The mean, median, and mode are all measures of central tendency, but they differ in how they represent the center of a dataset.

The mean is calculated by adding up all the values in a dataset and dividing by the number of values. It represents the average value of the dataset and is the most commonly used measure of central tendency. The mean is sensitive to extreme values, or outliers, which can greatly affect its value.

The median is the middle value in a dataset when the values are arranged in order. It represents the value that splits the dataset into two equal halves. The median is less sensitive to outliers than the mean and is often used when the dataset has extreme values.

The mode is the value that appears most frequently in a dataset. It represents the most common value or values in the dataset. The mode is often used with categorical or nominal data where values cannot be added or averaged.

To choose the most appropriate measure of central tendency for a dataset, we need to consider the type of data, the distribution of the data, and the presence of outliers. If the data is normally distributed with no outliers, the mean is the most appropriate measure. If the data is skewed or has outliers, the median may be a better representation of the center of the dataset. If the data is categorical, the mode is the most appropriate measure

---

### Q3. Measure the three measures of central tendency for the given height data:  [178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5] 


In [3]:
import numpy as np
from scipy import stats

height= [178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]
#Mean
mean=np.mean(height)
print(mean)

#Median
median=np.median(height)
print(median)

#Mode
mode=stats.mode(height)
print(mode)

177.01875
177.0
ModeResult(mode=177.0, count=3)


---

### Q4. Find the standard deviation for the given data: 
[178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5] 
 

In [4]:
import numpy as np
data = [178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]
np.std(data)

1.7885814036548633

---

### Q5. How are measures of dispersion such as range, variance, and standard deviation used to describe  the spread of a dataset? Provide an example. 


Measures of dispersion, such as range, variance, and standard deviation, are used to describe the spread or variability of a dataset.

The range is the difference between the highest and lowest values in a dataset. It gives a rough idea of the spread of the data, but it can be affected by outliers.

The variance is the average of the squared differences between each value and the mean of the dataset. It measures how far the data is spread out from the mean. The variance can be influenced by extreme values and is expressed in squared units.

The standard deviation is the square root of the variance. It is a more commonly used measure of dispersion than variance, as it is expressed in the same units as the original data. The standard deviation indicates how far the data is spread out from the mean and is less sensitive to outliers than the variance.

For example, consider the following dataset of test scores: 75, 80, 85, 90, 95.

The mean of this dataset is (75+80+85+90+95)/5 = 85.

The range of the dataset is 95 - 75 = 20.

The variance of the dataset is ((75-85)^2 + (80-85)^2 + (85-85)^2 + (90-85)^2 + (95-85)^2)/5 = 100/5 = 20.

The standard deviation of the dataset is the square root of the variance, which is √20 = 4.47.

The range suggests that the scores are somewhat spread out, but it doesn't give a precise measure of how spread out they are. The variance and standard deviation provide a more precise measure of the spread of the scores, indicating that they are spread out by approximately 4.47 points on average from the mean of 85.

---

### Q6. What is a Venn diagram? 


A Venn diagram is a visual representation of the relationships between different sets of items or concepts. It consists of overlapping circles or other shapes, where each circle represents a set and the overlap between circles represents the intersection of those sets. The areas outside the circles represent the items or concepts that are not part of any of the sets.

Venn diagrams can be used to illustrate relationships between different groups, such as the overlap between different demographic groups, or the similarities and differences between different types of animals or plants. They can also be used to solve problems related to sets and their intersections, such as calculating the probability of two events occurring simultaneously. Venn diagrams are a popular tool in mathematics, logic, and statistics.

---

### Q7. For the two given sets A = (2,3,4,5,6,7) & B = (0,2,6,8,10). Find: 
(i)  A B 
(ii) A ⋃ B 


In [5]:
A = {2,3,4,5,6,7}
B = {0,2,6,8,10}

union= A.union(B)
print(union)

intersection = A.intersection(B)
print(intersection)

{0, 2, 3, 4, 5, 6, 7, 8, 10}
{2, 6}


---

### Q8. What do you understand about skewness in data? 


Skewness is a measure of the degree of asymmetry of a probability distribution. In other words, it measures how lopsided a distribution is.

A distribution can be positively skewed, negatively skewed, or have no skewness (i.e., be symmetric).

A distribution is positively skewed if it has a long tail on the right-hand side of the distribution, meaning that the right side of the distribution extends farther out than the left side. In this case, the mean of the distribution will be greater than the median, and the mode (the most common value) will be less than the median.

A distribution is negatively skewed if it has a long tail on the left-hand side of the distribution, meaning that the left side of the distribution extends farther out than the right side. In this case, the mean of the distribution will be less than the median, and the mode will be greater than the median.

A distribution has no skewness (i.e., is symmetric) if it is evenly distributed around its mean, with no long tails on either side.

---

### Q9. If a data is right skewed then what will be the position of median with respect to mean? 


If a data is right skewed, the median will be less than the mean. This is because the mean is pulled towards the long tail on the right-hand side of the distribution, which makes it greater than the median. In a right-skewed distribution, there are typically a few very high values on the right-hand side of the distribution that can have a large impact on the mean. However, the median is not affected by extreme values in the same way, so it provides a better measure of central tendency for skewed datasets. Therefore, in a right-skewed distribution, the median is typically a more representative measure of central tendency than the mean.

----

### Q10. Explain the difference between covariance and correlation. How are these measures used in  statistical analysis? 


Covariance and correlation are both measures of the relationship between two variables, but they differ in their interpretation and scale.

Covariance measures the extent to which two variables vary together. It is a measure of the joint variability of two variables, and it can take on positive or negative values. A positive covariance indicates that the two variables tend to increase or decrease together, while a negative covariance indicates that the two variables tend to move in opposite directions. However, the magnitude of covariance is not standardized, which means it can be difficult to compare covariances across different datasets or variable scales.

Correlation, on the other hand, measures the strength and direction of the linear relationship between two variables. It is a standardized measure that ranges from -1 to +1, with 0 indicating no linear relationship, -1 indicating a perfect negative linear relationship, and +1 indicating a perfect positive linear relationship. Correlation is useful because it is not affected by differences in the scales of the two variables, and it provides a standardized measure of the strength and direction of the relationship.

---

### Q11. What is the formula for calculating the sample mean? Provide an example calculation for a  dataset. 

The sample mean is a measure of central tendency that represents the average value of a dataset. It is calculated by summing all the values in the dataset and dividing by the number of observations. The formula for calculating the sample mean is as follows:

Sample Mean = (sum of all values) / (number of observations)

Here is an example calculation of the sample mean for a dataset:

Suppose we have the following dataset:

4, 8, 2, 6, 9, 1, 5, 3

To calculate the sample mean, we first add up all the values in the dataset:

4 + 8 + 2 + 6 + 9 + 1 + 5 + 3 = 38

Next, we divide the sum by the number of observations (which in this case is 8):

Sample Mean = 38 / 8 = 4.75

Therefore, the sample mean of this dataset is 4.75

---

### Q12. For a normal distribution data what is the relationship between its measure of central tendency?


For a normal distribution data, the measures of central tendency (i.e., mean, median, and mode) are equal to each other. This is because a normal distribution is symmetric around its mean, so the median and mode are also located at the same point as the mean.

In other words, the mean, median, and mode are all located at the peak of the normal distribution, and they are all equal to each other. This property of the normal distribution makes it easier to describe and analyze the data, as we can use any of the measures of central tendency to summarize the distribution.

---

### Q13. How is covariance different from correlation? 


Covariance and correlation are both measures of the relationship between two variables, but they have some important differences.

Covariance measures the degree to which two variables vary together. Specifically, covariance measures the expected value of the product of the deviations of two variables from their respective means. A positive covariance indicates that when one variable is above its mean, the other variable tends to be above its mean as well, while a negative covariance indicates that when one variable is above its mean, the other variable tends to be below its mean. However, the magnitude of the covariance depends on the units of the variables, which can make it difficult to compare covariances across different datasets.

Correlation, on the other hand, is a standardized measure of the relationship between two variables that ranges from -1 to 1. Correlation is calculated by dividing the covariance of the two variables by the product of their standard deviations. Because correlation is standardized, it is not affected by differences in the units of the variables, making it easier to compare across datasets. A correlation of 1 indicates a perfect positive relationship between the variables, a correlation of -1 indicates a perfect negative relationship, and a correlation of 0 indicates no relationship.

---

### Q14. How do outliers affect measures of central tendency and dispersion? Provide an example.

Outliers can have a significant impact on measures of central tendency and dispersion. Central tendency refers to the center of a distribution, while dispersion refers to the spread or variability of the data. Outliers are values that are significantly different from the rest of the data and can skew the measures of central tendency and dispersion in different ways.

For example, consider a dataset of exam scores:

Exam Score Frequency

80 3

85 7

90 12

95 5

100 3

The mean score for this dataset is 90.5, the median score is 90, and the mode is 90. However, if there is an outlier score of 50, the mean will decrease to 82.83, while the median and mode will remain at 90. This is because the mean is sensitive to extreme values, while the median and mode are not. In this case, the outlier score of 50 significantly affects the measure of central tendency.

Similarly, outliers can affect measures of dispersion, such as the range and standard deviation. The range is the difference between the highest and lowest values in a dataset, while the standard deviation measures the average distance of each data point from the mean. If there is an outlier with a very high or low value, the range will be increased, while the standard deviation will also be increased because the data points are more spread out from the mean.

---