### Q1. What are the three measures of central tendency?

The three measures of central tendency are the mean, median, and mode.

### Q2. What is the difference between the mean, median, and mode? How are they used to measure the central tendency of a dataset?


The mean is the sum of all values in a dataset divided by the total number of values. It is sensitive to outliers, meaning that extreme values can greatly influence the mean. The mean is often used to measure central tendency when the data is normally distributed, with few outliers.

The median is the middle value in a dataset when the values are arranged in numerical order. It is not sensitive to outliers, meaning that extreme values have little influence on the median. The median is often used to measure central tendency when there are outliers or when the data is not normally distributed.

The mode is the value that occurs most frequently in a dataset. It is useful for describing the most common value or category in the data. The mode is often used to measure central tendency when dealing with categorical or nominal data.

The choice of measure of central tendency depends on the type of data being analyzed and the research question being asked. If the data is normally distributed, the mean is often the best measure of central tendency. However, if there are outliers, the median may be a better choice. If the data is categorical or nominal, the mode is the most appropriate measure of central tendency.

### Q3. Measure the three measures of central tendency for the given height data:
### [178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]

In [7]:
import scipy.stats as stat
import numpy as np
data = [178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]
print('mean', np.mean(data))
print('median', np.median(data))
print(stat.mode(data))

mean 177.01875
median 177.0
ModeResult(mode=array([177.]), count=array([3]))


### Q4. Find the standard deviation for the given data:
### [178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]


In [11]:
data  =[178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]
print('standard deviation:', np.std(data))

standard deviation: 1.7885814036548633


### Q5. How are measures of dispersion such as range, variance, and standard deviation used to describe the spread of a dataset? Provide an example.

Measures of dispersion, such as range, variance, and standard deviation, are used to describe the spread or variability of a dataset.

The range is the difference between the highest and lowest values in a dataset. It provides a simple measure of the spread of the data but can be affected by outliers.

The variance is the average of the squared differences between each value and the mean of the dataset. It measures how much the data deviates from the mean. A high variance indicates that the data is spread out over a wide range, while a low variance indicates that the data is tightly clustered around the mean.

The standard deviation is the square root of the variance. It is a measure of the average distance of each data point from the mean. A high standard deviation indicates that the data is spread out, while a low standard deviation indicates that the data is tightly clustered around the mean.

### Q6. What is a Venn diagram?

A Venn diagram is a visual representation of the relationships between different sets or groups of data. It consists of overlapping circles or other shapes, where each shape represents a set, and the overlap represents the elements that belong to both sets.

### Q7. For the two given sets A = (2,3,4,5,6,7) & B = (0,2,6,8,10). Find:
### (i) A ∩ B
### (ii) A ⋃ B

In [16]:
A = {2,3,4,5,6,7}
B = {0,2,6,8,10}

print('A ∩ B =', A.intersection(B))

print('A ⋃ B =', A.union(B))

A ∩ B = {2, 6}
A ⋃ B = {0, 2, 3, 4, 5, 6, 7, 8, 10}


### Q8. What do you understand about skewness in data?

Skewness in data refers to the extent to which a probability distribution or dataset deviates from symmetry. A symmetrical dataset will have a perfectly bell-shaped normal distribution, with equal frequencies of data points on both sides of the mean. A dataset that is skewed, however, will have a distribution that is not symmetrical, with more data points on one side of the mean than the other.

Skewness can be positive or negative. A positive skew indicates that the tail of the distribution is longer on the right side than the left, while a negative skew indicates that the tail is longer on the left side. Skewness can have important implications for statistical analysis and modeling, as it can affect the accuracy of certain statistical tests and the reliability of regression models. It is important to identify and account for skewness when analyzing data to ensure that conclusions and decisions are based on accurate and representative data.

### Q9. If a data is right skewed then what will be the position of median with respect to mean?

In [None]:
median < mean
the mean will be greater than the median.

### Q10. Explain the difference between covariance and correlation. How are these measures used in statistical analysis?

Covariance: Covariance is a measure of how two variables change together. It measures the extent to which changes in one variable are associated with changes in another variable. It can be positive or negative, depending on whether the two variables move in the same or opposite direction. However, covariance alone does not provide information on the strength or direction of the relationship between the two variables.

Correlation: Correlation is a standardized version of covariance, which measures the strength and direction of the linear relationship between two variables. It ranges from -1 to +1, with a value of 0 indicating no linear relationship. A positive correlation indicates a positive relationship between the two variables, while a negative correlation indicates a negative relationship.

They can be used to identify patterns, trends, and dependencies in data, and to make predictions or estimate the value of one variable based on the value of another variable.

### Q11. What is the formula for calculating the sample mean? Provide an example calculation for a dataset.

The formula for calculating the sample mean is:

Sample Mean = (Sum of all values in the sample) / (Number of values in the sample)

Here's an example calculation for a dataset:

Suppose we have a sample of 5 test scores: 70, 80, 85, 90, 95.

In [18]:
score = [70, 80, 85, 90, 95]
mean = sum(score)/len(score)
print(mean)

84.0


### Q12. For a normal distribution data what is the relationship between its measure of central tendency?

In [None]:
mean = median = mode

### Q13. How is covariance different from correlation?

Covariance measures the degree to which two variables vary together. It is a measure of how much two variables move in relation to each other. A positive covariance means that the variables tend to move together, while a negative covariance means that they tend to move in opposite directions. However, the magnitude of the covariance is not standardized, so it is difficult to compare the strength of the relationship between different pairs of variables.

Correlation, on the other hand, is a standardized measure of the relationship between two variables. It measures both the strength and direction of the relationship between two variables, ranging from -1 (perfect negative correlation) to 1 (perfect positive correlation). A correlation of 0 indicates no relationship between the variables.

Correlation is preferred over covariance because it is a standardized measure that allows for easy comparison between different pairs of variables. Additionally, correlation is less sensitive to differences in the scales of the variables being compared, making it easier to interpret.

### Q14. How do outliers affect measures of central tendency and dispersion? Provide an example.

Outliers are extreme values in a dataset that differ significantly from the other values in the dataset. Outliers can have a significant impact on measures of central tendency and dispersion.

Measures of central tendency, such as the mean or median, are used to describe the typical or central value of a dataset. Outliers can have a significant impact on the mean, as the mean is sensitive to extreme values. For example, consider a dataset of salaries for a company: 50,000, 60,000, 70,000, 80,000, 90,000, 100,000, 1,000,000. The mean salary for this dataset is 214,285, which is much higher than the typical salaries of the employees. This is because the outlier value of 1,000,000 is significantly higher than the other values in the dataset.

The median is less sensitive to outliers than the mean because it is not affected by extreme values. In the above example, the median salary is 80,000, which is a more accurate representation of the typical salary of the employees.

Measures of dispersion, such as the range, variance, or standard deviation, are used to describe how spread out the values in a dataset are. Outliers can have a significant impact on these measures because they can increase the range or variance of the dataset. For example, consider a dataset of the heights of a group of people: 5'2", 5'4", 5'6", 5'8", 6'0", 6'2", 7'0". The range of this dataset is 24 inches, which is much larger than the typical range of human heights. The outlier value of 7'0" (84 inches) is significantly higher than the other values in the dataset, causing the range to be much larger than it should be.