# Q1. What are the three measures of central tendency?

The three measures of central tendency are mean, median, and mode.

# Q2. What is the difference between the mean, median, and mode? How are they used to measure the central tendency of a dataset?

Mean, median, and mode are the three measures of central tendency. They are used to describe the central location of a data set.

* The mean, also known as the arithmetic mean, is the sum of all the values in a dataset divided by the number of values. It is the most commonly used measure of central tendency. The mean is sensitive to extreme values, and a single outlier can have a significant impact on its value.

* The median is the middle value of a dataset when the values are arranged in order from lowest to highest. It is the value that divides the dataset into two equal parts. The median is used when the dataset contains extreme values that could significantly affect the mean.

* The mode is the most common value in a dataset. It is used when the dataset contains several repeated values or when it is necessary to describe the most typical value.

All three measures of central tendency have their own strengths and weaknesses. The mean is sensitive to extreme values but can be used for parametric statistical analysis. The median is less affected by extreme values but may not be appropriate for parametric analysis. The mode is a useful measure when the dataset has a skewed distribution or contains several repeated values, but it may not be representative of the central tendency of the dataset in all cases.

# Q3. Measure the three measures of central tendency for the given height data:
[178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]

In [4]:
# The three measures of central tendency are mean, median, and mode.

In [1]:
data = [178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]

In [12]:
import numpy as np
from scipy import stats

mean = np.mean(data)
median = np.median(data)
mode = stats.mode(data)

print(mean)
print(median)
print(mode)

177.01875
177.0
ModeResult(mode=array([177.]), count=array([3]))


  mode = stats.mode(data)


177.01875
177.0
ModeResult(mode=array([177.]), count=array([3]))

# Q4. Find the standard deviation for the given data:
[178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]

In [14]:
data1 = [178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]

In [15]:
standard_deviation = np.std(data1)
print(standard_deviation)

1.7885814036548633


# Q5. How are measures of dispersion such as range, variance, and standard deviation used to describe the spread of a dataset? Provide an example.

Measures of dispersion are used to describe the spread or variability of a dataset. Here are some common measures of dispersion and how they are used:

* Range: The range is the difference between the largest and smallest values in a dataset. It is a simple measure of dispersion but can be misleading if there are extreme values or outliers in the dataset. For example, if we have the following dataset of daily high temperatures for a city: [25, 27, 28, 30, 32, 34, 36], the range would be 11 (36-25).

* Variance: Variance measures how far the values in a dataset are from the mean. A higher variance indicates that the data points are more spread out. It is calculated by taking the average of the squared differences from the mean. For example, if we have the following dataset of ages for a group of people: [25, 27, 28, 30, 32], the mean would be 28.4. The variance would be calculated as: ((25-28.4)^2 + (27-28.4)^2 + (28-28.4)^2 + (30-28.4)^2 + (32-28.4)^2)/5 = 7.04.

* Standard deviation: The standard deviation is the square root of the variance. It is a more commonly used measure of dispersion as it is in the same unit as the data. A higher standard deviation indicates that the data points are more spread out. For example, if we have the same dataset of ages as before, the standard deviation would be the square root of the variance: sqrt(7.04) = 2.65.

Together, measures of dispersion and central tendency help provide a more complete picture of a dataset, including how spread out the values are and where the majority of the values lie.

# Q6. What is a Venn diagram?

A Venn diagram is a graphical representation of the relationships between sets or groups. It consists of overlapping circles (or other shapes) that represent different groups or sets, with the overlap indicating the elements or characteristics that are common to both groups. Venn diagrams are often used in mathematics, statistics, logic, and other fields to visually illustrate the relationships between different groups or sets of data.

# Q7. For the two given sets A = (2,3,4,5,6,7) & B = (0,2,6,8,10). Find:
(i) A ⋂ B
(ii) A ⋃ B

In [21]:
A = (2,3,4,5,6,7)
B = (0,2,6,8,10)

A = set(A)
B = set(B)

intersection = A.intersection(B)
union = A.union(B)

print("The intersection of A and B is" , tuple(intersection))
print("The union of A and B is" , tuple(union))

The intersection of A and B is (2, 6)
The union of A and B is (0, 2, 3, 4, 5, 6, 7, 8, 10)


## (i) A ⋂ B
The intersection of A and B is (2, 6)

## (ii) A ⋃ B
The union of A and B is (0, 2, 3, 4, 5, 6, 7, 8, 10)

# Q8. What do you understand about skewness in data?

Skewness in data is a measure of the asymmetry of the probability distribution of a random variable. In simpler terms, it refers to the degree to which a dataset is not symmetrical. A perfectly symmetrical dataset has zero skewness, while a dataset that is skewed to the left or right has negative or positive skewness, respectively. Skewness can affect various statistical analyses, such as the mean and standard deviation, and it is important to consider when interpreting the results of data analysis.

# Q9. If a data is right skewed then what will be the position of median with respect to mean?

If a data is right skewed, then the median will be less than the mean.

# Q10. Explain the difference between covariance and correlation. How are these measures used in statistical analysis?

Covariance and correlation are two statistical measures that are used to quantify the relationship between two variables.

* Covariance is a measure of how two variables vary together. It measures the strength and direction of the linear relationship between two variables. A positive covariance indicates that the two variables are positively related, while a negative covariance indicates that the two variables are negatively related. A covariance of zero indicates that the two variables are not related. However, covariance does not provide information about the strength of the relationship between the variables, as it is influenced by the scale of the variables.

* Correlation, on the other hand, is a standardized measure of the relationship between two variables. It measures the strength and direction of the linear relationship between two variables on a scale ranging from -1 to +1. A correlation of +1 indicates a perfect positive linear relationship, a correlation of -1 indicates a perfect negative linear relationship, and a correlation of 0 indicates no linear relationship. Correlation is not influenced by the scale of the variables and is therefore a more useful measure of the strength of the relationship between the variables.

Both covariance and correlation are used in statistical analysis to study the relationship between two variables. They are commonly used in regression analysis, where the relationship between a dependent variable and one or more independent variables is studied. Correlation is also used to assess the strength of association between two variables in bivariate analysis.

# Q11. What is the formula for calculating the sample mean? Provide an example calculation for a dataset.

The formula for calculating the sample mean is:

* sample mean = (sum of all values in the dataset) / (number of values in the dataset)

* For example, consider the following dataset:

*  3, 6, 9, 12, 15

* To calculate the sample mean of this dataset, we first add up all the values:

* 3 + 6 + 9 + 12 + 15 = 45

* There are 5 values in the dataset, so we divide the sum by 5 to get:

* sample mean = 45 / 5 = 9

Therefore, the sample mean of this dataset is 9.

# Q12. For a normal distribution data what is the relationship between its measure of central tendency?

For a normal distribution, the mean, median, and mode are equal to each other. In other words, the measure of central tendency, mean, median, and mode are all at the center of the distribution, and they are symmetrically distributed around the center. This is because the normal distribution is a bell-shaped curve, and its symmetry ensures that the mean, median, and mode coincide at the center.

# Q13. How is covariance different from correlation?

Covariance and correlation are two measures of the relationship between two variables in a dataset.

* Covariance measures how much two variables vary together. It is a measure of how much two variables change together. If two variables have a positive covariance, it means that they tend to increase or decrease together. If two variables have a negative covariance, it means that they tend to move in opposite directions. However, covariance does not give information about the strength and direction of the relationship.

* On the other hand, correlation measures both the strength and direction of the linear relationship between two variables. Correlation is a standardized form of covariance, which scales the covariance by the product of the standard deviations of the two variables. Correlation values range between -1 and 1, where a value of 1 means a perfect positive linear relationship, a value of -1 means a perfect negative linear relationship, and a value of 0 means no linear relationship.

In summary, covariance and correlation both measure the relationship between two variables, but correlation provides more information about the strength and direction of the relationship.

# Q14. How do outliers affect measures of central tendency and dispersion? Provide an example.

Outliers can significantly affect measures of central tendency and dispersion. Central tendency measures such as the mean can be heavily influenced by outliers, pulling the value towards the extreme end of the distribution. On the other hand, median is more robust to outliers as it is less affected by extreme values.

Similarly, outliers can also affect measures of dispersion such as variance and standard deviation. Outliers can increase the value of these measures, suggesting a higher degree of variability in the data than actually exists.

For example, consider a dataset of the salaries of employees in a company:
$[25,000, 30,000, 32,000, 33,000, 35,000, 37,000, 39,000, 40,000, 42,000, 45,000, 100,000]$
Here, the outlier value of 100,000 will pull the mean towards the higher end of the distribution, giving an inaccurate representation of the central tendency. Similarly, the outlier will increase the value of the standard deviation, suggesting more variability in the data than actually exists.