### Q1. What are the three measures of central tendency?

The three measures of central tendency are:

Mean: The arithmetic average of a set of values. It is calculated by adding up all the values and dividing by the number of values.

Median: The middle value of a dataset when it is sorted in ascending or descending order. If there is an even number of values, the median is the average of the two middle values.

Mode: The value that appears most frequently in a dataset.

### Q2. What is the difference between the mean, median, and mode? How are they used to measure the central tendency of a dataset?

Mean: Sensitive to extreme values (outliers), affected by skewed distributions.
Median: Not affected by extreme values, suitable for skewed distributions.
Mode: Represents the most frequent value(s), but a dataset can have no mode, one mode, or multiple modes.
These measures provide a summary of the "center" or "typical" value in a dataset. The choice of which measure to use depends on the nature of the data and the desired characteristics of the central tendency measure

### Q3. Measure the three measures of central tendency for the given height data:
[178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]

In [2]:
data = [178, 177, 176, 177, 178.2, 178, 175, 179, 180, 175, 178.9, 176.2, 177, 172.5, 178, 176.5]

mean_height = sum(data) / len(data)
sorted_data = sorted(data)
median_height = (sorted_data[len(data) // 2] + sorted_data[(len(data) - 1) // 2]) / 2

from collections import Counter
mode_height = Counter(data).most_common(1)[0][0]

print("Mean:", mean_height)
print("Median:", median_height)
print("Mode:", mode_height)


Mean: 177.01875
Median: 177.0
Mode: 178


### Q4. Find the standard deviation for the given data:
[178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]

In [3]:
import statistics

data = [178, 177, 176, 177, 178.2, 178, 175, 179, 180, 175, 178.9, 176.2, 177, 172.5, 178, 176.5]

std_dev = statistics.stdev(data)
print("Standard Deviation:", std_dev)

Standard Deviation: 1.8472389305844188


### Q5. How are measures of dispersion such as range, variance, and standard deviation used to describe the spread of a dataset? Provide an example.

Range: The difference between the maximum and minimum values. Larger range indicates greater spread.

Variance and Standard Deviation: Measure how far individual data points deviate from the mean. A higher variance or standard deviation indicates greater dispersion.

In [4]:
data = [10, 12, 15, 18, 20]

range_val = max(data) - min(data)
variance_val = statistics.variance(data)
std_dev_val = statistics.stdev(data)

print("Range:", range_val)
print("Variance:", variance_val)
print("Standard Deviation:", std_dev_val)

Range: 10
Variance: 17
Standard Deviation: 4.123105625617661


### Q6. What is a Venn diagram?

A Venn diagram is a graphical representation of the relationships between sets. It uses circles to represent sets, and the overlap between circles shows the common elements shared between sets. Venn diagrams are often used in set theory and probability to visually represent the intersections and unions of sets.

### Q7. For the two given sets A = (2,3,4,5,6,7) & B = (0,2,6,8,10). Find:
(i) A B
(ii) A ⋃ B

(i) Intersection (A ∩ B): The common elements between sets A and B.
A ∩ B = {2, 6}

(ii) Union (A ⋃ B): All unique elements from sets A and B.
A ⋃ B = {0, 2, 3, 4, 5, 6, 7, 8, 10}

### Q8. What do you understand about skewness in data?


Skewness measures the asymmetry of a probability distribution. In a dataset:

Positive Skewness (Right Skew): The right tail is longer, and the majority of data is concentrated on the left.

Negative Skewness (Left Skew): The left tail is longer, and the majority of data is concentrated on the right.

### Q9. If a data is right skewed then what will be the position of median with respect to mean?

In a right-skewed distribution (positively skewed), the tail on the right side is longer. In such cases:

The median will be less than the mean.
The mean is pulled to the right by the longer tail of higher values.

### Q10. Explain the difference between covariance and correlation. How are these measures used in statistical analysis?

Covariance: Measures how two variables change together. Positive covariance indicates a positive relationship, while negative covariance indicates a negative relationship. However, the magnitude of covariance is not standardized, making it difficult to compare the strength of relationships.

Correlation: Standardized measure of the strength and direction of the linear relationship between two variables. It ranges from -1 to 1, where 1 indicates a perfect positive linear relationship, -1 indicates a perfect negative linear relationship, and 0 indicates no linear relationship.

Both covariance and correlation are used to understand the relationship between two variables in statistical analysis.

### Q11. What is the formula for calculating the sample mean? Provide an example calculation for a dataset.

The formula for calculating the sample mean (bar x)
bar x = (sum_(i=1)^n xi)/n
Example:
    Data = [10,15,20,25,30]
    bar x = (10+15+20+25+30)/5 = 100/5 = 20

### Q12. For a normal distribution data what is the relationship between its measure of central tendency?

In a normal distribution:
The mean (μ) is equal to the median.
The mode is also equal to the mean and median.
This symmetry is a characteristic feature of a perfectly normal distribution.

### Q13. How is covariance different from correlation?

Covariance: Measures the degree to which two variables change together. It is not standardized and can range from negative infinity to positive infinity. The units of covariance are the product of the units of the two variables.

Correlation: Standardized measure of the strength and direction of the linear relationship between two variables. It ranges from -1 to 1, with 1 indicating a perfect positive linear relationship, -1 indicating a perfect negative linear relationship, and 0 indicating no linear relationship. Correlation is dimensionless.

### Q14. How do outliers affect measures of central tendency and dispersion? Provide an example.

Outliers can have a substantial impact on measures of central tendency and dispersion:

Central Tendency:
Mean: Outliers can heavily influence the mean, pulling it towards extreme values. For instance, in a dataset [10, 12, 14, 15, 100], the mean is significantly higher than the majority of data due to the outlier 100.
Median: The median is less affected by outliers since it depends on the order of values. In the same dataset, the median remains close to the typical values (14).

Dispersion:
Range: Outliers can substantially increase the range. In the dataset [5, 8, 10, 12, 100], the range is significantly larger due to the outlier.
Variance and Standard Deviation: Outliers can increase the variability of data, leading to higher variance and standard deviation. For example, in the dataset [5, 8, 10, 12, 100], the standard deviation is higher due to the presence of the outlier.
It's crucial to be aware of outliers and consider their impact, especially when interpreting statistical measures, as they can distort the overall characteristics of the data. Outlier detection and treatment methods are often employed to mitigate their influence.