## Q1. What are the three measures of central tendency?

The three measures of Central tendency are:
1. Mode: The most frequent value.
2. Median: The middle number in an ordered dataset.
3. Mean: The sum of all values divided by the total number of values.

---
## Q2. What is the difference between the mean, median, and mode? How are they used to measure the central tendency of a dataset?


|Mean|Median|Mode|
|-|-|-|
|The average taken of given observations is called Mean.|The middle number in a given set of observations is called Median.|The most frequently occurred number in a given set of observations is called mode.|
|When data is normally distributed, the mean is widely preferred.|When data distribution is skewed, median is the best representative.|When there is a nominal distribution of data, the mode is preferred.|


1. Mode: The mode can be used for any level of measurement, but it is most meaningful for nominal and ordinal levels.
2. Median: The median can only be used on data that can be ordered – that is, from ordinal, interval and ratio levels of measurement.
3. Mean: The mean can only be used on interval and ratio levels of measurement because it requires equal spacing between adjacent values or scores in the scale.

---
## Q3. Measure the three measures of central tendency for the given height data:
[178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]

In [2]:
import numpy as np
from scipy import stats


heights_list = [178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]

print(f'Mean is: {np.mean(heights_list)}')
print(f"Median is: {np.median(heights_list)}")
print(f"Mode is: {stats.mode(heights_list)}")


Mean is: 177.01875
Median is: 177.0
Mode is: ModeResult(mode=177.0, count=3)


---
## Q4. Find the standard deviation for the given data:
[178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]

In [3]:
print(f"Standard Deviation is: {np.std(heights_list)}")

Standard Deviation is: 1.7885814036548633


---
## Q5. How are measures of dispersion such as range, variance, and standard deviation used to describe the spread of a dataset? Provide an example.

The measures of central tendency are not adequate to describe data. Thus to describe data, one needs to know the extent of variability.

1. Range: The range is the difference between the largest and the smallest observation in the data.
2. Variance: The average squared deviation from the mean of the given data set is known as the variance. This measure of dispersion checks the spread of the data about the mean.
3. Standard Deviation: It is a measure of spread of data about the mean. SD is the square root of sum of squared deviation from the mean divided by the number of observations.

Example:

Range:
- observations  = 20, 24, 31, 17, 45, 39, 51, 61
- largest value = 61
- smallest value = 17
- range = 61 - 17 = 44

Variance:
- observations  = 20, 24, 31, 17, 45, 39, 51, 61
- step-1: substract the mean value from individual value
- = (20 - 36), (24 -36), ....
- step-2: Squaring the above values
- = 256, 144, ....
- step-3: add the numbers
- = 256 + 144 + ....
- step-4: n = 8, variance = total/8 = 215.75

 Standard Deviation:
- observations  = 20, 24, 31, 17, 45, 39, 51, 61
- square root of variance = 14.68


---
## Q6. What is a Venn diagram?

Venn Digram:
- A Venn diagram is an illustration that uses circles to show the relationships among things or finite groups of things. 
- Circles that overlap have a commonality while circles that do not overlap do not share those traits. 
- Venn diagrams help to visually represent the similarities and differences between two concepts.

---
## Q7. For the two given sets A = (2,3,4,5,6,7) & B = (0,2,6,8,10). Find:
1. (i) A ∩ B
2. (ii) A ⋃ B

In [8]:
A = [2,3,4,5,6,7]
B = [0,2,6,8,10]

print(f"1. A ∩ B = {np.intersect1d(A,B)}")
print(f"2. A ⋃ B = {np.union1d(A,B)}")

1. A ∩ B = [2 6]
2. A ⋃ B = [ 0  2  3  4  5  6  7  8 10]


---
## Q8. What do you understand about skewness in data?

Skewness:
- Skewness is the measure of how much the probability distribution of a random variable deviates from the normal distribution.
- skewness tells us about the direction of outliers.

---
## Q9. If a data is right skewed then what will be the position of median with respect to mean?


- If the distribution of data is skewed to the right, the median is often less than the mean.

---
## Q10. Explain the difference between covariance and correlation. How are these measures used in statistical analysis?


|Covariance|Correlation|
|:-:|:-:|
|Covariance is an indicator of the extent to which 2 random variables are dependent on each other. A higher number denotes higher dependency.|Correlation is a statistical measure that indicates how strongly two variables are related.|
|The value of covariance lies in the range of -∞ and +∞.|Correlation is limited to values between the range -1 and +1|
|Change in scale affects covariance|Change in scale does not affect the correlation|


---
## Q11. What is the formula for calculating the sample mean? Provide an example calculation for a dataset.

Sample Mean:
- The sample mean is an average value found in a sample.
- The sample mean formula is:
  > x̄ = ( Σ xi ) / n

- x̄ just stands for the “sample mean”
- Σ is summation notation, which means “add up”
- xi “all of the x-values”
- n means “the number of items in the sample”

Example:
- sample = 12,13,14,15,16,17
- step-1: add all numbers:
  - 12 + 13 + 14 + 15 + 16 + 17 = 87
- step-2: number ot items in data set = 6
- step-3: divide 87/6 = 14.5

---
## Q12. For a normal distribution data what is the relationship between its measure of central tendency?


- The mean, mode and median are exactly the same in a normal distribution.

---
## Q13. How is covariance different from correlation?


- Covariance indicates the direction of the linear relationship between variables. 
- Correlation measures both the strength and direction of the linear relationship between two variables.

---
## Q14. How do outliers affect measures of central tendency and dispersion? Provide an example

Outlier:
- An extreme value in a set of data which is much higher or lower than the other numbers.

Its affect on Central Tendency:
1. Mean: Outliers can pull the mean towards their extreme values. If there are outliers with large values, the mean can be inflated, while outliers with small values can lower the mean.
2. Median: The median is less affected by outliers compared to the mean. It represents the middle value, so outliers have less impact unless they are close to the median and significantly change the order of values.
3. Mode: The mode is the most frequent value. Outliers do not directly affect the mode unless they appear frequently and change the overall distribution of values.

Its affect on Central Dispersion:
1. Range: The range is the difference between the maximum and minimum values. Outliers can significantly increase the range if they are far from the other values.
2. Variance: Outliers can increase the variance, as they introduce more variability to the dataset. 
3. Standard Deviation: Outliers can increase the standard deviation, as they introduce more variability to the dataset. 
