# Statistics Placement Assignment

Q-2. Consider a dataset containing the heights (in centimeters) of 1000 individuals. The mean height is 170 cm with a standard deviation of 10 cm. The dataset is approximately normally distributed, and its skewness is approximately zero. Based on this information, answer the following questions:

a. What percentage of individuals in the dataset have heights between 160 cm and 180 cm?

b. If we randomly select 100 individuals from the dataset, what is the probability that their average height is greater than 175 cm?

c. Assuming the dataset follows a normal distribution, what is the z-score corresponding to a height of 185 cm?

d. We know that 5% of the dataset has heights below a certain value. What is the approximate height corresponding to this threshold?

e. Calculate the coefficient of variation (CV) for the dataset.

f. Calculate the skewness of the dataset and interpret the result.

**ANS** = 
```
a. To determine the percentage of individuals in the dataset with heights between 160 cm and 180 cm, we need to calculate the z-scores corresponding to these heights and find the area under the normal distribution curve between those z-scores.

First, let's calculate the z-scores:
For 160 cm:
z = (160 - mean) / standard deviation = (160 - 170) / 10 = -1

For 180 cm:
z = (180 - mean) / standard deviation = (180 - 170) / 10 = 1

Next, we need to find the area under the normal distribution curve between -1 and 1. We can use a standard normal distribution table or a statistical software to find this value.

The area between -1 and 1 under the standard normal distribution curve is approximately 0.6826, which means that approximately 68.26% of individuals in the dataset have heights between 160 cm and 180 cm.

b. To find the probability that the average height of 100 randomly selected individuals from the dataset is greater than 175 cm, we can use the Central Limit Theorem. According to the theorem, when the sample size is large enough, the distribution of sample means approaches a normal distribution with the same mean as the population and a standard deviation equal to the population standard deviation divided by the square root of the sample size.

In this case, the population mean is 170 cm, the population standard deviation is 10 cm, and the sample size is 100. So, the standard deviation of the sample mean is 10 / √100 = 1 cm.

Now, we can calculate the z-score for a sample mean of 175 cm:
z = (175 - mean) / (standard deviation / √sample size) = (175 - 170) / (1 / √100) = 5

To find the probability that the sample mean is greater than 175 cm, we can look up the area to the right of the z-score of 5 in the standard normal distribution table or use a statistical software. The probability is very close to 0.

c. To calculate the z-score corresponding to a height of 185 cm, we can use the formula:
z = (x - mean) / standard deviation

Substituting the values, we get:
z = (185 - 170) / 10 = 1.5

So, the z-score corresponding to a height of 185 cm is 1.5.

d. To find the approximate height corresponding to the threshold where 5% of the dataset has heights below that value, we need to find the z-score that corresponds to a cumulative probability of 0.05.

Using the standard normal distribution table or a statistical software, we can find the z-score that corresponds to a cumulative probability of 0.05, which is approximately -1.645.

Now, we can calculate the height using the formula:
x = mean + (z * standard deviation)
x = 170 + (-1.645 * 10) = 153.55

So, the approximate height corresponding to the threshold where 5% of the dataset has heights below that value is approximately 153.55 cm.

e. The coefficient of variation (CV) is a measure of relative variability and is calculated as the ratio of the standard deviation to the mean, expressed as a percentage.

CV = (standard deviation / mean) * 100

In this case, the standard deviation is 10 cm, and the mean is 170 cm.

CV = (10 / 170) * 100 = 5.88%

Therefore, the coefficient of variation for the dataset is approximately 5.88%.

f. Skewness measures the asymmetry of a distribution. If the skewness is zero, it indicates that the dataset is approximately symmetrically distributed.

In this case, the skewness is stated to be approximately zero, suggesting that the dataset is approximately normally distributed and symmetric around the mean. This means that the heights are distributed fairly evenly around the mean height of 170 cm, without any significant skew to the left or right.
```

In [1]:
import scipy.stats as stats

# Given values
mean_height = 170
std_dev = 10

# a. Percentage of individuals with heights between 160 cm and 180 cm
z1 = (160 - mean_height) / std_dev
z2 = (180 - mean_height) / std_dev
percentage_between = stats.norm.cdf(z2) - stats.norm.cdf(z1)
percentage_between *= 100

print(f"Percentage of individuals with heights between 160 cm and 180 cm: {percentage_between:.2f}%")

# b. Probability that the average height of 100 individuals is greater than 175 cm
sample_size = 100
sample_mean_std_dev = std_dev / (sample_size ** 0.5)
z_score = (175 - mean_height) / sample_mean_std_dev
probability_greater = 1 - stats.norm.cdf(z_score)

print(f"Probability that the average height of 100 individuals is greater than 175 cm: {probability_greater:.4f}")

# c. Z-score corresponding to a height of 185 cm
height = 185
z_score = (height - mean_height) / std_dev

print(f"Z-score corresponding to a height of 185 cm: {z_score:.2f}")

# d. Approximate height corresponding to the threshold where 5% of the dataset has heights below that value
percentile = 0.05
threshold = stats.norm.ppf(percentile, mean_height, std_dev)

print(f"Approximate height corresponding to the 5th percentile: {threshold:.2f}")

# e. Coefficient of variation (CV) for the dataset
cv = (std_dev / mean_height) * 100

print(f"Coefficient of variation (CV) for the dataset: {cv:.2f}%")

# f. Skewness of the dataset
skewness = 0  # Given that the skewness is approximately zero for the dataset

print(f"Skewness of the dataset: {skewness}")


Percentage of individuals with heights between 160 cm and 180 cm: 68.27%
Probability that the average height of 100 individuals is greater than 175 cm: 0.0000
Z-score corresponding to a height of 185 cm: 1.50
Approximate height corresponding to the 5th percentile: 153.55
Coefficient of variation (CV) for the dataset: 5.88%
Skewness of the dataset: 0
