### Q-2. Consider a dataset containing the heights (in centimeters) of 1000 individuals. The mean height is 170 cm with a standard deviation of 10 cm. The dataset is approximately normally distributed, and its skewness is approximately zero. Based on this information, answer the following questions: 
#### a. What percentage of individuals in the dataset have heights between 160 cm and 180 cm? 
#### b. If we randomly select 100 individuals from the dataset, what is the probability that their average height is greater than 175 cm? 
#### c. Assuming the dataset follows a normal distribution, what is the z-score corresponding to a height of 185 cm? 
#### d. We know that 5% of the dataset has heights below a certain value. What is the approximate height corresponding to this threshold? 
#### e. Calculate the coefficient of variation (CV) for the dataset. 
#### f. Calculate the skewness of the dataset and interpret the result.


***Ans a. To determine the percentage of individuals in the dataset with heights between 160 cm and 180 cm, we can use the properties of the normal distribution.***

***Since the mean height is 170 cm and the standard deviation is 10 cm, we can calculate the z-scores for both heights using the formula: z = (x - μ) / σ, where x is the height, μ is the mean, and σ is the standard deviation.***

For 160 cm: z1 = (160 - 170) / 10 = -1

For 180 cm: z2 = (180 - 170) / 10 = 1

***Using a standard normal distribution table or a statistical software, we can find the area under the curve between z = -1 and z = 1, which represents the percentage of individuals with heights between 160 cm and 180 cm***

***The area between z = -1 and z = 1 is approximately 0.6826 or 68.26%. Therefore, approximately 68.26% of individuals in the dataset have heights between 160 cm and 180 cm.***

In [8]:
import scipy.stats as stats

# Given values
mean_height = 170
std_dev = 10

# a. Percentage of individuals with heights between 160 cm and 180 cm
z1 = (160 - mean_height) / std_dev
z2 = (180 - mean_height) / std_dev
percentage_between = stats.norm.cdf(z2) - stats.norm.cdf(z1)
percentage_between *= 100

print(f"Percentage of individuals with heights between 160 cm and 180 cm: {percentage_between:.2f}%")

Percentage of individuals with heights between 160 cm and 180 cm: 68.27%


***Ans b. To calculate the probability that the average height of a randomly selected group of 100 individuals is greater than 175 cm, we can use the Central Limit Theorem. The Central Limit Theorem states that the distribution of sample means approaches a normal distribution with a mean equal to the population mean and a standard deviation equal to the population standard deviation divided by the square root of the sample size.***

***The standard deviation of the sample means is calculated as σ / sqrt(n), where σ is the population standard deviation and n is the sample size.***

***In this case, the population standard deviation is 10 cm, and the sample size is 100.***

***Standard deviation of the sample means = 10 / sqrt(100) = 10 / 10 = 1***

***Now, we can calculate the z-score for a height of 175 cm using the formula: z = (x - μ) / σ, where x is the height, μ is the mean, and σ is the standard deviation.***

z = (175 - 170) / 1 = 5

***Using a standard normal distribution table or a statistical software, we can find the area to the right of z = 5, which represents the probability that the average height is greater than 175 cm.***

***The area to the right of z = 5 is very close to 0. Therefore, the probability that the average height of a randomly selected group of 100 individuals is greater than 175 cm is nearly 0.***

In [10]:
# b. Probability that the average height of 100 individuals is greater than 175 cm
sample_size = 100
sample_mean_std_dev = std_dev / (sample_size ** 0.5)
z_score = (175 - mean_height) / sample_mean_std_dev
probability_greater = 1 - stats.norm.cdf(z_score)

print(f"Probability that the average height of 100 individuals is greater than 175 cm: {probability_greater:.4f}")


Probability that the average height of 100 individuals is greater than 175 cm: 0.0000


***Ans c. To calculate the z-score corresponding to a height of 185 cm, we can use the formula: z = (x - μ) / σ, where x is the height, μ is the mean, and σ is the standard deviation.***

z = (185 - 170) / 10 = 15 / 10 = 1.5

***Therefore, the z-score corresponding to a height of 185 cm is 1.5.***

In [11]:
# c. Z-score corresponding to a height of 185 cm
height = 185
z_score = (height - mean_height) / std_dev

print(f"Z-score corresponding to a height of 185 cm: {z_score:.2f}")

Z-score corresponding to a height of 185 cm: 1.50


***Ans d. To find the approximate height corresponding to the threshold where 5% of the dataset has heights below that value, we need to find the z-score that corresponds to the 5th percentile.***

***Using a standard normal distribution table or a statistical software, we can find the z-score that corresponds to the 5th percentile. In this case, the z-score is approximately -1.645.***

***Now, we can calculate the height corresponding to this z-score using the formula: x = μ + z * σ, where x is the height, μ is the mean, z is the z-score, and σ is the standard deviation.***

x = 170 + (-1.645) * 10 = 170 -16.45 = 153.55

In [13]:
# d. Approximate height corresponding to the threshold where 5% of the dataset has heights below that value
percentile = 0.05
threshold = stats.norm.ppf(percentile, mean_height, std_dev)

print(f"Approximate height corresponding to the 5th percentile: {threshold:.2f}")

Approximate height corresponding to the 5th percentile: 153.55


***Ans e. Calculate the coefficient of variation (CV) for the dataset.***
Answer
CV = (σ / μ) * 100%

(σ)=10 cm Mean (μ) = 170 cm coefficient of variation (CV) = ?

CV = (10 / 170) * 100% = 5.88%

In [14]:
# e. Coefficient of variation (CV) for the dataset
cv = (std_dev / mean_height) * 100

print(f"Coefficient of variation (CV) for the dataset: {cv:.2f}%")

Coefficient of variation (CV) for the dataset: 5.88%


***Ans f. Calculate the skewness of the dataset and interpret the result.***

In [15]:
# f. Skewness of the dataset
skewness = 0  # Given that the skewness is approximately zero for the dataset

print(f"Skewness of the dataset: {skewness}")

Skewness of the dataset: 0
