# Descriptive statistics solutions

### Exercise 1

In [1]:
import numpy as np

np.random.seed(42)

normal = np.random.normal(size = 100)
chi = np.random.chisquare(3, 100)

print(f"Normal vector: {normal}")
print(f"Chi-square vector: {chi}")

Normal vector: [ 0.49671415 -0.1382643   0.64768854  1.52302986 -0.23415337 -0.23413696
  1.57921282  0.76743473 -0.46947439  0.54256004 -0.46341769 -0.46572975
  0.24196227 -1.91328024 -1.72491783 -0.56228753 -1.01283112  0.31424733
 -0.90802408 -1.4123037   1.46564877 -0.2257763   0.0675282  -1.42474819
 -0.54438272  0.11092259 -1.15099358  0.37569802 -0.60063869 -0.29169375
 -0.60170661  1.85227818 -0.01349722 -1.05771093  0.82254491 -1.22084365
  0.2088636  -1.95967012 -1.32818605  0.19686124  0.73846658  0.17136828
 -0.11564828 -0.3011037  -1.47852199 -0.71984421 -0.46063877  1.05712223
  0.34361829 -1.76304016  0.32408397 -0.38508228 -0.676922    0.61167629
  1.03099952  0.93128012 -0.83921752 -0.30921238  0.33126343  0.97554513
 -0.47917424 -0.18565898 -1.10633497 -1.19620662  0.81252582  1.35624003
 -0.07201012  1.0035329   0.36163603 -0.64511975  0.36139561  1.53803657
 -0.03582604  1.56464366 -2.6197451   0.8219025   0.08704707 -0.29900735
  0.09176078 -1.98756891 -0.21967189

#### Measures of central tendency

##### Mean

In [2]:
import statistics as stats

print(f"Normal mean: {stats.mean(normal)}")
print(f"Chi mean: {stats.mean(chi)}")

Normal mean: -0.10384651739409385
Chi mean: 2.9380795335328225


##### Median

In [21]:
print(f"Normal median: {stats.median(normal)}")
print(f"Chi median: {stats.median(chi)}")

Normal median: -0.1269562917797126
Chi median: 2.4636148965577283


##### Mode

In a continuous array, values are usually represented as intervals rather than distinct numbers. Thus, finding the mode is slightly different from how it's done in a discrete array. For continuous data, we typically find the "modal class" or the interval with the highest frequency.

In [22]:
print(f"Normal mode: {stats.mode(normal)}")
print(f"Chi mode: {stats.mode(chi)}")

Normal mode: 0.4967141530112327
Chi mode: 0.4168513022813494


#### Measures of dispersion

##### Range

In [23]:
range_normal = max(normal) - min(normal)
range_chi = max(chi) - min(chi)
print(f"Normal range: {range_normal}")
print(f"Normal chi: {range_chi}")

Normal range: 4.472023288598682
Normal chi: 12.592089274962756


##### Variance and standard deviation

In [24]:
var_normal = stats.variance(normal)
std_normal = stats.stdev(normal)
var_chi = stats.variance(chi)
std_chi = stats.stdev(chi)

print(f"Normal variance: {var_normal} and std: {std_normal}")
print(f"Chi variance: {var_chi} and std: {std_chi}")

Normal variance: 0.82476989363016 and std: 0.9081684280078007
Chi variance: 5.87576054587392 and std: 2.4239968122656266


#### Shape measures

##### Skewness

In [25]:
from scipy.stats import skew

skew_normal = skew(normal)
skew_chi = skew(chi)

print(f"Normal skewness: {skew_normal}")
print(f"Chi skewness: {skew_chi}")

Normal skewness: -0.17526772024433726
Chi skewness: 1.6683703423622345


##### Kurtosis

In [26]:
from scipy.stats import kurtosis

kurt_normal = kurtosis(normal)
kurt_chi = kurtosis(chi)

print(f"Normal kurtosis: {kurt_normal}")
print(f"Chi kurtosis: {kurt_chi}")

Normal kurtosis: -0.1554047077420817
Chi kurtosis: 3.620577909892315


As can be seen, the statistics yield intrinsic values and characteristics for each distribution.

- The vector following a normal distribution has a skewness that is equal to 0, indicating that the mean, median and mode coincide and are 0. On the other hand, the kurtosis is close to 0, so it follows a normal distribution with high accuracy.
- The vector following a chi-square distribution has a skewness close to 1, with the largest amount of the data on the left, which coincides with the very definition of the distribution. This skewness reflects that the mean is larger than the mode and the median, thus characterizing the chi-square distribution.

### Exercise 2

In [27]:
import math
import sys

# Define the standard deviation function

def sd_calc(data):
    n = len(data)

    if (n <= 1):
        return 0.0

    mean, sd = avg_calc(data), 0.0

    for d in data:
        sd += (float(d) - mean) ** 2
    sd = math.sqrt(sd / float(n - 1))

    return sd

def avg_calc(data):
    n, mean = len(data), 0.0

    if (n <= 1):
        return data[0]

    for d in data:
        mean = mean + float(d)

    mean = mean / float(n)
    return mean


data = [4, 2, 5, 8, 6]
print(f"Sample Data: {data}")
print(f"Standard Deviation: {sd_calc(data)}")

Sample Data: [4, 2, 5, 8, 6]
Standard Deviation: 2.23606797749979
