# Position and dispersion measurements
- Descriptive statistics: describes and summarizes a set of data.

- Position measures: Mean, median and mode, Arithmetic mean, geometric mean, harmonic mean and quadratic mean.
- Quartiles and percentiles.

- Dispersion Measures:Variance, standard deviation and coefficient of variation.

## Position Measures

## Arithmetic mean, mode and median

In [3]:
#Imports
import pandas as pd
import numpy as np
import statistics
from scipy import stats
import math

import warnings
warnings.filterwarnings("ignore")

In [2]:
data = np.array([160, 165, 167, 164, 160, 166, 160, 161, 150, 152, 173, 160, 155,
                  164, 168, 162, 161, 168, 163, 156, 155, 169, 151, 170, 164,
                  155, 152, 163, 160, 155, 157, 156, 158, 158, 161, 154, 161, 156, 172, 153])

Arithmetic mean
![geo.png](attachment:geo.png)

In [3]:
#Arithmetic mean with numpy

data.mean()

160.375

In [4]:
#With statistics

statistics.mean(data)

160

Mode

In [5]:
#With statistics

statistics.mode(data)

160

In [6]:
#With stats

stats.mode(data)

ModeResult(mode=array([160]), count=array([5]))

Median

In [7]:
np.median(data)

160.0

In [8]:
statistics.median(data)

160.0

## Weighted mean
![weighted%20mean.png](attachment:weighted%20mean.png)

In [11]:
grades = np.array([9, 8 , 7, 3])
weights = np.array([1, 2, 3, 4])

In [14]:
#way one
w_mean = (grades * weights).sum() / weights.sum()
w_mean

5.8

In [15]:
#way two
np.average(grades, weights = weights)

5.8

## Geometric, harmonic and quadratic mean

Geometric mean
![geometric.webp](attachment:geometric.webp)

In [16]:
from scipy.stats.mstats import gmean

In [17]:
gmean(data)

160.26958390038902

Harmonic
![harmonic.png](attachment:harmonic.png)

In [18]:
from scipy.stats.mstats import hmean

In [19]:
hmean(data)

160.16471947994674

Quadratic mean
- Used to evaluete machine learning performance
![quadratic.png](attachment:quadratic.png)

In [20]:
def quadratic_mean(data):
    return math.sqrt(sum(n * n for n in data) / len(data))

In [21]:
quadratic_mean(data)

160.48091786876097

## Quartiles
![quartilesss.png](attachment:quartilesss.png)

In [23]:
data2 = [150, 151, 152, 152, 153, 154, 155, 155, 155]

Numpy

In [26]:
q1 = np.quantile(data2, 0.25)
q2 = np.quantile(data2, 0.5)
q3 = np.quantile(data2, 0.75)
print(f'Q1: {q1}, Q2: {q2}, Q3: {q3}')

Q1: 152.0, Q2: 153.0, Q3: 155.0


In [27]:
np.median(data2)

153.0

Scipy

In [28]:
stats.scoreatpercentile(data2, 25), stats.scoreatpercentile(data2, 50), stats.scoreatpercentile(data2, 75)

(152.0, 153.0, 155.0)

Pandas

In [29]:
dframe = pd.DataFrame(data2)

In [30]:
dframe.quantile([0.25, 0.50, 0.75])

Unnamed: 0,0
0.25,152.0
0.5,153.0
0.75,155.0


In [33]:
dframe.describe()

Unnamed: 0,0
count,9.0
mean,153.0
std,1.870829
min,150.0
25%,152.0
50%,153.0
75%,155.0
max,155.0


## Dispersion Measures

![varian%20and%20stand.png](attachment:varian%20and%20stand.png)

In [34]:
base = np.array([150, 151, 152, 152, 153, 154, 155, 155, 155])

Variance
- The greater the variance, more the elements are varying from the mean.

In [35]:
np.var(base)

3.111111111111111

In [36]:
statistics.variance(base)

3

In [37]:
from scipy import ndimage
ndimage.variance(base)

3.111111111111111

Standard deviation
- Is the square root of the variance, how far the values ​​are from the "expected value".

In [38]:
np.std(base)

1.7638342073763937

In [39]:
statistics.stdev(base)

1.7320508075688772

Coefficient of variation, is the standard deviation devided by the mean.
![cv1.webp](attachment:cv1.webp)

In [40]:
cv = np.std(base) / np.mean(base) * 100
cv

1.1528328152786886

In [41]:
#with scipy

stats.variation(base) * 100

1.1528328152786886

## Absolute and relative data
- Absolute data: data that is colected from the font without any kind of manipulation (only sum or sorting).

- Relative data: data that has some kind of manipulation which will help to understand and compare the data. Such as:
porcentages, indexes, coefficients and rates.

- Indexes: ratio between two quantities, summarizes in a single number the general behavior of a variable.
- Coefficient: ratio between the number of occurrences and the total number.
- Rates: coefficient multiplied by a power of 10 or 100 or 1000.

In [1]:
#Number of employees hired 
data3 = {'Job':['Database_Maneger', 'Programmer', 'Computer_Networ_Architect'],
        'new_jersey': [97350, 82080, 112840],
        'florida': [77140, 71540, 62310]}

In [4]:
df = pd.DataFrame(data3)

In [5]:
type(df)

pandas.core.frame.DataFrame

In [6]:
df

Unnamed: 0,Job,new_jersey,florida
0,Database_Maneger,97350,77140
1,Programmer,82080,71540
2,Computer_Networ_Architect,112840,62310


In [7]:
df['new_jersey'].sum()

292270

In [8]:
df['florida'].sum()

210990

In [9]:
#Calculating the percentage

df['%_new_jersey'] = (df['new_jersey'] / df['new_jersey'].sum()) * 100
df['%_florida'] = (df['florida'] / df['florida'].sum()) * 100

In [10]:
df

Unnamed: 0,Job,new_jersey,florida,%_new_jersey,%_florida
0,Database_Maneger,97350,77140,33.308242,36.560974
1,Programmer,82080,71540,28.083621,33.90682
2,Computer_Networ_Architect,112840,62310,38.608136,29.532205


In [11]:
df4 = pd.DataFrame({'Gratuation year': ['1º', '2º', '3º', '4º', 'total'],
          'March enrollment':[70, 50, 47, 23, 190],
          'November enrollment': [65, 48, 40, 22,175]})

In [12]:
type(df4)

pandas.core.frame.DataFrame

In [14]:
df4['Dropout Rate'] = (df4['March enrollment'] - df4['November enrollment']) / df4['March enrollment'] * 100

In [15]:
df4

Unnamed: 0,Gratuation year,March enrollment,November enrollment,Dropout Rate
0,1º,70,65,7.142857
1,2º,50,48,4.0
2,3º,47,40,14.893617
3,4º,23,22,4.347826
4,total,190,175,7.894737
