## Selected Statistics on The Cast of The Brady Bunch

Mean age = 9.86 years. Median age = 10 years. Modal age = 8 years. Based on the data given, I would choose the mean as the best representation of central tendency, because the distribution is fairly normal, and does not show a large amount of deviation from the mean.

In [1]:
import pandas as pd
import numpy as np

In [55]:
people = pd.DataFrame()
people['name'] = ['Greg', 'Marcia', 'Peter', 'Jan', 'Bobby', 'Cindy', 'Oliver']
people['age'] = [14, 12, 11, 10, 8, 6, 8]

In [56]:
print(people)

     name  age
0    Greg   14
1  Marcia   12
2   Peter   11
3     Jan   10
4   Bobby    8
5   Cindy    6
6  Oliver    8


In [57]:
people.describe()

Unnamed: 0,age
count,7.0
mean,9.857143
std,2.734262
min,6.0
25%,8.0
50%,10.0
75%,11.5
max,14.0


In [58]:
people.median()

age    10.0
dtype: float64

In [59]:
people.mode()

Unnamed: 0,name,age
0,Bobby,8.0
1,Cindy,
2,Greg,
3,Jan,
4,Marcia,
5,Oliver,
6,Peter,


In [60]:
#variance
people.var()

age    7.47619
dtype: float64

In [62]:
#standard deviation
(people.var())**0.5

age    2.734262
dtype: float64

In [63]:
people.count()

name    7
age     7
dtype: int64

In [65]:
#standard error
((people['age'].var())**0.5)/((people['age'].count())**0.5)

1.0334540197243192

In [66]:
#standard error by other method
np.std(people['age'], ddof=1)/np.sqrt(len(people['age']))

1.0334540197243192

In [67]:
people

Unnamed: 0,name,age
0,Greg,14
1,Marcia,12
2,Peter,11
3,Jan,10
4,Bobby,8
5,Cindy,6
6,Oliver,8


## Cindy Has a Birthday - What Changes?

This change has caused the mean to increase and the standard dev, variance and standard error all to decrease. All of this is because the change has brought that one value closr to the original mean. The mode and the median remain unchanged.

In [97]:
people.loc[[5],['age']] = 7
people

Unnamed: 0,name,age
0,Greg,14
1,Marcia,12
2,Peter,11
3,Jan,10
4,Bobby,8
5,Cindy,7
6,Oliver,8


In [98]:
people.mean()

age    10.0
dtype: float64

In [99]:
people.mode()

Unnamed: 0,name,age
0,Bobby,8.0
1,Cindy,
2,Greg,
3,Jan,
4,Marcia,
5,Oliver,
6,Peter,


In [100]:
people.median()

age    10.0
dtype: float64

In [101]:
#variance
v2 = sum(people.age**2)/len(people) - (people.mean())**2
print(v2)

age    5.428571
dtype: float64


In [102]:
#variance by other method as a check
#people.var()
np.var(people.age)

5.428571428571429

In [103]:
#standard deviation
sd2 = v2**0.5
print(sd2)

age    2.329929
dtype: float64


In [104]:
#standard deviation using numpy
np.std(people['age'])

2.32992949004287

In [96]:
#standard error
print(sd2/(len(people)**0.5))

age    0.880631
dtype: float64


# Remove Oliver (age 8) and add Jessica (age 1)

This has considerably increased the variance, standard deviation and standard error. The mean has decreased as the new entrant has an age far below the original mean. At this point the median may be the better choice for representation of central tendency, as the variance and standard deviation are high. There is no modal value.


In [106]:
people[6:] = ['Jessica', 1]

In [107]:
people

Unnamed: 0,name,age
0,Greg,14
1,Marcia,12
2,Peter,11
3,Jan,10
4,Bobby,8
5,Cindy,7
6,Jessica,1


In [108]:
people.mean()

age    9.0
dtype: float64

In [110]:
people['age'].mode()

0     1
1     7
2     8
3    10
4    11
5    12
6    14
dtype: int64

In [111]:
people['age'].median()

10.0

In [118]:
#variance
v3 = sum(people['age']**2)/len(people) - people.mean()**2
print(v3)

age    15.428571
dtype: float64


In [119]:
#variance by other method ***Anomalous***
people['age'].var()

18.0

In [120]:
people.var()

age    18.0
dtype: float64

In [121]:
np.var(people.age)

15.428571428571429

In [123]:
#standard deviation
sd3 = v3**0.5
print(sd3)

age    3.927922
dtype: float64


In [124]:
#standard deviation by other method
np.std(people.age)

3.927922024247863

In [126]:
#standard error
sd3/len(people)**0.5

age    1.484615
dtype: float64