## What is Central Tendency?

   Imagine there is a bunch of numbers, like the ages of people in a group. Now, think of central tendency as a way to find a "typical" or "middle" age that represents the group. There are three main measures of central tendency: mean, median, and mode.

- **Mean:** If you calculate the mean (average) age, it's like saying, "On average, people in this group are around this age."

- **Median:** If you find the median age, it's like saying, "Half of the people in this group are younger than this age, and half are older."

- **Mode:** If you find the mode, it's like saying, "The most common age in this group is this."

So, central tendency is a way to give you a single number that sort of represents where most of the data is centered. It's like finding the "middle" in different ways, depending on whether you're looking at averages, the middle point, or the most common value.

## Mean

   - The mean is the average of all values in a dataset.
   - It is calculated by adding up all the values and dividing by the number of values.
   - The mean is sensitive to extreme values and can be influenced by outliers(extreme values).

In [1]:
# In Python, mean can be calculated using the 'statistics' module or using the 'numpy' library for numerical operations.

# Using statistics module
import statistics

data = [12, 14, 15, 17, 19]

mean_value = statistics.mean(data)
print("Mean using statistics module:", mean_value)

# Using numpy
import numpy as np

mean_np = np.mean(data)
print("Mean using numpy:", mean_np)

Mean using statistics module: 15.4
Mean using numpy: 15.4


### When the data contains missing values (represented by 'NA'), numpy library provides the numpy.nanmean() function. This function calculates the mean while ignoring the missing values.

In [2]:
import numpy as np

data = [2, 4, np.nan, 5, 6]
print("data:",data)

mean_value = np.nanmean(data)
print("Mean with NA:", mean_value)

data: [2, 4, nan, 5, 6]
Mean with NA: 4.25


This function only works when the missing values represented by 'np.nan'. If you have other representations for missing values, you may need to preprocess the data accordingly before calculating the mean.

## Median
   - The median is the middle value when the data is sorted in ascending or descending order.
   - If there is an even number of observations, the median is the average of the two middle values.
   - The median is less affected by extreme values than the mean, making it a more robust measure in the presence of outliers.

In [3]:
# Median calculation is similar to calculating the mean

# Using statistics module
import statistics

data = [12, 14, 15, 17, 19]

median_value = statistics.median(data)
print("Median using statistics module:", median_value)

Median using statistics module: 15


In [4]:
# Using numpy
import numpy as np

data = np.random.randint(11,30,10)
print("Data:", data)

median_np = np.median(data)
print("Median of data:", median_np)

Data: [24 22 22 17 18 29 23 18 29 25]
Median of data: 22.5


Similar to calculating the mean, when dealing with missing values (represented by 'NA'), you can use the 'numpy' library, specifically the 'numpy.nanmedian()' function. This function calculates the median while ignoring the missing values.



## Mode
   - The mode is the value that occurs most frequently in a dataset.
   - A dataset may have no mode, one mode (unimodal), or multiple modes (multimodal).
   - Unlike mean and median, the mode can be applied to categorical data as well.

In [5]:
# Using the statistics Module

from statistics import mode

my_list = [2, 4, 3, 5, 4, 6, 4, 7]

most_common = mode(my_list)
print("Mode:",most_common)

Mode: 4


In [6]:
# Finding mode of list contains categorical data

mode(["red", "blue", "blue", "red", "green", "red", "red"])

'red'

### Handling Multiple Modes:

In 'statistics' module 'multimode()' function return a list of the most frequently occurring values in the order they were first encountered in the data. Will return more than one result if there are multiple modes or an empty list if the data is empty

In [7]:
from statistics import multimode

data = (2, 6, 4, 3, 4, 6, 5, 4, 6, 4, 7, 6)

modes = multimode(data)
print("Modes:", modes)

Modes: [6, 4]


In [8]:
# Empty data
multimode('')

[]

### Finding Mean, Median and Mode using Pandas Module

In [9]:
# Import Pandas Library
import pandas as pd

# Lets create a DataFrame

df = pd.DataFrame({'player': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'],
                   'game1': [18, 22, 19, 14, 14, 11, 20, 28],
                   'game2': [5, 7, 7, 9, 12, 9, 9, 4],
                   'game3': [11, 8, 10, 6, 6, 5, 9, 12]})
df

Unnamed: 0,player,game1,game2,game3
0,A,18,5,11
1,B,22,7,8
2,C,19,7,10
3,D,14,9,6
4,E,14,12,6
5,F,11,9,5
6,G,20,9,9
7,H,28,4,12


In [10]:
# Mean across columns
df.mean(numeric_only=True)

game1    18.250
game2     7.750
game3     8.375
dtype: float64

In [11]:
# Mean across Rows
df.mean(axis=1, numeric_only=True)

0    11.333333
1    12.333333
2    12.000000
3     9.666667
4    10.666667
5     8.333333
6    12.666667
7    14.666667
dtype: float64

In [12]:
# Mean of a series in the DataFrame
df['game1'].mean()

18.25

In [13]:
# Median
df.median(numeric_only=True)

game1    18.5
game2     8.0
game3     8.5
dtype: float64

In [14]:
df['game1'].median()

18.5

In [15]:
# Mode
df.mode(numeric_only=True)

Unnamed: 0,game1,game2,game3
0,14,9,6


In [16]:
df['game2'].mode()

0    9
Name: game2, dtype: int64

### Finding Mean, Median, Mode in Python without libraries

In [17]:
# Define the dataset
data = [12, 14, 13, 15, 14, 16, 16, 17]
data

[12, 14, 13, 15, 14, 16, 16, 17]

In [18]:
# Calculate Mean

mean_value = sum(data) / len(data)
print("Mean:", mean_value)

Mean: 14.625


In [19]:
# Calculate Median

sorted_data = sorted(data)
n = len(sorted_data)

if n % 2 == 0:
    median_value = (sorted_data[n // 2 - 1] + sorted_data[n // 2]) / 2
else:
    median_value = sorted_data[n // 2]
    
print("Median:", median_value)


Median: 14.5


In [20]:
# Calculate Mode

counts = {}
for value in data:
    counts[value] = counts.get(value, 0) + 1

max_count = max(counts.values())
modes = [key for key, value in counts.items() if value == max_count]

print("Mode:", modes)


Mode: [14, 16]
