Common Statistical Functions in NumPy

1. Mean – np.mean()

Definition: The arithmetic average of a set of numbers.

In [22]:
import numpy as np

data = np.array([10, 20, 20, 40, 50, 60, 70, 80, 90, 100])

mean = np.mean(data)
print("Mean:", mean)


Mean: 54.0


Explanation:

* Adds up all elements in data and divides by the number of elements.

2. Median – np.median()

Definition: The middle value in a sorted array. If even number of elements, it returns the average of the two middle numbers.

In [23]:
median = np.median(data)
print("Median:", median)

Median: 55.0


Explanation:

* Sorted data: [10, 20, 20, 40, 50, 60, 70, 80, 90, 100]

* Median = average of 5th (50) and 6th (60) elements

3. Mode – scipy.stats.mode() (NumPy does not directly support mode)

Definition: The value that appears most frequently in the dataset.

In [24]:
from scipy import stats

mode_result = stats.mode(data, keepdims=True)
print("Mode:", mode_result.mode[0], "Count:", mode_result.count[0])


Mode: 20 Count: 2


Explanation:

* 20 appears twice, more than any other number.

* Mode: 20, Count: 2

4. Standard Deviation – np.std()

Definition: Measures the spread of numbers from the mean.

In [25]:
std_dev = np.std(data)
print("Standard Deviation:", std_dev)

Standard Deviation: 29.732137494637012


Explanation:
 
* A higher standard deviation means the data is more spread out.

5. Variance – np.var()

Definition: The square of the standard deviation.

In [26]:
variance = np.var(data)
print("Variance:", variance)

Variance: 884.0


Explanation:

* Variance = std_dev²

* It shows how far the numbers are spread out from the mean.

6. Minimum and Maximum – np.min(), np.max()

Definition: Returns the smallest and largest values in the array.

In [27]:
minimum = np.min(data)
maximum = np.max(data)
print("Min:", minimum, "Max:", maximum)

Min: 10 Max: 100


Explanation:

* Useful for checking data ranges and outliers.

7. Percentile – np.percentile()

Definition: Shows the value below which a given percentage of data falls.

In [28]:
q25 = np.percentile(data, 25)  # 25th percentile (1st Quartile)
q75 = np.percentile(data, 75)  # 75th percentile (3rd Quartile)
print("25th Percentile (Q1):", q25)
print("75th Percentile (Q3):", q75)

25th Percentile (Q1): 25.0
75th Percentile (Q3): 77.5


Explanation:

* Helps in understanding distribution and outliers (used in box plots).

* 25th percentile means 25% of values are less than or equal to that number.

8. Range – Difference between max and min

In [29]:
data_range = np.ptp(data)  # Peak-to-peak
print("Range:", data_range)

Range: 90


Explanation:

* Gives you a quick idea of the spread of values.

* Range = max - min

9. Skewness & Kurtosis (using SciPy)

In [30]:
from scipy.stats import skew, kurtosis

skewness = skew(data)
kurt = kurtosis(data)
print("Skewness:", skewness)
print("Kurtosis:", kurt)


Skewness: -0.0027393914687632638
Kurtosis: -1.3184824225548208


Explanation:

Skewness: Measures asymmetry.

* 0: right-skewed (tail on right)

* <0: left-skewed


Kurtosis: Measures peakedness.

* 0: sharper peak (leptokurtic)

* <0: flatter peak (platykurtic)

10. Cumulative Sum – np.cumsum()

Definition: The running total of elements in the array.

In [31]:
cumulative = np.cumsum(data)

print("Cumulative Sum:", cumulative)

Cumulative Sum: [ 10  30  50  90 140 200 270 350 440 540]


Explanation:

* Useful in time series or progressive totals.

* Example: [10, 30, 50, 90, ...]

| Function       | Description                 | Syntax                   |
| -------------- | --------------------------- | ------------------------ |
| Mean           | Average of elements         | `np.mean(data)`          |
| Median         | Middle value                | `np.median(data)`        |
| Mode           | Most frequent value         | `stats.mode(data)`       |
| Std Deviation  | Spread from mean            | `np.std(data)`           |
| Variance       | Square of std deviation     | `np.var(data)`           |
| Min & Max      | Smallest and largest values | `np.min(data), np.max()` |
| Percentiles    | Value at given percentile   | `np.percentile(data, p)` |
| Range          | Max - Min                   | `np.ptp(data)`           |
| Skewness       | Asymmetry of distribution   | `skew(data)`             |
| Kurtosis       | Peakedness of distribution  | `kurtosis(data)`         |
| Cumulative Sum | Running total               | `np.cumsum(data)`        |
