<a href="https://colab.research.google.com/github/ranjithdurgunala/ML-LAB-2025-2026/blob/main/Statistical_Measures.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Statistical measures**

Statistical measures help summarize and describe data sets: central tendency (mean, median, mode) and dispersion (variance, standard deviation).

**Central Tendency Measures**

**Mean** is the arithmetic average, found by summing all data values and dividing by the number of items. It gives a sense of the "typical" value in a dataset, but is sensitive to extreme values (outliers).

**Median** is the middle value when the dataset is ordered by size. If there is an even number of items, it's the average of the two middle ones. The median is much less affected by outliers or skewed data.

**Mode** is the value that occurs most frequently within the dataset. It's most useful with categorical or discrete data but can be used for numeric values as well.

These measures help identify a single representative value that reflects the "center" of the data.

**Measures of Dispersion**

**Variance** measures how far the numbers in the data set are spread out from the mean. A higher variance shows the data is widely spread, while a low variance suggests values are closely clustered around the mean.

**Standard Deviation** is the square root of variance and is in the same units as the data. It provides a direct sense of the average distance each value differs from the mean.

Both variance and standard deviation help describe the "spread" or "variability" of the data, indicating how much individual values tend to differ from the typical value.

In [1]:
import numpy as np
from statistics import mean, median, mode

# Example data
data = [2, 4, 4, 4, 5, 5, 7, 9]

# Central Tendency Measures
print("Mean:", mean(data))         # Mean
print("Median:", median(data))     # Median
print("Mode:", mode(data))         # Mode

# Measures of Dispersion
print("Variance:", np.var(data))            # Variance (Population)
print("Standard Deviation:", np.std(data))  # Standard Deviation (Population)

Mean: 5
Median: 4.5
Mode: 4
Variance: 4.0
Standard Deviation: 2.0


**Math library**

The math library in Python is a built-in module that provides advanced mathematical functions and constants for performing a variety of calculations beyond basic operators. It is widely used for tasks in science, engineering, data analysis, and everyday programming.

**1. Number Functions**

math.ceil(x): Returns the smallest integer greater than or equal to x.

math.floor(x): Returns the largest integer less than or equal to x.

math.trunc(x): Truncates the decimal and returns the integer part.

In [None]:
import math
print(math.ceil(2.3))
print(math.floor(2.7))
print(math.trunc(8.9))

3
2
8


**2. Power and Logarithmic Functions**

math.pow(x, y): Returns x power y as a float.

math.sqrt(x): Returns the square root of x.

math.exp(x): Returns e power x  (the exponential of x).

math.log(x, base): Returns the logarithm of x to the given base.

In [None]:
print(math.pow(2, 3))
print(math.sqrt(16))
print(math.exp(2))
print(math.log(100, 10))

8.0
4.0
7.38905609893065
2.0


**3. Trigonometric Functions**

math.sin(x), math.cos(x), math.tan(x): Trigonometric functions (input in radians).

math.radians(x): Converts degrees to radians.

math.degrees(x): Converts radians to degrees.


In [None]:
print(math.sin(math.pi/2))
print(math.cos(0))
print(math.tan(math.pi/4))

print(math.radians(90))

print(math.degrees(math.pi))

1.0
1.0
0.9999999999999999
1.5707963267948966
180.0


**4. Special Functions**

math.factorial(x): Returns the factorial of x (x!).

math.gcd(a, b): Computes the greatest common divisor.

math.fabs(x): Returns the absolute value as a float.

In [None]:
print(math.factorial(5))
print(math.gcd(20, 8))
print(math.fabs(-7.5))

120
4
7.5


**5. Constants**

math.pi: The value of π.

math.e: The value of Euler’s number e.


In [None]:
print(math.pi)
print(math.e)

3.141592653589793
2.718281828459045


**SciPy**

SciPy is a scientific computation library that uses NumPy underneath.

SciPy stands for Scientific Python.

It provides more utility functions for optimization, stats and signal processing.

Like NumPy, SciPy is open source so we can use it freely.

SciPy was created by NumPy's creator Travis Olliphant.

In [2]:
#Print the constant value of PI:

from scipy import constants

print(constants.pi)

print(constants.yotta)    #1e+24
print(constants.zetta)    #1e+21
print(constants.exa)      #1e+18
print(constants.peta)     #1000000000000000.0
print(constants.tera)     #1000000000000.0
print(constants.giga)     #1000000000.0
print(constants.mega)     #1000000.0
print(constants.kilo)     #1000.0
print(constants.hecto)    #100.0
print(constants.deka)     #10.0
print(constants.deci)     #0.1
print(constants.centi)    #0.01
print(constants.milli)    #0.001
print(constants.micro)    #1e-06
print(constants.nano)     #1e-09
print(constants.pico)     #1e-12
print(constants.femto)    #1e-15
print(constants.atto)     #1e-18
print(constants.zepto)    #1e-21

3.141592653589793
1e+24
1e+21
1e+18
1000000000000000.0
1000000000000.0
1000000000.0
1000000.0
1000.0
100.0
10.0
0.1
0.01
0.001
1e-06
1e-09
1e-12
1e-15
1e-18
1e-21


In [3]:
#Binary Prefixes:
#Return the specified unit in bytes (e.g. kibi returns 1024)

print(constants.kibi)    #1024
print(constants.mebi)    #1048576
print(constants.gibi)    #1073741824
print(constants.tebi)    #1099511627776
print(constants.pebi)    #1125899906842624
print(constants.exbi)    #1152921504606846976
print(constants.zebi)    #1180591620717411303424
print(constants.yobi)    #1208925819614629174706176

1024
1048576
1073741824
1099511627776
1125899906842624
1152921504606846976
1180591620717411303424
1208925819614629174706176


In [4]:
#Connected Components
#Find all of the connected components with the connected_components() method.
import numpy as np
from scipy.sparse.csgraph import connected_components
from scipy.sparse import csr_matrix

arr = np.array([
  [0, 1, 2],
  [1, 0, 0],
  [2, 0, 0]
])

newarr = csr_matrix(arr)

print(connected_components(newarr))

(1, array([0, 0, 0], dtype=int32))


**Dijkstra**

Use the dijkstra method to find the shortest path in a graph from one element to another.

It takes following arguments:

return_predecessors: boolean (True to return whole path of traversal otherwise False).
indices: index of the element to return all paths from that element only.
limit: max weight of path.

In [5]:
#Find the shortest path from element 1 to 2:

import numpy as np
from scipy.sparse.csgraph import dijkstra
from scipy.sparse import csr_matrix

arr = np.array([
  [0, 1, 2],
  [1, 0, 0],
  [2, 0, 0]
])

newarr = csr_matrix(arr)

print(dijkstra(newarr, return_predecessors=True, indices=0))

(array([0., 1., 2.]), array([-9999,     0,     0], dtype=int32))


In [6]:
#Floyd Warshall
#Use the floyd_warshall() method to find shortest path between all pairs of elements.

#Find the shortest path between all pairs of elements:

import numpy as np
from scipy.sparse.csgraph import floyd_warshall
from scipy.sparse import csr_matrix

arr = np.array([
  [0, 1, 2],
  [1, 0, 0],
  [2, 0, 0]
])

newarr = csr_matrix(arr)

print(floyd_warshall(newarr, return_predecessors=True))

(array([[0., 1., 2.],
       [1., 0., 3.],
       [2., 3., 0.]]), array([[-9999,     0,     0],
       [    1, -9999,     0],
       [    2,     0, -9999]], dtype=int32))


In [7]:
#Bellman Ford
#The bellman_ford() method can also find the shortest path between all pairs of elements, but this method can handle negative weights as well.

#Find shortest path from element 1 to 2 with given graph with a negative weight:

import numpy as np
from scipy.sparse.csgraph import bellman_ford
from scipy.sparse import csr_matrix

arr = np.array([
  [0, -1, 2],
  [1, 0, 0],
  [2, 0, 0]
])

newarr = csr_matrix(arr)

print(bellman_ford(newarr, return_predecessors=True, indices=0))

(array([ 0., -1.,  2.]), array([-9999,     0,     0], dtype=int32))
