<a href="https://colab.research.google.com/github/lucianoayres/tdd-python-statistics-standard-deviation/blob/main/Algorithm_TDD_Calculate_Standard_Deviation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Standard Deviation (Desvio Padrão)

The standard deviation is a statistical measure that indicates how much the values in a dataset are spread out around the mean of that dataset. In other words, it shows how far the data points are from the mean.

To calculate the standard deviation, you follow these steps:

1. Calculate the mean of the numbers.
2. Subtract the mean from each number to find the difference.
3. Square each difference.
4. Calculate the mean of these squares.
5. Take the square root of this mean.

Here's a simple example:

Let's say you have the following set of numbers: 2, 4, 4, 4, 5, 5, 7, 9.

Step 1: Calculate the mean.
Mean = (2 + 4 + 4 + 4 + 5 + 5 + 7 + 9) / 8 = 40 / 8 = 5.

Step 2: Subtract the mean from each number.
Differences: (-3, -1, -1, -1, 0, 0, 2, 4).

Step 3: Square each difference.
Squares: (9, 1, 1, 1, 0, 0, 4, 16).

Step 4: Calculate the mean of these squares.
Mean of squares = (9 + 1 + 1 + 1 + 0 + 0 + 4 + 16) / 8 = 32 / 8 = 4.

Step 5: Take the square root of this mean.
Standard deviation = √4 = 2.

Therefore, the standard deviation of this set of numbers is 2. This indicates that, on average, each number is approximately 2 units away from the mean.

In [2]:
# calculate mean function
def calculate_mean(number_list):
  list_lenght = len(number_list)
  if (list_lenght) == 0:
    return 0
  else:
    list_sum = sum(number_list)
    return list_sum / list_lenght

In [3]:
# test cases
import unittest

class TestCalculateMean(unittest.TestCase):
  def test_empty_list(self):
    self.assertEqual(calculate_mean([]), 0)

  def test_single_element(self):
    self.assertEqual(calculate_mean([5]), 5.0)

  def test_positive_numbers(self):
    self.assertEqual(calculate_mean([1, 2, 3, 4, 5]), 3.0)

  def test_negative_numbers(self):
    self.assertEqual(calculate_mean([-1, -2, -3, -4, -5]), -3.0)

  def test_mixed_numbers(self):
    self.assertEqual(calculate_mean([-1, 0, 1]), 0.0)

  def test_float_numbers(self):
    self.assertAlmostEqual(calculate_mean([1.5, 2.5, 3.5]), 2.5)

if __name__ == '__main__':
    unittest.main(argv=[''], exit=False)

......
----------------------------------------------------------------------
Ran 6 tests in 0.018s

OK


In [4]:
# subtract number from list function
def subtract_number_from_list(number, number_list):
  list_lenght = len(number_list)
  if (list_lenght) == 0:
    return []
  else:
    return [x - number for x in number_list]

In [5]:
# test cases
import unittest

class TestSubctractNumberFromList(unittest.TestCase):
  def test_empty_list(self):
    self.assertEqual(subtract_number_from_list(1,[]),[])

  def test_single_element(self):
    self.assertEqual(subtract_number_from_list(1,[5]), [4])

  def test_negative_numbers(self):
    self.assertEqual(subtract_number_from_list(1,[-1, -2, -3, -4, -5]), [-2, -3, -4, -5, -6])

  def test_mixed_numbers(self):
    self.assertEqual(subtract_number_from_list(1,[-1, 0, 1]), [-2, -1, 0])

  def test_float_numbers(self):
    self.assertEqual(subtract_number_from_list(1,[1.5, 2.5, 3.5]), [0.5, 1.5, 2.5])

if __name__ == '__main__':
    unittest.main(argv=[''], exit=False)

...........
----------------------------------------------------------------------
Ran 11 tests in 0.024s

OK


In [6]:
# square numbers function
def square_numbers(number_list):
  list_lenght = len(number_list)
  if (list_lenght) == 0:
    return []
  else:
    return [x ** 2 for x in number_list]

In [1]:
import math

# calculate standard deviation function
def calculate_standard_deviation(number_list):
  list_lenght = len(number_list)
  if (list_lenght) == 0:
    return 0
  else:
    mean = calculate_mean(number_list)
    list_subtracted_by_mean = subtract_number_from_list(mean, number_list)
    list_square = square_numbers(list_subtracted_by_mean)
    square_mean = calculate_mean(list_square)
    square_root = math.sqrt(square_mean)
    return square_root

In [14]:
# test cases
class TestCalculateStandardDeviation(unittest.TestCase):
    def test_empty_list(self):
        self.assertEqual(calculate_standard_deviation([]), 0)

    def test_positive_numbers(self):
        self.assertEqual(calculate_standard_deviation([2, 4, 4, 4, 5, 5, 7, 9]), 2)

    def test_negative_numbers(self):
        self.assertAlmostEqual(calculate_standard_deviation([-2, -4, -4, -4, -5, -5, -7, -9]), 2)

    def test_mixed_numbers(self):
        self.assertAlmostEqual(calculate_standard_deviation([-2, 4, -4, 4, 5.5, -5, 5.5, -7, -9]), 5.705139008920915)

    def test_single_number(self):
        self.assertEqual(calculate_standard_deviation([5]), 0)

    def test_repeated_numbers(self):
        self.assertEqual(calculate_standard_deviation([2, 2, 2, 2]), 0)

    def test_large_number_of_elements(self):
        self.assertAlmostEqual(calculate_standard_deviation(list(range(1, 1001))), 288.67499025753324)

    def test_identical_numbers(self):
        self.assertEqual(calculate_standard_deviation([1, 1, 1, 1, 1]), 0)

    def test_mixed_types(self):
        self.assertAlmostEqual(calculate_standard_deviation([1, 2.5, -3, 4.2, 5]), 3.165122430491434)

if __name__ == '__main__':
    unittest.main(argv=[''], exit=False)

.........FF.........
FAIL: test_mixed_numbers (__main__.TestCalculateStandardDeviation)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython-input-14-e2949fbd44ba>", line 13, in test_mixed_numbers
    self.assertAlmostEqual(calculate_standard_deviation([-2, 4, -4, 4, 5.5, -5, 5.5, -7, -9]), 5.705139008920915)
AssertionError: 5.3788566410931695 != 5.705139008920915 within 7 places (0.3262823678277451 difference)

FAIL: test_mixed_types (__main__.TestCalculateStandardDeviation)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython-input-14-e2949fbd44ba>", line 28, in test_mixed_types
    self.assertAlmostEqual(calculate_standard_deviation([1, 2.5, -3, 4.2, 5]), 3.165122430491434)
AssertionError: 2.8309715646752793 != 3.165122430491434 within 7 places (0.3341508658161545 difference)

---------------------------------------------------------------------

In [13]:
# confirm test cases results
import statistics

mixed_numbers = [-2, 4, -4, 4, 5.5, -5, 5.5, -7, -9]
mixed_types = [1, 2.5, -3, 4.2, 5]

# Calculate standard deviation using statistics module
std_dev_mixed_numbers = statistics.stdev(mixed_numbers)
std_dev_mixed_types = statistics.stdev(mixed_types)

print("Expected standard deviation for test_mixed_numbers:", std_dev_mixed_numbers)
print("Expected standard deviation for test_mixed_types:", std_dev_mixed_types)

Expected standard deviation for test_mixed_numbers: 5.705139008920915
Expected standard deviation for test_mixed_types: 3.165122430491434
result:  2.0
