## Introduction

Contained within this notebook is work specifically pertaining to statistics, with a primary focus on a specific branch called descriptive statistics.

Descriptive statistics involves the characterization and depiction of data, encompassing measures that assess central tendency, dispersion, and the shape of the data.

<b>Measure of Central Tendency:-</b> A Measure of Central Tendency refers to a singular value or number that represents the center of a dataset, with all data points exhibiting an equal distance from this central point. There are three common metrics used to measure central tendency: the Mean, Median, and Mode.

<b>Measure of Dispersion:-</b> A Measure of Dispersion denotes a value that illustrates the extent to which data is spread out from the central tendency or the variability within the data. It provides insights into the spread of the data. Two commonly used metrics to measure dispersion are the Standard Deviation and Variance.

## Required Dependencies

In [1]:
import statistics
from typing import Sequence

In [2]:
from warnings import filterwarnings
filterwarnings("ignore")

In [3]:
import numpy as np
import pandas as pd

In [4]:
from matplotlib import pyplot as plt
import seaborn as sns

## Data

Data can be described as individual units of information that are stored in both structured and unstructured formats. When data is organized in a CSV file or an Excel spreadsheet, it is categorized as structured data. On the other hand, when data lacks a specific organization, such as in the form of text, paragraphs, or tables, it falls under the category of unstructured data.

In the world of data, two main types can be distinguished: qualitative data and quantitative data.

### Qualitative Data

Qualitative data refers to categorical information found within a dataset, which can be further categorized into two types: ordinal data and nominal data. Ordinal data is characterized by a specific order among its categories, such as the sizes of T-shirts (e.g., Small, Medium, Large) where there is a clear hierarchy like "Small < Medium < Large" or "Large > Medium > Small." On the other hand, nominal data represents categories without any inherent order, like countries (e.g., U.S.A, Canada, England, Scotland), where no meaningful order can be established.

### Quantative Data

Quantitative data pertains to numerical information present in a dataset, which can be classified into two categories: integer points and floating points. Floating points consist of values that include decimal places, such as 1.2, 5.5, and 3.2. An example of this could be the amount paid in dollars and cents. On the other hand, integer points represent whole numbers, like the number of apples purchased by different customers, such as 3, 3, 5, and 1. In this case, the store does not sell fractional amounts of apples like 4.5 or 3.2.

## Measure of Central Tendency

### Mean Calculation Algorithm and Code

In mathematical terms, the mean is defined as the sum of all observations divided by the total number of observations.

#### Mean using traditional functional program

In [5]:
np.random.seed(52)
random_values = np.random.randint(low=1, high=10, size=10)
random_values

array([6, 8, 7, 8, 1, 6, 4, 4, 2, 4])

In [6]:
def calculate_mean(num_sequence: Sequence) -> float:
    
    sum_of_values = 0
    number_of_values = len(num_sequence)
    
    for i in num_sequence:
        sum_of_values += i
    
    mean_of_data = sum_of_values/number_of_values
    return mean_of_data

In [7]:
calculate_mean(num_sequence=random_values)

5.0

#### Mean using numpy arrays

In [8]:
def calculate_mean_using_np_array(num_sequence: Sequence) -> float:
    
    num_sequence = np.array(num_sequence)
    mean_value = np.mean(num_sequence)
    return mean_value

In [9]:
calculate_mean_using_np_array(num_sequence=random_values)

5.0

#### Mean using statistics module

In [10]:
def calculate_mean_using_statistics(num_sequence: Sequence) -> float:
    
    mean_value = statistics.mean(num_sequence)
    return mean_value

In [11]:
calculate_mean_using_statistics(num_sequence=random_values)

5

### Median

### Mode