**Central Tendency**

The goal is to calculate and display the central tendency measures (Mean, Median, Mode) of a given dataset. These measures help summarize the data by representing a "central" value around which other data points cluster.


**Key Objectives:**

*  Understand and calculate the mean, median, and mode for a dataset.
*   Allow users to input data manually or load it from a file.

*   Display appropriate error messages for invalid data or file formats.


*   Improve Statistical Literacy by understanding and using the central tendency measures, users can improve their grasp of basic statistics and how different measures are useful depending on the data type.
*   Application in Data Analysis introduces users to how central tendency can be applied in real-world data analysis, helping them summarize and interpret datasets efficiently.












 **Theory:**

**Central Tendency:** Central tendency refers to the statistical measures that identify the center of a data distribution. These measures aim to summarize the entire dataset by providing a single representative value. The three most commonly used measures of central tendency are mean, median, and mode.

**Mean (Arithmetic Average):**

**Definition:** The mean is the sum of all data points divided by the number of data points.

**Formula:**
   Mean=
∑x i / N


where
𝑥
𝑖
  are the individual data points, and
𝑁
 is the number of data points.

**Usefulness:** The mean is sensitive to every value in the dataset, meaning that extreme values (outliers) can significantly influence the mean. For example, in a dataset of salaries, a few extremely high salaries could distort the mean, making it appear higher than the majority of the data.

**Limitations:** When data has outliers or is skewed, the mean might not be a good representation of the "central" value. For example, the average income in a region might not reflect the typical income of most individuals if there are a few very wealthy people.









**Median (Middle Value):**

**Definition:** The median is the middle value in a dataset when it is ordered (from smallest to largest). If there is an even number of data points, the median is the average of the two middle values.

**Formula:**
If
𝑁
 is odd, the median is the value at position
𝑁
+
1
/
2
.

If
𝑁
is even, the median is the average of the values at positions
𝑁
/
2

  and
(𝑁
/
2)
+
1.


**Usefulness:** The median is robust to outliers because it depends only on the middle value(s) of the ordered data. This makes it a better measure of central tendency when the dataset is skewed or contains extreme values.

**Limitations:** The median does not account for the magnitude of values on either side, meaning it doesn't reflect the overall distribution of the data as well as the mean.

**Mode (Most Frequent Value):**

**Definition:** The mode is the value that appears most frequently in the dataset. A dataset can have one mode (unimodal), more than one mode (bimodal or multimodal), or no mode at all if all values occur with the same frequency.

**Usefulness:** The mode is useful in identifying the most common value in categorical data or when determining the most frequent event in a series of observations. For example, in a survey, the mode might represent the most common response or the most frequent category.

**Limitations:** The mode can be misleading if the data set has multiple values occurring with the same frequency (multimodal), or no value occurs more frequently than others.


**Comparison of Measures of Central Tendency:** Each measure has its strengths and weaknesses, making them appropriate for different types of data:


*   Mean is most useful for data that is symmetrically distributed without significant outliers.

*   Median is preferred for skewed distributions or when outliers are present, as it is not influenced by extreme values.
*   Mode is best for categorical data or when identifying the most frequent event is crucial.








**Choosing the Right Measure:** The choice of measure depends on the nature of the data and the analysis goals. For normally distributed data (bell curve), the mean is often the most informative measure. In cases of skewed data or ordinal data, the median is usually more representative. The mode, on the other hand, is useful when identifying the most frequent category or value is important, especially in non-numeric data.

 **Code Implementation**

 The implementation of the module calculates the mean, median, and mode for the dataset provided by the user. It can handle manual data entry as well as data loading from files with different formats. Here’s the detailed explanation of each part of the code:

 **Step 1: Importing necessary libraries**

 We import libraries that provide essential functions for computing the central tendency. **numpy** is efficient for mean and median calculations, **statistics** is used to compute the mode and **import os**
 provides a way to interact with the operating system. While it's not directly used in the current version of the code, it can be useful for handling file paths or checking for file existence. If you extend this code further (e.g., for saving results or more complex file operations), you would use os.

In [1]:
import numpy as np
from statistics import mode, StatisticsError
import os

**Step 2: Function to Calculate Central Tendency**

This function takes a list of numerical data as input and calculates three measures of central tendency: **mean, median, and mode.**

**1. mean_value = np.mean(data) :**
*   This line uses the **numpy.mean()** function to compute the mean of the dataset.
*  The **mean** is calculated as the sum of all values divided by the number of values in the dataset.




**2. median_value = np.median(data) :**


*   The **numpy.median()** function is used to calculate the **median**.
*   The **median** is the middle value when the data is sorted in ascending order. If the number of data points is odd, it picks the middle value. If even, it takes the average of the two middle values.


**3. mode_value = mode(data) :**


*   The **mode()** function is used to calculate the mode, which is the most frequent value in the dataset.

*   If there’s more than one mode (i.e., a multimodal dataset) or no mode at all, it will raise a **StatisticsError**, which is caught by the inner **try-except** block.
*   If there’s no unique mode, we assign the string **"No unique mode"** to the **mode_value** variable.









**4. Displaying Results :**


*  After calculating the mean, median, and mode, the program prints out the results.
*   The results are printed with formatted strings to clearly show each central tendency measure.





**5.Error Handling :**

*   The outer try-except block handles any other unexpected errors, such as if the data is not valid (non-numeric or empty).
*  Any error encountered will be caught by except Exception as e, and a message with the error details will be printed.







In [3]:
def calculate_central_tendency(data):
    try:
        # Calculate Mean
        mean_value = np.mean(data)

        # Calculate Median
        median_value = np.median(data)

        # Calculate Mode
        try:
            mode_value = mode(data)
        except StatisticsError:
            mode_value = "No unique mode"

        # Display Results
        print("\nCentral Tendency Measures:")
        print(f"Mean: {mean_value}")
        print(f"Median: {median_value}")
        print(f"Mode: {mode_value}")

    except Exception as e:
        print(f"Error: {e}")


**Step 3 :  Main Program Block**




1.   if __name__ == "__main__" :
This is the standard Python way of checking whether the script is being run directly or being imported as a module into another script. When running this program directly, it will execute the code inside this block.
2.  **User Input Prompt :** The program prints a simple menu to the user with two options:
*   Option 1 allows the user to enter data manually.
*   Option 2 allows the user to load data from a file (in various formats).


3.   **choice = input("Choose an option (1 or 2): ") :**
This line takes the user’s input, asking them to choose an option (1 or 2). It stores their response in the choice variable.


















In [5]:
if __name__ == "__main__":
    print("Central Tendency Calculator")
    print("1. Enter data manually")
    print("2. Load data from a file")

    choice = input("Choose an option (1 or 2): ")


Central Tendency Calculator
1. Enter data manually
2. Load data from a file
Choose an option (1 or 2): 2


**Step 4 : Data Entry via User Input or File**

1.  **If the User Chooses Option 1 (Enter Data Manually)**


*   **Manual Data Entry :**

*   If the user chooses option 1, they are prompted to enter a series of numbers separated by spaces.

*   The **input()** function reads the user's input as a string, and **replace(' , ' , '  ')** replaces any commas with spaces to standardize the input format.

*   The **.split()** method splits the string into a list of values.
*  The **map(float, ...)** function converts each string value to a float.


*   The **list()** function converts the result into a list of numbers, which is then passed to the **calculate_central_tendency(data)** function.



*   **Error Handling:**


*   If the user enters any non-numeric value (e.g., text or symbols), a ValueError will be raised. In that case, the program will print a message asking the user to enter valid numbers











In [9]:
if choice == "1":
    try:
        data = list(map(float, input("Enter numbers separated by spaces: ").replace(',', ' ').split()))
        calculate_central_tendency(data)
    except ValueError:
        print("Please enter valid numbers.")



**2. If the User Chooses Option 2 (Load Data from a File)**



*   **File Loading :**


*   If the user chooses option 2, they are prompted to provide the path to the file. The program checks the file extension to determine how to read it.

* If the file is **.txt**, the program reads the contents, splits them into numbers, and converts them into a list of floats.
*   If the file is **.csv**, it uses the **csv** module to read the contents, flattening any nested lists of values into a single list and converting them to floats.


*  If the file is **.xlsx**, it uses the **pandas** library to read the Excel file and convert it into a flat list of floats.


*   **Error Handling :**

*  If the file format is not supported, the program prints an error message and exits.
*   The program also handles **FileNotFoundError** if the file is not found at the given path and **ValueError** if the file contains invalid data.








In [10]:
if choice == "2":
    file_path = input("Enter the file path (.txt, .csv, or .xlsx file): ")
    try:
        if file_path.endswith(".txt"):
            with open(file_path, "r") as file:
                data = list(map(float, file.read().split()))
        elif file_path.endswith(".csv"):
            import csv
            with open(file_path, "r") as file:
                reader = csv.reader(file)
                data = [float(value) for row in reader for value in row]
        elif file_path.endswith(".xlsx"):
            import pandas as pd
            df = pd.read_excel(file_path, header=None)
            data = df.to_numpy().flatten().astype(float).tolist()
        else:
            print("Unsupported file format. Please provide a .txt, .csv, or .xlsx file.")
            exit()
        calculate_central_tendency(data)
    except FileNotFoundError:
        print("File not found. Please check the file path.")
    except ValueError:
        print("File contains invalid data. Ensure it has only numbers.")


Enter the file path (.txt, .csv, or .xlsx file): data.csv

Central Tendency Measures:
Mean: 4.57
Median: 4.5
Mode: 4.5


**Conclusion :**

This program efficiently computes the **mean, median, and mode** for a dataset, either entered manually or loaded from a file. It also includes error handling to ensure that invalid data or file formats don't cause the program to crash.