### Requirement FR1 - Develop a function to find the arithmetic mean

In [2]:
input_data = [29, 17, 28, 6, 14, 7, 4, 27, 21, 15, 10, 16, 24, 26, 3, 11, 13, 8, 23, 9, 0, 22, 12, 2, 18, 19, 5, 1, 20, 25]

def calc_mean(values_list):
    """
    Calculates the mean of a given array or list.

    @parameter values_list - is a list of integer/float values.

    @return - the mean of the array or list
    """
    sum_of_numbers = sum(values_list)
    cnt_of_numbers = len(values_list)
    
    mean = sum_of_numbers / cnt_of_numbers
    return float(mean)

mean = calc_mean(input_data)
mean

14.5

### Requirement FR2 - Develop a function to find the standard deviation


In [3]:
def sqrt(n):
    """
    Calculates the Square Root of a given value.

    @parameter n - any integer/float value

    @return - the square root n
    """
    return n ** 0.5

def calc_standard_deviation(mean, values_list):
    """
    Calculates the Standard Deviation 
    of a list or array.

    @parameter mean - any integer/float value
    @parameter values_list - is a list or array of values
    

    @return - the square root n
    """
    # Compute sum squared differences with mean.
    sqDiff = [abs(val-mean)**2 for val in values_list]
    # Calculating Variance
    variance = sum(sqDiff) / len(values_list)
    # Calculating Standard Deviation
    standardDeviation = sqrt(variance)
    return standardDeviation

stdv = calc_standard_deviation(mean, input_data)
stdv

8.65544144839919

### Requirement FR3 - Develop a function to find the median 

In [4]:
# add code here
def median(values_list):
    """
    Find the median of a list of values.

    @parameter values_list - is a list of values.

    @return - the median of given list of values
    """

    #Ensuring list is sorted
    sorted_values = sorted(values_list)
    
    if len(values_list) % 2 == 0:  # Checking if the length is even
        # Applying formula which is sum of middle two divided by 2
        return (sorted_values[len(sorted_values) // 2] + sorted_values[(len(sorted_values) - 1) // 2]) / 2
    else:
        # If length is odd then get middle value
        return( sorted_values[len(sorted_values) // 2])

median(input_data)

14.5

### Requirement FR4 - Develop a function to find the skewness

In [5]:
# add code here
def skewness(values_list):
    """
    Find the skewness of a list of values.

    @parameter values_list - is a list of values.

    @return - the skewness of a given list of valuesusing the statistics generated from FR1 to FR3
    """
    e = calc_mean(values_list)
    f = median(values_list)
    g = calc_standard_deviation(mean, values_list)
    h = (3*(e-f))/(g) # skewness= 3(Mean-Median)/Standard Deviation
    return h

skewness(input_data) 

0.0

### Requirement FR5 - Develop a function to read a single column from a CSV file

In [6]:
# add code here
def read_csv_column(file_name, column_index):
    """
    function to read a single specified 
    column of data from a CSV file

    @parameter file_name - name of file to read
    @parameter column_index - index or number of columns
    
    @return - target column as a list
    """
    try:
        with open(file_name, 'r') as f:
            lines = f.readlines()

            # check if column_index is in range 0 - n-1
            no_cols = len(lines[0].split(','))
            if column_index >= 0 and column_index <= (no_cols-1):
                target_column = [row.split(',')[column_index].replace('\n', '') for row in lines]
                return target_column

            # column_index not in range 
            # so return message to user
            return 'Index out of range'
    except FileNotFoundError:
        print(f'File "{file_name}" not found')
    
read_csv_column('task1.csv', 1)

File "task1.csv" not found


### Requirement FR6 - Develop a function to read CSV data from a file into memory

In [7]:
# add code here
def read_csv_into_memory(file_name):
    """
    function to read CSV data from a file into memory

    @parameter file_name - name of file to read
    
    @return - csv data in dictionary format
    """
    with open(file_name, 'r') as f:
        lines = f.readlines()
        # finding total number of columns
        no_cols = len(lines[0].split(','))
        
        data_dict = {}
        
        # get entire column based on index
        for index in range(no_cols):
            target_col = read_csv_column(file_name, index)
            
            # setting first value as dictionary key
            # and assigning remaining data to that key  
            data_dict[target_col[0]] = [float(val) for val in target_col[1:]]
        return data_dict

read_csv_into_memory('task1.csv')

FileNotFoundError: [Errno 2] No such file or directory: 'task1.csv'

### Requirement FR7 - Develop a function to generate a set of statistics for a given data file

In [None]:
# Helper Function
def statistics_summary(data, display=True):
    """
    function to print the whole set of 
    summary statistics in tabular form

    @parameter data - is a list of values
    @parameter display - enable to print out statistics to 
                         console or just return it as a list
    
    @return - count of all values in data
    """

    mean      = calc_mean(data)
    std_dv    = calc_standard_deviation(mean, data)
    med       = median(data)
    skw       = skewness(data)
    
    # if true print sumary table
    # True by default
    if display:
        print("{0:^10s}: {1:^25s}".format('Metric', 'Value'))
        print("="*37)
        print("{0:^10s}: {1:^25n}".format('Mean', mean))
        print("{0:^10s}: {1:^25n}".format('StDev', std_dv))
        print("{0:^10s}: {1:^25n}".format('Median', med))
        print("{0:^10s}: {1:^25n}".format('Skewness', skw))
    else:
        summary_list = [
            mean,
            std_dv,
            med,
            skw,            
        ]
        return summary_list
              
def describe(file_name):
    """
    function to generate a set of statistics for a given data file

    @parameter file_name - name of file to read
    
    @return - a dict containing statistics of data in file name 
    """
    # read all data from csv as a dict
    data_dict = read_csv_into_memory(file_name)
    # initialize empty dict for storing
    # each column's statistics
    statistics_dict = {
        "Stats" : ["Mean","Stdev","Median","Skewness"]
    }
    
    for column, values in data_dict.items():
        summary_list = statistics_summary(values, False)
        statistics_dict[column] = summary_list
        
    return statistics_dict

describe('task1.csv')

### Requirement FR8 - Develop a function to print a custom table

In [None]:
# add code here
# Helper Functions
def transpose_dict(data_dict):
    # swaps rows with columns
    new_dict = {}
    cols = []
    added = False
    for ind, metric in enumerate(data_dict['Stats']):
        new_list = []
        
        for key, vals in data_dict.items():
            if key != 'Stats': 
                new_list.append(vals[ind])
                if not added:
                    cols.append(key)
        added = True
        new_dict[metric] = new_list
    new_dict['Columns'] = cols
    return new_dict

    
def get_col_name_by_index(file_name, index, cols):
    with open(file_name, 'r') as f:
        lines = f.readlines()
        
        # get column name by index
        col_name = lines[0].split(',')[index]
        
        return col_name
        
def summary_table(file_name, border_char='*', columns=[]):
    """
    function to print a custom table of summary dict

    @parameter file_name - name of file to read
    @parameter border_char - character to use for table borders
    @parameter columns - list of indexes represeting columns
    
    @return - Null
    """
    c = border_char
    # check if any columns were specified
    if len(columns) == 0:
        return 'No columns specified'
    _summary_dict = describe(file_name)
    
    
    # get column names in the order provided
    col_names = []
    for ind in columns:
        col_names.append(get_col_name_by_index(file_name, ind, columns))
    summary_dict = {}
    for col in col_names:
        col = col.replace('\n','')
        summary_dict[col] = _summary_dict[col]
    summary_dict['Stats'] = _summary_dict['Stats']
    summary_dict_t = transpose_dict(summary_dict)
    col_names = summary_dict_t['Columns']
    
    col_str = ''
    col_str_done = False
    
    full_width = 0
    col_widths = [len(col) for col in col_names]
    all_rows = []
    
    
    for key, vals in summary_dict_t.items():
        row_str = ''        

        for i in range(len(col_names)):
            if vals[i].__class__ == int:
                d_type = 'n'
                pos    = '<'
            elif vals[i].__class__ == float:
                d_type = '.2f'
                pos    = '<'
            else:
                d_type = 's'
                pos    = '^'

            row_str += "{" + f"{i}" + f":{pos}" + f"{col_widths[i]+4}{d_type}" + "}" + f" {c} "
            if not col_str_done:
                col_str += "{" + f"{i}" + f":^" + f"{col_widths[i]+4}s" + "}" + f" {c} "

        # print(row_str)

        a = row_str.format(*vals)
        full_width = len(a)-1
        all_rows.append(a)
        col_str = col_str.format(*col_names)
        col_str_done = True
        
    
    print(f'           {c*(full_width+2)}', end='\n')
    print(f'           {c}', col_str)
    print(f'{c*(full_width+13)}')
    print(f'{c} Mean     {c}',all_rows[0])
    print(f'{c} Stdev    {c}',all_rows[1])
    print(f'{c} Median   {c}',all_rows[2])
    print(f'{c} Skewness {c}',all_rows[3])   
    print(f'{c*(full_width+13)}', end='')

summary_table('task1.csv', '*', [6,3,4,2,1])

# Process Development Report for Programming Task 1


### Introduction
We had a period from 15 February 2022 to 12 May 2022 as we worked on using built-in python functions to work on given CSV file task1.csv and a list of values to generate outputs such as statistical values like mean, median, standard deviation, skewness and generate some summary statistics.

### Challenges  
Some of the challenges it has easy as we had to learn and be accustomed to modern technologies in a popular Data Science Language-Python and ‘get our feet wet and the use of GitLab for course work submission over the study of the course relating to Programming for Data Science over a brief period. 
Some parts of the task like FR5 to FR7 difficult to do at first then I figured it after good research and reading python documentation and lecture materials.

### How I carried out the task
For FRI, I could have added code to check if the list was empty, so it returns a prompt or I could have used the fmean function which has been added to the Python library in python 3.4 and above. Also, the output mean should be made to return a float for python 2 and below by adding float(mean).
My solution finds the mean based on the length of the given list given calculated using len

For FR2
This computed the population mean as the whole list was available for use, the sample mean could also have been computed. My solution first referred to the mean from FR1, calculated the deviations from the means and their variance and then finally the standard deviation

For FR3,To find the median of a set of numbers, the numbers had to be sorted in ascending order first and if the numbers add to an even count, the middle two numbers were averaged to get the median else if the length of the list of  odd the number is just picked as the median
For FR4 to be computed, FR1 to FR3 had to be computed and their respective functions referred to in the skewness function.Using the relation 
skewness= 3(Mean-Median)/Standard Devaition .Its output a skewness of of zero(0) meaning the distribution is perfectly symmetric reflecting the distribution of a normal distribution.(‘1.3.5.11. Measures of Skewness and Kurtosis’, 2022)
For FR5,After reading in the data-csv file,the function read the a  specific column of data(series) are requested as per the Appendix 1.
For FR6,making use of the function in FR5 the function read in all the contents of the file outputed it out in a dictionary 
For FR7,using the statistics from FR!1 to FR4 the function should return a dictionary containing all calculated statistics for each column plus an extra entry showing the name of each statistic. An illustration of this is given in Appendix 3.
For FR8,making use of the statistics generated in FR7. The function should also offer customisable options including which border character to use and which columns to output. An illustration of this is given in Appendix 4
(‘statistics — Mathematical statistics functions — Python 3.10.4 documentation, 2019)

### Learning Experience
In the end, some of the learning outcomes it has also exposed me to the fact that most tasks can be accomplished using inbuilt python functions, methods and libraries and it has made me gain a deeper understanding of core python itself and a sense of developing my algorithm which will solve the problem best, efficient as well for a  given problem at hand 
Next time I will make more full use of resources available online and references brought to my attention during the coursework and get accustomed to them as soon as possible so it becomes first-hand.

### References
‘1.3.5.11. Measures of Skewness and Kurtosis’ (2022) Nist.gov.2022 [online]. Available from: https://www.itl.nist.gov/div898/handbook/eda/section3/eda35b.htm#:~:text=The%20skewness%20for%20a%20normal,data%20that%20are%20skewed%20right. [Accessed 15 May 2022].
‘Manish Kumar’ (2021) Zenesys.2021 [online]. Available from: https://www.zenesys.com/blog/coding-standards-and-best-practices-for-python-code-quality [Accessed 7 April 2022].
‘statistics — Mathematical statistics functions — Python 3.10.4 documentation’ (2019) Python.org.2019 [online]. Available from: https://docs.python.org/3/library/statistics.html#statistics.mean [Accessed 7 April 2022].




