In [8]:
import numpy as np
from collections.abc import MutableSequence
import pandas as pd
from abc import ABC, abstractmethod

import math

# Assignment #2 - With Bonus Stats!!

## Overview

The end goal for this is to create a special data structure that will be a list of numbers plus some extra math stuff, as well as the code to support using and testing everything. Each of these lists, here called a calculationList, will have two main parts - a list of numbers and a threshold value. Each type of object will work differently depending on its type, but the basic logic is the same. The threshold value is a limit for whatever type of calculation the list belongs to, so for a stdList, the threshold applies to the standard deviation, for a meanList, the threshold applies to the mean, etc. The calculation list should have a prune() method that will start removing values from the list until the relevant value is below the threshold. Each type of calculation list will have a different way of figuring out what to remove, as we want to remove the most "important" values first - i.e. if the standard deviation is greater than the threshold, and we have a value that is 3 standard deviations away from the mean and another that is 10 standard deviations away from the mean, we want to remove the second value first as it will be the most impactful. 

<b>Note: please let me know if the premise isn't clear. You should have to sort out some ambiguities as you develop, but the goal should be clear.</b>

### Classes to Create

A caclulationList class that is made up of a list of float numbers as well as a few additions. This class will inherit from two things - the mutable sequence class and the ABC class. The mutable sequence class will allow us to use the list methods, and the ABC class will allow us to use the abstract methods.

The calculation list will be a base class that will not be implemented directly. You will need to create some subclasses that then inherit from the calculationList class. These subclasses will be the following:
<ul>
<li> stdList - this will be a calculationList that will prune values based on the standard deviation of the list. </li>
<li> meanList - this will be a calculationList that will prune values based on the mean of the list. </li>
<li> sumList - this will be a calculationList that will prune values based on the sum of the list. </li>
</ul>

Each of these classes should only add what they need to make their unique functionality work, the things that are common to all of them should be in the calculationList class. The top level calcList class is similar to the example listBasedSet class here: https://python.readthedocs.io/en/latest/library/collections.abc.html The other classes should be children of that class, each adding their own unique parts. One note, there may be erroneous values in the input data, so there should be some error checking to deal with broken inputs - <b>if a row has erroneous data, that row should be skipped entirely. </b>

#### Example Results

Here are a few screenshots of the processing logic of the calculation lists:

![Calculation List Example](example_results.png "Calculation List Example")

We can also look at the inputs and outputs of the calculation lists to see some of the details:

![Input and Output Example](input_output.png "Input and Output Example")

Please check with me if the idea and the goal is not clear. 

## Deliverables

For this assignment, please submit the following:
<ul>
<li> The notebook file containing your code. </li>
<li> The CSV output file, <b>generated from a test file that I'll post before the due date.</b> This file will be in the same format as the test data, but the values will be different. </li>
</ul>

## Grading

The grading for this will be broken out as follows, and will learn heavily on things working correctly. 
<ul>
<li> 75% - Functionality. If yours works, this is the baseline. If it fails, I may decrease this, depending on what I can visually spot in code. </li>
<li> 25% - Code clarity and formatting. </li>
</ul>

### Notes and Hints

I will put any update notes, responses to common questions, and relevant hints in a list in the README file. Please don't edit that file, as that will let you pull it to get new stuff without conflict. 

In [10]:
class calcList(MutableSequence, ABC):
    """
    A class used to represent a list of values that can be pruned based on a threshold value. This class must be extened to be used, with the child class adding the logic to implement the calculation used for the pruning. 

    Attributes
    ----------
    elements : list
        a list of values that can be pruned
    _name : str
        the name of the list
    _threshold : float
        the threshold value used to prune the list
    _trim : int
        the number of decimal places to trim the output to

    Methods
    -------
    csv_output(self)
        returns a string representation of the list in the format: name,length,threshold,value. Used to create the csv output to be written to file
    value(self)
        returns the value of the list based on the calculation used - i.e. a meanList would return the mean of the list, an stdList would return the standard deviation of the list...
    prune(self)
        prunes the list based on the threshold value. The logic for the pruning is implemented in the child class and should remove the "most impactful" value from the list until the threshold is met. 
    isPruned(self)
        returns True if the list is pruned, False otherwise
    returnType(self)
        returns the type of the list
    setThreshold(self, threshold)
        sets the threshold value to threshold
    getThreshold(self)
        returns the threshold value
    """

    ## Note: other things will be needed depending on how you implement your work. 
    ## As long as you make things work and meet anything explicitly stated in the assignment, you can add whatever you want.
    def __init__(self, name, threshold, iterable, trim=3):
        pass

    # Loading Data into Lists
    # You should write a function to read data, and generate the lists.
    # This could be a static method in here, but you can do it other ways. 
    

    # These methods must be implemented in the child classes
    # There may be other methods you want to add as well
    @abstractmethod
    def value(self):
        pass
    @abstractmethod
    def prune(self):
        pass
    @abstractmethod
    def isPruned(self):
        pass
    @abstractmethod
    def returnType(self):
        pass
    def setThreshold(self, threshold):
        self._threshold = threshold
    def getThreshold(self):
        return self._threshold

class stdList(calcList):
    pass
class meanList(calcList):
    pass
class sumList(calcList):
    pass

### Simple Unit Tests

These are some simple tests that you can use to check, if you want. Please feel free to change, remove, or add to these as you see fit.

In [11]:
calc = stdList("test", 2, [1,2,3,4,5,6,7,8,9,10])
print(calc)
calc.prune()
print(calc)

TypeError: Can't instantiate abstract class stdList with abstract methods __delitem__, __getitem__, __len__, __setitem__, insert, isPruned, prune, returnType, value

In [5]:
calc2 = meanList("test2", 4, [1,2,3,4,5,6,7,8,9,10])
print(calc2)
calc2.prune()
print(calc2)

TypeError: Can't instantiate abstract class meanList with abstract methods __delitem__, __getitem__, __len__, __setitem__, insert, isPruned, prune, returnType, value

In [6]:
calc3 = sumList("test3", 45, [1,2,3,4,5,6,7,8,9,10])
print(calc3)
calc3.prune()
print(calc3)

TypeError: Can't instantiate abstract class sumList with abstract methods __delitem__, __getitem__, __len__, __setitem__, insert, isPruned, prune, returnType, value

### Load Data and Test

The functions below are a simple test function for your code, it'll take in an input and an output and score the two. In your code, you'll have half of the inputs here, the expected results, and will need to write the rest of the code to generate your results and input them to run the test. 

This function can likely be wrapped in another, one that calls your code to generate that input to check against. This isn't required, but will likely make things easier to call and test repeatedly. You'd have to do everything required to get the "response" argument, which is the CSV file of your answers. 

In [12]:
def testHarness(response, expected, response_col="Value", expected_col="Value", match_thresh=.03, exp_name="Name", resp_name="Name"):
    '''Runs a test of the response file against the expected file. Returns a tuple of the number of correct and incorrect responses.'''
    resp = pd.read_csv(response)
    exp = pd.read_csv(expected)
    
    correct = 0
    incorrect = 0
    
    i = 0
    while i < len(resp):
        exp_val = exp.iloc[i][expected_col]
        resp_val = resp.iloc[i][response_col]
        
        if toleranceMatch(exp_val, resp_val, match_thresh) and (exp.iloc[i][exp_name] == resp.iloc[i][resp_name]):
            correct += 1
        else:
            incorrect += 1
        i += 1
    
    return (correct, incorrect)
    

def toleranceMatch(val1, val2, percent_tolerance):
    '''Returns True if val1 and val2 are within percent_tolerance of each other, False otherwise.'''
    if val1 == val2:
        return True
    else:
        if val1 == 0:
            if val2 == 0:
                return True
            else:
                return False
        if (abs(val1 - val2) / val1) <= percent_tolerance:
            return True
        else:
            return False

In [1052]:
# Sample exectution - you can change this to test your code
# The functions here are things I made to both:
# - read data from disk, and create a list of the calculation lists.
# - process those lists to get actual outputs. 
#outputs = processCalculationLists(calculationListLoader("inputs.csv"), output_file="output.csv")
#outputs.head()

Unnamed: 0,Name,Length,Threshold,Value
0,List_0,5,30.19605769,15.001
1,List_1,9,37.83299154,29.911
2,List_2,44,42.90158192,42.091
3,List_3,2,-1428.622569,0.0
4,List_4,8,13.75396487,12.75


In [1053]:
tests = testHarness("output.csv", "output.csv")
tests

(1000, 1)