In [None]:
import pandas as pd
 
d = {'Sun Hours': [4.5,4.0,5.1,5],
     'Max Temp': [19.6,19.1,19.6,20.0],
     'Min Temp': [12.7,12.5,13.3,12.1],
     'Rain (mm)': [82,109,65,76],
     'Rain Days': [13,20,10,9.7]}
Aug_df = pd.DataFrame(d, index = ['Clare', 'Galway','Dublin', 'Wexford'])
Aug_df

In [None]:
Aug_df.sort_values('Rain Days')

In [None]:
Aug_df.max()

In [None]:
Aug_df.min()

# Design decisions MySeries

The design decisions for the MySeries class was to convert all the differnet types of input to a dictionary so that they could all have the same code once they were all in standard form. If no index was passed and the data is not a dictionary a index is created for the MySeries instance.

<b>Requirement 2 min, max and mean:</b> It was decided to return None if there was no values that could be returned for the max, min and mean, as strings do not have a mean and can not be applied to the sum built in python function, as min and max can be applied to strings also these were allowed to return the alphabetical min and max

<b>Requirement 8 constructor error checking:</b>
There area a number of checks made on the s_dict and index values passed to ensure that the data provided is consistent and can be constructed into a data frame as per requirement 8. Index is checked that the index passed is either a list or an tuple, these were chosen as all other types would cause a failure further on, this prevents the user from creating an series with invaid data. A TypeError is raised indicating to the user that this is invalid data preventing the instance from being created. Also the index values are checked to ensure that they are unique and if they are not the instance construction is abandoned.

The attribute s_dict first checks that the passed series is either a dictionary, list or a tuple, all other types cause an exception to be raised, this error handling was chosen to prevent the creation of the class instance when this condition is not met. We also check that the length of the passed index is the same length as the length of the data. Finally we check that the data passed is of a consistent type, we do not allow a series to be created that has more than one data type except for floats and ints as these are allowed to be used together. This ensures that the min, max and mean will work and we will have consitent data. For example mixed series of int and string could cause problems later.

In [None]:
class MySeries():
    def __init__(self, s_dict, **kwargs):
        if type(s_dict) not in (list, tuple, dict): raise TypeError ("Incorrect series type")
            # raise an error if the user tries to use
            # a data structure that is not recognised by the the class MySeries
        if "index" in kwargs:
            if type(kwargs["index"]) not in (tuple, list): raise TypeError("Incorrect index type used for index")
            if len(set(kwargs["index"])) != len(kwargs["index"]): raise ValueError("Index values must be unique")
            self.index = kwargs["index"]
        elif isinstance(s_dict, dict):
            self.index = list(s_dict.keys())
        else:
            self.index = [i for i in range(len(s_dict))]

        if len(self.index) != len(s_dict):
            raise IndexError("Index and series do not have same number of args") # don't want to create series if there
            # is a different number of values and indices
            
        if isinstance(s_dict, dict):
            self.s_dict = s_dict
        elif isinstance(s_dict, list) or isinstance(s_dict, tuple):
            self.s_dict = dict(zip(self.index, s_dict))
        
        self.values = list(self.s_dict.values())
        for i in range(len(self.values) - 1): # as per requirement 8 make sure that the data is consitent type
            # for example we do not want a series with type str and int together as this will break min max mean etc
            if type(self.values[i]) != type(self.values[i+1]) \
            and not ((type(self.values[i]) in (float, int)) and (type(self.values[i + 1]) in (float, int))):
                # the only mixed type we will allow is int and float together
                raise ValueError ("Data must be of consistent type")
        
            
    def item_at_ind(self, key):
        return self.s_dict[key]

    def min(self):
        return min(self.values)

    def max(self):
        return max(self.values)


    def mean(self):
        # can only sum ints or floats
        nums = [i for i in self.s_dict.values() if type(i) in (float, int)] # can't get mean of a string
        if len(nums) > 0:
            return sum(self.s_dict.values())/len(self.s_dict)
        return None

    def print(self):
        for i, j in self.s_dict.items():
            print("{0:5} {1:5}".format(i, j))

In [None]:
ms3 = MySeries([1,2,1], index = ['a','b','c'])
ms3.s_dict

In [None]:
ms4 = MySeries([4,5,6])
ms4.s_dict

In [None]:
d = {'b': 1, 'a': 0, 'c': 2}
s2 = MySeries(d)
s2.s_dict

In [None]:
print("min:", ms4.min(), "max:", ms4.max(), "mean:", ms4.mean())

In [None]:
ms5 = MySeries(["abba","aardvark","zebra"], index = ['a','b','c'])
print("min:", ms5.min(), "max:", ms5.max(), "mean:", ms5.mean())

In [None]:
ms3 = MySeries([1,2,1], index = ['a','b','c'])
ms3.print()

In [None]:
ms3.item_at_ind('c')

# Design decisions MyDataFrame

<b>Requirement 4 MyDataFrame:</b> Each column corresponds to a dictionary with the key representing the column header and the column values correspond to an instance of the MySeries class as per the given instructions. The index passed into the MyDataFrame was stored as a attribute <i>index</i>, this allowed for the MySeries instance to use these as identifiers for each row. This also allowed values to be sorted more easily as rearranging this <i>index</i> list changed the order of the printed MyDataFrame instance. When no index was passed to the MyDataFrame one was created inside the class with numeric values representing the rows. This allowed printing to be done more easily as the rows could be accessed with the names in this list and allowed each value from the MySeries instance to be fetched and printed.

<b>Requirement 6 Sorting:</b> rather than sorting the entire data frame we sort just the index which reflects the order that the rows are ouput. This allows us to sort in-place as no new copy of the data-frame is created. Form the wiki on in place algrothims https://en.wikipedia.org/wiki/In-place_algorithm bubble sort is a in place algrothim, this was used to sort the index attribute which allowed the printing to be sorted. Since the columns are stored as MySeries instances, there is no sorting that needs to be done on the columns as sorting a dictionary produces the same dictionary, the order does not matter as they are accessed by key not index.

<b>Requirement 7 mean, max and min:</b> String values that are sorted by mean are not output as per the example of films, this was caught by checking if MySeries returned none.

<b>Requirement 8 constructor error checking:</b> Most of the actual data checks are implemented in MySeries, these are not overwritten as they provide all necessary detail on data being consistent. Data consistency checks in the MyDataFrame class are that the number of values in the columns passed are the same, we do not want to have a situation where there are more rows in one column as this would crash the printing process later on, this was avoided by raising a exception to the user when they try and create an instance of the MyDataFrame class with inconsistent row values. Index values were not checked as these are passed directly to the MySeries constructor which will handle any inconsistencies. The first check in the constructor for the MyDataFrame is that the data is a dictionary as described in requirement 4, an attempt to use another type will raise an error and the instance will not be created.

In [None]:
class MyDataFrame(MySeries):
    def __init__(self, d, **kwargs):
        if not isinstance(d, dict): raise TypeError("Incorrect type for data")
        # error checking to ensure that the data used is in the dictionary form as stated in 4, implemented as part of 8
        self.collength = 0 # error checking that the number of values used in the cols is consistent
        for j in d.values():
            if len(j) != self.collength and self.collength != 0:
                raise ValueError ("Please ensure all columns have equal length")
            self.collength = len(j)
        self.cols = {}
        if "index" in kwargs:
            self.index = kwargs["index"]
        else:
            self.index = [i for i in range(self.collength)]
        # error checking for the index is implemented in MySeries so not implemented again here
        # will fail if the index does not match the number of data values
        for i, j in d.items():       
            self.cols[i] = MySeries(j, index=self.index)
        # consistency of the data is implemented in MySeries so not implemented here again
    
    def print(self):
        print(" " * 15, end="")
        for key in self.cols.keys():
            print("{:>15}".format(key), end="")
            # < and > align used here and in the following comes from
            # https://stackoverflow.com/questions/8234445/format-output-string-right-alignment
        print()
        for i in self.index:
            print("{:<15}".format(i), end="")
            for myseries in list(self.cols.values()):
                print("{:>15}".format(myseries.item_at_ind(i)), end="")
            print()
            
    def sort_values(self, col): 
        # this implementation of bubble sort is based on pseudocode from 
        # https://en.wikipedia.org/wiki/Bubble_sort
        sort = True
        while sort:
            sort = False
            for i in range(len(self.index) - 1):
                if self.cols[col].s_dict[self.index[i]] > self.cols[col].s_dict[self.index[i + 1]]:
                    self.index[i], self.index[i + 1] = self.index[i + 1], self.index[i] 
                    sort = True

    def mean(self):
        for col in self.cols:
            m = self.cols[col].mean()
            if m != None: print("{0:15} {1:10.2f}".format(col, self.cols[col].mean()))
            
    def max(self):
        for col in self.cols:
            if type(self.cols[col].max()) == str:
                print("{0:15} {1:>10}".format(col, self.cols[col].max()))
            else:
                print("{0:15} {1:10.2f}".format(col, self.cols[col].max()))
            
    def min(self):
        for col in self.cols:
            if type(self.cols[col].min()) == str:
                print("{0:15} {1:>10}".format(col, self.cols[col].min()))
            else:
                print("{0:15} {1:10.2f}".format(col, self.cols[col].min()))
            
        

In [None]:
d = {'Sun Hours': [4.5,4.0,5.1,5],
     'Max Temp': [19.6,19.1,19.6,20.0],
     'Min Temp': [12.7,12.5,13.3,12.1],
     'Rain (mm)': [82,109,65,76],
     'Rain Days': [13,20,10,9.7]}
df1 = MyDataFrame(d)
df2 = MyDataFrame(d, index = ['Clare', 'Galway','Dublin', 
  'Wexford'])

In [None]:
df2.print()

In [None]:
df2.sort_values('Rain (mm)')
df2.print()

In [None]:
df2.mean()

In [None]:
df2.max()

In [None]:
df2.min()

In [None]:
films = {'Rank': [112,62,41,172,230,176],
        'Release Year': [1973,1980,1960,2015,1976,1996],
        'IMDB Rating': [8.3,8.4,8.5,8.1,8.1,8.1],
        'Time (minutes)': [129,146,109,118,120,98],
        'Main Genre': ['Comedy','Horror','Horror','Drama','Drama','Drama']}
f_names = ['Sting','Shining', 'Psycho','Room','Rocky','Fargo']

films_df =  MyDataFrame(films, index = f_names) 
films_df.print()

In [None]:
films_df.mean()

In [None]:
films_df.sort_values('Release Year')
films_df.print()

In [None]:
films_df.sort_values('Main Genre')
films_df.print()