# Exxcellent solutions programming challenge

## challenge-weatherdata
First we inspect what tasks our challenge consists of. We will design a code that will also be usable for the latter football challenge
1. Read the .csv file in internal format
- Access and compare internal values
- Print results

Specificaly in both examples the goal is to find the minimal difference of two column entries.

### python
First I choose a language I am accustomed with and because it is a small task that I should be able to programm in one go. The idea is to create a working example and think of some test cases.

The design goals are:
- robustness & correctness
- readability & maintainability
- clean software design & architecture

over speed, which is why I choose to work with the pandas library which is widespread in use and well maintained. This alleviates the need to write a functionality from scratch that parses our .csv, which are in fact a okish standarized format.

First we import the needed modules. And checl if the existing paths exist

In [5]:
import numpy  as np
import pandas as pd

relative = 'src/main/resources/de/exxcellent/challenge/'
fn       = 'weather.csv' #football.csv

df = pd.read_csv(relative+fn)
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 14 columns):
Day           30 non-null int64
MxT           30 non-null int64
MnT           30 non-null int64
AvT           30 non-null int64
AvDP          30 non-null float64
1HrP TPcpn    30 non-null int64
PDir          30 non-null int64
AvSp          30 non-null float64
Dir           30 non-null int64
MxS           30 non-null int64
SkyC          30 non-null float64
MxR           30 non-null int64
Mn            30 non-null int64
R AvSLP       30 non-null float64
dtypes: float64(4), int64(10)
memory usage: 3.4 KB
None


This worked quite well, considering that the file format could have been based on different delimiters etc. 

We see that this specific data file contains 14 columns. For our generall purpose however, we wount care about the specific number of columns and rows. We now use some of the custom pandas functions to create a new custom atribute 'difference' on the fly to find the rows (plural) with the lowest difference of in our case 'MxT' and 'MnT'. 

As we do not know how this format might change in the future it is much more sensible to access the columns by their identifiers rather than by their column ids.

In [45]:
#Testing, create a custom column test
df['difftest'] = df['MxT']-df['MnT']
print(df.reset_index()[['MxT', 'MnT', 'difftest']],'\n')

print( 'Row id of minimal temperature spread:')
(df['MxT'] - df['MnT']).idxmin()

    MxT  MnT  difftest
0    88   59        29
1    79   63        16
2    77   55        22
3    77   59        18
4    90   66        24
5    81   61        20
6    73   57        16
7    75   54        21
8    86   32        54
9    84   64        20
10   91   59        32
11   88   73        15
12   70   59        11
13   61   59         2
14   64   55         9
15   79   59        20
16   81   57        24
17   82   52        30
18   81   61        20
19   84   57        27
20   86   59        27
21   90   64        26
22   90   68        22
23   90   77        13
24   90   72        18
25   97   64        33
26   91   72        19
27   84   68        16
28   88   66        22
29   90   45        45 

Row id of minimal temperature spread:


13

Because the file is so small we can see by eye, that row 13 hosts the entries with the minimal temperature difference.  As we do not want the column but the day we can rewrite the function

In [59]:
print( 'Day of minimal temperature spread is:')
df.iloc[(df['difftest']).idxmin]['Day']

print(type(df.iloc[(df['difftest']).idxmin()]))

Day of minimal temperature spread is:
<class 'pandas.core.series.Series'>


df.iloc[(df['difftest']).idxmin()]) itself is a panda series object, which is why we can acces its day by ['Day'] (also bay aware of the capitalization) that I forgot in the first access.

However, this is beginners data since, how do we now create a running function so that we can also investigate other datasets quite easily?

In [80]:
class table_dataset():

    def __init__(self,filename,
                 filepath='src/main/resources/de/exxcellent/challenge/'):
     
        self.filename = filename
        self.filepath = filepath
        self.setpath()
        self.loadfile()
        
    def setpath(self):
        ''' Combines paths, could also be written as a decorator.
        '''

        path = ''
        for string in [self.filepath,self.filename]:
            if string is None:
                string = ''
            path += string

        self.path = path
        return path
    
    
    def loadfile(self, format='.csv'):
        self.df = pd.read_csv(self.path)
    
    def findmin(self, ident,colA,colB):
        ''' Input:
                ident: identifier that is returend
                 colA: Minuend 
                 colB: Subtrahend
            Return:
                The identifier that exists in the row with the minimum difference.
        
            Here we prefer to compare values on the fly. 
            We could of cource also modify the table and provide an additional attrivute like 
            "difference" ... because we do not know if this would overwrite an already existing
            attribute we foresee to do that 
        '''
        
        try:
            return self.df.iloc[(self.df[colA] - self.df[colB]).idxmin][ident]
        except:
            print('__ Issues in table_dataset::findmin', self.df.info())
        
        

First I created the python class and then some test examples.



In [84]:
def test(verbose=False):
    test1 = table_dataset('weather.csv').findmin('Day','MxT','MnT')
    test2 = table_dataset('football.csv').findmin('Team','Goals','Goals Allowed')
    test3 = table_dataset('football.csv').findmin('Team','Goals Allowed','Goals ')  
    # test3 Will print __ Issues in table_dataset::findmin None
    
    if verbose:
        print(test1)
        print(test2)
        print('Test completed')
    
test(True)    

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 8 columns):
Team             20 non-null object
Games            20 non-null int64
Wins             20 non-null int64
Losses           20 non-null int64
Draws            20 non-null int64
Goals            20 non-null int64
Goals Allowed    20 non-null int64
Points           20 non-null int64
dtypes: int64(7), object(1)
memory usage: 1.3+ KB
__ Issues in table_dataset::findmin None
14.0
Leicester
Test completed


Looks fine.

In this particular example we didn't had to refactor our code as we realited quite from the start that we could wrap it in one code if we would provide the functionality to read the path from and the functionality to choose the elements which were to be minimized.

We could add some additional functionalities, like checking if the order of minuend and subtrahend is correct but because this requires additional assumptions, we leave this to be.