# Documentation and Testing

## Example 1

In [7]:
def remove_extremes(ls: list, key=None) -> list:
    """
    Remove the extreme elements of a list, i.e. minimum and maximum
    If list has less than three elements the original list will be returned
    A function can be provided for custom order
    
    Parameter:
        ls:  list of elements, elements need to be ordered
        key: function to customize sort order
        
    Returns:
        res: list of elements without the extremes
    """
    if len(ls) < 3:
        return sorted(ls, key=key)
    ls_sorted = sorted(ls, key=key)
    res = ls_sorted[1:-1]
    return res›

### Show documentation

![](help_remove_extremes.png)

In [8]:
help(remove_extremes)

Help on function remove_extremes in module __main__:

remove_extremes(ls: list, key=None) -> list
    Remove the extreme elements of a list, i.e. minimum and maximum
    If list has less than three elements the original list will be returned
    A function can be provided for custom order
    
    Parameter:
        ls:  list of elements, elements need to be ordered
        key: function to customize sort order
        
    Returns:
        res: list of elements without the extremes



### Test

Just use the `assert` statement

In [15]:
x1 = [13, 345, 7, 89, 2, 0, 34, 103]
x2 = [78, 67]
x3 = ['Hello', 'Zoo', 'Zeppelin', 'Albert', 'Bravo', 'Gustave']
x4 = ['zz', 'zzzzzz', 'z', 'zzz', 'zzzz']

assert remove_extremes(x1) == [2, 7, 13, 34, 89, 103], 'Test 1: "List of integers" failed'
assert remove_extremes(x2) == [67, 78], 'Test 2: "Only two elements" failed'
assert remove_extremes(x3) == ['Bravo', 'Gustave', 'Hello', 'Zeppelin'], 'Test 3: "List of strings" failed'
assert remove_extremes(x4, key=lambda s: len(s)) == ['zz', 'zzz', 'zzzz']
# shorter version of Test 4
assert remove_extremes(x4, key=len) == ['zz', 'zzz', 'zzzz']

In [18]:
! ls ../data


100_BT_Records.csv      customers.csv           orders.csv
100_CC_Records.csv      employees.csv           payments.csv
100_Records.csv         mysqlsampledatabase.sql productlines.csv
100_Sales_Records.csv   offices.csv             products.csv
README.md               orderdetails.csv


In [28]:
import pandas as pd

employees_df = pd.read_csv('../data/employees.csv')
print(f"Number of records: {employees_df.shape[0]:,}")
display(employees_df.dtypes)
display(employees_df.head())

Number of records: 23


employeeNumber      int64
lastName           object
firstName          object
extension          object
email              object
officeCode          int64
reportsTo         float64
jobTitle           object
dtype: object

Unnamed: 0,employeeNumber,lastName,firstName,extension,email,officeCode,reportsTo,jobTitle
0,1002,Murphy,Diane,x5800,dmurphy@classicmodelcars.com,1,,President
1,1056,Patterson,Mary,x4611,mpatterso@classicmodelcars.com,1,1002.0,VP Sales
2,1076,Firrelli,Jeff,x9273,jfirrelli@classicmodelcars.com,1,1002.0,VP Marketing
3,1088,Patterson,William,x4871,wpatterson@classicmodelcars.com,6,1056.0,Sales Manager (APAC)
4,1102,Bondur,Gerard,x5408,gbondur@classicmodelcars.com,4,1056.0,Sale Manager (EMEA)


In [29]:
offices_df = pd.read_csv('../data/offices.csv')
print(f"Number of records: {offices_df.shape[0]:,}")
display(offices_df.dtypes)
display(offices_df.head())

Number of records: 7


officeCode       int64
city            object
phone           object
addressLine1    object
addressLine2    object
state           object
country         object
postalCode      object
territory       object
dtype: object

Unnamed: 0,officeCode,city,phone,addressLine1,addressLine2,state,country,postalCode,territory
0,1,San Francisco,+1 650 219 4782,100 Market Street,Suite 300,CA,USA,94080,
1,2,Boston,+1 215 837 0825,1550 Court Place,Suite 102,MA,USA,02107,
2,3,NYC,+1 212 555 3000,523 East 53rd Street,apt. 5A,NY,USA,10022,
3,4,Paris,+33 14 723 4404,43 Rue Jouffroy D'abbans,,,France,75017,EMEA
4,5,Tokyo,+81 33 224 5000,4-1 Kioicho,,Chiyoda-Ku,Japan,102-8578,Japan


Let's join the tables

In [31]:
joined_df = employees_df.join(offices_df.set_index('officeCode'), on='officeCode')
joined_df.head(3).T

Unnamed: 0,0,1,2
employeeNumber,1002,1056,1076
lastName,Murphy,Patterson,Firrelli
firstName,Diane,Mary,Jeff
extension,x5800,x4611,x9273
email,dmurphy@classicmodelcars.com,mpatterso@classicmodelcars.com,jfirrelli@classicmodelcars.com
officeCode,1,1,1
reportsTo,,1002,1002
jobTitle,President,VP Sales,VP Marketing
city,San Francisco,San Francisco,San Francisco
phone,+1 650 219 4782,+1 650 219 4782,+1 650 219 4782


Let's assume every employee is assigned to exactly one office. When we join the tables `employees_df` and `offices_df` we should still have the same number of rows.

In [32]:
assert employees_df.shape[0] == joined_df.shape[0], "Mismatched number of rows"

In [33]:
print(employees_df.dtypes)

employeeNumber      int64
lastName           object
firstName          object
extension          object
email              object
officeCode          int64
reportsTo         float64
jobTitle           object
dtype: object
