## Let's get the quartiles! I'll try a few diferent approaches.


### `1` - This approach is quite bulky, but I tried to make the code as self-explanatory as possible.
(But I thinks it's a freaking overkill... There are simplified versions below. :))

In [1]:
# helper functions, not really necessary, may provide readibility or confussion
def convert_to_element_number(index):
    return index+1

def convert_to_index(element_number):
    return element_number-1

def is_even(number):
    return number%2 == 0

Function `get_median(numerical_dataset)` does exactly what it states. :) But it requires a sorted dataset...<br>(But the way we coded it is kinda an overkilled with so many variables trying to explain sooo much.)

In [2]:
def get_median(sorted_dataset):
    
    length_of_the_dataset = len(sorted_dataset)
    assert length_of_the_dataset >= 0
    
    if is_even(length_of_the_dataset):
        nr_of_first_number = length_of_the_dataset//2
        nr_of_second_number = length_of_the_dataset//2 + 1

        index_of_first_number = convert_to_index(nr_of_first_number)
        index_of_second_number = convert_to_index(nr_of_second_number)

        first_number = sorted_dataset[index_of_first_number]
        second_number = sorted_dataset[index_of_second_number]

        median = (first_number+second_number)/2

    else:
        nr_of_central_value = length_of_the_dataset//2 + 1
        index_of_central_value = convert_to_index(nr_of_central_value)
        central_value = sorted_dataset[index_of_central_value]

        median = central_value

    return median

The function `divide_by_mean(sorted_dataset)`, but it has to be sorted... We don't love "buts"...<br>(We're not gonna use it, but it's a possible way to implement the division of the dataset to subsets.)

In [3]:
def divide_by_median(sorted_dataset):

    length_of_the_dataset = len(sorted_dataset)
    assert length_of_the_dataset >= 0

    median = get_median(sorted_dataset)

    data_subset_1 = []
    data_subset_2 = []

    for index in range(0, length_of_the_dataset):

        if sorted_dataset[index] <= median:
            data_subset_1.append(sorted_dataset[index])

        else:
            data_subset_2.append(sorted_dataset[index])
            
    return data_subset_1, data_subset_2

We will use the function `divide_sorted_set_by_number(sorted_dataset, number)`, since it's more versatile.<br>But here we also have a "but" - the dataset has to be sorted!

In [4]:
def divide_sorted_set_by_number(sorted_dataset, number):

    length_of_the_dataset = len(sorted_dataset)
    assert length_of_the_dataset >= 0    

    data_subset_1 = []
    data_subset_2 = []

    for index in range(0, length_of_the_dataset):

        if sorted_dataset[index] <= number:
            data_subset_1.append(sorted_dataset[index])

        else:
            data_subset_2.append(sorted_dataset[index])
            
    return data_subset_1, data_subset_2

Conclusion of the actual algorithm using the defined functions:

In [5]:
def get_quartiles(numerical_dataset):
    
    assert len(numerical_dataset) >= 0

    sorted_set = sorted(numerical_dataset)

    quartile_2 = get_median(sorted_set)
    subset_1, subset_2 = divide_sorted_set_by_number(sorted_set, quartile_2)
    
    quartile_1 = get_median(subset_1)
    quartile_3 = get_median(subset_2)
    
    return quartile_1, quartile_2, quartile_3

In [6]:
get_quartiles([1,7,8,2,3,6,9,4,5,10,1,2,1,4,5])

(2.0, 4, 7)

#### Profit!

### `2` - Now let's do all of this in a more fasionable way. Less lines, more profit. :]

This is a cool way to look at the problem - we will try to always compute the subsets and the mean.<br>But it's not the best solution in terms of performance, because we always sort the list, even if we're not extracting the subsets.<br>(You'll see what I mean in the function quartiles(dataset).)

In [7]:
def get_median_and_divide(dataset):

    ds_length = len(dataset)
    assert ds_length >= 0

    dataset.sort()

    middle = ds_length // 2
    median = dataset[middle]

    if is_even(ds_length):
        median = (median + dataset[middle-1])/2
        
    return [number for number in dataset if number <= median], [number for number in dataset if number > median], median
    

In the function below u can notice we're igoring 2 of the returned values of `get_mean_and_divite(dataset)`.<br>We assign their values to a variable with garbage name of `_`. We're basicaly performing some redundant sortings and list comprehensions.

In [8]:
def quartiles(dataset):
    
    assert len(dataset) >= 0

    subset_1, subset_2, quartile_2 = get_median_and_divide(dataset)
    
    _, _, quartile_1 = get_median_and_divide(subset_1)
    _, _, quartile_3 = get_median_and_divide(subset_2)

    return quartile_1, quartile_2, quartile_3

In [9]:
quartiles([1,7,8,2,3,6,9,4,5,10,1,2,1,4,5])

(2.0, 4, 7)

#### Profit!

### `3` - We can also speed this up a bit by splitting the funcion we used before into 2 separate functions.
(We'll use our helper function defined before - `is_even(number)` - to improve readibility.)

In [10]:
def median(dataset):

    ds_length = len(dataset)
    assert ds_length >= 0

    dataset.sort()  # (**) notice this for later reference

    middle = ds_length // 2
    median = dataset[middle]    
    
    return median if is_even(ds_length) else (median + dataset[middle-1])/2

It doesn't matter for the `divide(dataset, division_value)` if the given dataset is sorted, so we can safely process an unsorted dataset.

In [11]:
def divide(dataset, division_value):
    
    assert len(dataset) >= 0
    
    return [number for number in dataset if number <= division_value], [number for number in dataset if number > division_value]

And since our `mean(dataset)` function sorts the set (**) before looking for the mean - everything works fine here:

In [12]:
def quartiles_(dataset):
    
    assert len(dataset) >= 0

    quartile_2 = median(dataset)
    subset_1, subset_2 = divide(dataset, quartile_2)
    
    return median(subset_1), quartile_2, median(subset_2)

In [13]:
quartiles_([1,7,8,2,3,6,9,4,5,10,1,2,1,4,5])

(2, 4.0, 6.5)

## So there are many ways of achieving the same task. Pick your favourite! :D