# Quartiles
Quartile is a type of quantile which divides the number of data points into four parts, or quarters, of more-or-less equal size. The data must be ordered from smallest to largest to compute quartiles; as such, quartiles are a form of order statistics. 
- The first quartile $(Q_1)$ is defined as the middle number between the smallest number($minimum$) and the median of the data set. It is also known as the lower or $25^{th}\text{empirical quartile}$, as $25\%$ of the data is below this point.
- The second quartile $(Q_2)$ is the median of the whole data set, thus $50\%$ of the data lies below this point.
- The third quartile $(Q_3)$ is the middle value between the median and the highest value ($maximum$) of the data set. It is known as the $upper$ or $75^{th}\text{empirical quartile}$, as $75\%$ of the data lies below this point.

$$minimum-----Q_1-----Q_2-----Q_3-----maximum$$

Along with minimum and maximum of the data (which are also quartiles), the three quartiles described above provide a $\text{five-number summary}$ of the data. This summary is important in statistics because it provides information about both the center and the spread of the data. Knowing the lower and upper quartile provides information on how big the spread is and if the dataset is $skewed$ toward one side. Since quartiles divide the number of data points evenly, the range is not the same between quartiles (i.e., $Q_3-Q_2 \neq Q_2-Q_1$) and is instead known as the $\textbf{interquartile range (IQR)}$. While the maximum and minimum also show the spread of the data, the upper and lower quartiles can provide more detailed information on the location of specific data points, the presence of outliers in the data, and the difference in spread between the middle $50\%$ of the data and the outer data points.  

In desciptive statistics, the $\textbf{Interquartile range (IQR)}$ also called $midspread$, $middle\;50\%$, or $H-spread$, is a measure of $statistical\;dispersion$ being equal to the difference between $75^{th}$ and $25^{th}\;percentiles$. $IQR=Q_3-Q_1$

<p align="center">
  <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/Boxplot_vs_PDF.svg/640px-Boxplot_vs_PDF.svg.png?1626778057933">
</p>


|Symbol|Names|Definition|
|:---:|:---:|:---:|
|$Q_1$|$25^{th}\;percentile$|splits off the lowest $25\%$ data from the highest $75\%$|
|$Q_2$|$50^{th}\;percentile$|splits dataset in half|
|$Q_3$|$75^{th}\;percentile$|splits off the highest $25\%$ data from the lowest $75\%$|

In [1]:
import numpy as np 

def quartiles(array):
    # sort original array in ascending order
    print(f"The original array is {array}") # Comment this out for large datasets
    temp = 0
    for i in range(0,len(array)):
        for j in range(i+1,len(array)):
            if (array[i]>array[j]):
                temp = array[i]
                array[i] = array[j]
                array[j] = temp
  
    # lower half of array
    array1 = []
    for i in range(0,len(array)//2):
        array1.append(array[i])
    
    # upper half of array
    if len(array)%2==0:
        array2 = []
        for i in range(len(array)//2,len(array)):
            array2.append(array[i])
    elif len(array)%2==1:
        array2 = []
        for i in range((len(array)//2)+1,len(array)):
            array2.append(array[i])
    
    # Quartile values
    Q1 = np.median(array1)
    Q2 = np.median(array)
    Q3 = np.median(array2)

    # Either define a function to return the desired values or to print arrays and quartiles. 

    return array1,Q1,array,Q2,array2,Q3,Q3-Q1

    '''
    return values in the order - 
    Lower half, First quartile, whole array, second quartile(median of whole array), Upper half, third quartile, IQR = Q3-Q1
    '''

    # Alternatively if you don't want to use the values further you can print all the values by defining it in the function itself.

    '''
    print(f"The sorted array is {array}")
    print(f"The lower half consists of {array1}, and it's Median: Q1 = {Q1}.")
    print(f"The median of entire array {array} is Q2 = {Q2}.")
    print(f"The upper half consists of {array2}, and its Median: Q3 = {Q3}.")
    print(f"The interquartile range, IQR = {IQR}")
    '''



Testing the function for odd and even number of elements in the array

In [2]:
# Odd number of elements in array
array = [5,7,1,4,2,9,10]
array1,Q1,array,Q2,array2,Q3,IQR = quartiles(array)
print(f"The sorted array is {array}")
print(f"The lower half consists of {array1}, and it's Median: Q1 = {Q1}.")
print(f"The median of entire array {array} is Q2 = {Q2}.")
print(f"The upper half consists of {array2}, and its Median: Q3 = {Q3}.")
print(f"The interquartile range, IQR = {IQR}")

The original array is [5, 7, 1, 4, 2, 9, 10]
The sorted array is [1, 2, 4, 5, 7, 9, 10]
The lower half consists of [1, 2, 4], and it's Median: Q1 = 2.0.
The median of entire array [1, 2, 4, 5, 7, 9, 10] is Q2 = 5.0.
The upper half consists of [7, 9, 10], and its Median: Q3 = 9.0.
The interquartile range, IQR = 7.0


In [3]:
#  Even number of elements in array
a = [3,5,7,1,4,2,9,10]
array1,Q1,array,Q2,array2,Q3,IQR = quartiles(a)
print(f"The sorted array is {array}")
print(f"The lower half consists of {array1}, and it's Median: Q1 = {Q1}.")
print(f"The median of entire array {array} is Q2 = {Q2}.")
print(f"The upper half consists of {array2}, and its Median: Q3 = {Q3}.")
print(f"The interquartile range, IQR = {IQR}")

The original array is [3, 5, 7, 1, 4, 2, 9, 10]
The sorted array is [1, 2, 3, 4, 5, 7, 9, 10]
The lower half consists of [1, 2, 3, 4], and it's Median: Q1 = 2.5.
The median of entire array [1, 2, 3, 4, 5, 7, 9, 10] is Q2 = 4.5.
The upper half consists of [5, 7, 9, 10], and its Median: Q3 = 8.0.
The interquartile range, IQR = 5.5


In [4]:
# Test with different array
b = [3,7,8,5,12,14,21,13,18]
array1,Q1,array,Q2,array2,Q3,IQR = quartiles(b)
print(f"The sorted array is {array}")
print(f"The lower half consists of {array1}, and it's Median: Q1 = {Q1}.")
print(f"The median of entire array {array} is Q2 = {Q2}.")
print(f"The upper half consists of {array2}, and its Median: Q3 = {Q3}.")
print(f"The interquartile range, IQR = {IQR}")

The original array is [3, 7, 8, 5, 12, 14, 21, 13, 18]
The sorted array is [3, 5, 7, 8, 12, 13, 14, 18, 21]
The lower half consists of [3, 5, 7, 8], and it's Median: Q1 = 6.0.
The median of entire array [3, 5, 7, 8, 12, 13, 14, 18, 21] is Q2 = 12.0.
The upper half consists of [13, 14, 18, 21], and its Median: Q3 = 16.0.
The interquartile range, IQR = 10.0
