### Calculating Median Stack

Write a function called list_stats that takes a list of numbers and returns a tuple of the median and mean of the list (in this order).

The function should work on lists with even or odd numbers of elements and handle the case of a one-element list.

Your solution cannot use the builtin statistics module.

In [1]:
def list_stats(values):
  N = len(values)
  if N==0:
    return
  
  #mean
  mean = sum(values)/N
  
  #median
  values.sort()   #first sort the list
  mid = int(N/2)  #middle value
  if N % 2 == 0:  #for even lists
    median = (values[mid] + values[mid - 1])/2
    
  else:           #for odd lists
    median = values[mid]

  return median, mean


# You can use this to test your function.
# Any code inside this `if` statement will be ignored by the automarker.
if __name__ == '__main__':
  # Run your function with the first example in the question.
  m = list_stats([1.3, 2.4, 20.6, 0.95, 3.1, 2.7])
  print(m)

  # Run your function with the second example in the question
  m = list_stats([1.5])
  print(m)

(2.55, 5.175)
(1.5, 1.5)


Write a time_stat function to time our statistic implementations.

time_stat should take three arguments: the func function we're timing, the size of the random array to test, and the number of experiments to perform. It should return the average running time for the func function.

We have provided a skeleton time_stat function to show you how func should be called. You should add timing code to this function.

The time for creating new random arrays for each experiment should not be included in the running time.

In [2]:
import numpy as np
import statistics
import time

def time_stat(func, size, ntrials):
  total = 0
  # the time to generate the random array should not be included
  for i in range(ntrials):
    data = np.random.rand(size)
    start = time.perf_counter()
    
  # modify this function to time func with ntrials times using a new random array each time
    res = func(data)
    total += time.perf_counter() - start
  
  # return the average run time
  return total/ntrials

if __name__ == '__main__':
  print('{:.6f}s for statistics.mean'.format(time_stat(statistics.mean, 10**5, 10)))
  print('{:.6f}s for np.mean'.format(time_stat(np.mean, 10**5, 1000)))


0.283361s for statistics.mean
0.000101s for np.mean


Write a median_fits function which takes a list of FITS filenames, loads them into a NumPy array, and calculates the median image (where each pixel is the median of that pixel over every FITS file).

Your function should return a tuple of the median NumPy array, the time it took the function to run, and the amount of memory (in kB) used to store all the FITS files in the NumPy array in memory.

The running time should include loading the FITS files and calculating the median.

In [5]:
import numpy as np   #for median
from astropy.io import fits  #for loading fits file
import time          #for calculating time

def median_fits(filenames):
  start = time.perf_counter()  #start timer
  
  #read all files and store in list
  fits_list = []
  for file in filenames:
    hdulist = fits.open(file)
    fits_list.append(hdulist[0].data)
    hdulist.close()
    
  # Stack image arrays in 3D array for median calculation
  fits_stack = np.dstack(fits_list) 
  median = np.median(fits_stack, axis = 2)
  
  #calculating memory consumed by data
  memory = fits_stack.nbytes/1024  #converting to kb
    
  total_time = time.perf_counter() - start
  return median, total_time, memory

Let's implement the binapprox algorithm to calculate the median of a list of numbers. This algorithm is quite complex, so we'll break it down into managable parts.

Your task is to write two functions:

1.median_bins to calculate the mean, standard deviation and the bins (steps 1-6 on the previous slide) 2.
median_approx which calls median_bins and then calculates the approximated median (steps 7-8).
We will test each function separately.

median_bins(values, B) :
This function takes a list of values and the number of bins, B, and returns the mean μ and standard deviation σ of the values, the number of values smaller than μ−σ, and a NumPy array with B elements containing the bin counts.

median_approx(values, B) :
This function takes the same input as median_bins. It should return the approximate median using median_bins to calculate the bins. Using the same data as above, it should work like this:

In [11]:
# Write your median_bins and median_approx functions here.
import numpy as np

def median_bins(values, B):
  mean = np.mean(values)
  std = np.std(values)
  
  #creating bins
  left_bin = 0    #stores values less than (mean - std)
  bins = np.zeros(B)   #setting bins with counts as zero
  bin_width = 2*std/B
   
  #Bin values
  for value in values:
    if value < mean - std:
      left_bin += 1
    elif value < mean + std:
      bin = int((value - (mean - std)) / bin_width)
      bins[bin] += 1
  
  return mean , std, left_bin, bins

def median_approx(values, B):
  mean, std, left_bin, bins = median_bins(values, B)
  
  #position of middle element
  N = len(values)
  mid = (N + 1)/2
  
  count = left_bin
  for b, bincount in enumerate(bins):
    count += bincount
    
    # Stop when the cumulative count exceeds the midpoint
    if count >= mid:
      break
  
  bin_width = 2*std/B
  median = mean - std + bin_width*(b + 0.5)
  return median

print(median_bins([1, 1, 3, 2, 2, 6], 3))
print(median_approx([1, 1, 3, 2, 2, 6], 3))

print(median_bins([1, 5, 7, 7, 3, 6, 1, 1], 4))
print(median_approx([1, 5, 7, 7, 3, 6, 1, 1], 4))


(2.5, 1.707825127659933, 0, array([2., 3., 0.]))
2.5
(3.875, 2.521780125229002, 3, array([0., 1., 1., 1.]))
4.50544503130725
