In [2]:
%load_ext rpy2.ipython




The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython


---
title: "M2R_Parallel_Quicksort"
output: html_notebook
---
# Task 1: Compute confidence intervals for the data from M2R Parallel quicksort experiment

Data is contained in "measurements_03_47.csv"
First we read the data:

In [3]:
%%R 

data <- read.csv("measurements_03_47.csv")



Then we group the data in new tables, by the experiment types sequential, parallel, and built-in.

In [31]:
%%R
data_sequential = data[c(TRUE,FALSE, FALSE), ] 
data_parallel = data[c(FALSE,TRUE,FALSE), ]
data_built_in = data[c(FALSE, FALSE, TRUE), ]





Let's compute confidence intervals for the mean time for the different sizes:
1. for each size\
compute the sample mean time $S_{5} = \frac{1}{5} \times \sum_{i=1}^{5} x_i$\
compute the sample standard deviation $\sigma = \sum_{i=1}^{5} (x_i - \mu)^2 $
compute the 95% confidence interval = $[\mu - 2 \times \frac{\sigma}{\sqrt{n}} , \mu + 2 \times \frac{\sigma}{\sqrt{n}} ] $
   

In [35]:
%%R
library(dplyr)

data_sequential_mean = data_sequential %>% group_by(Size) %>% summarize(mean = sum(Time)/5) # 5x2 array holding Size and mean

replication_times = c(5, 5, 5, 5, 5) # to extend the mean array to 25x2
data_sequential_mean_long = data_sequential_mean[rep(row.names(data_sequential_mean), times = replication_times),] # 25x2 array holding Size and mean


data_sequential$squared_difference = (data_sequential_mean_long$mean - data_sequential$Time)^2



 [1] 4.000000e-14 4.000000e-14 6.400000e-13 4.000000e-14 4.000000e-14
 [6] 4.000000e-14 3.240000e-12 4.000000e-14 4.000000e-14 1.440000e-12
[11] 6.115240e-09 4.840000e-12 1.918440e-09 2.496400e-10 4.326400e-10
[16] 2.274064e-08 1.317904e-08 1.592644e-08 5.664400e-10 2.663424e-08
[21] 8.510056e-06 4.896484e-06 2.321120e-05 4.149936e-07 1.203535e-05


In [36]:
%%R
data_sequential_variance = data_sequential %>% group_by(Size) %>% summarize(variance = sum(squared_difference)/4)



# A tibble: 5 × 2
     Size variance
    <int>    <dbl>
1     100 2.  e-13
2    1000 1.20e-12
3   10000 2.18e- 9
4  100000 1.98e- 8
5 1000000 1.23e- 5


In [39]:
%%R
data_sequential_stdeviation = (data_sequential_variance$variance)^(1/2)  # list of the standard errors


Now make this list of standard deviations into a list of confidence intervals:

In [29]:
%%R
data_sequential_mean$lower_bound = data_sequential_mean$mean - 2*data_sequential_stdeviation/2
data_sequential_mean$upper_bound = data_sequential_mean$mean + 2*data_sequential_stdeviation/2
data_sequential_mean


# A tibble: 5 × 4
     Size      mean lower_bound upper_bound
    <int>     <dbl>       <dbl>       <dbl>
1     100 0.0000098  0.00000935   0.0000102
2    1000 0.000128   0.000127     0.000129 
3   10000 0.00170    0.00165      0.00174  
4  100000 0.0199     0.0197       0.0200   
5 1000000 0.234      0.230        0.237    


the lower_bound and upper_Bound columns in data_sequential_mean now hold the corresponding confidence intervals fo