In [189]:
%load_ext rpy2.ipython




The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython


---
title: "M2R_Parallel_Quicksort"
output: html_notebook
---
# Task 1: Compute confidence intervals for the data from M2R Parallel quicksort experiment

Data is contained in "measurements_03_47.csv"
First we read the data:

In [190]:
%%R 

data <- read.csv("measurements_03_47.csv")



Then we group the data in new tables, by the experiment types sequential, parallel, and built-in.

In [191]:
%%R
data_sequential = data[c(TRUE,FALSE, FALSE), ] 
data_parallel = data[c(FALSE,TRUE,FALSE), ]
data_built_in = data[c(FALSE, FALSE, TRUE), ]




Let's compute confidence intervals for the mean time for the different sizes:
1. for each size\
compute the sample mean time $S_{5} = \frac{1}{5} \times \sum_{i=1}^{5} x_i$\
compute the sample standard deviation $\sigma = \sum_{i=1}^{5} (x_i - \mu)^2 $
compute the 95% confidence interval = $[\mu - 2 \times \frac{\sigma}{\sqrt{n}} , \mu + 2 \times \frac{\sigma}{\sqrt{n}} ] $
   

In [192]:
%%R
library(dplyr)

data_sequential_mean = data_sequential %>% group_by(Size) %>% summarize(mean = sum(Time)/5) # 5x2 array holding Size and mean

replication_times = c(5, 5, 5, 5, 5) # to extend the mean array to 25x2
data_sequential_mean_long = data_sequential_mean[rep(row.names(data_sequential_mean), times = replication_times),] # 25x2 array holding Size and mean


data_sequential$squared_difference = (data_sequential_mean_long$mean - data_sequential$Time)^2


data_sequential

      Size        Type     Time squared_difference
1      100  Sequential 0.000010       4.000000e-14
4      100  Sequential 0.000010       4.000000e-14
7      100  Sequential 0.000009       6.400000e-13
10     100  Sequential 0.000010       4.000000e-14
13     100  Sequential 0.000010       4.000000e-14
16    1000  Sequential 0.000128       4.000000e-14
19    1000  Sequential 0.000126       3.240000e-12
22    1000  Sequential 0.000128       4.000000e-14
25    1000  Sequential 0.000128       4.000000e-14
28    1000  Sequential 0.000129       1.440000e-12
31   10000  Sequential 0.001774       6.115240e-09
34   10000  Sequential 0.001698       4.840000e-12
37   10000  Sequential 0.001652       1.918440e-09
40   10000  Sequential 0.001680       2.496400e-10
43   10000  Sequential 0.001675       4.326400e-10
46  100000  Sequential 0.020040       2.274064e-08
49  100000  Sequential 0.020004       1.317904e-08
52  100000  Sequential 0.019763       1.592644e-08
55  100000  Sequential 0.019913

In [195]:
%%R
data_sequential_variance = data_sequential %>% group_by(Size) %>% summarize(variance = sum(squared_difference)/5)
data_sequential_variance

# A tibble: 5 × 2
     Size variance
    <int>    <dbl>
1     100 1.6 e-13
2    1000 9.60e-13
3   10000 1.74e- 9
4  100000 1.58e- 8
5 1000000 9.81e- 6


In [199]:
%%R
data_sequential_stderror = (data_sequential_variance$variance)^(1/2) / 5 # list of the standard errors
data_sequential_stderror

[1] 8.000000e-08 1.959592e-07 8.352628e-06 2.514706e-05 6.265338e-04


Now make this list of standard errors into a list of confidence intervals:

In [201]:
%%R
confidence_intervals = array(dim = c(5,2))
colnames(confidence_intervals) = c("lower_bound", "upper_bound")
confidence_intervals$lower_bound = data_sequential_mean$mean - 2 * data_sequential_stderror$
confidence_intervals

[[1]]
[1] NA

[[2]]
[1] NA

[[3]]
[1] NA

[[4]]
[1] NA

[[5]]
[1] NA

[[6]]
[1] NA

[[7]]
[1] NA

[[8]]
[1] NA

[[9]]
[1] NA

[[10]]
[1] NA

$lower_bound
[1] 0.0000096400 0.0001274081 0.0016790947 0.0198389059 0.2323121324



In confidence_intervals$lower_bound = data_sequential_mean$mean -  :
  Coercing LHS to a list
