Permalink
Browse files

first draft with working boxplots

  • Loading branch information...
1 parent f71b94c commit 0ffe2424424e60ea3f8ca9981ab388f647f65741 @novoid committed May 15, 2012
Showing with 171 additions and 24 deletions.
  1. +153 −24 iKNOW2012-orgmode-demo.org
  2. BIN iKNOW2012-orgmode-demo.pdf
  3. +18 −0 refinding_folders.csv
View
@@ -122,30 +122,32 @@ languages, and the describing text of a paper in one single file.
\cite{Schulte2012}
-** Formal Experiment
+** Formal Experiment
In \cite{Voit2011} the authors describe a formal experiment conducted
with 18 test persons in the field of information retrieval. The
original data set is available
online\footnote{https://github.com/novoid/2011-01-tagstore-formal-experiment}.
This paper here demonstrates FIXXME\todo{add introduction text}
+*** Reading in refinding tagstore values
+
Reading in raw data related to seconds per task from CSV file:
-The following shell commands read in a CSV file, removes all values
+The following shell commands reads in a CSV file, removes all values
before the character ";" (thus removing all values related to number
of mouse clicks), removes all incomplete lines (containing the string
\enquote{TC}), and removes the header line as well (using the
\texttt{tail} command):
-#+NAME: time-per-task
+#+NAME: TS-time-per-task
#+BEGIN_SRC sh :exports both
sed 's/.*;//' refinding_tagstore.csv | \
grep -v "TC" | \
tail -n +2
#+END_SRC
-#+RESULTS: time-per-task
+#+RESULTS: TS-time-per-task
| 5.7 | 3.8 | 4.3 | 2.4 | 4.3 | 3.2 |
| 6.0 | 2.9 | 4.3 | 4.6 | 4.4 | 3.3 |
| 5.4 | 3.2 | 6.1 | 6.5 | 5.7 | 4.4 |
@@ -162,46 +164,173 @@ sed 's/.*;//' refinding_tagstore.csv | \
| 5.0 | 5.6 | 6.0 | 14.2 | 2.0 | 6.6 |
| 6.7 | 5.6 | 7.2 | 12.3 | 5.1 | 6.6 |
-
+*** Generating mean values
In the next step, the mean values per test person will be calculated
using the programming language Python:
-#+NAME: calculate-mean-values
-#+BEGIN_SRC python :var mytable=time-per-task :exports both
+#+NAME: calculate-TS-mean-values
+#+BEGIN_SRC python :var mytable=TS-time-per-task :exports code
import numpy
return [round(numpy.average(row),2) for row in mytable]
#+END_SRC
-#+RESULTS: calculate-mean-values
+#+RESULTS: calculate-TS-mean-values
| 3.95 | 4.25 | 5.22 | 6.2 | 5.27 | 10.27 | 4.63 | 5.97 | 3.27 | 4.55 | 4.55 | 6.92 | 7.48 | 6.57 | 7.25 |
+This time, the (long) output list of mean values is being suppressed
+for layout purposes.
+*** Sorting values
+#+NAME: TS-sort-mean-values
+#+BEGIN_SRC sh :var myvalues=calculate-TS-mean-values :exports both
+echo ${myvalues} | sed 's/ /\n/g' | sort -nr
+#+END_SRC
-** Test :noexport:
+#+RESULTS: TS-sort-mean-values
+| 10.27 |
+| 7.48 |
+| 7.25 |
+| 6.92 |
+| 6.57 |
+| 6.2 |
+| 5.97 |
+| 5.27 |
+| 5.22 |
+| 4.63 |
+| 4.55 |
+| 4.55 |
+| 4.25 |
+| 3.95 |
+| 3.27 |
+
+*** Process folder values
+
+#+NAME: F-time-per-task
+#+BEGIN_SRC sh :exports both
+sed 's/.*;//' refinding_folders.csv | \
+ grep -v "TC" | \
+ tail -n +2
+#+END_SRC
-#+name: directory-pie-chart(dirs = directories)
-#+begin_src R :session R-pie-example :file ../../images/babel/dirs.png
- pie(dirs[,1], labels = dirs[,2])
-#+end_src
+#+RESULTS: F-time-per-task
+
+| 6.7 | 5.4 | 2.4 | 3.9 | 3.6 | 3.8 |
+| 5.4 | 3.1 | 3.4 | 3.5 | 3.3 | 3.6 |
+| 6.5 | 6.6 | 4.0 | 4.4 | 4.0 | 5.1 |
+| 3.0 | 3.3 | 3.7 | 7.1 | 2.8 | 4.3 |
+| 6.6 | 3.6 | 10.3 | 4.6 | 5.4 | 3.7 |
+| 2.7 | 3.2 | 9.4 | 18.0 | 4.7 | 3.8 |
+| 7.0 | 3.7 | 8.1 | 4.9 | 5.2 | 5.2 |
+| 34.1 | 2.8 | 8.9 | 8.9 | 3.1 | 8.3 |
+| 4.0 | 2.9 | 3.6 | 5.7 | 5.0 | 5.5 |
+| 4.8 | 1.4 | 3.5 | 3.5 | 3.3 | 1.9 |
+| 42.9 | 1.9 | 12.3 | 5.8 | 7.6 | 3.4 |
+| 7.0 | 5.2 | 5.0 | 3.8 | 5.1 | 4.2 |
+| 19.3 | 1.6 | 11.9 | 7.0 | 3.9 | 4.0 |
+| 6.6 | 6.6 | 4.6 | 7.5 | 3.8 | 5.2 |
+| 6.0 | 3.2 | 5.1 | 4.4 | 5.9 | 4.0 |
+| 4.6 | 1.6 | 3.4 | 4.1 | 4.4 | 3.8 |
+| 7.1 | 4.5 | 7.0 | 7.6 | 5.5 | 7.5 |
+
+
+#+NAME: calculate-F-mean-values
+#+BEGIN_SRC python :var mytable=F-time-per-task :exports code
+import numpy
+return [round(numpy.average(row),2) for row in mytable]
+#+END_SRC
+
+#+RESULTS: calculate-F-mean-values
+| 4.3 | 3.72 | 5.1 | 4.03 | 5.7 | 6.97 | 5.68 | 11.02 | 4.45 | 3.07 | 12.32 | 5.05 | 7.95 | 5.72 | 4.77 | 3.65 | 6.53 |
+
+#+NAME: F-sort-mean-values
+#+BEGIN_SRC sh :var myvalues=calculate-F-mean-values :exports both
+echo ${myvalues} | sed 's/ /\n/g' | sort -nr
+#+END_SRC
+
+#+RESULTS: F-sort-mean-values
+| 12.32 |
+| 11.02 |
+| 7.95 |
+| 6.97 |
+| 6.53 |
+| 5.72 |
+| 5.7 |
+| 5.68 |
+| 5.1 |
+| 5.05 |
+| 4.77 |
+| 4.45 |
+| 4.3 |
+| 4.03 |
+| 3.72 |
+| 3.65 |
+| 3.07 |
+
+
+*** Plotting data
+
+#+NAME: boxplot-data
+#+BEGIN_SRC R :var TSdata=TS-sort-mean-values :var Fdata=F-sort-mean-values :exports code :results none
+png('my_boxplot_data.png')
+mFdata=c(4.3, 3.72, 5.1, 4.03, 5.7, 6.97, 5.68, 11.02, 4.45, 3.07, 12.32, 5.05, 7.95, 5.72, 4.77, 3.65, 6.53)
+mTSdata=c(3.95, 4.25, 5.22, 6.2, 5.27, 10.27, 4.63, 5.97, 3.27, 4.55, 4.55, 6.92, 7.48, 6.57, 7.25)
+#par(mai=c(0.8,0.8,0,0), omd=c(0,0.5,0,1))
+# bot, lef, top, rig
+boxplot( list(mTSdata, mFdata),
+ names=c("tagstore", "folders"),
+ xlab="Task Times", ylab="Seconds",
+ pars = list(boxwex = 0.3, staplewex = 0.5,
+ boxfill="lightblue"))
+#+END_SRC
+
+#+RESULTS: boxplot-data
+
+
+#+ATTR_LaTeX: width=0.5\textwidth
+#+CAPTION: Comparison of the two task conditions for re-finding: tagstore and folders. There is no significant difference between the two conditions.
+[[file:my_boxplot_data.png]]
+
+
+#+NAME: draw-histogram
+#+BEGIN_SRC python :var myvalues=TS-sort-mean-values :exports both
+import numpy as np
+import matplotlib.pyplot as plt
+
+#n, bins, patches = plt.hist(myvalues, histtype="bar")
+plt.xlabel('Sorted Average Task Times')
+plt.ylabel('Seconds')
+plt.bar(range(1,len(myvalues)+1), myvalues)
+
+plt.savefig("my_hist.png", format="png")
+#+END_SRC
+
+** boxplot-test
+
+#+NAME: boxplot-test
+#+BEGIN_SRC R :var mydata=TS-sort-mean-values :exports code :results none
+png('my_boxplot_test.png')
+#lmts <- range(x1,x2,y1,y2)
+par(mfrow = c(1, 2))
+boxplot(mydata, mydata, xlab="x")
+#+END_SRC
-#+tblname: mean-values
-| 5.7 |
-#+TBLFM: @1$1=remote(time-per-task, @1$1)
+#+RESULTS: boxplot-test
-#+TBLFM: @1$1..@6$15=remote(time-per-task, @1$1..@6$15)
-: $2=vmean($1)::
+** Overview
-** Tables :noexport:
+#+CAPTION: Overview of the input values, execution languages, and output values.
+#+LABEL: tab:overview-values-languages
+| *Input* | *Language* | *Output* |
+|----------------------------------+------------+-----------------------|
+| \texttt{refinding\_tagstore.csv} | shell | task time values |
+| task time values | Python | average time values |
+| average time values | shell | sorted numbers |
+| average time values | R | boxplot of times |
-#+CAPTION: [Short caption]{Long caption}
-#+LABEL: tab:my-table
-| *Head1* | *Head2* |
-|---------+---------|
-| foo | bar |
** End
View
Binary file not shown.
View
@@ -0,0 +1,18 @@
+Mouseclicks;Time per Task
+3, 3, 2, 3, 3, 3;'6.7', '5.4', '2.4', '3.9', '3.6', '3.8'
+3, 3, 3, 3, 3, 3;'5.4', '3.1', '3.4', '3.5', '3.3', '3.6'
+4, 5, 3, 3, 3, 3;'6.5', '6.6', '4.0', '4.4', '4.0', '5.1'
+2, 2, 2, 4, 2, 3;'3.0', '3.3', '3.7', '7.1', '2.8', '4.3'
+2, 3, 2, 2, 3, 2;'6.6', '3.6', '10.3', '4.6', '5.4', '3.7'
+2, 2, 3, 5, 2, 2;'2.7', '3.2', '9.4', '18.0', '4.7', '3.8'
+5, 3, 5, 4, 3, 4;'7.0', '3.7', '8.1', '4.9', '5.2', '5.2'
+18, 2, 4, 5, 2, 5;'34.1', '2.8', '8.9', '8.9', '3.1', '8.3'
+2, 2, 2, 3, 2, 3;'4.0', '2.9', '3.6', '5.7', '5.0', '5.5'
+2, 2, 2, 2, 2, 2;'4.8', '1.4', '3.5', '3.5', '3.3', '1.9'
+13, 2, 5, 4, 3, 3;'42.9', '1.9', '12.3', '5.8', '7.6', '3.4'
+4, 3, 4, 2, 3, 2;'7.0', '5.2', '5.0', '3.8', '5.1', '4.2'
+7, 2, 5, 4, 3, 3;'19.3', '1.6', '11.9', '7.0', '3.9', '4.0'
+2, 2, 3, 4, 3, 2;'6.6', '6.6', '4.6', '7.5', '3.8', '5.2'
+3, 3, 3, 3, 5, 3;'6.0', '3.2', '5.1', '4.4', '5.9', '4.0'
+3, 2, 3, 2, 3, 2;'4.6', '1.6', '3.4', '4.1', '4.4', '3.8'
+2, 2, 2, 2, 2, 3;'7.1', '4.5', '7.0', '7.6', '5.5', '7.5'

0 comments on commit 0ffe242

Please sign in to comment.