Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
821 lines (526 sloc) 13.4 KB
<style>
.reveal pre code {
font-size: 1em;
}
.midcenter {
position: fixed;
top: 50%;
left: 50%;
}
.footer {
position: fixed;
top: 90%;
text-align:right;
width:90%;
margin-top:-150px;
}
.reveal section img {
border: 0px;
box-shadow: 0 0 0 0;
}
</style>
R 4 hackers
========================================================
author:
date:
autosize: true
width: 1440
incremental:true
Hello World
========================================================
```{r setup, include=FALSE}
opts_chunk$set(cache=TRUE)
opts_chunk$set(fig.align='left')
```
```{r, echo=FALSE}
library(ggplot2)
library(GGally)
library(dplyr)
library(forecast)
library(purrr)
```
&nbsp;
... that is, _Data Science Hello World_.
We got some data...
========================================================
&nbsp;
Sure, first we ALWAYS do some data exploration.
```{r}
data(longley)
head(longley)
```
Sure, first we ALWAYS visualize...
========================================================
&nbsp;
```{r}
ggpairs(longley)
```
But then: Hello World!
========================================================
&nbsp;
Linear models.
&nbsp;
```{r}
fit <- lm(Employed ~ GNP, longley)
```
&nbsp;
Now what can we do with this thing returned by lm?
Print it
========================================================
&nbsp;
```{r}
# equivalently: print(fit)
fit
```
Output a summary
========================================================
&nbsp;
```{r}
summary(fit)
```
Use it to make predictions
========================================================
&nbsp;
```{r}
predict(fit, newdata = data.frame(GNP = c(222, 223)))
```
We can plot it...
========================================================
&nbsp;
```{r}
par(mfrow=c(2,2))
plot(fit)
dev.off()
```
And even do some fancy stuff.
========================================================
&nbsp;
Like extracting the Akaike Information Criterion ...
```{r}
extractAIC(fit)
```
Or getting confidence intervals for coefficients.
```{r}
confint(fit)
```
Let's see what other objects we can print, plot, get a summary of!
========================================================
&nbsp;
Data frames (1)
========================================================
&nbsp;
```{r}
df <- data.frame(x = 1:8, y = cumsum(rnorm(8)))
df
summary(df)
```
Data frames (2)
========================================================
&nbsp;
```{r}
plot(df)
```
Time series objects (1)
========================================================
&nbsp;
```{r}
ts <- ts(cumsum(round(rnorm(120), 2)), start = c(2004,12), frequency = 12)
ts
summary(ts)
```
Time series objects (2)
========================================================
&nbsp;
```{r}
plot(ts)
```
Things we can get a summary of (but not plot) ...
========================================================
&nbsp;
```{r}
m <- matrix(1:10, nrow = 2)
summary(m)
```
Things we can plot (but not get a summary of)
========================================================
&nbsp;
```{r}
plot(function(x) x^2, from = -2, to = 2)
```
What is going on?
========================================================
&nbsp;
- R has several OO systems (on top of base (internal) objects)
- the oldest and most widely used is S3
S3
========================================================
&nbsp;
- generic function OO (instead of the more common message-passing OO)
- methods belong to (generic) functions, not to classes!
- _UseMethod()_ performs method dispatch
```{r}
print
```
```{r}
UseMethod
```
In S3, there are no formal class definitions
========================================================
&nbsp;
```{r}
# a bike constructor
bike <- function(type, color) {structure (list(type = type, color = color), class = 'bike')}
# create an instance of class bike
mybike <- bike('cyclocross', 'green')
class(mybike)
```
```{r}
# still prints like a list
mybike
```
Define a print method for bikes
========================================================
&nbsp;
```{r}
# methods are called <funcname>.<classname>
print.bike <- function(b) {print(paste0('a ', b$color, ' ', b$type, ' bike'))}
mybike
```
With S3, you could shoot yourself in the foot...
========================================================
&nbsp;
... or just ... don't.
```{r}
# what if we just change the class
class(mybike) <- 'lm'
# and then print it
mybike
# let's undo this ASAP ;-)
class(mybike) <- 'bike'
# and nothing's broken
mybike
```
S3 "inheritance" is informal as well
========================================================
```{r}
ebike <- function(type, color) {
parent <- bike(type, color)
structure (c(unclass(parent), motor = TRUE), class = c('ebike', class(parent)))}
theotherpersonsbike <-ebike('mountain', 'red')
class(theotherpersonsbike)
theotherpersonsbike
```
```{r}
print.ebike <- function(b) {
ptext <- NextMethod()
print('... with a motor!')
}
theotherpersonsbike
```
S3: Create your own generic
========================================================
```{r, echo=FALSE}
accelerate_until_at <- function(target_speed) {
paste('Now accelerating to', target_speed, 'km/h')
}
```
```{r}
# create a generic function that calls UseMethod to do the dispatching
speed_up <- function(object, ...) UseMethod("speed_up")
# create an implementation for our bike class
speed_up.bike <- function(object, target_speed) {
accelerate_until_at(target_speed)
}
speed_up(mybike, target_speed = 33)
```
```{r}
# also of course create an implementation for the e-bike
speed_up.ebike <- function(object, target_speed) {
adjusted_speed <- ifelse(target_speed <= 25, target_speed, 25) # ... you've seen that coming
accelerate_until_at(adjusted_speed)
}
speed_up(theotherpersonsbike, target_speed = 33)
```
S3: What happens if there is no implementation for a class?
========================================================
&nbsp;
The default method for a function will be used.
Remember confint() from above?
```{r}
# these implementations exist for confint:
methods('confint')
```
```{r}
data("lynx")
fit <- auto.arima(lynx)
# same as explicitly calling confint.default
confint(fit)
```
OO wrap-up: Other systems
========================================================
&nbsp;
- S4:
- more formal than S3 (formal class definitions)
- but methods still belong to functions, not classes
- Reference classes (RC):
- methods belong to objects, not functions
- objects are mutable (the usual R copy-on-modify semantics do not apply)
On to...
========================================================
&nbsp;
functional programming!
The Magic Three: map, fold, and filter
========================================================
&nbsp;
Magic Three in Haskell:
- $map$: map a function over a list of elements
```
λ: map (+1) [1..10]
[2,3,4,5,6,7,8,9,10,11]
```
- $filter$: filter a list of elements according to some predicate
```
λ: filter even [1..10]
[2,4,6,8,10]
```
- $fold$: combine values recursively (a.k.a. $reduce$ (Clojure, Java, Python...), $apply$ (Scheme, ...))
```
λ: foldl (+) 0 [1..10]
55
```
Mapping in R (1): meet the APPLY family
========================================================
&nbsp;
- apply, lapply, sapply, vapply, mapply, tapply ... ough!
- Basic question: What data structure(s) am I working with?
- one-dimensional?
- more than one dimension?
- more than one data structure?
The apply family (1): just ... apply
========================================================
&nbsp;
Use with more-than-one-dimensional data structures: data.frame, matrix, array
```{r}
m <- matrix(1:12, nrow = 3, ncol = 4)
m
# apply mean to the columns
apply(m, 2, mean)
# apply mean to the rows
apply(m, 1, mean)
```
The apply family (2): lapply and friends
========================================================
Use with one-dimensional stuff (list, vector)
- lapply: outputs a list
```{r}
mychars <- c("a", "b"); str(lapply(mychars, toupper))
```
- sapply: simplifies the result
```{r}
mychars <- c("a", "b"); str(sapply(mychars, toupper))
```
- vapply: returns requested type
```{r}
mychars <- c("a", "b"); str(vapply(mychars, utf8ToInt, integer(1)))
```
Aside: What's the problem with sapply? (1)
========================================================
&nbsp;
```{r}
# a list of 3
l1 <-list(
col1 = "a",
col2 = "b",
col3 = c("c", "d")
)
str(l1)
# a list of 2
l2 <- l1[1:2]
str(l2)
```
Aside: What's the problem with sapply? (2)
========================================================
&nbsp;
```{r}
# upper case everything
u1 <- sapply(l1, toupper)
# result is still a list!
str(u1)
u2 <- sapply(l2, toupper)
# result is a vector!
str(u2)
```
Mapping in R (2): Map
========================================================
&nbsp;
Yes, we have them in R, too:
- Map
- Filter
- Reduce
- (plus Find, Negate, and Position)
Redoing the Magic Three, in R
========================================================
&nbsp;
Map
```{r}
# same as lapply(1:10, function(x) x+1), but see order of args!
m <- Map(function(x) x+1, 1:10)
```
Filter
```{r}
# same as lapply(1:10, function(x) x+1), but see order of args!
Filter(function(x) x %% 2 == 0, 1:10)
```
Reduce
```{r}
# same as lapply(1:10, function(x) x+1), but see order of args!
Reduce(`+`, 1:10)
```
Mapping in R (3): Typesafe mapping with purrr
========================================================
&nbsp;
```{r}
m <- map_dbl(1:10, function(x) x+1)
# but
m2 <- map_chr(letters[1:3], toupper)
```
&nbsp;
So ... what is purrr?
![](purrr.png)
- functional programming package for R, by Hadley Wickham
- not just the "big three"...
Functional programming with purrr (examples)
========================================================
&nbsp;
Too verbose?
```{r}
m <- map_dbl(1:10, function(x) x+1)
```
How about partial application:
```{r}
m <- map_dbl(1:10, partial(`+`,1))
```
Or, how about function composition?
```{r}
inttolower <- compose(tolower, intToUtf8)
inttolower(65:68)
```
OK. Time for the real internals ...
========================================================
&nbsp;
We've seen objects, we've seen functions.
But what's R _basically_ made of?
Object types: class(), typeof(), mode() ... oh my!
========================================================
&nbsp;
Just wanna use R? Use _class()_:
```{r}
myfunc <- function(x) x + 1
tests1 <- c(`<-`, `if`, `[`, length, c, sum, nrow, eval, myfunc)
sapply(tests1, class)
```
For the user, these are all _functions_.
Even though they do such different things as
- assignment (x <- 1)
- constructing new objects (x <- c(1,2))
- branching (if)
typeof() tells about the internal object type:
========================================================
&nbsp;
```{r}
tests1 <- c(`<-`, `if`, `[`, length, c, sum, nrow, eval, myfunc)
sapply(tests1, typeof)
```
So we have three different corresponding object types:
- specials,
- builtins, and
- closures.
Closures (1)
========================================================
&nbsp;
Every user-defined function is a closure.
With closures, we can conveniently print the source code on the console:
```{r}
nrow
```
Closures (2)
========================================================
&nbsp;
Closures have formals, a body, and an associated environment.
```{r}
c(formals(myfunc), body(myfunc), environment(myfunc))
```
Let's try this with eval!
========================================================
&nbsp;
(Remember, this was a closure, too.)
```{r}
body(eval)
```
Oops!
So, for the .Internals...
========================================================
&nbsp;
For .Internal and .Primitive functions (the "builtins" above),
__$R_source/src/main/names.c__
contains the mapping to the corresponding C function:
![](names.c.png)
And this is (the beginning of) do_eval
========================================================
&nbsp;
![](do_eval.png)
Notice something?
Yes. There's some LISP in there
========================================================
&nbsp;
Not just the CARs, CADRs, CADDRs...
... S-expressions...
... the whole idea of environments and closures in current R is modeled after Lisp.
(No time now and here, but there's always SICP to read up on environments etc.)
![](sicp.jpg)
What's so special about specials?
========================================================
&nbsp;
Specials get their arguments passed in quoted and decide themselves when to evaluate what.
What do you think will happen here?
```{r}
# no confint.bike defined for bikes -> confint.default will get called
# but confint.default needs some other methods that do not exist
if(1 > 0) 1 else confint(mybike)
```
Just so you believe me an error _would_ get generated if confint(mybike) were called.
```{r, error=TRUE}
confint(mybike)
```
I promise you (1)
========================================================
&nbsp;
Here we have a user-defined function, f. What will happen?
```{r, error=TRUE}
f <- function(exp1, exp2) {
exp1
}
f(confint(mybike), f)
f(123, confint(mybike))
```
&nbsp;
Closures evaluate their arguments lazily (by need).
I promise you (2)
========================================================
&nbsp;
As users, we can create _promises_, too:
```{r, error=TRUE}
# normal assignment - this can't work
x <- confint(mybike)
# promise to evaluate when needed
# this works without error
delayedAssign('x', confint(mybike))
# here it gets evaluated
x
```
I promise you (3)
========================================================
&nbsp;
... this could go on for quite some time ... but ;-)
Thanks a lot for your attention!
:-)