cleaning up

edwindj · Jul 7, 2016 · ea660ec · ea660ec
2 parents 40416dd + 2e9bd99
commit ea660ec
Show file tree

Hide file tree

Showing 2 changed files with 20 additions and 51 deletions.
diff --git a/useR/lightning.Rmd b/useR/lightning.Rmd
@@ -1,7 +1,7 @@
 ---
 title: "Chunked"
 author: "Edwin de Jonge"
-date: "Statistisc Netherlands / UseR! 2016"
+date: "Statistics Netherlands / UseR! 2016"
 output: 
   beamer_presentation:
     keep_tex: false
@@ -40,7 +40,23 @@ Short answer:
 - Another text file
 - A database
 
-## Option 1: Use unix tools
+## Option 1: Read data with R
+
+### Use:
+
+- ~~~read.csv~~~ uh, `readr::read_csv`
+- `datatable::fread`
+- Fast reading of data into memory!
+
+### However...
+
+- You will need a lot of RAM! 
+- Text files tend to be 1 to 100 Gb.
+- **Even though these procedures use memory mapping the resulting `data.frame` 
+does not!** 
+- development cycle of processing script is long... 
+
+## Option 2: Use unix tools
 
 ### Good choice!
 
@@ -58,7 +74,7 @@ It is nice to stay in `R`-universe (one data-processing tool)
 - Does it work on my OS/shell?
 - I want to use dplyr verbs! (dplyr-deprivation...)
 
-## Option 2: Import data in DB
+## Option 3: Import data in DB
 
 ### Import data into DB
 
@@ -67,25 +83,9 @@ It is nice to stay in `R`-universe (one data-processing tool)
 
 ### However
 
-- That is LET (Load, Extract, Transform) in stead of (Extract, Load, Transform)
 - It is not really a R, but a DB solution
 - May be not efficient.
 
-## Option 3: Read data with R
-
-### Use:
-
-- ~~~read.csv~~~ uh, `readr::read_csv`
-- `datatable::fread`
-- Fast reading of data into memory!
-
-### However...
-
-- You will need a lot of RAM! 
-- Text files tend to be 1 to 100 Gb.
-- **Even though these procedures use memory mapping the resulting `data.frame` 
-does not!** 
-- development cycle of processing script is long... 
 
 ## Process in chunks?
 \begin{center}
@@ -107,37 +107,6 @@ does not!**
 - All `dplyr` verbs on `chunk_wise` objects are recorded and replayed when
 writing.
 
-## Option 4: Use chunked!
-
-### Idea:
-
-- Process data chunk by chunk using `dplyr` verbs
-- Memory efficient, only one chunk at a time in memory
-- Lazy processing
-- Development cycle is short: test on first chunk.
-
-### 
-
-- Read (and write) on chunk at a time using  R package `LaF`.
-- All `dplyr` verbs on `chunk_wise` objects are recorded and replayed when
-writing.
-
-## Scenario 1: TXT -> TXT
-
-### Preprocess a text file with data
-
-```{r}
-read_chunkwise("my_data.csv", chunk_size = 5000) %>% 
- select(col1, col2) %>% 
- filter(col1 > 1) %>% 
- mutate(col3 = col1 + 1) %>% 
-write_chunkwise("output.csv")
-```
-This code:
-
-- evals chunk by chunk
-- allows for column name completion in Rstudio!
-
 ## Scenario 1: TXT -> TXT
 
 ### Preprocess a text file with data
@@ -168,7 +137,7 @@ tbl <-
   mutate(col6 = col1 + col2) %>% 
   write_chunkwise(db, 'my_large_table')
 ```
-## Scenario 2: DB -> TXT
+## Scenario 3: DB -> TXT
 
 ### Extract a large table from a DB to a text file
 

diff --git a/useR/lightning.pdf b/useR/lightning.pdf