Talks given at
- csv, conf, v2: http://csvconf.com 2016 May 03/04, Berlin
- Genentech, 2016 June 23, San Francisco
- BioC 2016: http://bioconductor.org/help/course-materials/2016/BioC2016/ 2016 June 25/26, Stanford
- useR! 2016: http://user2016.org 2016 June 27 - 30, Stanford
- JSM 2016: Reproducibility in Statistics and Data Science, invited session, 2016 Aug 3, Chicago
Browse on Speakerdeck
- The ~1 hour version: https://speakerdeck.com/jennybc/spreadsheets
PDFs here in this repo
- The ~1 hour version: 2016-06_genentech-bioc.pdf warning: big PDF, more pleasant to browse on Speakerdeck
- The 15 minute useR version: 2016-06_useR-stanford.pdf
Video of the 15 min talk at useR! 2016 @ Stanford:
Me, in other places
- twitter: https://twitter.com/JennyBryan
- github: https://github.com/jennybc
- STAT 545 twitter: https://twitter.com/STAT545
- STAT 545 website: http://stat545.com
csv, conf, v1
http://csvconf.com/2014/
Felienne Hermans
Rich FitzJohn
- Research Software Engineer, University College London
- github: https://github.com/richfitz
- twitter: https://twitter.com/rgfitzjohn
- website: http://richfitz.github.io
- https://github.com/richfitz/jiffy
Gordon Shotwell tweets about dystopian moonscapes:
- https://twitter.com/gshotwell/status/646219102609014785
- https://twitter.com/gshotwell/status/577485681146097664
"50 million accountants use monads in Excel. They just don't go around explaining monads to everyone..."
https://twitter.com/tomaspetricek/status/687947134088392704
Felienne Hermans shows an example of monads in Excel @14:45 in this talk: https://vimeo.com/162206549
My middle finger gif re: Excel having had reactivity for years
https://twitter.com/JennyBryan/status/713198745022693377
Not So Standard Deviations (NSSD) Podcast by Hilary Parker and Roger Peng
Episode 9 - Spreadsheet Drama
https://soundcloud.com/nssd-podcast/episode-9-spreadsheet-drama
Philip Guo https://twitter.com/pgbovine blog posts about command bullshittery
- Helping my students overcome command-line bullshittery
- An example of command-line bullshittery in computer science research
Sources for approximate facts re: Excel, R, Python usage
- http://www.wsj.com/articles/do-you-really-need-microsoft-office-anymore-1407873198
- http://news.microsoft.com/bythenumbers/
- https://twitter.com/grammarware/status/709173561663983617 and replies
- https://twitter.com/JennyBryan/status/709252242109440000 and replies
- https://www.quora.com/How-many-people-use-R
- http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html
- https://blog.pythonanywhere.com/67/
Tools used in /r/DataIsBeautiful posts
http://www.randalolson.com/2016/03/11/what-data-visualization-tools-do-rdataisbeautiful-oc-creators-use/
Donald Rumsfeld: "As you know, you go to war with the army you have, not the army you might want or wish to have at a later time."
Chris Albon https://twitter.com/chrisalbon riffing on that w/r/t to a crazy web form: "You go to production with the features you know, not the features you need."
https://twitter.com/chrisalbon/status/720613507457093632
Me https://twitter.com/JennyBryan/status/735671032636276736: "The more spreadsheets I look at, the more I realize ppl use them like I use a git repo. It's how they manage data + "code" + figs & results."
Thread of CRAZY spreadsheet stories, in response to my tweet:
https://twitter.com/JennyBryan/status/722954354198597632
The Kinsey report:
https://en.wikipedia.org/wiki/Kinsey_Reports
https://en.wikipedia.org/wiki/File:Kinsey-Male.jpg
Enron corpus:
- https://en.wikipedia.org/wiki/Enron_Corpus
- Blog post: A modern day Pompeii: Spreadsheets at Enron
- Enron spreadsheets and emails
- Hermans and Murphy Hill paper about the Enron spreadsheets
- Pompeii image https://www.flickr.com/photos/simonpocock/318438014
- https://en.wikipedia.org/wiki/Enron#/media/File:Logo_de_Enron.svg
- Enron sheets shown:
- frank_ermis__11178__Enron Pricing Report.xlsx
- NFL betting series: errol_mclaughlin_jr__XXXXX__weekY.xlsx
Excel color-formatted table with exploding pie chart:
http://pubpages.unh.edu/~mjp262/excel_spreadsheet.jpg
Dog shaming
- http://www.dogshaming.com
- http://giveitlove.com/wp-content/uploads/Dog-Hiding-Between-Two-Couch-Cushions.jpg
xkcd comic:
http://xkcd.com/1667/
http://imgs.xkcd.com/comics/algorithms.png
Advice re: organizing data in spreadsheets
- Tutorial by Karl Broman
- Lesson from Data Carpentry
- Advice from UK Gov’t Statistical Service Good Practice Team
- Luis D. Verde blog post
Examples of new spreadsheet implementations
- Stencila sheets
- Alphasheets
- pyspread
Some of the other csvconf talks from Stencila contributors
Tweet: If your collaborator asks, “In what form would you like the data?” you should respond, “In its current form.”
https://twitter.com/JeanVAdams/status/707241263645392896
Some of the other csvconf talks re: dealing with data in suboptimal form
TableToLongForm R package
- "automatically converts hierarchical Tables intended for a human reader into a simple LongForm Dataframe that is machine readable"
- https://cran.r-project.org/web/packages/TableToLongForm/index.html
- R Journal article: https://journal.r-project.org/archive/2014-2/oh.pdf
R packages that talk to Excel (there are even more!):
- readxl: CRAN, GitHub
- openxlsx: CRAN, GitHub (zip dep.)
- xlsx: CRAN, GitHub (Java dep.)
- XLConnect: CRAN, GitHub (Java dep.)
- gdata: CRAN Perl dep.)
- RODBC: CRAN (Windows only)
Mango blog post about R packages to read/write Excel:
http://www.mango-solutions.com/wp/2015/05/r-the-excel-connection/
googlesheets
package
- On GitHub: https://github.com/jennybc/googlesheets
- On CRAN: https://cran.r-project.org/web/packages/googlesheets/index.html
rsheets
organization holds several packages, mostly WIPs
- Organization: https://github.com/rsheets
- https://github.com/rsheets/cellranger
- https://github.com/rsheets/linen
- https://github.com/rsheets/rexcel
- https://github.com/rsheets/jailbreakr
David Robinson RPub tidying one of the Enron sheets
http://rpubs.com/dgrtwo/tidying-enron
UBC Master of Data Science program
http://mds.science.ubc.ca