-
-
Notifications
You must be signed in to change notification settings - Fork 383
Conversation
… want to refer to later.
Assuming people will be linked to the README directly, rather than find it in `slides/`
What about mention mixing R with LaTeX or Markdown? |
+1 to @r-gaia-cs 's suggestion to highlight the ability to create nice, well-documented reports from your analyses. These slides do a good job on the long-term motivations for using R, which is that no matter what complicated a problem you will encounter in the future, you will likely not have to start from scratch, and you can also extend it yourself relatively easily. But what about the short term for those completely new to programming? I would argue that if you have a tabular data set, R is the quickest option to start learning about your data as you learn the language. Consider this small example: my_dat <- read.table("data.txt")
summary(my_dat)
boxplot(continous_var ~ categorical_var, data = my_dat) This is so empowering because you can get to this level after only a little bit of learning R. Most other languages have a much steeper learning curve. And even in Python, you would first have to import pandas and matplotlib and be familiar with calling methods with dot notation before accomplishing something similar. |
This looks wonderful, concise and covers all the main selling points. I have two suggestions for additions, perhaps just two bullet points somewhere on the slides: First, I'd also add somewhere that R has a very large community of users who are generally very helpful, and add mention some of the bigger and better sources of free online information like http://stackoverflow.com/questions/tagged/r, http://www.statmethods.net/, http://www.twotorials.com/ and so on. And package authors are generally willing to help users with their packages. This is a very important detail because people who have grown up with a commercial stats package will be anxious about switching to R and not having a help line that they are entitled to call because of their licensing fees. They will want to know that help is available for R, but it's not in the form that they might be used to! Second, I'd make a brief mention of how R improves the reproducibility and transparency of research. People using point-and-click stats packages are typically not very aware of this issue (because of how hard it is to do reproducible research with a point-and-click interface), and as a script-driven environment R gives it to them for free. Using R, a researcher can script an analysis that they can run over and over with different data, with different projects, and give to someone else to use (ie. students) and verify . They can also publish their code online for others to inspect and validate their analyses, and so on. Since the target audience here is mostly non-programmers, this benefit of openness that we get from a stats package based on scripting is likely to make quite an impression. This might be a good topic to connect to LaTeX and Markdown as @jdblischak and @r-gaia-cs suggest. |
All worth mentioning, I didnt think of adding those points: I was trying to
|
A couple of counter arguments:
But I agree with you, definitely the easiest and fastest language for
|
It's fine to have arguments pro and con in motivational slides - if |
Looks great. Listing some cons wise -- I'd just say the syntax is more challenging/frustrating than most, (so that users don't get too discouraged when they struggle with the use of @sritchie73 I'm surprised that you find data cleaning harder in R, I would have listed that as one of it's greatest strengths! Have you had a read through http://vita.had.co.nz/papers/tidy-data.html or more recent http://blog.rstudio.org/2014/07/22/introducing-tidyr/ ? |
@sritchie73 @gvwilson It is important to be neutral in providing pros and cons and some of those cons are very personal.
For cleaning data, I rarely use anything else but R for this. Probably that's because I know a lot more R than bash or python or some other such language. |
Rather than just tell people why R is so great, why not show them an excellent example? One that is used quite often for a more statistically-minded group is creating a bootstrap confidence interval on the kernel density estimate of the Old Faithful Waiting Time data ( Codekde <- with(faithful, density(waiting))
from <- min(kde$x)
to <- max(kde$x)
boots <- with(faithful, replicate(10000, {
samp <- sample(waiting, replace = TRUE)
density(samp, from = from, to = to)$y
}))
ci <- apply(boots, 1, quantile, probs = c(0.025, 0.975))
plot(kde, ylim = range(ci))
polygon(c(kde$x, rev(kde$x)),
c(ci[1, ], rev(ci[2, ])), col = "grey", border = FALSE)
lines(kde, lwd = 2) |
Another +1 to adding a mention of knitr/rmarkdown to create nice reports. People are always impressed when I show them how easy it is to make a nice deliverable (html or pdf) to share with their collaborators and/or PI. I would also mention what a great resource RStudio is for coding in R, particularly for novices - built-in help, objects browser, tab completion(!!), not to mention more advanced things like git integration, etc. This makes R a lot more familiar for people coming from e.g. something like MATLAB. I think it may also be worth mentioning briefly that different disciplines tend to have different 'default/go-to' languages. In ecology, for example, R is certainly the 'go-to', which means a lot of code from manuscripts, or for new analysis methods, is in R. I think other disciplines have other defaults, e.g. python or MATLAB or etc. |
Regarding data cleaning, an R novice might be novice to other languages too. So I don't think it will help him/her to refer to other languages (depends on the audience). Also as a novice, you might prefer to do as much as you can in a single environment (i.e. all in R) |
@dhaine agree completely with both of those points. Seems a more consistent approach with the novice student as someone coming to the command line for the first time. |
Great to see a lot of discussion in the second day of the bootcamp! @cboettig agreed on the syntax, but better to not demotivate novices by saying its hard straight away. Personally I think the major reason things are so difficult is most courses don't teach the basic data structures and how to access them, instead focussing solely on statistics, so we've all had to struggle through it. I'm a big advocate of Hadley Wickham's Advanced R in that regard. I haven't seen @dhaine I completely agree with you. It's not helpful for novices to compare to other languages (unless they come from a Comp Sci background), but we also shouldn't be advocating any one language as the be-all and end-all solution. @gavinsimpson On giving motivating examples, I believe @gvwilson 's intention was to have the pitches to be quite short, ~3 mins each. See |
@sritchie73 as a pitch, you aren't going to need to explain what each line of code does, you just need to explain the general steps (KDE in line 1, bootstrap on lines 4-7, CI on line 8, rest plotting in the above example), point to the efficiency of the small amount of code needed to do this and point to the result. We don't even need to have just one example but perhaps a few to choose from or insert your own favorite. The pitch needs to be more than trust me on these things language x is great because it will save you time / allow you to do x, y, & z once you've invested a bit of time learning. At least that's been my experience. My point re Re your reply to @dhaine I agree that knowing about a range of languages is helpful, right tool for the job and all, but that doesn't mean R isn't an easy or useful language for data manipulation/processing. This isn't a negative against R, it isn't bad at data processing. |
My 2 cents on the motivating example idea. I strongly concur with @gavinsimpson that to gain people's trust, it is best to walk the talk and show them how a few lines of code could do something for them, that would typically take many lines of code in other languages. Having a carefully curated list of examples will allow instructors to pick the one most relevant to their audience. |
@jdblischak @dhaine Please merge if you think this is close enough. |
My understanding is that this content is being merged into #628. |
Created some motivational slides for why you should learn and use R.
Issues that need solving: