Skip to content

Commit

Permalink
faq: Is R used extensively today in data science?
Browse files Browse the repository at this point in the history
  • Loading branch information
rasbt committed Feb 14, 2016
1 parent af6e05f commit d749cb1
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 0 deletions.
4 changes: 4 additions & 0 deletions faq/README.md
Expand Up @@ -122,6 +122,10 @@ Sebastian
- [What is Euclidean distance in terms of machine learning?](./euclidean-distance.md)
- [When should one use median, as opposed to the mean or average?](./median-vs-mean.md)

##### Programming Languages for Data Science and ML

- [Is R used extensively today in data science?](./r-in-datascience.md)

<!--- end -->

<br>
Expand Down
15 changes: 15 additions & 0 deletions faq/r-in-datascience.md
@@ -0,0 +1,15 @@
# Is R used extensively today in data science?

"Extensively" is a relative term, so let me discuss this in comparison to other languages.
I would say that R was probably THE language for doing statistics or "data science" work about 5-10 years ago. Today, as the Python sci-stack caught up and keeps growing, it's about as widely used as Python for similar tasks. I can see a shift more towards Python in future though because there seems to be more development going on at the moment towards scalability and computational efficiency. For example,

- Blaze for out-of-core analysis of big datasets
- Dask for parallel computing on multi-core machines or on a distributed clusters
- Theano and Tensorflow for the optimization and evaluation of mathematical

expressions involving multi-dimensional arrays utilizing GPUs
and many, many more. Although R is fine for "small scale" analyses, performance can be (become) a big weakness of R for real-world applications.
However, keep in mind that Scala is also big on the rise right now, take Spark for instance.
Eventually, I think it all depends on the task and the problem you'd want to solve. For "smallish" analysis and projects, Python's default sci-stack and R work just fine. For large-scale distributed computing, you'd typically use Spark (written in Scala). For deep learning, you use Theano or Tensorflow (via Python) or Torch (written in Lua).

(If all you have is a hammer, everything looks like a nail :).)

0 comments on commit d749cb1

Please sign in to comment.