Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
faq: Is R used extensively today in data science?
- Loading branch information
Showing
2 changed files
with
19 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# Is R used extensively today in data science? | ||
|
||
"Extensively" is a relative term, so let me discuss this in comparison to other languages. | ||
I would say that R was probably THE language for doing statistics or "data science" work about 5-10 years ago. Today, as the Python sci-stack caught up and keeps growing, it's about as widely used as Python for similar tasks. I can see a shift more towards Python in future though because there seems to be more development going on at the moment towards scalability and computational efficiency. For example, | ||
|
||
- Blaze for out-of-core analysis of big datasets | ||
- Dask for parallel computing on multi-core machines or on a distributed clusters | ||
- Theano and Tensorflow for the optimization and evaluation of mathematical | ||
|
||
expressions involving multi-dimensional arrays utilizing GPUs | ||
and many, many more. Although R is fine for "small scale" analyses, performance can be (become) a big weakness of R for real-world applications. | ||
However, keep in mind that Scala is also big on the rise right now, take Spark for instance. | ||
Eventually, I think it all depends on the task and the problem you'd want to solve. For "smallish" analysis and projects, Python's default sci-stack and R work just fine. For large-scale distributed computing, you'd typically use Spark (written in Scala). For deep learning, you use Theano or Tensorflow (via Python) or Torch (written in Lua). | ||
|
||
(If all you have is a hammer, everything looks like a nail :).) |