developed by
Feng Li
School of Statistics and Mathematics
Central University of Finance and Economics
feng.li@cufe.edu.cn
由中央财经大学统计与数学学院李丰建设。
- 
Distributed Statistical Computing for Big Data and Case Studies (大数据分布式计算与案例) ISBN:9787300230276 - Available at JD.COM
 
- 
New version (In Preparation) 
You coud view all the notebooks in this repository via the Jupyter Notebook Viewer
Requirements to run the notebook interactively
- 
Python (>= 3.6.0) - findspark(invoke Spark from Python Session)
- numpy,- scipy,- pandas
 
- 
Hadoop (>= 2.7.0) 
- 
Hive (>= 2.3.3) 
- 
Spark (>= 2.3.1) 
- 
Jupyter Notebook (>= 5.0) - 
RISE (for Jupyter slides) Use Alt+Rto enter sildeshow mode
- 
Bash Kernel (for Linux and Hadoop, Hive, Spark batch mode) 
- 
IPython kernel for Python 3 (for Interactive PySpark Sessions) 
- 
HiveQL Kernel (for Interactive Hive Sessions) 
- 
Spark Toree (for Interactive Spark Scala Sessions) 
 
-