You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"description": "Apache Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance. At first glance, it seems that getting started with programming the Hadoop eco-system is quite cumbersome, and not so user-friendly for a data scientist or a machine learning specialist. In this talk I will briefly introduce Apache Spark, and its programming paradigm. I will show how to easily execute a distributed training of the common multi-class classifiers (na\u00efve Bayes, random forest, logistic regression), without installing a single virtual machine, virtual box or a docker. I will share my experience of managing long-term software projects which are based on the Hadoop technology for data storage, extraction and transformation.",