pydata-meetups/videos/victor-makarenkov-easy-spark-exploiting-large-datasets-for-multi-class-classification.json

{
  "description": "Apache Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance. At first glance, it seems that getting started with programming the Hadoop eco-system is quite cumbersome, and not so user-friendly for a data scientist or a machine learning specialist. In this talk I will briefly introduce Apache Spark, and its programming paradigm. I will show how to easily execute a distributed training of the common multi-class classifiers (na\u00efve Bayes, random forest, logistic regression), without installing a single virtual machine, virtual box or a docker. I will share my experience of managing long-term software projects which are based on the Hadoop technology for data storage, extraction and transformation.",
  "duration": 2173,
  "language": "eng",
  "recorded": "2017-04-14",
  "related_urls": [
    "https://github.com/vicmak/Mining-Massive-Datasets"
  ],
  "speakers": [
    "Victor Makarenkov"
  ],
  "tags": [],
  "thumbnail_url": "https://i.ytimg.com/vi/lLDE5y_yJSs/hqdefault.jpg",
  "title": "Easy Spark: Exploiting large datasets for multi-class classification",
  "videos": [
    {
      "type": "youtube",
      "url": "https://www.youtube.com/watch?v=lLDE5y_yJSs"
    }
  ]
}