/
ronert-obst-dat-tran-pyspark-in-practice.json
26 lines (26 loc) · 2.05 KB
/
ronert-obst-dat-tran-pyspark-in-practice.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
{
"copyright_text": "Standard YouTube License",
"description": "PyData London 2016\n\nIn this talk we will share our best practices of using PySpark in numerous customer facing data science engagements. Topics covered in this talk are:\n\n- Configuration\n- Unit testing with PySpark\n- Integration with SQL on Hadoop engines\n- Data pipeline management and workflows\n- Data Structures (RDDs vs. Data Frames vs. Data Sets)\n- When to use MLlib vs scikit-learn\n- Operationalisation\n\nAt Pivotal Labs we have many data science engagements on big data. Typical problems involve real-time data from sensors collected by telecom operators to GPS data produced by vehicle tracking systems. One widespread framework to solve those inherently difficult problems is Apache Spark. In this talk, we want to share our best practices with PySpark, Spark\u2019s Python API, highlighting our experience as well as dos and don'ts. In particular, we will focus on the whole data science pipeline from data ingestion, data munging and wrangling to the actual model building.\n\nFinally, many businesses have started to realise that there is no return on investment from data science if the models do not go into production. At Pivotal Labs, one our core principle is API first. Therefore, we will also talk how we put our models into production, sharing our hands-on knowledge in this field and also how this fits into test-driven development.\n\nSlides available here: http://pydata2016.cfapps.io/\n\nGitHub Repo: https://github.com/datitran/spark-tdd-example",
"duration": 2567,
"id": 5249,
"language": "eng",
"recorded": "2016-05-11",
"related_urls": [
"http://pydata2016.cfapps.io/",
"https://github.com/datitran/spark-tdd-example"
],
"slug": "ronert-obst-dat-tran-pyspark-in-practice",
"speakers": [
"Ronert Obst",
"Dat Tran"
],
"tags": [],
"thumbnail_url": "https://i.ytimg.com/vi/SETpipUZ_Lc/maxresdefault.jpg",
"title": "PySpark in Practice",
"videos": [
{
"type": "youtube",
"url": "https://www.youtube.com/watch?v=SETpipUZ_Lc"
}
]
}