/
c-barra-scaling-your-data-infrastructure.json
37 lines (37 loc) · 2.39 KB
/
c-barra-scaling-your-data-infrastructure.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
{
"copyright_text": null,
"description": "**This talk aims to answer a few questions:**\n\n- What do you do when you need to move your model from your laptop to\n production?\n- Is ``big data == I need to use JVM`` the right assumption?\n- How can I put my jupyter notebook in production?\n- How do you apply the best software engineering practices (testing and\n ci for example) inside your data science process?\n- How do you \u201cdecouple\u201d your data scientists, developers and devops\n teams?\n- How do you guarantee the reproducibility of your models?\n- How do you scale your training process when does not fit in memory\n anymore?\n- How do you serve your models and provide an easy rollback system?\n\nThe Agenda:\n\n- The Data Science workflow\n- Scaling is not just a matter of the size of your Data\n- Scaling when the size of your Data matters\n- DDS, Dockerized Data Science\n- Cassiny\n\nI\u2019ll share my experience highlighting some of the challenges I faced and\nthe solutions I came up to answer these questions.\n\nDuring this presentation I will mention libraries like jupyter, atom,\nscikit- learn, dask, ray, parquet, arrow and many others.\n\nThe principles and best practices I will share are something that you\ncan apply, more or less easily, if you are running or in the process to\nrun a production system based on the Python stack.\n\nThis talk will focus on (my) best practices to run the Python Data stack\ntogether and I will also talk about Cassiny, an open source project I\nstarted, that aims to simplify your life if you want to use a completely\nPython based solution in your data science workflow.\n\nin \\_\\_on **Friday 20 April** at 11:00 `**See\nschedule** </en/sprints/schedule/pycon9/>`__\n",
"duration": 2227,
"language": "eng",
"recorded": "2018-04-20",
"related_urls": [
{
"label": "Conference schedule",
"url": "https://www.pycon.it/p3/schedule/pycon9/"
}
],
"speakers": [
"Christian Barra"
],
"tags": [
"Jupyter",
"CloudComputing",
"pydata",
"#lessonslearned",
"Big-Data",
"S3",
"Data-Scientist",
"#amicodialessia",
"java",
"docker",
"cloud"
],
"thumbnail_url": "https://i.ytimg.com/vi/rK4rGrhIWsk/maxresdefault.jpg",
"title": "Scaling your Data infrastructure",
"videos": [
{
"type": "youtube",
"url": "https://www.youtube.com/watch?v=rK4rGrhIWsk"
}
]
}