-
Notifications
You must be signed in to change notification settings - Fork 265
/
chloe-mawer-jonathan-whitmore-exploratory-data-analysis-in-python-pycon-2017.json
18 lines (18 loc) · 2.73 KB
/
chloe-mawer-jonathan-whitmore-exploratory-data-analysis-in-python-pycon-2017.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
"description": "With the recent advancements in machine learning algorithms and\nstatistical techniques, and the increasing ease of implementing them in\nPython, it is tempting to ignore the power and necessity of exploratory\ndata analysis (EDA), the crucial step before diving into machine\nlearning or statistical modeling. Simply applying machine learning\nalgorithms without a proper orientation of the dataset can lead to\nwasted time and spurious conclusions. EDA allows practitioners to gain\nintuition for the pattern of the data, identify anomalies, narrow down a\nset of alternative modeling approaches, devise strategies to handle\nmissing data, and ensure correct interpretation of the results. Further,\nEDA can rapidly generate insights and answer many questions without\nrequiring complex modeling.\n\nPython is a fantastic language not only for machine learning, but also\nEDA. In this tutorial, we will walk through two hands-on examples of how\nto perform EDA using Python and discuss various EDA techniques for\ncross-section data, time-series data, and panel data. One example will\ndemonstrate how to use EDA to answer questions, test business\nassumptions, and generate hypotheses for further analysis. The other\nexample will focus on performing EDA to prepare for modeling. Between\nthese two examples, we will cover:\n\n- Data profiling and quality assessment\n- Basic describing of the data\n- Visualizing the data including interactive visualizations\n- Identifying patterns in the data (including patterns of correlated\n missing data)\n- Dealing with many attributes (columns)\n- Dealing with large datasets using sampling techniques\n- Informing the engineering of features for future modeling\n- Identifying challenges of using the data (e.g. skewness, outliers)\n- Developing an intuition for interpreting the results of future\n modeling\n\nThe intended audience for this tutorial are aspiring and practicing data\nscientists and analysts, or anyone who wants to be able to get insights\nout of data. Students must have at least an intermediate-level knowledge\nof Python and some familiarity with analyzing data would be beneficial.\nInstallation of Jupyter Notebook will be required (and potentially, we\nwill also demonstrate analysis in JupyterLab, if its development in the\nnext few months allows). Instructions will be sent on what packages to\ninstall beforehand.\n",
"duration": 10496,
"language": "eng",
"recorded": "2017-05-17",
"speakers": [
"Chloe Mawer",
"Jonathan Whitmore"
],
"thumbnail_url": "https://i.ytimg.com/vi/W5WE9Db2RLU/hqdefault.jpg",
"title": "Exploratory data analysis in python",
"videos": [
{
"type": "youtube",
"url": "https://www.youtube.com/watch?v=W5WE9Db2RLU"
}
]
}