Skip to content
Permalink
master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
# Talk details are specified in YAML files
# YAML was selected because we can use multi-line strings and add
# comments in the file.
speaker_name: "Serena Peruzzo"
talk_title: "From hot mess to information, or why you should spend more time processing your data"
# At least 1 tag is necessary!!
talk_tags:
- "Community, Social, Ethics, and Education"
- "Machine Learning & Data Science"
talk_abstract: "Over the last few years machine learning has drawn a lot of attention from both inside and outside the data science community. The internet is flooded with articles on the latest or coolest algorithms. What these articles often don’t cover is that at the beginning of your project, you'll be spending a lot of time collecting, cleaning and otherwise pre-processing your data, no matter what type of project or model you’re working on. There’s a tendency to dismiss this first stage as mundane, but this couldn’t be further from the truth. This first, exploratory, stage of the analysis is when you'll learn most about the information that is available for solving your problem and how to harness it. In this talk, I’ll use practical examples to describe some of the statistical techniques that I've found most useful over the years. For instance, box plots offer a simple way to detect outliers and inconsistencies. Others, like imputation, are more complex and can even leverage machine learning. These methods can be combined in multiple ways to create useful representations of data, making building a good model a whole lot easier."
about_author: "Serena is a senior data scientist at the analytics consultancy Bardess, currently based in Toronto, Canada. Before joining Bardess, she has worked both in academia as an ML researcher and in the industry as a data science consultant on the Australian, British and Canadian markets. Serena is passionate about education, community and tech for good and she splits her free time between mentoring data science students, organizing meetups and volunteering."
talk_metadata:
- "**Date:** Saturday Nov. 16"
- "**Location:** Sky Room"
- "**Begin time:** 15:00"
- "**Duration:** 25 minutes"