Welcome to the 2024 edition of the Exploratory Data Analysis (EDA) course, part of the Specialization in Economics with a Data Science option. This course was taught in Spanish and provides students with the necessary tools to perform exploratory analysis on various types of datasets, covering a broad range of techniques from basic statistics to advanced EDA methods such as time series and text data analysis.
Guillermo Lezama
Email: guillermo.lezama@cienciassociales.edu.uy
This course is designed to guide students through the process of exploratory data analysis, focusing on real-world datasets and problems. Throughout the course, students gained practical experience in data cleaning, visualization, transformation, and analysis using Google Colab for interactive notebooks. The course is structured into 10 classes, each with its own notebook that introduces specific topics and techniques.
Additionally, each folder contains a set of slides that were used during the corresponding class.
There is a folder called homework with the five homework assignments and the final project (in spanish).
Content: Introduction to EDA principles through voter turnout and electoral data.
Goal: Teach students how to conduct initial exploratory analysis and visualizations on datasets related to election results.
Content: Repetition of basic EDA steps using a COVID-19 dataset.
Goal: Strengthen students' skills in summary statistics and handling missing data.
Content: EDA in a marketing context, exploring customer personality traits and preferences.
Goal: Teach students how to uncover insights from customer data through visualizations and correlations.
Content: Review and consolidation of EDA concepts covered in the first three classes.
Goal: Reinforce students' ability to apply EDA techniques independently.
Content: Visual exploration of relationships between variables using the Iris dataset.
Goal: Teach students how to use visual tools to identify relationships and insights.
Content: Analysis of the Titanic dataset, focusing on survival rates by various categories (e.g., class, gender, age).
Goal: Demonstrate how to analyze categorical and numerical variables using grouping and aggregation techniques.
Content: Analysis of Amazon customer reviews using text data analysis.
Goal: Introduce basic natural language processing (NLP) techniques to explore customer sentiment and patterns.
Content: Exploratory analysis of song lyrics to identify themes and similarities between songs.
Goal: Teach students how to analyze textual data and create visualizations such as word clouds.
Content: Time series analysis of U.S. inflation data.
Goal: Introduce time series EDA, focusing on trends, seasonal patterns, and shocks.
Content: Introduction to SQL and PySpark for database querying and large-scale data processing.
Goal: Equip students with skills to handle large datasets efficiently using SQL and distributed computing tools like PySpark.
- Mode of Instruction: In-person / Hybrid
- Credits: 4
- Hours: 20 hours of in-person instruction, 40 hours of independent work
- Prerequisites: Basic Python, Jupyter Notebook, Basic Statistics
- Platform: Google Colab
- Final Project (60%): Apply EDA techniques to a given dataset and present findings.
- Classwork (40%): Practical exercises assigned throughout the course.
The syllabus for the course is available in two versions:
- Syllabus.pdf - Spanish
- Syllabus_in_English.pdf - English
While no specific textbook is required, the following resources will be referenced:
- Python for Data Analysis by Wes McKinney
- Python Data Science Handbook by Jake VanderPlas
- Learning SQL by Alan Beaulieu
- Practical Statistics for Data Scientists by Peter Bruce and Andrew Bruce
- Introduction to Time Series Forecasting with Python by Jason Brownlee
- The Effect: An Introduction to Research Design and Causality by Nick Huntington-Klein
Feel free to explore each notebook and the slides within each folder to learn more about the specific techniques and topics covered in the course. For any questions or clarifications, don't hesitate to reach out to me via email.
Happy coding and exploring!