Skip to content

marcoshsq/SQLBasicsForDataScience

Repository files navigation

sql

Learn SQL Basics for Data Science


This repository contains my studies of the SQL language, for that I did the Learn SQL Basics for Data Science specialization, available on Coursera, and some complementary materials.

About the Specialization:

There are 04 Courses in this Specialization:

  1. SQL for Data Science
  2. Data Wrangling, Analysis and AB Testing with SQL
  3. Distributed Computing with Spark SQL
  4. SQL for Data Science Capstone Project

Below are links to the directories with the projects and assignments developed, along with a brief description of what has been studied in the course.

Learning goals: Data Analysis, Apache Spark, Delta Lake, SQL, Data Science, Sqlite, A/B Testing, Query String, Predictive Analytics, Presentation Skills, creating metrics, Exploratory Data Analysis.

Projects, assignments and exercises:

Course 01 - SQL for Data Science.

Week 01: Getting Started and Selecting & Retrieving Data with SQL;
Week 02: Filtering, Sorting, and Calculating Data with SQL;
Week 03: Subqueries and Joins in SQL;
Week 04: Modifying and Analyzing Data with SQL.

This is the first course of the specialization, presenting the fundamentals of reading and manipulating data in SQL, during the course you are taught how to identify and write SQL queries to acquire results. Use SQL commands to filter, sort, and summarize data. Manipulation of Strings, dates and numeric data using functions to integrate data from different sources into fields with the correct format for analysis.

Learning goals: Data Science, Data Analysis, Sqlite, SQL.

Course 02 - Data Wrangling, Analysis and AB Testing with SQL.

Week 01: Data of Unknown Quality;
Week 02: Creating Clean Datasets;
Week 03: SQL Problem Solving;
Week 04: Case Study: AB Testing.

This is the second course of the specialization. The aim of this course is to deepen SQL skills, using four data science case studies, to practice validating and cleaning data, create a simple test framework to handle A/B Testing, and finally, use SQL to perform data analysis.

Learning goals: A/B Testing, Query String, Data Analysis, Predictive Analytics, SQL.

Course 03 - Distributed Computing with Spark SQL.

Week 01: Introduction to Spark;
Week 02: Spark Core Concepts;
Week 03: Engineering Data Pipelines;
Week 04: Data Lakes, Warehouses and Lakehouses.

This is the third specialization course and has a full focus on Big Data and Apache Spark.

Learning goals: Apache Spark, Delta Lake, SQL.

Course 04 - SQL for Data Science Capstone Project.

Project Proposal and Data Selection/Preparation;
Descriptive Stats & Understanding Your Data;
Beyond Descriptive Stats (Dive Deeper/Go Broader);
Presenting Your Findings (Storytelling).

the Capstone Project of the Specialization.

Learning goals: Presentation Skills, Data Analysis, SQL, creating metrics,Exploratory Data Analysis.