# UDEMY COURSES DATA ANALYSIS REPORT

In this notebook, we will analyze Udemy course data. The project relies on the Udemy Courses Data 2023 provided by Kaggle. It answers five simple questions to show my proficiency in SQL for data analysis.

We will find answers to questions such as:

* How many instructors and courses are there in total?

* What are the 10 most popular courses? 

* What are the 10 most popular trainers? 

* What are the most viewed instructor's courses? 

* What are the 5 most popular data analysis courses?

Jupyter notebooks can be powerful tools to connect to your remote database. They allow you to streamline, replicate, and document your data. Python Database (DB) APIs are compatible with various databases, and in particular, Python supports relational database systems. In this stage, using a Jupyter notebook, we will briefly see how to connect to a PostgreSQL database, which is a popular open-source relational database, and how to make queries in a Jupyter Notebook using Python language.

In [1]:
pip install sqlalchemy

Note: you may need to restart the kernel to use updated packages.


In [2]:
import sqlalchemy

In [3]:
engine = sqlalchemy.create_engine('postgresql://postgres:postgres@localhost/archive')

In [4]:
pip install ipython-sql

Note: you may need to restart the kernel to use updated packages.


In [5]:
%load_ext sql

In [6]:
%env database_url=postgresql://postgres:postgres@localhost/archive

env: database_url=postgresql://postgres:postgres@localhost/archive


## How many instructors and courses are there in total?
Let's start looking for answers to our questions.
The first line of code will uniquely tell us how many instructors and courses there are in total.

In [7]:
%%sql 
SELECT COUNT(DISTINCT instructors_id) AS total_instructors, 
COUNT(id) AS total_courses 
FROM courses;

1 rows affected.


total_instructors,total_courses
32233,83104


## What are the 10 most popular courses?
When looking for the most popular courses, we'll consider those with statistically high reviews and above average ratings.

In [8]:
%%sql 
SELECT title, rating, num_reviews 
FROM courses
WHERE rating >= (SELECT avg(rating) FROM courses) 
ORDER BY num_reviews DESC
LIMIT 10;

 * postgresql://postgres:***@localhost/archive
10 rows affected.


title,rating,num_reviews
The Complete Python Bootcamp From Zero to Hero in Python,4.5927815,452973
Microsoft Excel - Excel from Beginner to Advanced,4.659531,357442
The Complete 2023 Web Development Bootcamp,4.667258,263152
The Web Developer Bootcamp 2023,4.6961474,254711
Angular - The Complete Guide (2023 Edition),4.5926924,180257
100 Days of Code: The Complete Python Pro Bootcamp for 2023,4.6952515,177568
Java Programming Masterclass updated to Java 17,4.5502763,177184
"React - The Complete Guide (incl Hooks, React Router, Redux)",4.609824,176452
Ultimate AWS Certified Solutions Architect Associate SAA-C03,4.70473,168032
The Complete JavaScript Course 2023: From Zero to Expert!,4.728533,167670


## What are the 10 most popular instructors?
We'll consider the number of reviews when querying the most popular instructors. Looking by the results, Jose Portilla makes a good difference in this regard.

In [9]:
%%sql 
SELECT i.title, c.num_reviews 
FROM courses c INNER JOIN instructors i ON i.id = c.instructors_id 
ORDER BY num_reviews DESC LIMIT 10;

 * postgresql://postgres:***@localhost/archive
10 rows affected.


title,num_reviews
Jose Portilla,452973
Kyle Pew,357442
Dr. Angela Yu,263152
Colt Steele,254711
Maximilian Schwarzmüller,180257
Dr. Angela Yu,177568
Tim Buchalka,177184
Academind by Maximilian Schwarzmüller,176452
"Stephane Maarek | AWS Certified Cloud Practitioner,Solutions Architect,Developer",168032
Jonas Schmedtmann,167670


## What are the most reviewed instructor's courses?
In this line we see all the courses of the most popular instructor, sorted by their rating. The ratings are pretty good too.

In [10]:
%%sql 
SELECT title,rating FROM courses
WHERE instructors_id = (SELECT i.id FROM courses c INNER JOIN instructors i ON i.id = c.instructors_id 
                        ORDER BY num_reviews DESC
                        LIMIT 1)
ORDER BY rating DESC;


 * postgresql://postgres:***@localhost/archive
38 rows affected.


title,rating
Python for Finance and Algorithmic Trading with QuantConnect,4.707246
Complete Tensorflow 2 and Keras Deep Learning Bootcamp,4.68416
Go Bootcamp: Master Golang with 1000+ Exercises and Projects,4.6828847
The Complete SQL Bootcamp: Go from Zero to Hero,4.6764865
PyTorch for Deep Learning with Python Bootcamp,4.67154
Master Git and GitHub in 5 Days: Go from Zero to Hero,4.667742
Python and Flask Bootcamp: Create Websites using Flask!,4.6664762
Python for Machine Learning & Data Science Masterclass,4.6310763
Probability and Statistics for Business and Data Science,4.630266
Math for Data Science Masterclass,4.629254


## What are the 5 most popular data analysis courses?
Here, when we query the most popular data analysis courses, we display the highest reviewed courses with a rating greater than 4,50.

In [11]:
%%sql 
SELECT title,num_reviews,rating 
FROM courses
WHERE title ILIKE '%Data Analysis%' AND rating > 4.5
ORDER BY num_reviews DESC
LIMIT 5;

 * postgresql://postgres:***@localhost/archive
5 rows affected.


title,num_reviews,rating
Microsoft Excel - Data Analysis with Excel Pivot Tables,46711,4.6264486
Data Analysis with Pandas and Python,18831,4.668358
Complete Introduction to Business Data Analysis,12161,4.602791
Data Analysis Essentials Using Excel,9904,4.5659075
SQL for Data Analysis: Beginner MySQL Business Intelligence,7412,4.651596
