GitHub - iemad406/Udemy-Data-Analysis-using-python

Project Overview This project provides a comprehensive analysis of a Udemy course dataset, aiming to uncover trends in pricing, enrollment, revenue, and content characteristics. The analysis was conducted using a dual-approach: Python for automated data processing and visualization, and Excel for statistical modeling, correlation analysis, and content classification.

Dataset Description The dataset includes 3,672 course records with the following key attributes:

Course Metadata: Title, ID, URL, Subject, and Level.

Engagement Metrics: Number of subscribers and reviews.

Course Specifics: Price, content duration (in hours), number of lectures, and publication timestamp.

Financial Data: Calculated revenue based on price and subscriber count.

Tools & Technologies Python:

Pandas & Numpy: Data cleaning, manipulation, and feature engineering.

Matplotlib & Seaborn: Exploratory Data Analysis (EDA) and data visualization.

Excel:

Statistical reporting and summary metrics.

Correlation analysis and classification modeling.

Key Analysis Components

Data Cleaning & Preprocessing (Python) Duplicate Removal: Identified and removed duplicate course entries to ensure data integrity.

Type Conversion: Converted published_timestamp to datetime objects to extract publication year, date, and time.

Feature Engineering: Created a revenue column by calculating the product of price and num_subscribers.

Statistical Analysis & Classification (Excel) Revenue Summary: - Total Revenue generated across all courses: ~881,674,940

Average Content Duration: ~4.10 hours

Correlations:

Price vs. Subscribers: Found a very weak positive correlation (0.05), suggesting that price is not a primary driver for enrollment volume.

Price vs. Reviews: Analyzed the relationship between course cost and user feedback frequency.

Content Duration Classification: Categorized courses based on their duration using Mean (4.10) and Standard Deviation (6.06):

Normal Content: Courses within a standard duration range (~92.1% of the dataset, 3,383 courses).

Long Content: Courses significantly exceeding the mean duration (~7.87% of the dataset, 289 courses).

Exploratory Data Analysis (Python) Yearly Revenue Trends: Visualized how total revenue evolved over time using line and bar charts.

Subject Analysis: Analyzed revenue distribution across different subjects (e.g., Web Development, Business Finance) to identify high-performing categories.

Market Share: Used pie charts to visualize the percentage contribution of each year and subject to the total revenue.

Summary of Insights Top Performing Year: The analysis identifies the specific year with peak revenue and course publication activity.

Content Strategy: Most courses on the platform follow a "Normal" duration (~4 hours), indicating a preference for concise, focused learning modules.

Revenue Drivers: While individual course prices vary, the bulk of revenue is driven by high-subscriber counts in specific high-demand subjects like Web Development.

Project Structure udemy_courses_analysis.py: Main script for data cleaning, transformation, and plotting.

Report.csv: Summary of high-level project metrics.

Correlations.csv: Detailed statistical correlations between price, engagement, and duration.

ClassifyContentDuration.csv: Classification logic and results for course lengths.

How to Run Ensure Python 3.x is installed along with pandas, matplotlib, and seaborn.

Place the dataset udemy_online_education_courses_dataset.csv in the project directory.

Run the analysis script:

Bash python udemy_courses_analysis_second_project.py

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
udemy-analysis-using-spreadsheet.xlsx		udemy-analysis-using-spreadsheet.xlsx
udemy_courses_analysis_second_project.py		udemy_courses_analysis_second_project.py
udemy_online_education_courses_dataset.csv		udemy_online_education_courses_dataset.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages