End to End Data Science Project using the Udemy Dataset

This repository contains the data analytics insights of the Udemy courses datasets of 4 selective major domains.

This dataset is taken from the Kaggle website.

The link is below. https://www.kaggle.com/andrewmvd/udemy-courses

This dataset includes 3683 courses from Udemy in 4 areas: business finance, graphic design, musical instruments, and web design. Udemy is an online platform for massive open online courses (MOOCs) that has both free and paid courses. Udemy's business model is that anyone can make a course, which is how it has grown to have hundreds of thousands of courses. Online courses and digital learning are becoming more and more popular these days. And more students, teachers, and even professionals are taking classes online through sites like Udemy, Coursera, and so on. So, this data analysis is done to figure out how many people sign up for courses on the Udemy platform.

From the insights developed, I answer the following questions:

Questions

Course Title
- What is the most frequent words in course title
- Longest/Shortest course title
- How can we build recommendation systems via title using similarity
- Most famous courses by number of subscribers
Subjects/Category
- What is the distribution of subjects
- How many courses per subject
- Distribution of subjects per year
- How many people purchase a particular subject
- Which subject is the most popular
Published Year
- Number of courses per year
- Which year has the highest number of courses
- What is the trend of courses per year
Levels
- How many levels do we have
- What is the distribution of courses per levels
- Which subject have the highest levels
- How many subscribers per levels
- How many courses per levels
Duration of Course
- Which courses have the highest duration (paid or not)
- Which courses have higher duration
- Duration vs number of subscribers
Subscribers
- Which course have the highest number of subscribers
- Average number of subscribers
- Number of subscribers per Subject
- Number of subscribers per year
Price
- What is the average price of a course
- What is the min/max price
- How much does Udemy earn
- The most profitable courses
Correlation Questions
- Does number of subscribers depend on
- number of reviews
- price
- number of lectures
- content duration

Insights are developed to answer all the above questions with the help of pandas, numpy and matplotlib framework.

I also performed Keyword extraction to remove stopwords. Stopword is a word that is automatically omitted from a computer-generated concordance or index.Examples of stop words in English are “a”, “the”, “is”, “are” and etc. Stop words are commonly used in Text Mining and Natural Language Processing (NLP) to eliminate words that are so commonly used that they carry very little useful information.

Libraries used: pandas, numpy, matplotlib, seaborn, warnings, datetime, neattext, counter, rake.

If you find this insightful, feel free to star it. Any issues can be notified to me.

If you wanna work with this analysis, you can:

Clone the repository, or Fork the repository. Then, can make changes as you wish.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
End2End_DatascienceProjectWithUdemy.ipynb		End2End_DatascienceProjectWithUdemy.ipynb
README.md		README.md
udemy_courses.csv		udemy_courses.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

End to End Data Science Project using the Udemy Dataset

Questions

About

Releases

Packages

Languages

mahikkaaa/End_2_End_Data_Science_Project

Folders and files

Latest commit

History

Repository files navigation

End to End Data Science Project using the Udemy Dataset

Questions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages