# Udemy Course Analysis

**Datset used [Udemy Datset](https://www.kaggle.com/andrewmvd/udemy-courses)**

### Importing the necessary packages

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as  plt
import seaborn as sns

### Reading the dataset

In [None]:
df_udemy = pd.read_csv(r'./udemy_courses.csv',index_col = 0)

In [None]:
df_udemy.describe()

In [None]:
df_udemy.shape

In [None]:
df_udemy.columns

In [None]:
df_udemy.head()

### Data cleaning and preprocessing

In [None]:
df_udemy.isnull().sum()

In [None]:
df_udemy.select_dtypes('object').columns

In [None]:
df_udemy.drop(index=96698,axis=0,inplace=True)

In [None]:
plt.barh(df_udemy['subject'],df_udemy['num_subscribers'])
plt.xlabel('Number of subscribers')
plt.ylabel('Subject')
plt.show()

## 1. Which course overall has the maximum number of enrollments?

In [None]:
overall_max_enrolled_course = df_udemy['num_subscribers'].max()
overall_max_enrolled_course

In [None]:
df_udemy[df_udemy['num_subscribers']==overall_max_enrolled_course]

##### From the analysis, it is clear that "Learn HTML5 Programming from Scratch" is the highly enrolled course with 268932 subscribers in Udemy based on these 4 domains.

## 2. Which courses overall has the minimum subscribers/ minimum number of enrollment?

In [None]:
overall_min_enrolled_courses = df_udemy['num_subscribers'].min()
df_min = df_udemy[df_udemy['num_subscribers']==overall_min_enrolled_courses]
df_min

In [None]:
df_min.shape

In [None]:
plt.plot(df_min['subject'],df_min['num_subscribers'])
plt.ylabel('Number of subscribers')
plt.xlabel('Subject')
plt.show()

##### It is clear that 70 of 3683 courses are not enrolled by any number of subscribers. The number of subscribers is 0 in the case.

## 3. Which of the paid courses is maximum enrolled?

In [None]:
features_paid = (df_udemy['is_paid'] == 'True') | (df_udemy['is_paid']=='TRUE')
df_paid = df_udemy[features_paid]
df_paid.head()

In [None]:
df_paid.shape

In [None]:
plt.scatter(df_paid['subject'],df_paid['num_subscribers'])
plt.ylabel('Number of subscribers')
plt.xlabel('Subject')
plt.show()

In [None]:
max_enrolled_in_paid_courses = df_paid['num_subscribers'].max()
max_enrolled_in_paid_courses

In [None]:
df_paid[df_paid['num_subscribers']==max_enrolled_in_paid_courses]

##### There are 3372 paid courses in this dataset. Out of those, the most enrolled paid course, as of analysis, is "The Web Developer Bootcamp" where 121584 users have enrolled in this course which has 43 hours of duration with price 200

## 4. Which paid courses are minimum enrolled?

In [None]:
min_enrolled_in_paid_courses = df_paid['num_subscribers'].min()
min_enrolled_in_paid_courses

In [None]:
min_paid_courses = df_paid[df_paid['num_subscribers']==min_enrolled_in_paid_courses]
min_paid_courses

In [None]:
min_paid_courses.shape

##### There are 70 paid courses which are minimally enrolled. Here, the number of subscribers are 0 i.e. there are no subscribers for these 70 courses.

## 5. Which of the free courses are max enrolled?

In [None]:
features_free = (df_udemy['is_paid'] == 'False') | (df_udemy['is_paid']=='FALSE')
df_free = df_udemy[features_free]
df_free.head()

In [None]:
df_free.shape

In [None]:
plt.scatter(df_free['subject'],df_free['num_subscribers'])
plt.ylabel('Number of subscribers')
plt.xlabel('Subject')
plt.show()

In [None]:
max_enrolled_in_free_courses = df_free['num_subscribers'].max()
max_enrolled_in_free_courses

In [None]:
df_free[df_free['num_subscribers']==max_enrolled_in_free_courses]

##### From the insights, it is clearly seen that the "Learn HTML5 Programming From Scratch" is the free course which has the highest number of enrolments of 268923. Also, from the previous insight in 1, it is clear that this course has the highest number of enrollments overall.

## 6. Which of the free courses are min enrolled?

In [None]:
min_enrolled_in_free_courses = df_free['num_subscribers'].min()
min_enrolled_in_free_courses

In [None]:
df_free[df_free['num_subscribers']==min_enrolled_in_free_courses]

##### From the above insight, it is clear that the "Learn to Play Tabla - The Indian drums" is the minimally enrolled course with only one subscriber.

## 7. Which of the 4 subjects in this dataset has maximum and minimum enrollments?

In [None]:
df_subjectwise = df_udemy.groupby(['subject'],as_index=False)['num_subscribers'].sum()
df_subjectwise = pd.DataFrame(df_subjectwise)
df_subjectwise

In [None]:
plt.bar(df_subjectwise['subject'],df_subjectwise['num_subscribers'])
plt.xticks(rotation=90)
plt.ylabel('Number of subscribers')
plt.xlabel('Subject')
plt.show()

In [None]:
plt.plot(df_subjectwise['subject'],df_subjectwise['num_subscribers'])
plt.xticks(rotation=90)
plt.ylabel('Number of subscribers - levelwise')
plt.xlabel('Subject')
plt.show()

In [None]:
ms1 = df_subjectwise['num_subscribers'].max()
ms1

In [None]:
df_subjectwise[df_subjectwise['num_subscribers']==ms1]

In [None]:
ms2 = df_subjectwise['num_subscribers'].min()
ms2

In [None]:
df_subjectwise[df_subjectwise['num_subscribers']==ms2]

##### From the above insight, it is clear that "Web Development" has the most number of subscribers with total of 7980572 subscribers. And, the minimum enrollments are for the subject "Musical Instruments" which has 846689 subscribers

## 8. Which subject has the most high and low enrollment in free courses?

In [None]:
df_subjectwise_paidornot = df_udemy.groupby(['subject','is_paid'],as_index=False)['num_subscribers'].sum()
df_subjectwise_paidornot = pd.DataFrame(df_subjectwise_paidornot)
df_subjectwise_paidornot

In [None]:
df_notpaid_subjectwise = df_subjectwise_paidornot[(df_subjectwise_paidornot['is_paid']=='False') | (df_subjectwise_paidornot['is_paid']=='FALSE')]
df_notpaid_subjectwise

In [None]:
plt.bar(df_notpaid_subjectwise['subject'],df_notpaid_subjectwise['num_subscribers'])
plt.xticks(rotation=90)
plt.ylabel('Number of subscribers for free courses - subjectwise')
plt.xlabel('Subject')
plt.show()

In [None]:
msf1 = df_notpaid_subjectwise['num_subscribers'].max()
msf1

In [None]:
df_notpaid_subjectwise[df_notpaid_subjectwise['num_subscribers']==msf1]

In [None]:
msf2 = df_notpaid_subjectwise['num_subscribers'].min()
msf2

In [None]:
df_notpaid_subjectwise[df_notpaid_subjectwise['num_subscribers']==msf2]

##### It is clear from the above analysis that "Web Development" subject has most high enrolment for free courses with 2382741 subscribers. And, "Graphic Design" subject has low enrolment of free courses with 284821 subscribers.

## 9. Which subject has most high and low enrolment in paid courses?

In [None]:
df_paid_subjectwise = df_subjectwise_paidornot[(df_subjectwise_paidornot['is_paid']=='True') | (df_subjectwise_paidornot['is_paid']=='TRUE')]
df_paid_subjectwise

In [None]:
plt.bar(df_paid_subjectwise['subject'],df_paid_subjectwise['num_subscribers'])
plt.xticks(rotation=90)
plt.ylabel('Number of subscribers for paid courses - subjectwise')
plt.xlabel('Subject')
plt.show()

In [None]:
msp1 = df_paid_subjectwise['num_subscribers'].max()
msp1

In [None]:
df_paid_subjectwise[df_paid_subjectwise['num_subscribers']==msp1]

In [None]:
msp2 = df_paid_subjectwise['num_subscribers'].min()
msp2

In [None]:
df_paid_subjectwise[df_paid_subjectwise['num_subscribers']==msp2]

##### From the analytics, it is clear that "Web Development" subject has high enrolment of paid courses with 5597831 subscribers. And, the 'Musical Instruments' subject has low number of enrolment for paid courses with 541954 subscribers.

## 10. Which level courses has maximum users and minimum users subscribed?

In [None]:
df_levelwise = df_udemy.groupby(['level'],as_index=False)['num_subscribers'].sum()
df_levelwise = pd.DataFrame(df_levelwise)

In [None]:
df_levelwise

In [None]:
plt.bar(df_levelwise['level'],df_levelwise['num_subscribers'])
plt.xticks(rotation=90)
plt.ylabel('Number of subscribers - levelwise')
plt.xlabel('Level')
plt.show()

In [None]:
plt.plot(df_levelwise['level'],df_levelwise['num_subscribers'])
plt.xticks(rotation=90)
plt.ylabel('Number of subscribers - levelwise')
plt.xlabel('Level')
plt.show()

In [None]:
ml1 = df_levelwise['num_subscribers'].max()

In [None]:
df_levelwise[df_levelwise['num_subscribers']==ml1]

In [None]:
ml2 = df_levelwise['num_subscribers'].min()
ml2

In [None]:
df_levelwise[df_levelwise['num_subscribers']==ml2]

##### The maximum users have, thus, subscribed to the "All levels" category courses with 50196 number of users enrolled.

###### The "Expert level" courses has minimum number of subscribers with 50196 users enrolled.

## 11. Which course has been created but they have no content?

In [None]:
min_hrs = df_udemy['content_duration'].min()
min_hrs

In [None]:
min_hr_course = df_udemy[df_udemy['content_duration']==min_hrs]
min_hr_course

##### The "Mutual Funds for Investors in Retirement Accounts" course has been created but has no course content.

## 12. What does the users choose among paid and free courses?

In [None]:
plt.barh(df_udemy['is_paid'],df_udemy['num_subscribers'])
plt.xticks(rotation=90)
plt.xlabel('Number of subscribers')
plt.ylabel('Is Paid')
plt.show()

##### The above insight shows that the False value has the highest number of subscribers where the False represents that the courses are not paid. 

##### From the above insight, it is clear that the large number of users choose courses that are free more than the paid courses. This is also a good choice since there are large number of free courses which are of high quality and rich content.

## CONCLUSION:
  From the analysis made in this project, it is clear that "Web development" sector is on high demand and is the most learned online course on the Udemy platform irrespective of paid or free.
  
  
  It is also found that the users prefer courses that are free and are of normal level("All level") courses among all other categories.

### Thank you : )