# Udemy

## General Info on Udemy:

Udemy, Inc. is an American massive open online course provider aimed at professional adults and students. 

It was founded in May 2010 by Eren Bali, Gagan Biyani, and Oktay Caglar.

As of Jan 2020, the platform has more than 35 million students and 57,000 instructors teaching courses in over 65 languages.

### Many learners have this doubt whether learning from Udemy will be effective and should one Enroll for courses there.

### In this Notebook, I have tried to clear few doubt with respect to Udemy, It will give you a bigger picture on Udemy courses!

![Udemy](https://lh3.googleusercontent.com/3bJYxTz2wadBVS21234Cl5l_Aksm04whiYa4KaWB8boywSfd1YN3LstlSGsA7oUpWZrx=s180-rw)

In [None]:
!pip install joypy -q
import joypy
import numpy as np # linear algebra
import plotly.express  as px
import plotly.graph_objects as go
fig = go.Figure()

import matplotlib.pyplot as plt
from matplotlib import cm
plt.style.use('ggplot')
import seaborn as sns
import plotly.io as pio
pio.templates.default = "plotly_dark"
sns.set_style('darkgrid')
%matplotlib inline

import cufflinks as cf
import plotly.offline
cf.go_offline()
cf.set_config_file(offline=False, world_readable=True)

import datetime as dt

import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
data = pd.read_csv('/kaggle/input/udemy-courses/udemy_courses.csv')
data.sample(5).reset_index(drop=True).style.set_properties(**{'background-color': '#161717','color': '#30c7e6','border-color': '#8b8c8c'})

## Data Cleaning

In [None]:
data[data['num_lectures']==0]

## It is funny how a course has '0' Number of Lectures!

In [None]:
## Removing Undesired Rows
data.drop([892], inplace = True)

In [None]:
Subject = pd.DataFrame(data['subject'].value_counts()).reset_index(drop = False)
fig = px.pie(Subject, values =Subject.subject, names = Subject['index'],
             title='Distribution of Various Courses on Udemy!')
fig.show()

<div class="alert alert-block alert-success">
<li>32.6% courses on the Udemy is for WEB DEVELOPMENT</li>
<li>32.5% courses on the Udemy is for BUSINESS FINANCE</li> 
<li>18.5% courses on the Udemy is for MUSICAL INSTRUMENTS</li>
<li>16.4% course on the Udemy is for GRAPHIC DESIGN</li>
</div>


In [None]:
#changing the 'published_timestamp' to correct Datatye
data['published_timestamp'] = pd.to_datetime(data['published_timestamp'])
data['year'] = data['published_timestamp'].dt.year

Year_wise = data.groupby('year')['course_id'].count().sort_values().reset_index()
Year_wise.rename({'course_id':'Number of Courses'},axis = 1, inplace = True)
fig = px.bar(Year_wise, y = 'Number of Courses', x = 'year', color = 'year')
fig.show()

## We can see a gradual Increase in people releasing their Courses over the peroid of time

<div class="alert alert-block alert-success">
    There was a sudden rise in the number of courses from 2014 to 2015.
</div>

## There was a 106.5% increase from 2014 to 2015

In [None]:
#plt.figure(figsize = (12,10))

fig = px.box(data,
       x='content_duration',
       y='is_paid',
       orientation='h',
       color='is_paid',
       title='Duration Distribution Across Type of Course',
       color_discrete_sequence=['#03cffc','#eb03fc']
      )

fig.update_layout(showlegend=False)
fig.update_xaxes(title='Content Duration')
fig.update_yaxes(title='Paid Course')
fig.show()

<div class="alert alert-block alert-info">
<b>We can see that paid courses have a higher duration, with an average of 2.5 hours, whereas free courses have a median of 1.5 hours. It is also worth noting that duration varies considerabily more on paid courses as well.
 </b>
</div>



In [None]:
fig = px.box(data,     
       x='content_duration',
       y='subject',
       orientation='h',
       color='is_paid',
       title='Duration Distribution Across Subject and Type of Course',
       color_discrete_sequence=['#03cffc','#eb03fc']
      )


fig.update_xaxes(title='Content Duration')
fig.update_yaxes(title='Course Subject')
fig.show()

<div class="alert alert-block alert-info">
<b>
    We can clearly see that Paid Courses have High Content Duration but the mean Duration of Paid to Free do not defer much!
 </b>
</div>



 # Box plot for Course Prices X Subject

In [None]:
fig = px.box(data,
      x = 'subject',
      y = 'price',
      hover_name = 'course_title',
      color = 'subject',
      title = 'Course Prices x Subject'
)
fig.show()


<div class="alert alert-block alert-success">
The Price Range of Business Finance and Web Development has very high Range!
</div>


# Price Distribution across various Subjects:

In [None]:
# Ridgeline Plot
fig = joypy.joyplot(data,
                    by      = 'subject',
                    column  = 'price',
                    figsize = (16,12),
                    grid    = 'both',
                    linewidth = 3,
                    colormap  = cm.winter,
                    fade      = True,
                    title     = 'Price Distribution Across Subjects',
                    overlap   = 2
                   )
plt.show()


<div class="alert alert-block alert-success">
Most of the Musical Instruments courses lie in the range of 0-50 and very few are actually quite costly!
</div>


# Top 25 most Famous Courses with Price

<div class="alert alert-block alert-info">
<b>With Respect to the Number of Subscribers!
 </b>
</div>



In [None]:
top25_paid = data.sort_values("num_subscribers", ascending=False)[0:25].sort_values("num_subscribers", ascending=True).reset_index(drop=True).reset_index(drop =True)
fig = px.bar(top25_paid,
       y = 'course_title',
       x= 'num_subscribers',
       orientation = 'h',
       color='num_subscribers',
      hover_data=['is_paid','num_reviews','num_lectures'])


fig.update_layout(showlegend=False)
fig.update_xaxes(title='Number of Subscribers')
fig.update_yaxes(title='Course Title',showticklabels=False)
fig.show()

# Top 25 Most Popular Free Courses 
<div class="alert alert-block alert-info">
<b>With Respect to the Number of Subscribers!
 </b>
</div>


In [None]:
Unpaid = data[data['is_paid']==False]

top25_free = Unpaid.sort_values("num_subscribers", 
                                ascending=False)[0:25].sort_values("num_subscribers", ascending=True).reset_index(drop=True).reset_index()
fig = px.bar(top25_free,
       y = 'course_title',
       x= 'num_subscribers',
       orientation = 'h',
       color='num_subscribers',
      hover_data=['num_reviews','num_lectures','year'])


fig.update_layout(showlegend=False)
fig.update_xaxes(title='Number of Subscribers')
fig.update_yaxes(title='Course Title',showticklabels=False)
fig.show()

# Top 10 courses for Subject Wise:
<div class="alert alert-block alert-info">
<b>With Respect to the Number of Subscribers!
 </b>
</div>


## Top 10 Web Development Courses on Udemy:

In [None]:
Web = data[data['subject']=='Web Development']
top_web = Web.sort_values("num_subscribers", 
                                ascending=False)[0:10].sort_values("num_subscribers", ascending=True).reset_index(drop=True).reset_index()

fig = px.bar(top_web,
       y = 'course_title',
       x= 'num_subscribers',
       orientation = 'h',
       color='num_subscribers',
      hover_data=['num_reviews','num_lectures','year','url'])


fig.update_layout(showlegend=False)
fig.update_xaxes(title='Number of Subscribers')
fig.update_yaxes(title='Course Title',showticklabels=False)
fig.show()

## Top 10 Business Finance Courses on Udemy

In [None]:
Bus = data[data['subject']=='Business Finance']

top_web = Bus.sort_values("num_subscribers", 
                                ascending=False)[0:10].sort_values("num_subscribers", ascending=True).reset_index(drop=True).reset_index()

fig = px.bar(top_web,
       y = 'course_title',
       x= 'num_subscribers',
       orientation = 'h',
       color='num_subscribers',
      hover_data=['num_reviews','num_lectures','year','url'])


fig.update_layout(showlegend=False)
fig.update_xaxes(title='Number of Subscribers')
fig.update_yaxes(title='Course Title',showticklabels=False)
fig.show()

## Top 10 Graphic Design courses on Udemy:

In [None]:
Graphic = data[data['subject']=='Graphic Design']

top_graph = Graphic.sort_values("num_subscribers", 
                                ascending=False)[0:10].sort_values("num_subscribers", ascending=True).reset_index(drop=True).reset_index()

fig = px.bar(top_graph,
       y = 'course_title',
       x= 'num_subscribers',
       orientation = 'h',
       color='num_subscribers',
      hover_data=['num_reviews','num_lectures','year','url'])


fig.update_layout(showlegend=False)
fig.update_xaxes(title='Number of Subscribers')
fig.update_yaxes(title='Course Title',showticklabels=False)
fig.show()

## Top 10 courses on Musical Instruments:

In [None]:
Music = data[data['subject']=='Musical Instruments']

top_music= Music.sort_values("num_subscribers", 
                                ascending=False)[0:10].sort_values("num_subscribers", ascending=True).reset_index(drop=True).reset_index()

fig = px.bar(top_music,
       y = 'course_title',
       x= 'num_subscribers',
       orientation = 'h',
       color='num_subscribers',
      hover_data=['num_reviews','num_lectures','year','url'])


fig.update_layout(showlegend=False)
fig.update_xaxes(title='Number of Subscribers')
fig.update_yaxes(title='Course Title',showticklabels=False)
fig.show()

## Correlation: Price, Reviews, subscribers, duration

In [None]:
plt.figure(figsize = (10,7))
f = data[['num_reviews','price','num_subscribers','content_duration']].corr()
sns.heatmap(f, annot=True)

### Lets plot the Graph for the most correlated features:
- Price vs Content Duration
- Price vs Number of Subscribers
- Number of Reviews vs Content Duration

# Graph: Price Vs Content Duration

In [None]:
fig = px.scatter(data,x = data['price'], y = data['content_duration'],
           hover_data = ['course_title'],color=data["subject"])

fig.update_xaxes(title='Price')
fig.update_yaxes(title='Content Duration',showticklabels=False)
fig.show()

## Note from 'Price Vs Content Duration' plot:

<div class="alert alert-block alert-info">
<b><li>
    We see that the Content Duration for Web Development with Higher prices is High.
</li>
    <li>
        In case of Graphic Design Content Duration is High within lesser price
    </li>
 </b>
</div>

# Graph:Number of Reviews vs Number of Subscribers

In [None]:
fig = px.scatter(data,x = data['num_reviews'], y = data['num_subscribers'],
           hover_data = ['course_title'],color=data["subject"])

fig.update_xaxes(title='Number of Reviews')
fig.update_yaxes(title='Number of Subscribers',showticklabels=False)
fig.show()

## Note from 'Number of Reviews vs Number of Subscribers'

<div class="alert alert-block alert-info">
<b><li>
    We see the majority of Reviews is Highest for Web Development and has Good Number of Subscribers.
</li>
    <li>
        We see that Not many people Subscribe or Review a course.
    </li>
 </b>
</div>


# Graph for 'Price vs Number of Subscribers'

In [None]:
fig = px.scatter(data, x = data['price'], y = data['num_subscribers'],
                 hover_data = ['course_title'],
              color=data["subject"])

fig.update_xaxes(title='Price of a Course')
fig.update_yaxes(title='Number of Subscribers',showticklabels=False)
fig.show()

## Note from 'Price vs Number of Subscriber'
<div class="alert alert-block alert-info">
<b>
     We see that the highest number of Subscribers are from courses that are Unpaid
 </b>
</div>


## I hope this Notebook, gave you a Good glance on Courses on Udemy and whether you should Enroll yourself in one!