![logo](https://user-images.githubusercontent.com/8652642/113849287-020f8600-97a2-11eb-9430-3c7af8823cf9.png)
<hr style="margin-bottom: 40px;">

## Udemy Course Analysis

This dataset contains all courses offered by Udemy

![purple_divider](https://user-images.githubusercontent.com/8652642/113848477-2f0f6900-97a1-11eb-8d8f-f30fb9e8433d.png)


In [12]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sqlite3

%matplotlib inline

In [13]:
df = pd.read_csv('data/data.csv')

![green_divider](https://user-images.githubusercontent.com/8652642/113848708-6aaa3300-97a1-11eb-8dd0-e1c60c5ab0fc.png)


### Overview of the data

In [14]:
df.head()

Unnamed: 0,course_id,course_title,is_paid,price,num_subscribers,num_reviews,num_lectures,level,content_duration,published_timestamp,subject
0,288942,#1 Piano Hand Coordination: Play 10th Ballad i...,True,35,3137,18,68,All Levels,1.5 hours,2014-09-18T05:07:05Z,Musical Instruments
1,1170074,#10 Hand Coordination - Transfer Chord Ballad ...,True,75,1593,1,41,Intermediate Level,1 hour,2017-04-12T19:06:34Z,Musical Instruments
2,1193886,#12 Hand Coordination: Let your Hands dance wi...,True,75,482,1,47,Intermediate Level,1.5 hours,2017-04-26T18:34:57Z,Musical Instruments
3,1116700,#4 Piano Hand Coordination: Fun Piano Runs in ...,True,75,850,3,43,Intermediate Level,1 hour,2017-02-21T23:48:18Z,Musical Instruments
4,1120410,#5 Piano Hand Coordination: Piano Runs in 2 ...,True,75,940,3,32,Intermediate Level,37 mins,2017-02-21T23:44:49Z,Musical Instruments


In [15]:
df.describe()

Unnamed: 0,course_id,num_subscribers,num_reviews,num_lectures
count,3682.0,3682.0,3682.0,3682.0
mean,676612.1,3194.23031,156.093156,40.065182
std,343635.5,9499.378361,934.957204,50.373299
min,8324.0,0.0,0.0,0.0
25%,407843.0,110.25,4.0,15.0
50%,688558.0,911.5,18.0,25.0
75%,961751.5,2540.25,67.0,45.0
max,1282064.0,268923.0,27445.0,779.0


In [16]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3682 entries, 0 to 3681
Data columns (total 11 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   course_id            3682 non-null   int64 
 1   course_title         3682 non-null   object
 2   is_paid              3682 non-null   bool  
 3   price                3682 non-null   object
 4   num_subscribers      3682 non-null   int64 
 5   num_reviews          3682 non-null   int64 
 6   num_lectures         3682 non-null   int64 
 7   level                3682 non-null   object
 8   content_duration     3682 non-null   object
 9   published_timestamp  3682 non-null   object
 10  subject              3682 non-null   object
dtypes: bool(1), int64(4), object(6)
memory usage: 291.4+ KB


In [17]:
df.shape

(3682, 11)

![green_divider](https://user-images.githubusercontent.com/8652642/113848708-6aaa3300-97a1-11eb-8dd0-e1c60c5ab0fc.png)



### 1. What are all the different subjects offered by Udemy?

In [19]:
df['subject'].unique()

array(['Musical Instruments', 'Business Finance', 'Graphic Design',
       'Web Development'], dtype=object)

![green_divider](https://user-images.githubusercontent.com/8652642/113848708-6aaa3300-97a1-11eb-8dd0-e1c60c5ab0fc.png)



### 2. Which subject has the max number of courses?

In [21]:
df['subject'].value_counts()

Web Development        1200
Business Finance       1199
Musical Instruments     680
Graphic Design          603
Name: subject, dtype: int64

In [22]:
df['subject'].max()

'Web Development'

![green_divider](https://user-images.githubusercontent.com/8652642/113848708-6aaa3300-97a1-11eb-8dd0-e1c60c5ab0fc.png)



### 3. Which courses are free?

In [44]:
df[df['is_paid'] == False].head(2)

Unnamed: 0,course_id,course_title,is_paid,price,num_subscribers,num_reviews,num_lectures,level,content_duration,published_timestamp,subject
41,286070,5 lecciones que todo guitarrista debe tomar,False,Free,4452,263,14,Beginner Level,1 hour,2014-08-23T05:08:14Z,Musical Instruments
49,696630,"7 Ways A Beginner Guitarist Can Sound Better, ...",False,Free,4529,193,7,Beginner Level,36 mins,2015-12-21T18:50:50Z,Musical Instruments


In [45]:
df[df['is_paid'] == False].count()

course_id              310
course_title           310
is_paid                310
price                  310
num_subscribers        310
num_reviews            310
num_lectures           310
level                  310
content_duration       310
published_timestamp    310
subject                310
dtype: int64

![green_divider](https://user-images.githubusercontent.com/8652642/113848708-6aaa3300-97a1-11eb-8dd0-e1c60c5ab0fc.png)



### 4. Show all the courses which are paid?

In [46]:
df[df['is_paid'] == True].count()

course_id              3372
course_title           3372
is_paid                3372
price                  3372
num_subscribers        3372
num_reviews            3372
num_lectures           3372
level                  3372
content_duration       3372
published_timestamp    3372
subject                3372
dtype: int64

![green_divider](https://user-images.githubusercontent.com/8652642/113848708-6aaa3300-97a1-11eb-8dd0-e1c60c5ab0fc.png)



### 5. Top selling courses?

In [47]:
df['num_subscribers'].sort_values(ascending = False)

2230    268923
776     161029
3385    121584
640     120291
3316    114512
         ...  
3328         0
910          0
3261         0
3259         0
649          0
Name: num_subscribers, Length: 3682, dtype: int64

In [48]:
df.sort_values('num_subscribers').tail(2)

Unnamed: 0,course_id,course_title,is_paid,price,num_subscribers,num_reviews,num_lectures,level,content_duration,published_timestamp,subject
776,59014,Coding for Entrepreneurs Basic,False,Free,161029,279,27,Beginner Level,3.5 hours,2013-06-09T15:51:55Z,Web Development
2230,41295,Learn HTML5 Programming From Scratch,False,Free,268923,8629,45,All Levels,10.5 hours,2013-02-14T07:03:41Z,Web Development


![green_divider](https://user-images.githubusercontent.com/8652642/113848708-6aaa3300-97a1-11eb-8dd0-e1c60c5ab0fc.png)



### 6. Least selling courses?

In [49]:
df['num_subscribers'].sort_values(ascending = True)

649          0
3259         0
3261         0
910          0
3328         0
         ...  
3316    114512
640     120291
3385    121584
776     161029
2230    268923
Name: num_subscribers, Length: 3682, dtype: int64

In [50]:
df.sort_values('num_subscribers').head(2)

Unnamed: 0,course_id,course_title,is_paid,price,num_subscribers,num_reviews,num_lectures,level,content_duration,published_timestamp,subject
649,1233314,Building a Balanced Scorecard,True,50,0,0,11,Intermediate Level,2 hours,2017-07-03T21:38:22Z,Business Finance
3259,1232282,The Cash Flow Statement - An Introduction,True,50,0,0,10,Beginner Level,1.5 hours,2017-06-28T16:05:51Z,Business Finance


![green_divider](https://user-images.githubusercontent.com/8652642/113848708-6aaa3300-97a1-11eb-8dd0-e1c60c5ab0fc.png)



### 7. Show all courses of Grahic Design where the price is below 100?

![green_divider](https://user-images.githubusercontent.com/8652642/113848708-6aaa3300-97a1-11eb-8dd0-e1c60c5ab0fc.png)



### 8. List out all the courses that are related with Python?

![green_divider](https://user-images.githubusercontent.com/8652642/113848708-6aaa3300-97a1-11eb-8dd0-e1c60c5ab0fc.png)



### 9. What are the courses published in year 2015

![green_divider](https://user-images.githubusercontent.com/8652642/113848708-6aaa3300-97a1-11eb-8dd0-e1c60c5ab0fc.png)



### 10. What are the Max. number of subscribers for each level of course

![purple_divider](https://user-images.githubusercontent.com/8652642/113848477-2f0f6900-97a1-11eb-8d8f-f30fb9e8433d.png)
