<h1><center>Udemy Courses Analysis<center>

<center>Analysis of data gathered from Udemy Courses. Link is available at https://www.kaggle.com/andrewmvd/udemy-courses?select=udemy_courses.csv<center>

### I. Importing modules and libraries

For this step, I imported panda, numpy, and plotly modules. After setting up the modules, the csv file was also imported. 

In [9]:
import pandas as pd
import numpy  as np
import plotly.express  as px
import matplotlib.pyplot as plt
import plotly.graph_objects as go

from matplotlib import cm
plt.style.use('ggplot')

In [10]:
import os
import csv

os.chdir("C:\\Users\\maegr\Documents\Maymay\DataScience\modules")

#Opens the data file as df
with open("udemy_courses.csv", "r") as courses_df:
    courses_df = pd.read_csv('udemy_courses.csv')

### II. Describing the Courses

Instead of getting the statistical values separately, the **describe** syntax was used so that it would be easier to analyze the data.

In [3]:
courses_df.describe()

Unnamed: 0,course_id,price,num_subscribers,num_reviews,num_lectures,content_duration
count,3678.0,3678.0,3678.0,3678.0,3678.0,3678.0
mean,675972.0,66.049483,3197.150625,156.259108,40.108755,4.094517
std,343273.2,61.005755,9504.11701,935.452044,50.383346,6.05384
min,8324.0,0.0,0.0,0.0,0.0,0.0
25%,407692.5,20.0,111.0,4.0,15.0,1.0
50%,687917.0,45.0,911.5,18.0,25.0,2.0
75%,961355.5,95.0,2546.0,67.0,45.75,4.5
max,1282064.0,200.0,268923.0,27445.0,779.0,78.5


#### Conclusions: 
1. If someone is planning to take a Udemy course, it must be noted that the average price of Udemy Courses is **66.049 USD**. Some of the courses are entirely **free**, while others could go as high as **200 USD**
2. The average number of subcsribers per course is **3197**
3. Some courses could have as high as **27455** reviews while some courses don't have any review at all.
3. In order to finish a course, one must spend an average of **4.09 hours**. 

### III. Amount of Courses by Subject

The pie chart below describes the distribution of the amount of courses per subject

In [11]:
temp_df = pd.DataFrame(courses_df['subject'].value_counts()).reset_index()

fig = go.Figure(data=[go.Pie(labels = temp_df['index'],
                             values = temp_df['subject'],
                             hole = .7,
                             title = '% of Courses by Subject',
                             marker_colors = px.colors.sequential.Blues_r,
                            )
                     
                     ])
fig.update_layout(title = 'Amount of Courses by Subject')
fig.show()

#### Conclusions: 
1. About 32.6% of lessons from Udemy is about Web Development, which makes it the most popular subject in the site. 
2. The Web Development course is followed closely by Business Finance with around 32.5%. 
3. Musical Instruments is at third, while Graphic Design is placed at last.

### IV. Duration of contents across subjects and the type of course
The table below shows the distribution of the duration of the courses. It is separated into free and paid courses.

In [14]:
fig = px.box(courses_df,
       x = 'content_duration',
       y = 'subject',
       orientation = 'h',
       color = 'is_paid',
       title = 'Duration Distribution Across Subject and Type of Course',
       color_discrete_sequence = ['#03cffc','#eb03fc']
      )

fig.update_xaxes(title = 'Content Duration')
fig.update_yaxes(title = 'Course Subject')
fig.show()

#### Conclusions:
1. Based from the graph above, we can infer that the free courses typically lasts for less than 10 hours. 
2. Paid courses is around 15 hours to 30 hours long.
3. There are some courses that lasts for as long as 70 hours.

### V. Price distribution 
The table below shows the price distribution per type of course

In [26]:
paid_courses_df = courses_df.query("price != 0")
fig = px.box(paid_courses_df,
             x = 'subject',
             y = 'price',
             color = 'subject',
             title = 'Course Prices x Subject',
             color_discrete_sequence = ['#03cffc','#0362fc','#eb03fc','#0ecc83'],
             hover_name = 'course_title',
            )

fig.update_layout(showlegend = False)
fig.update_yaxes(range = [0,220], title = 'Course Price')
fig.update_xaxes(title = 'Course Subject')
fig.show()

Conclusions:
1. Business Finance courses usually cost around 20-90 USD.
2. Most Graphic Design courses cost around 20-80 USD.
3. Most of Musical Instrument courses cost around 20-50 USD, but some courses cost more than 100 USD.
4. Web Development courses cost around 20-120 USD
5. The most expensive course in Musical Instruments is "Bones of the Blues - learn 4 cool tunes to expert level now!". For graphic design, the most expensive course is "Adobe Photoshop: Complete Beginner".

### VI. Most Popular Paid Courses
The table below shows the most popular paid courses in Udemy based on the number of subscribers

In [27]:
top25_paid = paid_courses_df.sort_values("num_subscribers", ascending=False)[0:25].sort_values("num_subscribers", ascending=True).reset_index(drop=True).reset_index()
fig = px.bar(top25_paid,
               y = 'index',
               x = 'num_subscribers',
               orientation = 'h',
               color = 'num_subscribers',
               hover_name  = 'course_title',
               title = 'Top 25 Most Popular Courses (by number of subscribers)',
               opacity = 0.8,
               color_continuous_scale = px.colors.sequential.ice,
               height = 800,
              )

fig.update_layout(showlegend = False)
fig.update_xaxes(title = 'Number of Subscribers')
fig.update_yaxes(title = 'Course Title',showticklabels=False)
fig.show()

#### Conclusions:
1. The most popular paid course is "The Web Developer Bootcamp". 
2. 21 out of 25 most popular paid courses is about Web Development. From this and the conclusion made earlier, it is possible that many udemy users are interested in learning about Web Development.
3. Those most popular paid course in musical instruments is about learning to play piano and guitar.

## Most Popular Free Courses

In [28]:
free_courses_df = courses_df.query("price == 0")
top25_free = free_courses_df.sort_values("num_subscribers", ascending=False)[0:25].sort_values("num_subscribers", ascending=True).reset_index(drop=True).reset_index()
fig = px.bar(top25_free,
               y = 'index',
               x = 'num_subscribers',
               orientation = 'h',
               color = 'num_subscribers',
               hover_name = 'course_title',
               title = 'Top 25 Most Popular Courses (by number of subscribers)',
               opacity = 0.8,
               color_continuous_scale = px.colors.sequential.Aggrnyl,
               height = 800,
              )

fig.update_layout(showlegend = False)
fig.update_xaxes(title = 'Number of Subscribers')
fig.update_yaxes(title = 'Course Title',showticklabels = False)
fig.show()

#### Conclusions:
1. The most popular free course is "Learn HTML5 Programming from scratch". 
2. 21 out of 25 most popular free courses is about Web Development. 
3. Those most popular free course in musical instruments is about learning to play electric guitar, titled "Free Beginner Electric Guitar Lessons"

### VIII. Highest reviewed course

In [37]:
num_reviews_df = courses_df.query("num_reviews != 0")
top25_reviewed = num_reviews_df.sort_values("num_reviews", ascending=False)[0:25].sort_values("num_reviews", ascending=True).reset_index(drop=True).reset_index()
top25_reviewed.max()

index                                                                 24
course_id                                                         995016
course_title           Web Design for Web Developers: Build Beautiful...
url                          https://www.udemy.com/web-developer-course/
is_paid                                                             True
price                                                                200
num_subscribers                                                   268923
num_reviews                                                        27445
num_lectures                                                         362
level                                                 Intermediate Level
content_duration                                                    43.0
published_timestamp                                 2016-11-08T18:55:21Z
subject                                                  Web Development
dtype: object

#### Conclusions:
1. The highest reviewed course is about Web Designing under the subject Web Development. 
2. At the time of the csv file upload, the title of the course is "Web Design for Web Developers" but currently, it is titled as "Ultimate Web Designer & Web Developer Course for 2021" 
3. It takes 43 hours to finish the course. 
4. It costs around 200 USD before but currently, it is priced at 89.99 USD.
5. The course is good for Intermediate Level learners

To check the course, visit the link https://www.udemy.com/web-developer-course/

## General Conclusions

Based from the data provided above, we can conclude the following:

1. Web Development is the most popular paid and free course in Udemy.
2. If someone is planning to take a Udemy course, it must be noted that the average price of Udemy Courses is **66.049 USD**. Some of the courses are entirely **free**, while others could go as high as **200 USD**.
3. On an average, it takes about 4 hours to finish a Udemy course.