# What are the most popular courses in Coursera?
This is Data Analysis Project with Coursera course dataset(kaggle).

- First, Let's look at Coursera courses related to 'Data'
- And then Look at the entire courses by programming language(or Frameworks)
- Then by type(field or sector)


In [None]:
import pandas as pd

In [None]:
data = pd.read_csv('../input/coursera-course-dataset/coursea_data.csv')

## Step 1. Data Selection

### Coursera Course Dataset | [Kaggle](https://www.kaggle.com/siddharthm1698/coursera-course-dataset#)
>This dataset contains mainly 6 columns and 890 course data. The detailed description:
1. **course_title** : Contains the course title.
2. **course_organization** : It tells which organization is conducting the courses.
3. **courseCertificatetype** : It has details about what are the different certifications available in courses.
4. **course_rating** : It has the ratings associated with each course.
5. **course_difficulty** : It tells about how difficult or what is the level of the course.
6. **coursestudentsenrolled** : It has the number of students that are enrolled in the course.

## Step 2. Data Lookup

In [None]:
# lookup columns
data.head()

In [None]:
# Summarize only the data needed for analysis
data[['course_title','course_rating','course_difficulty','course_students_enrolled']]

## Step 3. Create DataFrame

In [None]:
# create DataFrame
df = pd.DataFrame(data[['course_title','course_rating','course_difficulty','course_students_enrolled']])
df

In [None]:
df.columns=['title', 'rating', 'level', 'enrolled']
df

## Step 4. Preprocessing

In [None]:
# lookup level column
df['level'].unique()

In [None]:
# Transforms 'level column' into numeric form
def levels(x):
    if x == 'Beginner':
        return 1
    elif x == 'Mixed':
        return 2
    elif x == 'Intermediate':
        return 3
    elif x == 'Advanced':
        return 4

In [None]:
df['level'] = df['level'].apply(levels)
df['level'].unique()

In [None]:
# lookup enrolled column
df['enrolled']

In [None]:
# Transforms 'K' into numeric form
def counts(x):
    rx = x.replace('k','000')
    if '.' in rx:
        rx = rx.replace('.','')
        rx = rx[:-1]
        return int(rx)
    return int(rx)

In [None]:
# Test function 'counts()'
df['enrolled'].apply(counts)

In [None]:
# Apply the function to the dataframe
df['enrolled'] = df['enrolled'].apply(counts)
df

In [None]:
# Extract/check the lecture containing the word 'Data' in the Frame
df[df['title'].str.contains('Data')]

In [None]:
# Extract/check the lecture containing the word 'data' in the Frame
df[df['title'].str.contains('data')]

In [None]:
# All start with the form 'D'(Uppercase). Sort courses containing 'Data' by 'enrolled'
df[df['title'].str.contains('Data')].sort_values(by=['enrolled'], ascending=False)

In [None]:
# Exclude low rated courses
df[df['title'].str.contains('Data')].sort_values(by=['enrolled'], ascending=False)['rating'].min()

In [None]:
# All the ratings are good. Save it as a new variable 'Courses_data'.
Courses_data = df[df['title'].str.contains('Data')].sort_values(by=['enrolled'], ascending=False)

In [None]:
# Make sure it is well saved
Courses_data.head()

In [None]:
# Extract Courses that 100,000 or more students are enrolled
Courses_data[Courses_data['enrolled']>100000]

In [None]:
# Look up the number of the courses
len(Courses_data[Courses_data['enrolled']>100000])

In [None]:
# Save courses with 100000 or more students enrolled and Extract Top 10 Courses seperately.
BestSellers = Courses_data[Courses_data['enrolled']>100000]
AboutData_Top_Ten = BestSellers[:10]

In [None]:
AboutData_Top_Ten

## Step 5. Apply - Language/Framework
So far, for 890 Coursera lectures containing 'Data' have been extracted in the order of cumulative students. In this way, let's check the best sellers by language or by framework.

1. **Python** Best Sellers
2. **R** Best Sellers
3. **Java** Best Sellers
4. **JavaScrips** Best Sellers
5. **Django** Best Sellers
6. **Flask** Best Sellers
7. **React** Best Sellers
8. **Vue** Best Sellers
9. **Tensorflow** Best Sellers
10. **PyTorch** Best Sellers

### 1. Python Best Sellers

In [None]:
# look up all 'python' courses
df[df['title'].str.contains('Python')].sort_values(by=['enrolled'], ascending=False)

In [None]:
# 'python' courses that 100,000 or more students enrolled
Python_Courses = df[df['title'].str.contains('Python')].sort_values(by=['enrolled'], ascending=False)
Python_Courses[Python_Courses['enrolled']>100000]

In [None]:
# save data
Python_BestSellers = Python_Courses[Python_Courses['enrolled']>100000]

### 2. R Best Sellers

In [None]:
# look up R courses(1)
df[df['title'].str.contains('R ')].sort_values(by=['enrolled'], ascending=False)

In [None]:
# look up R courses(2)
# Extract 691 row
df[df['title'].str.contains('R ')].sort_values(by=['enrolled'], ascending=False).iloc[0,:]

In [None]:
# save the extracted row
R_temp_1 = df[df['title'].str.contains('R ')].sort_values(by=['enrolled'], ascending=False).iloc[0,:]
R_temp_1=pd.DataFrame(R_temp_1).T

In [None]:
# look up R courses(2)
df[df['title'].str[-1] =='R'].sort_values(by=['enrolled'], ascending=False)

In [None]:
# look up R courses(2)
# Extract 199, 753 rows
df[df['title'].str[-1] =='R'].sort_values(by=['enrolled'], ascending=False).iloc[:2,:]

In [None]:
# save rows that extracted
R_temp_2 = df[df['title'].str[-1] =='R'].sort_values(by=['enrolled'], ascending=False).iloc[:2,:]

In [None]:
pd.concat([R_temp_1, R_temp_2])

In [None]:
# save to R_BestSellers
R_BestSellers = pd.concat([R_temp_1, R_temp_2])

### 3. Java Best Sellers

In [None]:
# look up Java courses
df[df['title'].str.contains('Java')].sort_values(by=['enrolled'], ascending=False)

In [None]:
Java_Courses = df[df['title'].str.contains('Java')].sort_values(by=['enrolled'], ascending=False)
Java_BestSellers = Java_Courses[Java_Courses['enrolled']>100000]
Java_BestSellers

In [None]:
# JavaScript is also included. Exclude it
Java_BestSellers = Java_BestSellers[Java_BestSellers['title'].str.contains('JavaScript')==False]
Java_BestSellers = Java_BestSellers[Java_BestSellers['title'].str.contains('Javascript')==False]
Java_BestSellers

### 4. JavaScript Best Sellers


In [None]:
# look up JavaSc
df[df['title'].str.contains('JavaScript')].sort_values(by=['enrolled'], ascending=False)

In [None]:
JavaScript_temp_1 = df[df['title'].str.contains('JavaScript')].sort_values(by=['enrolled'], ascending=False).iloc[0,:]
JavaScript_temp_1 = pd.DataFrame(JavaScript_temp_1).T

In [None]:
# Temporary allocation before merging
JavaScript_temp_1

In [None]:
df[df['title'].str.contains('Javascript')].sort_values(by=['enrolled'], ascending=False)

In [None]:
# Temporary allocation before merging
JavaScript_temp_2 = df[df['title'].str.contains('Javascript')].sort_values(by=['enrolled'], ascending=False)

In [None]:
pd.concat([JavaScript_temp_1, JavaScript_temp_2])

In [None]:
# save to JavaScript_BestSellers
JavaScript_BestSellers = pd.concat([JavaScript_temp_1, JavaScript_temp_2])

### 5. Django Best Sellers

In [None]:
# look up courses about Django
df[df['title'].str.contains('Django')].sort_values(by=['enrolled'], ascending=False)

### 6. Flask Best Sellers

In [None]:
# look up courses about Flask
df[df['title'].str.contains('Flask')].sort_values(by=['enrolled'], ascending=False)

### 7. React Best Sellers

In [None]:
# look up courses about React
df[df['title'].str.contains('React')].sort_values(by=['enrolled'], ascending=False)

In [None]:
React_Courses = df[df['title'].str.contains('React')].sort_values(by=['enrolled'], ascending=False)
# save to React_BestSellers
React_BestSellers = React_Courses[React_Courses['enrolled']>100000]

In [None]:
React_BestSellers

### 8. Vue Best Sellers

In [None]:
# look up the courses about Vue
df[df['title'].str.contains('Vue')].sort_values(by=['enrolled'], ascending=False)

In [None]:

df[df['title'].str.contains('vue')].sort_values(by=['enrolled'], ascending=False)

### 9. Tensorflow Best Sellers

In [None]:
# look up the courses about Tensorflow
df[df['title'].str.contains('TensorFlow')].sort_values(by=['enrolled'], ascending=False)

In [None]:
Tensorflow_Courses = df[df['title'].str.contains('TensorFlow')].sort_values(by=['enrolled'], ascending=False)
# Save to Tensorflow_BestSellers
Tensorflow_BestSellers = Tensorflow_Courses[Tensorflow_Courses['enrolled']>100000]
Tensorflow_BestSellers

### 10. PyTorch Best Sellers

In [None]:
# look up the courses about PyTorch
df[df['title'].str.contains('PyTorch')].sort_values(by=['enrolled'], ascending=False)

## Step 6. Apply - Sector
Finally, we will find out popular lectures by filed.

1. Web
2. App
3. Algorithm
4. Computer
5. hacking
6. Finance
7. Commerce
8. neural network
9. machine learning 
10. deep learning

### 1. Web

In [None]:
# look up the courses about 'Web'
df[df['title'].str.contains('Web')].sort_values(by=['enrolled'], ascending=False)

In [None]:
Web_Courses = df[df['title'].str.contains('Web')].sort_values(by=['enrolled'], ascending=False)
# save to Web_BestSellers
Web_BestSellers = Web_Courses[Web_Courses['enrolled']>100000]
Web_BestSellers

### 2. App

In [None]:
# look up the courses about 'App'
df[df['title'].str.contains('App')].sort_values(by=['enrolled'], ascending=False)

In [None]:
App_Courses = df[df['title'].str.contains('App')].sort_values(by=['enrolled'], ascending=False)
# save to App_BestSellers
App_BestSellers = App_Courses[App_Courses['enrolled']>100000]
App_BestSellers

### 3. Algorithm

In [None]:
# look up the courses about 'Algorithm'
df[df['title'].str.contains('Algorithm')].sort_values(by=['enrolled'], ascending=False)

In [None]:
Algorithm_Courses = df[df['title'].str.contains('Algorithm')].sort_values(by=['enrolled'], ascending=False)
# save to Algorigthm_BestSellers
Algorithm_BestSellers = Algorithm_Courses[Algorithm_Courses['enrolled']>100000]
Algorithm_BestSellers

### 4. Computer

In [None]:
# look up the courses about 'Computer'
df[df['title'].str.contains('Computer')].sort_values(by=['enrolled'], ascending=False)

In [None]:
Computer_Courses = df[df['title'].str.contains('Computer')].sort_values(by=['enrolled'], ascending=False)
# save to Computer_BestSellers
Computer_BestSellers = Computer_Courses[Computer_Courses['enrolled']>100000]
Computer_BestSellers

### 5. Hacking

In [None]:
# look up the courses about 'Hacking'
df[df['title'].str.contains('Hacking')].sort_values(by=['enrolled'], ascending=False)

In [None]:
# Check related data
df[df['title'].str.contains('Security')].sort_values(by=['enrolled'], ascending=False)

In [None]:
Security_Courses = df[df['title'].str.contains('Security')].sort_values(by=['enrolled'], ascending=False)
# save to Security_BestSellers
Security_BestSellers = Security_Courses[Security_Courses['enrolled']>100000]
Security_BestSellers

### 6. Finance

In [None]:
# look up the courses about 'Finance'
df[df['title'].str.contains('Finance')].sort_values(by=['enrolled'], ascending=False)

In [None]:
Finance_Courses = df[df['title'].str.contains('Finance')].sort_values(by=['enrolled'], ascending=False)
Finance_BestSellers = Finance_Courses[Finance_Courses['enrolled']>100000]
Finance_BestSellers

### 7. Commerce

In [None]:
# look up the courses about 'Commerce'
df[df['title'].str.contains('Commerce')].sort_values(by=['enrolled'], ascending=False)

In [None]:
# Check related data
df[df['title'].str.contains('Shopping')].sort_values(by=['enrolled'], ascending=False)

In [None]:
# Check related data
df[df['title'].str.contains('Business')].sort_values(by=['enrolled'], ascending=False)

In [None]:
Business_Courses = df[df['title'].str.contains('Business')].sort_values(by=['enrolled'], ascending=False)
# save to Business_BestSellers
Business_BestSellers = Business_Courses[Business_Courses['enrolled']>100000]
Business_BestSellers

### 8. Neural Network

In [None]:
# look up the courses about 'Neural Network'
df[df['title'].str.contains('Neural')].sort_values(by=['enrolled'], ascending=False)

In [None]:
# Check related data
df[df['title'].str.contains('CNN')].sort_values(by=['enrolled'], ascending=False)

In [None]:
# Check related data
df[df['title'].str.contains('RNN')].sort_values(by=['enrolled'], ascending=False)

In [None]:
# Check related data
df[df['title'].str.contains('Reinforcement')].sort_values(by=['enrolled'], ascending=False)

In [None]:
# look up the courses about 'Neural Network'
Neural_Courses = df[df['title'].str.contains('Neural')].sort_values(by=['enrolled'], ascending=False)
# save to 'Neural_BestSellers'
Neural_BestSellers = Neural_Courses[Neural_Courses['enrolled']>100000]
Neural_BestSellers

### 9. Machine Learning

In [None]:
# look up the courses about 'Machine Learning'
df[df['title'].str.contains('Machine')].sort_values(by=['enrolled'], ascending=False)

In [None]:
Machine_Courses = df[df['title'].str.contains('Machine')].sort_values(by=['enrolled'], ascending=False)
# save to 'Machine_BestSellers'
Machine_BestSellers = Machine_Courses[Machine_Courses['enrolled']>100000]
Machine_BestSellers

### 10. Deep Learning

In [None]:
# look up the courses about 'Deep Learning'
df[df['title'].str.contains('Deep')].sort_values(by=['enrolled'], ascending=False)

In [None]:
# All over 100,000 students, so save to 'BestSellers'
Deep_BestSellers = df[df['title'].str.contains('Deep')].sort_values(by=['enrolled'], ascending=False)

In [None]:
Deep_BestSellers.head()