### Udemy Course Recommendation System

### Algorithm:
* Cosine Similarity Algorithm : To measure how similar /different documents are
* Linear Similarity Algorithm: Faster version of cosine 

### Workflow: 
* Dataset
* Vectorize dataset
* Cosine Similarity Matrix
* ID,Score
* Recommend


## 1. Exploratory Data Analytics

In [5]:
import pandas as pd
import neattext.functions as nfx

##### 1.1 Preliminary checks

In [8]:
udemy_df = pd.read_csv('data/udemy_courses.csv')
udemy_df.head()

Unnamed: 0,course_id,course_title,url,is_paid,price,num_subscribers,num_reviews,num_lectures,level,content_duration,published_timestamp,subject
0,1070968,Ultimate Investment Banking Course,https://www.udemy.com/ultimate-investment-bank...,True,200,2147,23,51,All Levels,1.5,2017-01-18T20:58:58Z,Business Finance
1,1113822,Complete GST Course & Certification - Grow You...,https://www.udemy.com/goods-and-services-tax/,True,75,2792,923,274,All Levels,39.0,2017-03-09T16:34:20Z,Business Finance
2,1006314,Financial Modeling for Business Analysts and C...,https://www.udemy.com/financial-modeling-for-b...,True,45,2174,74,51,Intermediate Level,2.5,2016-12-19T19:26:30Z,Business Finance
3,1210588,Beginner to Pro - Financial Analysis in Excel ...,https://www.udemy.com/complete-excel-finance-c...,True,95,2451,11,36,All Levels,3.0,2017-05-30T20:07:24Z,Business Finance
4,1011058,How To Maximize Your Profits Trading Options,https://www.udemy.com/how-to-maximize-your-pro...,True,200,1276,45,26,Intermediate Level,2.0,2016-12-13T14:57:18Z,Business Finance


In [10]:
udemy_df.shape

(3678, 12)

In [14]:
## Check Missing Values
udemy_df.isna().sum()

course_id              0
course_title           0
url                    0
is_paid                0
price                  0
num_subscribers        0
num_reviews            0
num_lectures           0
level                  0
content_duration       0
published_timestamp    0
subject                0
dtype: int64

inference : No missing values

In [29]:
## Check Null and Dtypes
udemy_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3678 entries, 0 to 3677
Data columns (total 12 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   course_id            3678 non-null   int64  
 1   course_title         3678 non-null   object 
 2   url                  3678 non-null   object 
 3   is_paid              3678 non-null   bool   
 4   price                3678 non-null   int64  
 5   num_subscribers      3678 non-null   int64  
 6   num_reviews          3678 non-null   int64  
 7   num_lectures         3678 non-null   int64  
 8   level                3678 non-null   object 
 9   content_duration     3678 non-null   float64
 10  published_timestamp  3678 non-null   object 
 11  subject              3678 non-null   object 
dtypes: bool(1), float64(1), int64(5), object(5)
memory usage: 319.8+ KB


In [32]:
#The number unique values for each column 
udemy_df.nunique()

course_id              3672
course_title           3663
url                    3672
is_paid                   2
price                    38
num_subscribers        2197
num_reviews             511
num_lectures            229
level                     4
content_duration        105
published_timestamp    3672
subject                   4
dtype: int64

In [33]:
## Statistics of dataset
udemy_df.describe()

Unnamed: 0,course_id,price,num_subscribers,num_reviews,num_lectures,content_duration
count,3678.0,3678.0,3678.0,3678.0,3678.0,3678.0
mean,675972.0,66.049483,3197.150625,156.259108,40.108755,4.094517
std,343273.2,61.005755,9504.11701,935.452044,50.383346,6.05384
min,8324.0,0.0,0.0,0.0,0.0,0.0
25%,407692.5,20.0,111.0,4.0,15.0,1.0
50%,687917.0,45.0,911.5,18.0,25.0,2.0
75%,961355.5,95.0,2546.0,67.0,45.75,4.5
max,1282064.0,200.0,268923.0,27445.0,779.0,78.5


In [35]:
## Categories of Subject
print("Categories of Subject : ", end=" ")
print(udemy_df['subject'].unique())

## Categories of Level
print("Categories of Level : ", end=" ")
print(udemy_df['level'].unique())

Categories of Subject :  ['Business Finance' 'Graphic Design' 'Musical Instruments'
 'Web Development']
Categories of Level :  ['All Levels' 'Intermediate Level' 'Beginner Level' 'Expert Level']


#### 1.1 Data Cleaning

In [36]:
udemy_df['course_title']

0                      Ultimate Investment Banking Course
1       Complete GST Course & Certification - Grow You...
2       Financial Modeling for Business Analysts and C...
3       Beginner to Pro - Financial Analysis in Excel ...
4            How To Maximize Your Profits Trading Options
                              ...                        
3673    Learn jQuery from Scratch - Master of JavaScri...
3674    How To Design A WordPress Website With No Codi...
3675                        Learn and Build using Polymer
3676    CSS Animations: Create Amazing Effects on Your...
3677    Using MODX CMS to Build Websites: A Beginner's...
Name: course_title, Length: 3678, dtype: object

Loads of stop words here so let's get ride of those

In [37]:
dir(nfx)

['BTC_ADDRESS_REGEX',
 'CURRENCY_REGEX',
 'CURRENCY_SYMB_REGEX',
 'Counter',
 'DATE_REGEX',
 'EMAIL_REGEX',
 'EMOJI_REGEX',
 'HASTAG_REGEX',
 'MASTERCard_REGEX',
 'MD5_SHA_REGEX',
 'MOST_COMMON_PUNCT_REGEX',
 'NUMBERS_REGEX',
 'PHONE_REGEX',
 'PoBOX_REGEX',
 'SPECIAL_CHARACTERS_REGEX',
 'STOPWORDS',
 'STOPWORDS_de',
 'STOPWORDS_en',
 'STOPWORDS_es',
 'STOPWORDS_fr',
 'STOPWORDS_ru',
 'STOPWORDS_yo',
 'STREET_ADDRESS_REGEX',
 'TextFrame',
 'URL_PATTERN',
 'USER_HANDLES_REGEX',
 'VISACard_REGEX',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__generate_text',
 '__loader__',
 '__name__',
 '__numbers_dict',
 '__package__',
 '__spec__',
 '_lex_richness_herdan',
 '_lex_richness_maas_ttr',
 'clean_text',
 'defaultdict',
 'digit2words',
 'extract_btc_address',
 'extract_currencies',
 'extract_currency_symbols',
 'extract_dates',
 'extract_emails',
 'extract_emojis',
 'extract_hashtags',
 'extract_html_tags',
 'extract_mastercard_addr',
 'extract_md5sha',
 'extract_numbers',
 'extr

In [38]:
# Text cleanup remove stop words from course title column
udemy_df['cleaner_course_title'] = udemy_df['course_title'].apply(nfx.remove_stopwords)
udemy_df['cleaner_course_title']

0                      Ultimate Investment Banking Course
1       Complete GST Course & Certification - Grow Pra...
2        Financial Modeling Business Analysts Consultants
3            Beginner Pro - Financial Analysis Excel 2017
4                        Maximize Profits Trading Options
                              ...                        
3673     Learn jQuery Scratch - Master JavaScript library
3674                      Design WordPress Website Coding
3675                                  Learn Build Polymer
3676       CSS Animations: Create Amazing Effects Website
3677            MODX CMS Build Websites: Beginner's Guide
Name: cleaner_course_title, Length: 3678, dtype: object

In [40]:
udemy_df['cleaner_course_title'] = udemy_df['cleaner_course_title'].apply(nfx.remove_special_characters)
udemy_df['cleaner_course_title']

0                      Ultimate Investment Banking Course
1       Complete GST Course  Certification  Grow Practice
2        Financial Modeling Business Analysts Consultants
3             Beginner Pro  Financial Analysis Excel 2017
4                        Maximize Profits Trading Options
                              ...                        
3673      Learn jQuery Scratch  Master JavaScript library
3674                      Design WordPress Website Coding
3675                                  Learn Build Polymer
3676        CSS Animations Create Amazing Effects Website
3677              MODX CMS Build Websites Beginners Guide
Name: cleaner_course_title, Length: 3678, dtype: object

In [46]:
## contrast the difference between raw and cleaner course titles
udemy_df[['course_title','cleaner_course_title']]

Unnamed: 0,course_title,cleaner_course_title
0,Ultimate Investment Banking Course,Ultimate Investment Banking Course
1,Complete GST Course & Certification - Grow You...,Complete GST Course Certification Grow Practice
2,Financial Modeling for Business Analysts and C...,Financial Modeling Business Analysts Consultants
3,Beginner to Pro - Financial Analysis in Excel ...,Beginner Pro Financial Analysis Excel 2017
4,How To Maximize Your Profits Trading Options,Maximize Profits Trading Options
...,...,...
3673,Learn jQuery from Scratch - Master of JavaScri...,Learn jQuery Scratch Master JavaScript library
3674,How To Design A WordPress Website With No Codi...,Design WordPress Website Coding
3675,Learn and Build using Polymer,Learn Build Polymer
3676,CSS Animations: Create Amazing Effects on Your...,CSS Animations Create Amazing Effects Website


### 2. Vectorization

In [48]:
from sklearn.feature_extraction.text import CountVectorizer,TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity,linear_kernel

In [49]:

count_vect = CountVectorizer() # Countvectorizer is a method to convert text to numerical data
cv_matrix = count_vect.fit_transform(udemy_df['cleaner_course_title'])

In [50]:
# Sparse matrix
cv_matrix

<3678x3559 sparse matrix of type '<class 'numpy.int64'>'
	with 18333 stored elements in Compressed Sparse Row format>

In [51]:
## dense matrix
cv_matrix.todense()

matrix([[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]], dtype=int64)

In [58]:
count_vect.get_feature_names()



['000005',
 '001',
 '01',
 '02',
 '10',
 '100',
 '101',
 '101master',
 '102',
 '10k',
 '10th',
 '11',
 '110',
 '111creating',
 '112',
 '12',
 '123d',
 '13',
 '13customer',
 '14',
 '15',
 '150',
 '16',
 '16propertyplant',
 '17',
 '175',
 '175pa',
 '18',
 '183pa',
 '1872',
 '188',
 '19',
 '1a',
 '1presentation',
 '1year',
 '20',
 '200',
 '201',
 '2012',
 '2013',
 '20132016365',
 '2014',
 '2015',
 '20153',
 '2016',
 '20162017',
 '2017',
 '20172018',
 '2020',
 '21',
 '23',
 '24',
 '24hrs',
 '25',
 '263432aprende',
 '28',
 '2creating',
 '2d',
 '2hour',
 '2x',
 '30',
 '30day',
 '31',
 '35',
 '38',
 '398746piano',
 '3course',
 '3d',
 '3dcgblender',
 '3ds',
 '3tier',
 '40',
 '42038learn',
 '45',
 '48',
 '4a',
 '4d',
 '4hours',
 '4trial',
 '50',
 '500',
 '52',
 '53',
 '54',
 '59',
 '5creating',
 '5k',
 '5ths',
 '60',
 '60mins',
 '61',
 '650804guitar',
 '66',
 '70461',
 '70462',
 '72',
 '800',
 '8020',
 '874284weekly',
 '88',
 '8accounting',
 '8currency',
 '90',
 '94',
 '97',
 'abc',
 'abcs',
 '

In [52]:
df_cv_words = pd.DataFrame(cv_matrix.todense(), columns=count_vect.get_feature_names())
df_cv_words



Unnamed: 0,000005,001,01,02,10,100,101,101master,102,10k,...,zend,zero,zerotohero,zf2,zinsen,zoho,zombie,zu,zuhause,zur
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3673,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3674,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3675,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3676,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [59]:
## Cosine similarity
cosine_sim_matrix = cosine_similarity(cv_matrix)
consine_sim_matrix

array([[1.        , 0.20412415, 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.20412415, 1.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 1.        , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.        , 0.        , ..., 1.        , 0.        ,
        0.23570226],
       [0.        , 0.        , 0.        , ..., 0.        , 1.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.23570226, 0.        ,
        1.        ]])

In [60]:
# import seaborn as sns

# sns.heatmap(cosine_sim_matrix[:10],annot=True)

In [63]:
#fetch index of each course and and corresponding value from the cosine matrix

course_indices = pd.Series(udemy_df.index,index=udemy_df['course_title']).drop_duplicates()
course_indices

course_title
Ultimate Investment Banking Course                                0
Complete GST Course & Certification - Grow Your CA Practice       1
Financial Modeling for Business Analysts and Consultants          2
Beginner to Pro - Financial Analysis in Excel 2017                3
How To Maximize Your Profits Trading Options                      4
                                                               ... 
Learn jQuery from Scratch - Master of JavaScript library       3673
How To Design A WordPress Website With No Coding At All        3674
Learn and Build using Polymer                                  3675
CSS Animations: Create Amazing Effects on Your Website         3676
Using MODX CMS to Build Websites: A Beginner's Guide           3677
Length: 3678, dtype: int64

In [70]:
## now let's find out which courses are similar
idx = course_indices['Financial Modeling for Business Analysts and Consultants']
idx

2

In [71]:

scores = list(enumerate(cosine_sim_matrix[idx]))
scores


[(0, 0.0),
 (1, 0.0),
 (2, 0.9999999999999999),
 (3, 0.18257418583505539),
 (4, 0.0),
 (5, 0.0),
 (6, 0.0),
 (7, 0.0),
 (8, 0.0),
 (9, 0.0),
 (10, 0.0),
 (11, 0.0),
 (12, 0.19999999999999998),
 (13, 0.0),
 (14, 0.0),
 (15, 0.0),
 (16, 0.0),
 (17, 0.0),
 (18, 0.0),
 (19, 0.25819888974716115),
 (20, 0.0),
 (21, 0.0),
 (22, 0.1690308509457033),
 (23, 0.19999999999999998),
 (24, 0.0),
 (25, 0.0),
 (26, 0.0),
 (27, 0.0),
 (28, 0.0),
 (29, 0.0),
 (30, 0.0),
 (31, 0.0),
 (32, 0.0),
 (33, 0.0),
 (34, 0.0),
 (35, 0.1690308509457033),
 (36, 0.0),
 (37, 0.19999999999999998),
 (38, 0.36514837167011077),
 (39, 0.0),
 (40, 0.19999999999999998),
 (41, 0.0),
 (42, 0.39999999999999997),
 (43, 0.0),
 (44, 0.0),
 (45, 0.18257418583505539),
 (46, 0.0),
 (47, 0.0),
 (48, 0.0),
 (49, 0.0),
 (50, 0.0),
 (51, 0.0),
 (52, 0.0),
 (53, 0.25819888974716115),
 (54, 0.0),
 (55, 0.0),
 (56, 0.22360679774997896),
 (57, 0.0),
 (58, 0.0),
 (59, 0.0),
 (60, 0.14907119849998596),
 (61, 0.18257418583505539),
 (62, 0.0),
 

In [73]:
## let's sort the scores
sorted_scores = sorted(scores,key=lambda x:x[1],reverse=True) # x[1] - coz we need only the similarity values
sorted_scores 

[(2, 0.9999999999999999),
 (119, 0.5163977794943223),
 (140, 0.5163977794943223),
 (822, 0.5163977794943223),
 (806, 0.4472135954999579),
 (829, 0.4472135954999579),
 (1023, 0.4472135954999579),
 (1024, 0.4472135954999579),
 (490, 0.44721359549995787),
 (42, 0.39999999999999997),
 (566, 0.39999999999999997),
 (744, 0.39999999999999997),
 (823, 0.39999999999999997),
 (1073, 0.39999999999999997),
 (1077, 0.39999999999999997),
 (38, 0.36514837167011077),
 (229, 0.36514837167011077),
 (268, 0.36514837167011077),
 (319, 0.36514837167011077),
 (927, 0.36514837167011077),
 (941, 0.36514837167011077),
 (1193, 0.36514837167011077),
 (195, 0.3380617018914066),
 (267, 0.3380617018914066),
 (373, 0.3162277660168379),
 (423, 0.3162277660168379),
 (473, 0.3162277660168379),
 (558, 0.3162277660168379),
 (636, 0.3162277660168379),
 (567, 0.2981423969999719),
 (1135, 0.2981423969999719),
 (1158, 0.2981423969999719),
 (19, 0.25819888974716115),
 (53, 0.25819888974716115),
 (99, 0.25819888974716115),
 (1

In [74]:
# index 2 - is the value itself so we dont need that 
## omitting it
sorted_scores[1:]

[(119, 0.5163977794943223),
 (140, 0.5163977794943223),
 (822, 0.5163977794943223),
 (806, 0.4472135954999579),
 (829, 0.4472135954999579),
 (1023, 0.4472135954999579),
 (1024, 0.4472135954999579),
 (490, 0.44721359549995787),
 (42, 0.39999999999999997),
 (566, 0.39999999999999997),
 (744, 0.39999999999999997),
 (823, 0.39999999999999997),
 (1073, 0.39999999999999997),
 (1077, 0.39999999999999997),
 (38, 0.36514837167011077),
 (229, 0.36514837167011077),
 (268, 0.36514837167011077),
 (319, 0.36514837167011077),
 (927, 0.36514837167011077),
 (941, 0.36514837167011077),
 (1193, 0.36514837167011077),
 (195, 0.3380617018914066),
 (267, 0.3380617018914066),
 (373, 0.3162277660168379),
 (423, 0.3162277660168379),
 (473, 0.3162277660168379),
 (558, 0.3162277660168379),
 (636, 0.3162277660168379),
 (567, 0.2981423969999719),
 (1135, 0.2981423969999719),
 (1158, 0.2981423969999719),
 (19, 0.25819888974716115),
 (53, 0.25819888974716115),
 (99, 0.25819888974716115),
 (178, 0.25819888974716115),


In [75]:
## selected course indices
selected_course_indices = [i[0] for i in sorted_scores[1:]]
selected_course_indices ## now these course indexes are similar to the course -idx belongs to

[119,
 140,
 822,
 806,
 829,
 1023,
 1024,
 490,
 42,
 566,
 744,
 823,
 1073,
 1077,
 38,
 229,
 268,
 319,
 927,
 941,
 1193,
 195,
 267,
 373,
 423,
 473,
 558,
 636,
 567,
 1135,
 1158,
 19,
 53,
 99,
 178,
 224,
 245,
 272,
 289,
 412,
 418,
 479,
 519,
 520,
 523,
 547,
 585,
 623,
 719,
 739,
 778,
 894,
 921,
 928,
 936,
 943,
 972,
 977,
 988,
 1074,
 1086,
 1156,
 56,
 82,
 101,
 130,
 168,
 256,
 257,
 322,
 333,
 374,
 447,
 458,
 482,
 486,
 552,
 559,
 561,
 572,
 574,
 611,
 691,
 708,
 759,
 785,
 852,
 862,
 905,
 986,
 1101,
 1113,
 1131,
 1141,
 1255,
 1406,
 1586,
 2991,
 12,
 23,
 37,
 40,
 141,
 171,
 187,
 213,
 222,
 247,
 250,
 251,
 262,
 317,
 371,
 426,
 453,
 454,
 505,
 522,
 533,
 550,
 582,
 584,
 633,
 657,
 685,
 698,
 699,
 765,
 779,
 787,
 788,
 836,
 844,
 847,
 949,
 1078,
 1133,
 1186,
 1523,
 1704,
 1787,
 2509,
 3034,
 3498,
 3,
 45,
 61,
 146,
 243,
 259,
 270,
 350,
 393,
 409,
 419,
 429,
 546,
 721,
 724,
 728,
 768,
 805,
 824,
 857,
 870

In [94]:
## selected course courses
selected_course_scores = [i[0] for i in sorted_scores[1:]]
selected_course_scores ## now these course indexes are similar to the course -idx belongs to

[119,
 140,
 822,
 806,
 829,
 1023,
 1024,
 490,
 42,
 566,
 744,
 823,
 1073,
 1077,
 38,
 229,
 268,
 319,
 927,
 941,
 1193,
 195,
 267,
 373,
 423,
 473,
 558,
 636,
 567,
 1135,
 1158,
 19,
 53,
 99,
 178,
 224,
 245,
 272,
 289,
 412,
 418,
 479,
 519,
 520,
 523,
 547,
 585,
 623,
 719,
 739,
 778,
 894,
 921,
 928,
 936,
 943,
 972,
 977,
 988,
 1074,
 1086,
 1156,
 56,
 82,
 101,
 130,
 168,
 256,
 257,
 322,
 333,
 374,
 447,
 458,
 482,
 486,
 552,
 559,
 561,
 572,
 574,
 611,
 691,
 708,
 759,
 785,
 852,
 862,
 905,
 986,
 1101,
 1113,
 1131,
 1141,
 1255,
 1406,
 1586,
 2991,
 12,
 23,
 37,
 40,
 141,
 171,
 187,
 213,
 222,
 247,
 250,
 251,
 262,
 317,
 371,
 426,
 453,
 454,
 505,
 522,
 533,
 550,
 582,
 584,
 633,
 657,
 685,
 698,
 699,
 765,
 779,
 787,
 788,
 836,
 844,
 847,
 949,
 1078,
 1133,
 1186,
 1523,
 1704,
 1787,
 2509,
 3034,
 3498,
 3,
 45,
 61,
 146,
 243,
 259,
 270,
 350,
 393,
 409,
 419,
 429,
 546,
 721,
 724,
 728,
 768,
 805,
 824,
 857,
 870

In [95]:
recommended_courses = udemy_df['course_title'].iloc[selected_course_indices]
recommended_courses

119                    Introduction to Financial Modeling
140                           Intro to Financial Modeling
822                                Financial Modeling 101
806        Financial Modeling for Professionals in 1 Day!
829              Financial Modeling in Excel for Startups
                              ...                        
3673    Learn jQuery from Scratch - Master of JavaScri...
3674    How To Design A WordPress Website With No Codi...
3675                        Learn and Build using Polymer
3676    CSS Animations: Create Amazing Effects on Your...
3677    Using MODX CMS to Build Websites: A Beginner's...
Name: course_title, Length: 3677, dtype: object

In [87]:
recommended_courses_df =pd.DataFrame(recommended_courses)
recommended_courses_df


Unnamed: 0,course_title
119,Introduction to Financial Modeling
140,Intro to Financial Modeling
822,Financial Modeling 101
806,Financial Modeling for Professionals in 1 Day!
829,Financial Modeling in Excel for Startups
...,...
3673,Learn jQuery from Scratch - Master of JavaScri...
3674,How To Design A WordPress Website With No Codi...
3675,Learn and Build using Polymer
3676,CSS Animations: Create Amazing Effects on Your...


In [99]:
recommended_courses_df['similarity_scores'] = selected_course_scores
recommended_courses_df

Unnamed: 0,course_title,similarity_scores
119,Introduction to Financial Modeling,119
140,Intro to Financial Modeling,140
822,Financial Modeling 101,822
806,Financial Modeling for Professionals in 1 Day!,806
829,Financial Modeling in Excel for Startups,829
...,...,...
3673,Learn jQuery from Scratch - Master of JavaScri...,3673
3674,How To Design A WordPress Website With No Codi...,3674
3675,Learn and Build using Polymer,3675
3676,CSS Animations: Create Amazing Effects on Your...,3676


In [105]:
def recommend_course(title,num_of_recommendations=10):
    #id for title
    idx = course_indices[title]
    
    # course indix
    # serach inside consine sim mat
    scores = list(enumerate(cosine_sim_matrix[idx]))
    # scores
    # sort scores
    sorted_scores = sorted(scores,key=lambda x:x[1],reverse=True)
    
    # recommend
    selected_course_indices = [i[0] for i in sorted_scores[1:]]
    selected_course_scores = [i[1] for i in sorted_scores[1:]]
    result = udemy_df['course_title'].iloc[selected_course_indices]
    recommend_df = pd.DataFrame(result)
    recommend_df['similarity_scores'] = selected_course_scores
    return recommend_df.head(num_of_recommendations)
    
    

In [106]:
udemy_df.course_title.head()

0                   Ultimate Investment Banking Course
1    Complete GST Course & Certification - Grow You...
2    Financial Modeling for Business Analysts and C...
3    Beginner to Pro - Financial Analysis in Excel ...
4         How To Maximize Your Profits Trading Options
Name: course_title, dtype: object

In [107]:
recommend_course('Ultimate Investment Banking Course')

Unnamed: 0,course_title,similarity_scores
39,The Complete Investment Banking Course 2017,0.67082
3474,The Ultimate jQuery Course,0.57735
240,Advanced Accounting for Investment Banking,0.5
417,The Investment Banking Recruitment Series,0.5
2714,The Ultimate Web Development Course,0.5
2802,Ultimate WordPress Plugin Course,0.5
657,Financial Accounting - The Ultimate Beginner C...,0.447214
1069,Managerial Accounting - The Ultimate Beginner ...,0.447214
1211,The Ultimate Drawing Course - Beginner to Adva...,0.447214
2643,The Ultimate Vue JS 2 Developers Course,0.447214
