# Search Campaigns for Datacamp

## Keyword Generation

## 1 Instructors
This campaign will be targeting people who search for courses by any of the instructors at Datacamp. 
We can have the keywords restricted to `course by <instructor name>` or we can go for a broad set of keywords targeting the instructor name only. At the beginning, it's better to target the names with 'course' or 'courses' and see the effect.    
We will also need to check if one of the names that we have also happens to be the name of another famous person, and then we will need to restrict the keywords for that instructor. 

We begin by getting the names of the instructors and the URLs for each:

In [3]:
import requests
from bs4 import BeautifulSoup


instructors_page = 'https://www.datacamp.com/instructors?all=true'
instructor_link_selector = '.instructor-block__description .instructor-block__link' # CSS class of the link
instructor_name_selector = '.mb-sm'  # CSS class of the name

instructor_resp = requests.get(instructors_page)
soup = BeautifulSoup(instructor_resp.text, 'lxml')

instructor_urls = [url['href'] for url in soup.select(instructor_link_selector)]
instructor_names = [name.text.strip() for name in soup.select(instructor_name_selector)]
instructor_urls = ['https://www.datacamp.com' + url for url in instructor_urls]

We put them in a data frame for later use. The URLs will be used later for generating ads.

In [5]:
instructor_df = pd.DataFrame({
    'name': instructor_names,
    'url': instructor_urls
})
instructor_df.head()

Unnamed: 0,name,url
0,Filip Schouwenaars,https://www.datacamp.com/instructors/filipsch
1,Jonathan Cornelissen,https://www.datacamp.com/instructors/jonathana...
2,Hugo Bowne-Anderson,https://www.datacamp.com/instructors/hugobowne
3,Nick Carchedi,https://www.datacamp.com/instructors/nickyc
4,Greg Wilson,https://www.datacamp.com/instructors/greg48f64...


### Generate Instructor Keywords

Now that we have the names of instructors, we will be using a template whereby we combine each name with a set of keywords related to our topic, which is mainly courses by the instructor. 

`col_names`: This is a list of the header names of the table that we will end up uploading to Google AdWords.    
`words`: The words that we will be combining with the instructor names to generate the full keywords / phrases.     
`match_types`: [data science course], "data science course", and data science course are technically three different keywords.
- exact match (in brackets), will only trigger ads if a user searches for exactly that keyword, 'data scince course', for example, written exactly like this.
- phrase match (with quotes) , will trigger our ads if a user searches for the exact string together with anything before, or after it. So "best data science course", or "data science course online" would trigger our ads. 
- broad match (no punctuation) would trigger our ads if someone searches for anything similar to or related to 'data science course'. This is up to Google's algorithms, and you have to be careful with it, as it might trigger ads when someone searches for 'data science platform' for example, which is not exactly what we are trying to promote. I like to use the modified broach match more, because it restricts targeting a little more. This is basically triggering ads if someoe searches for any derivative of the word, and not any similar word in meaning. This is denoted with a '+' sign at the begininning of the word. So, '+game' would trigger ads by 'gaming', 'gamers', but by 'play'. 

More details can be found on [AdWords help center](https://support.google.com/adwords/answer/2497836?hl=en)

In [None]:
col_names = ['Campaign', 'Ad Group', 'Keyword', 'Criterion Type']
instructor_keywords = []

words = ['course', 'courses', 'learn', 'data science', 'data camp', 'datacamp']
match_types = ['Exact', 'Phrase', 'Broad']
for instructor in instructor_df['name']:
    for word in words:
        for match in match_types:
            if match == 'Broad':
                keyword = '+' + ' +'.join([instructor.replace(' ', ' +').lower(), word])  # modified broach match
            else:
                keyword = instructor.lower() + ' ' + word
            row = ['SEM_Instructors',  # campaign name
                   instructor,  # ad group name
                   keyword, # instructor <keyword>
                   match]  # keyword match type
            instructor_keywords.append(row)

# do the same by having the keywords come before the instructor name
for instructor in instructor_df['name']:
    for word in words:
        for match in match_types:
            if match == 'Broad':
                keyword = '+' + ' +'.join([word, instructor.replace(' ', ' +').lower()])
            else:
                keyword = word + ' ' + instructor.lower() 
            row = ['SEM_Instructors',  # campaign name
                   instructor,  # ad group name
                   keyword, # instructor <keyword>
                   match]  # keyword match type
            instructor_keywords.append(row)
            

instructor_keywords_df = pd.DataFrame.from_records(instructor_keywords, columns=col_names)
print('total keywords:', instructor_keywords_df.shape[0])
instructor_keywords_df.head()

Basically, what are doing is looping over the instructor names, all the different keywords, and all the match types that we have, so that we have all possible combinations. We are doing it twice, once to have the template 'instructor name keyword' and 'keyword instructor name'. 

Now we simply do the same for all of our segments. A good guideline I learned from [R for Data Science](http://r4ds.had.co.nz/functions.html) is that if you are going to copy and paste more than twice, it's time to write a function! 

In [8]:
def generate_keywords(topics, keywords, match_types=['Exact', 'Phrase', 'Broad'],
                     campaign='SEM_Campaign'):
    col_names = ['Campaign', 'Ad Group', 'Keyword', 'Criterion Type']
    campaign_keywords = []
    
    for topic in topics:
        for word in keywords:
            for match in match_types:
                if match == 'Broad':
                    keyword = '+' + ' +'.join([topic.lower().replace(' ', ' +'), word.replace(' ', ' +')])
                else:
                    keyword = topic.lower() + ' ' + word
                row = [campaign,  # campaign name
                       topic,  # ad group name
                       keyword, # instructor <keyword>
                       match]  # keyword match type
                campaign_keywords.append(row)

    # I said more than twice! :)             
    for topic in topics:
        for word in keywords:
            for match in match_types:
                if match == 'Broad':
                    keyword = '+' + ' +'.join([word.replace(' ', ' +'), topic.lower().replace(' ', ' +')])
                else:
                    keyword = word + ' ' + topic.lower()
                row = [campaign,  # campaign name
                       topic,  # ad group name
                       keyword, # <keyword> instructor
                       match]  # keyword match type
                campaign_keywords.append(row)

    return pd.DataFrame.from_records(campaign_keywords, columns=col_names)


Let's give it a try:

In [71]:
topics = ['Data Science', 'Machine Learning']
keywords = ['learn', 'course']
generate_keywords(topics, keywords).head(10)

Unnamed: 0,Campaign,Ad Group,Keyword,Criterion Type
0,SEM_Campaign,Data Science,data science learn,Exact
1,SEM_Campaign,Data Science,data science learn,Phrase
2,SEM_Campaign,Data Science,+data +science +learn,Broad
3,SEM_Campaign,Data Science,data science course,Exact
4,SEM_Campaign,Data Science,data science course,Phrase
5,SEM_Campaign,Data Science,+data +science +course,Broad
6,SEM_Campaign,Machine Learning,machine learning learn,Exact
7,SEM_Campaign,Machine Learning,machine learning learn,Phrase
8,SEM_Campaign,Machine Learning,+machine +learning +learn,Broad
9,SEM_Campaign,Machine Learning,machine learning course,Exact


Looks good. Now we simply need to generate the relevant topics and keywords for each of our segments

## 2 Technologies

In [55]:
topics = ['R', 'Python', 'SQL', 'Git', 'Shell']  # listed on the /courses page
keywords = ['data science', 'programming', 'analytics', 'data analysis', 'machine learning',
            'deep learning', 'financial analysis', 'data viz', 'visualization', 'data visualization',
            'learn', 'course', 'courses', 'education', 'data import', 'data cleaning', 
            'data manipulation', 'probability', 'stats', 'statistics', 'course', 'courses',
           'learn', 'education', 'tutorial']  # @marketing_team: this list can / should be refined or 
                                              # expanded based on the strategy and how specific the 
                                              # targeting needs to be
tech_keywords = generate_keywords(topics, keywords, campaign='SEM_Technologies')
print('total keywords:', tech_keywords.shape[0])
tech_keywords.head()

total keywords: 750


Unnamed: 0,Campaign,Ad Group,Keyword,Criterion Type
0,SEM_Technologies,R,r data science,Exact
1,SEM_Technologies,R,r data science,Phrase
2,SEM_Technologies,R,+r +data +science,Broad
3,SEM_Technologies,R,r programming,Exact
4,SEM_Technologies,R,r programming,Phrase


## 3 Courses

This is probably the most specific, and therefore the most relevant of the segments to target people. If someone is searching for "data visualization with r" and we have that course, then the ad would be extremely relevant, because we would have the right landing page that exactly satisfies that user's need. 

One small problem. Some of the course name don't correspond to what a user would search for:
'Machine Learning with the Experts: School Budgets', 'Sentiment Analysis in R: The Tidy Way'. These are NOT bad course names. They just need some attention as to selecting the proper keywords that people might use to search for them. 

Again, we can scrape the names and correspnding URLs as we did with the instructors' campaign. 


In [11]:
courses_page = 'https://www.datacamp.com/courses/all'
course_link_selector = '.courses__explore-list .course-block'

course_resp = requests.get(courses_page)
soup = BeautifulSoup(course_resp.text, 'lxml')

course_urls = [link.contents[1]['href'] for link in soup.select(course_link_selector)] 
course_urls = ['https://www.datacamp.com' + url for url in course_urls]
course_names = [link.h4.text for link in soup.select(course_link_selector)]

In [13]:
course_df = pd.DataFrame({
    'name': course_names,
    'url': course_urls
})
course_df['name_clean'] = course_df.name.str.replace('\(.*\)', '').str.strip()  # remove (part x)
print('total keywords:', course_df.shape[0])
course_df.head()

total keywords: 94


Unnamed: 0,name,url,name_clean
0,Intro to Python for Data Science,https://www.datacamp.com/courses/intro-to-pyth...,Intro to Python for Data Science
1,Introduction to R,https://www.datacamp.com/courses/free-introduc...,Introduction to R
2,Intermediate Python for Data Science,https://www.datacamp.com/courses/intermediate-...,Intermediate Python for Data Science
3,Intro to SQL for Data Science,https://www.datacamp.com/courses/intro-to-sql-...,Intro to SQL for Data Science
4,Intermediate R,https://www.datacamp.com/courses/intermediate-r,Intermediate R


We will do the same (use the `generate_keywords` function) for the course, but we need to be careful, as they need to be reviewed because as mentioned above, some of the names are not really what people would look for, and we just need to account for that case-by-case. The following should be good enough for a start, and then we can see the data and make decisions.     
Please note that using the empty character below is not a mistake. The names of course are long and specific enough that they are fit to be keywords in and of themselves, without having to add other qualifier keywords like 'learn' or 'course'. So we will be using the course names alone, as well as with the qualifier keywords. 

In [57]:
keywords = ['', 'learn', 'course', 'courses', 'tutorial', 'education']
course_keywords = generate_keywords(course_df['name_clean'], keywords, campaign='SEM_Courses')
print('total keywords:', course_keywords.shape[0])
course_keywords.head(10)

total keywords: 3384


Unnamed: 0,Campaign,Ad Group,Keyword,Criterion Type
0,SEM_Courses,Intro to Python for Data Science,intro to python for data science,Exact
1,SEM_Courses,Intro to Python for Data Science,intro to python for data science,Phrase
2,SEM_Courses,Intro to Python for Data Science,+intro +to +python +for +data +science +,Broad
3,SEM_Courses,Intro to Python for Data Science,intro to python for data science learn,Exact
4,SEM_Courses,Intro to Python for Data Science,intro to python for data science learn,Phrase
5,SEM_Courses,Intro to Python for Data Science,+intro +to +python +for +data +science +learn,Broad
6,SEM_Courses,Intro to Python for Data Science,intro to python for data science course,Exact
7,SEM_Courses,Intro to Python for Data Science,intro to python for data science course,Phrase
8,SEM_Courses,Intro to Python for Data Science,+intro +to +python +for +data +science +course,Broad
9,SEM_Courses,Intro to Python for Data Science,intro to python for data science courses,Exact


## 4. Topics

These are basically generic topics that people might be intersted in searching for. They are covered by the 'tracks' section, which has skills and career as sub-sections. For our purposed they can be grouped under the same campaign.   

The process is again the same. 

#### Skills

In [40]:
skills_page = 'https://www.datacamp.com/tracks/skill'
skills_link_selector = '#all .shim'

skills_resp = requests.get(skills_page)
skill_soup = BeautifulSoup(skills_resp.text, 'lxml')

skills_urls = [link['href'] for link in skill_soup.select(skills_link_selector)] 
skills_names = [skill.replace('/tracks/', '').replace('-', ' ') for skill in skills_urls]
skills_urls = ['https://www.datacamp.com' + url for url in skills_urls]

##=====================================================================

#### Careers

In [40]:
career_page = 'https://www.datacamp.com/tracks/career'
career_link_selector = '#all .shim'

career_resp = requests.get(career_page)
career_soup = BeautifulSoup(career_resp.text, 'lxml')

career_urls = [link['href'] for link in career_soup.select(career_link_selector)] 

career_names = [career.replace('/tracks/', '').replace('-', ' ') for career in career_urls]
career_urls = ['https://www.datacamp.com' + url for url in career_urls]

In [48]:
tracks_df = pd.DataFrame({
    'name': skills_names + career_names,
    'url': skills_urls + career_urls
})
tracks_df['name'] = [x.title() for x in tracks_df['name']]
tracks_df.head()

Unnamed: 0,name,url
0,R Programming,https://www.datacamp.com/tracks/r-programming
1,Importing Cleaning Data With R,https://www.datacamp.com/tracks/importing-clea...
2,Data Manipulation With R,https://www.datacamp.com/tracks/data-manipulat...
3,Python Programming,https://www.datacamp.com/tracks/python-program...
4,Importing Cleaning Data With Python,https://www.datacamp.com/tracks/importing-clea...


In [49]:
tracks_keywords = generate_keywords(tracks_df['name'], keywords, campaign='SEM_Tracks')
print('total keywords:', tracks_keywords.shape[0])
tracks_keywords.head()

total keywords: 3000


Unnamed: 0,Campaign,Ad Group,Keyword,Criterion Type
0,SEM_Tracks,R Programming,r programming data science,Exact
1,SEM_Tracks,R Programming,r programming data science,Phrase
2,SEM_Tracks,R Programming,+r +programming +data +science,Broad
3,SEM_Tracks,R Programming,r programming programming,Exact
4,SEM_Tracks,R Programming,r programming programming,Phrase


In [60]:
full_keywords_df = pd.concat([instructor_keywords_df, tech_keywords, course_keywords, tracks_keywords])
print('total keywords:', full_keywords_df.shape[0])
print('total campaigns:', len(set(full_keywords_df['Campaign'])))
print('total ad groups:', len(set(full_keywords_df['Ad Group'])))
full_keywords_df.to_csv('keywords.csv', index=False)
full_keywords_df.head()

total keywords: 9438
total campaigns: 4
total ad groups: 173


Unnamed: 0,Campaign,Ad Group,Keyword,Criterion Type
0,SEM_Instructors,Filip Schouwenaars,filip schouwenaars course,Exact
1,SEM_Instructors,Filip Schouwenaars,filip schouwenaars course,Phrase
2,SEM_Instructors,Filip Schouwenaars,+filip +schouwenaars +course,Broad
3,SEM_Instructors,Filip Schouwenaars,filip schouwenaars courses,Exact
4,SEM_Instructors,Filip Schouwenaars,filip schouwenaars courses,Phrase


Now we are ready to go with our keywords, and the full set can be found [here](keywords.csv). Next we need to generate ads for each of our ad groups. 