## 1. The Summary
<p>Imagine working for a digital marketing agency, and the agency is approached by a massive discount retailer. They want to make sure analysts are proficient at making large campaigns for their website. The goal of this project is to create a prototype set of keywords for search campaigns for their athletics shoes department. The client says that they want us to generate keywords for the following products: </p>
<ul>
<li>trail shoes</li>
<li>running shoes</li>
<li>basketball shoes</li>
<li>walking shoes</li>
<li>cleats</li>
<li>tennis shoes</ul>
<p><strong>The Summary</strong>: Since the client is considered a low-cost retailer, they are offering many promotions and discounts. The analyst will need to focus on these keywords. They will also need to steer clear from keywords and topics that have to do with luxury, as their target customer is a price-sensitive customer. Because the company is also tight on budget, it would be most ideal to focus on a tightly targeted set of keywords and make sure they are all set to exact and phrase match rather than matching a large range of words without an exact phrase match.</p>
<p>Based on the summary above the analyst will first need to generate a list of words, that together with the products given above would make for good keywords. Here are some examples:</p>
<ul>
<li>Products: trail shoes, running shoes, basketball shoes, walking shoes, cleats shoes, tennis shoes</li>
<li>Words: buy, prices, promotion, promo, sale, discount, shop</li>
</ul>
<p>Here are some of the resulting keywords: 'buy shoes', 'shoes buy', 'discount shoes', 'shoes discount',
              'buy basketball shoes', 'prices running shoes', 'prices walking shoes', 'cleat shoes', 'cleat prices', 'prices shoes', 'shoe prices' etc.</p>
<p>As a final result, we want to have a DataFrame that looks something like this: </p>
<table>
<thead>
<tr>
<th>Campaign</th>
<th>Ad Group</th>
<th>Keyword</th>
<th>Criterion Type</th>
</tr>
</thead>
<tbody>
<tr>
<td>Campaign1</td>
<td>AdGroup_1</td>
<td>keyword 1a</td>
<td>Exact</td>
</tr>
<tr>
<td>Campaign1</td>
<td>AdGroup_1</td>
<td>keyword 1a</td>
<td>Phrase</td>
</tr>
<tr>
<td>Campaign1</td>
<td>AdGroup_1</td>
<td>keyword 1b</td>
<td>Exact</td>
</tr>
<tr>
<td>Campaign1</td>
<td>AdGroup_1</td>
<td>keyword 1b</td>
<td>Phrase</td>
</tr>
<tr>
<td>Campaign1</td>
<td>AdGroup_2</td>
<td>keyword 2a</td>
<td>Exact</td>
</tr>
<tr>
<td>Campaign1</td>
<td>AdGroup_2</td>
<td>keyword 2a</td>
<td>Phrase</td>
</tr>
</tbody>
</table>
<p>The first step is to come up with a list of words that users might use to express their desire in buying low-cost shoes. Some have already been suggested above so they will be used here</p>

In [50]:
# List of words to pair with products
words = ['buy', 'price', 'discount', 'promotion', 'promo', 'shop', 'sale']

# Print list of words
words

['buy', 'price', 'discount', 'promotion', 'promo', 'shop', 'sale']

## 2. Combine the words with the product names
<p>Coming up with all the possible combinations of keywords can be difficult! But not for the analyst that knows how to translate campaign briefs into Python data structures and can think the resulting DataFrames that they need to create.</p>
<p>Now that they have brainstormed the words that work well with the summary that they received, it is now time to combine the keywords with the product names to generate meaningful search keywords that will be useful to the client company. The analyst/data scientist want to combine every word with every product once before, and once after, as seen in the example above.</p>
<p>As a quick recap, for the product 'shoes' and the words 'buy' and 'price' for example, we would want to generate the following combinations: </p>
<p>buy shoes<br>
shoe prices<br>
prices shoes<br>
shoes buy<br>
…  </p>
<p>and so on for all the words and products that we have.</p>

In [51]:
products = ['running shoes', 'walking shoes', 'cleats', 'basketball shoes', 'trail shoes', 'tennis shoes', 'shoes']

# Create an empty list for the keywords
keywords_list = []

# Loop through products
for product in products:
    # Loop through words
    for word in words:
        # Append combinations
        keywords_list.append([product, product + ' ' + word])
        keywords_list.append([product, word + ' ' + product])
        
# Inspect keyword list
from pprint import pprint
pprint(keywords_list)

[['running shoes', 'running shoes buy'],
 ['running shoes', 'buy running shoes'],
 ['running shoes', 'running shoes price'],
 ['running shoes', 'price running shoes'],
 ['running shoes', 'running shoes discount'],
 ['running shoes', 'discount running shoes'],
 ['running shoes', 'running shoes promotion'],
 ['running shoes', 'promotion running shoes'],
 ['running shoes', 'running shoes promo'],
 ['running shoes', 'promo running shoes'],
 ['running shoes', 'running shoes shop'],
 ['running shoes', 'shop running shoes'],
 ['running shoes', 'running shoes sale'],
 ['running shoes', 'sale running shoes'],
 ['walking shoes', 'walking shoes buy'],
 ['walking shoes', 'buy walking shoes'],
 ['walking shoes', 'walking shoes price'],
 ['walking shoes', 'price walking shoes'],
 ['walking shoes', 'walking shoes discount'],
 ['walking shoes', 'discount walking shoes'],
 ['walking shoes', 'walking shoes promotion'],
 ['walking shoes', 'promotion walking shoes'],
 ['walking shoes', 'walking shoes prom

## 3. Convert the list of lists into a DataFrame
<p>Now we want to convert this list of lists into a DataFrame so we can easily manipulate it and manage the final output.</p>

In [52]:
# Load library
import pandas as pd

# Create a DataFrame from list
keywords_df = pd.DataFrame.from_records(keywords_list)

# Print the keywords DataFrame to explore it
print(keywords_df)

                0                       1
0   running shoes       running shoes buy
1   running shoes       buy running shoes
2   running shoes     running shoes price
3   running shoes     price running shoes
4   running shoes  running shoes discount
..            ...                     ...
93          shoes             promo shoes
94          shoes              shoes shop
95          shoes              shop shoes
96          shoes              shoes sale
97          shoes              sale shoes

[98 rows x 2 columns]


## 4. Rename the columns of the DataFrame
<p>Before we can upload this table of keywords, we will need to give the columns meaningful names. If we inspect the DataFrame we just created above, we can see that the columns are currently named <code>0</code> and <code>1</code>. <code>Ad Group</code> (example: "shoes") and <code>Keyword</code> (example: "shoes buy") are much more appropriate names.</p>

In [53]:
# Rename the columns of the DataFrame
keywords_df = keywords_df.rename(columns={0: 'Ad Group', 1: 'Keyword'})

## 5. Add a campaign column
<p>Now we need to add some additional information to our DataFrame. 
We need a new column called <code>Campaign</code> for the campaign name. We want campaign names to be descriptive of our group of keywords and products, so let's call this campaign 'SEM_Shoes'.</p>

In [54]:
# Add a campaign column
keywords_df['Campaign'] = 'SEM_Shoes'

## 6. Create the match type column
<p>There are different keyword match types. One is exact match, which is for matching the exact term or are close variations of that exact term. Another match type is broad match, which means ads may show on searches that include misspellings, synonyms, related searches, and other relevant variations.</p>
<p>Straight from Google's AdWords <a href="https://support.google.com/google-ads/answer/2497836?hl=en">documentation</a>:</p>
<blockquote>
  <p>In general, the broader the match type, the more traffic potential that keyword will have, since your ads may be triggered more often. Conversely, a narrower match type means that your ads may show less often—but when they do, they’re likely to be more related to someone’s search.</p>
</blockquote>
<p>Since the client is tight on budget, we want to make sure all the keywords are in exact match at the beginning.</p>

In [55]:
# Add a criterion type column
keywords_df['Criterion Type'] = 'Exact'

## 7. Duplicate all the keywords into 'phrase' match
<p>The great thing about exact match is that it is very specific, and we can control the process very well. The tradeoff, however, is that:  </p>
<ol>
<li>The search volume for exact match is lower than other match types</li>
<li>We can't possibly think of all the ways in which people search, and so, we are probably missing out on some high-quality keywords.</li>
</ol>
<p>So it's good to use another match called <em>phrase match</em> as a discovery mechanism to allow our ads to be triggered by keywords that include our exact match keywords, together with anything before (or after) them.</p>
<p>Later on, when we launch the campaign, we can explore with modified broad match, broad match, and negative match types, for better visibility and control of our campaigns.</p>

In [56]:
# Make a copy of the keywords DataFrame
keywords_phrase = keywords_df.copy()

# Change criterion type match to phrase
keywords_phrase['Criterion Type'] = 'Phrase'

# Append the DataFrames
keywords_df_final = keywords_df.append(keywords_phrase)

  keywords_df_final = keywords_df.append(keywords_phrase)


## 8. Save and summarize!
<p>To upload our campaign, we need to save it as a CSV file. Then we will be able to import it to AdWords editor or BingAds editor. There is also the option of pasting the data into the editor if we want, but having easy access to the saved data is great so let's save to a CSV file!</p>
<p>Looking at a summary of our campaign structure is good now that we've wrapped up our keyword work. We can do that by grouping by ad group and criterion type and counting by keyword. This summary shows us that we assigned specific keywords to specific ad groups, which are each part of a campaign. In essence, we are telling Google (or Bing, etc.) that we want any of the words in each ad group to trigger one of the ads in the same ad group. Separately, we will have to create another table for ads, which is a task for another day and would look something like this:</p>
<table>
<thead>
<tr>
<th>Campaign</th>
<th>Ad Group</th>
<th>Headline 1</th>
<th>Headline 2</th>
<th>Description</th>
<th>Final URL</th>
</tr>
</thead>
<tbody>
<tr>
<td>SEM_Shoes</td>
<td>Shoes</td>
<td>Looking for Quality Cleats?</td>
<td>Explore Our Massive Collection</td>
<td>30-day Returns With Free Delivery Within the US. Start Shopping Now</td>
<td>DataCampShoes.com/cleats</td>
</tr>
<tr>
<td>SEM_Shoes</td>
<td>Shoes</td>
<td>Looking for Affordable Shoes?</td>
<td>Check Out Our Weekly Offers</td>
<td>30-day Returns With Free Delivery Within the US. Start Shopping Now</td>
<td>DataCampShoes.com/shoes</td>
</tr>
<tr>
<td>SEM_Shoes</td>
<td>Running Shoes</td>
<td>Looking for Quality Running Shoes?</td>
<td>Explore Our Massive Collection</td>
<td>30-day Returns With Free Delivery Within the US. Start Shopping Now</td>
<td>DataCampShoes.com/recliners</td>
</tr>
<tr>
<td>SEM_shoes</td>
<td>Walking Shoes</td>
<td>Need Affordable Walking Shoes?</td>
<td>Check Out Our Weekly Offers</td>
<td>30-day Returns With Free Delivery Within the US. Start Shopping Now</td>
<td>DataCampShoes.com/recliners</td>
</tr>
</tbody>
</table>
<p>Together, these tables get us the sample <strong>keywords -> ads -> landing pages</strong> mapping shown in the diagram below.</p>
<p><img src="https://assets.datacamp.com/production/project_400/img/kwds_ads_lpages.png" alt="Keywords-Ads-Landing pages flow"></p>

In [57]:
# Save the final keywords to a CSV file
keywords_df_final.to_csv('keywords_shoes.csv', index=False)

# View a summary of our campaign work
summary = keywords_df_final.groupby(['Ad Group', 'Criterion Type'])['Keyword'].count()
print(summary)

Ad Group          Criterion Type
basketball shoes  Exact             14
                  Phrase            14
cleats            Exact             14
                  Phrase            14
running shoes     Exact             14
                  Phrase            14
shoes             Exact             14
                  Phrase            14
tennis shoes      Exact             14
                  Phrase            14
trail shoes       Exact             14
                  Phrase            14
walking shoes     Exact             14
                  Phrase            14
Name: Keyword, dtype: int64


In [58]:
# 9 Create Dummy Dataset

In [59]:
# Generating CTR, Conversion Rate, and CPC for each keyword in the extracted list
# Using only the combined keywords from the list for simplicity
combined_keywords = [kw[1] for kw in keywords_list]

# Creating dummy data for these keywords
np.random.seed(0)  # Ensuring reproducibility
ctr = np.random.rand(len(combined_keywords)) * 100  # CTR in percentage
conversion_rate = np.random.rand(len(combined_keywords)) * 100  # Conversion rate in percentage
cpc = np.random.rand(len(combined_keywords)) * 10  # Cost per click in dollars

# Combining into a DataFrame
keywords_data = pd.DataFrame({
    'Keyword': combined_keywords,
    'CTR': ctr,
    'Conversion_Rate': conversion_rate,
    'CPC': cpc
})

keywords_data.head()

Unnamed: 0,Keyword,CTR,Conversion_Rate,CPC
0,running shoes buy,54.88135,82.894003,2.274146
1,buy running shoes,71.518937,0.469548,2.543565
2,running shoes price,60.276338,67.781654,0.580292
3,price running shoes,54.488318,27.000797,4.344166
4,running shoes discount,42.36548,73.519402,3.117959


In [60]:
keywords_data.tail()

Unnamed: 0,Keyword,CTR,Conversion_Rate,CPC
93,promo shoes,71.63272,20.984375,6.394725
94,shoes shop,28.940609,18.619301,3.685846
95,shop shoes,18.319136,94.437239,1.369003
96,shoes sale,58.651293,73.95508,8.221177
97,sale shoes,2.010755,49.045881,1.898479


In [61]:
# Proceeding with the performance prediction

# Splitting the data into training and test sets
X = keywords_data[['Conversion_Rate', 'CPC']]  # Features: Conversion Rate and CPC
y = keywords_data['CTR']  # Target: CTR
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Training a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Predicting CTR on the test set
y_pred = model.predict(X_test)

# Calculating the mean squared error (MSE) for the model's performance
mse = mean_squared_error(y_test, y_pred)
print(mse)

932.7847312811916


In [62]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import RandomizedSearchCV
import scipy.stats as stats

# Setting up the Random Forest Regressor
rf = RandomForestRegressor(random_state= 1)

# Defining a range of hyperparameters for Random Search
param_dist = {
    'n_estimators': stats.randint(10, 200),  # Randomly chosen number of trees
    'max_depth': [10, 20, 30, None],  # Maximum depth of the tree
    'min_samples_split': stats.randint(2, 11),  # Min number of samples to split a node
    'min_samples_leaf': stats.randint(1, 5)  # Min number of samples at a leaf node
}

# Setting up Random Search with Cross-Validation
random_search = RandomizedSearchCV(estimator=rf, param_distributions=param_dist, n_iter=10, cv=3, 
                                   n_jobs=-1, scoring='neg_mean_squared_error', random_state=0)

# Fitting the Random Search to the data
random_search.fit(X_train, y_train)

# Best parameters and MSE from Random Search
best_params_random = random_search.best_params_
best_mse_random = -random_search.best_score_  # Convert negative MSE to positive

best_params_random, best_mse_random


({'max_depth': None,
  'min_samples_leaf': 4,
  'min_samples_split': 9,
  'n_estimators': 19},
 975.8647284320941)

## 8. Choosing a Model
<p>It can be said that the most ideal model has the lowest MSE, in this case, a linear regression model and a random forest model are tested and the hyperparameters for the random forest model are optimized.</p>
<ol>
<li>Linear Regression Model MSE: 932.7847312811916</li>
<li>Random Forest Model MSE: 975.8647284320941</li>
</ol>
<p>The linear regression appears to have a lower MSE. Therefore, it can be considered the more ideal model of the two. However, it should be noted that both models have a relatively high MSE. This could be due to hyperparameters and or the fact that the data used was randomly generated. It may be too random to show patterns that occur in real world googling patterns. This is perhaps the better explanation as hyperparameters are optimized already.</p>