### Title: Clustering in Marketing: Unveiling Patterns in Urban Demographics

#### Part A - Slide Contents and Brief Discussion:
- **Clustering U.S. Cities for Marketing Insights**
  - **Concept**: Cluster analysis groups cities based on demographic similarities to unveil patterns useful in marketing research, advertising, and sales strategies.
  - **Application**: By identifying demographically similar clusters, marketers can tailor advertising campaigns and product offerings to meet the specific needs of each group.
  - **Real-World Example**: Suppose a company aims to launch a new product in U.S. cities. Using cluster analysis, they discover four distinct demographic clusters. For Atlanta, with a high percentage of Black population, a different marketing approach is applied compared to a city with a higher Hispanic or Asian demographic. This targeted strategy ensures more effective advertising and better market penetration.

#### Part B1 - MS Excel Practice/Exercise/Steps:
- **Excel File**: `cluster.xlsx`
- **Steps for Standardizing Demographic Attributes**:
  1. Open `cluster.xlsx` and navigate to the sheet with demographic data.
  2. To compute the mean for the Black percentage, enter `=AVERAGE(C10:C58)` in cell C1.
  3. For the standard deviation of Black percentages, use `=STDEV(C10:C58)` in cell C2.
  4. Copy these formulas across D1:G2 to calculate the mean and standard deviation for each demographic attribute.
  5. In cell I10, calculate the standardized percentage of Blacks for Albuquerque by using `=STANDARDIZE(C10, C$1, C$2)`.
  6. Extend this formula from I10 to N58 to compute z-scores for all cities and attributes.
- **Troubleshooting Tips**:
  - Ensure formulas are correctly copied to reflect each attribute's specific column references.
  - Verify the mean and standard deviation calculations by comparing with manual computations for accuracy.

#### Part B2 - Python+SQLite3 Practice:

In [None]:
import pandas as pd
import sqlite3

# Load data from Excel with headers included
data = pd.read_excel('data/cluster.xlsx', sheet_name='cluster', usecols='C:G', skiprows=8, nrows=50)

# Standardize attributes and replace spaces with underscores in column names
data.columns = [column.replace(" ", "_") for column in data.columns]
for column in data.columns:
    data[column] = (data[column] - data[column].mean()) / data[column].std()

# Save to SQLite3 database with modified column names
conn = sqlite3.connect('data/cluster.db')
data.to_sql('cities', conn, if_exists='replace', index=False)

# Function to print DataFrame in a more readable, table-like format
def print_dataframe_sqlite(query, connection):
    df = pd.read_sql(query, connection)
    print(df.to_string(index=False))

# Display the table structure
print("Table Structure:")
cursor = conn.cursor()
cursor.execute("SELECT sql FROM sqlite_master WHERE tbl_name = 'cities' AND type = 'table'")
print(cursor.fetchone()[0])

# Example SQL Query: Load and display data in a readable format
print("\nExample Data Query:")
print_dataframe_sqlite("SELECT * FROM cities LIMIT 5", conn)

conn.close()


### Title: Harnessing Clustering for Strategic Marketing Insights

#### Part A - Slide Contents and Brief Discussion:
- **Understanding Clustering for Market Segmentation**
  - **Concept Overview**: Clustering allows marketers to group cities or consumers based on demographic similarities or preferences, facilitating targeted marketing strategies.
  - **Application in Marketing**: Identifying clusters helps in tailoring marketing campaigns, product development, and distribution strategies to meet the specific needs of different segments.
  - **Real-World Example**: Analyzing moviegoers' ratings for "Fight Club" and "Sea Biscuit" to segment audiences into four distinct preferences groups enables a movie distribution company to customize promotional activities, enhancing audience engagement and maximizing box office returns.

#### Part B1 - MS Excel Practice/Exercise/Steps:
- **Excel File**: `Clustermotivation.xlsx`
- **Steps for Cluster Analysis**:
  1. **Setup Trial Anchors**: In cells H5:H8, input trial values (1-4) representing initial cluster anchors.
  2. **Lookup Cluster Anchors' Names**: Use `=VLOOKUP(H5, A9:N58, 2, FALSE)` in G5 and copy through G8 to identify each cluster center candidate by name.
  3. **Identify Z-Scores for Anchors**: In I5:N8, apply `=VLOOKUP($H5, A9:N58, COLUMN()-6, FALSE)` to find z-scores for each cluster anchor, adjusting COLUMN() as necessary for your setup.
  4. **Compute Squared Distances**: Use `=SUMXMY2($I$5:$N$5, $I10:$N10)` in O10 to calculate the squared distance from Albuquerque to the first cluster anchor. Adjust cell references for subsequent anchors and copy from O10:R10 down to O58:R58.
  5. **Find Minimum Distance**: Enter `=MIN(O10:R10)` in S10 and copy down to S58 to determine the closest cluster anchor for each city.
  6. **Sum of Squared Distances**: Calculate the total squared distance with `=SUM(S10:S58)` in S8.
  7. **Assign Clusters**: In T10, use `=MATCH(S10, O10:R10, 0)` and copy down to T58 to identify the cluster assignment for each city.
- **Troubleshooting Tips**:
  - Ensure correct cell references and formulas are copied accurately.
  - Verify that the Solver settings are correctly configured for the Evolutionary Solver with a 0.5 Mutation rate for optimal performance.

#### Part B2 - Python+SQLite3 Practice:

In [None]:
import pandas as pd
import numpy as np
import sqlite3
from scipy.spatial.distance import cdist

# Load the Excel file
data = pd.read_excel('data/Clustermotivation.xlsx', sheet_name='Sheet1', skiprows=8, nrows=49, usecols='C:G')
data.columns = [c.replace(" ", "_") for c in data.columns]

# Standardize the data
z_scores = (data - data.mean()) / data.std()

# Save standardized data to SQLite
conn = sqlite3.connect('data/clustering.db')
z_scores.to_sql('cities', conn, if_exists='replace', index=False)

# Define a function to calculate squared distances and assign clusters
def assign_clusters(conn, trial_anchors):
    cursor = conn.cursor()
    query = "SELECT * FROM cities"
    cities = pd.read_sql(query, conn)
    anchors = cities.iloc[trial_anchors]
    distances = cdist(cities, anchors, 'sqeuclidean')
    closest_anchor = np.argmin(distances, axis=1) + 1
    min_distances = np.min(distances, axis=1)
    return closest_anchor, min_distances

# Example: Assigning clusters with trial anchors
trial_anchors = [0, 1, 2, 3]  # Example anchor indices
closest_anchor, min_distances = assign_clusters(conn, trial_anchors)
print("Assigned Clusters:", closest_anchor)
print("Minimum Distances:", min_distances)

conn.close()


#### Part B1 - MS Excel Practice/Exercise/Steps:
- Step-by-Step Guide for Conjoint Analysis in Excel
1. Open the `CokePepsi.xlsx` file and navigate to the 'Conjoint Data' worksheet.
2. For each customer (rows AC29:AW160), run a regression using the LINEST function:
- Select a range with five rows and the number of product attributes + 1 column.
- Enter `=LINEST(J6:J25, K6:M25, TRUE, TRUE)` in the first cell of the selected range.
- Press Control+Shift+Enter to apply the array formula.
3. Create a one-way data table for customer numbers (AY11:AY130):
- Enter customer numbers in AY11:AY130.
- Copy `=R12` into AZ10 and extend to BA10:BB10.
- Select the range AY10:BB130, go to Data > What-If Analysis > Data Table, and set $J$3 as the column input cell.
4. Copy the regression results to the 'cluster' worksheet and run a cluster analysis with five clusters.
- Use customers 1–5 as initial anchors for the clusters.
- Troubleshooting Tips:
- Ensure the array formula is entered correctly with Control+Shift+Enter.
- Verify the cell references match the data ranges in your worksheet.
- Check for consistent use of absolute and relative cell references.
#### Part B2 - Python+SQLite3 Practice:
```python
import pandas as pd
import sqlite3
from sklearn.cluster import KMeans
from sklearn.linear_model import LinearRegression
import numpy as np

# Load the Excel file
data = pd.read_excel('data/CokePepsi.xlsx', sheet_name='Conjoint Data', usecols='AC29:AW160')

# Connect to SQLite3 database
conn = sqlite3.connect('CokePepsi.db')
data.to_sql('conjoint_data', conn, if_exists='replace', index=False)

# Function to run regression for each customer and return coefficients
def run_regressions(data):
    coefficients = []
    for index, row in data.iterrows():
        # Assuming the independent variables are in the first three columns
        X = row.iloc[:3].values.reshape(-1, 3)
        y = row.iloc[3]
        model = LinearRegression().fit(X, y)
        coefficients.append(model.coef_)
    return coefficients

# Run the regressions and get coefficients
coefficients = run_regressions(data)

# Perform cluster analysis
kmeans = KMeans(n_clusters=5)
clusters = kmeans.fit_predict(np.array(coefficients))

# Output the cluster results
for i, cluster in enumerate(clusters):
    print(f"Customer {i+1} is in cluster {cluster+1}")
# Close the database connection
conn.close()
```
- Comments:
- The code loads the Excel data into a pandas DataFrame.
- It then creates a SQLite3 database and imports the data.
- A function is defined to run linear regressions for each customer.
- KMeans clustering is performed on the regression coefficients.
- The cluster for each customer is printed out.

In [None]:
import pandas as pd
import sqlite3
from sklearn.cluster import KMeans
from sklearn.linear_model import LinearRegression
import numpy as np

# Load the Excel file
data = pd.read_excel('data/CokePepsi.xlsx', sheet_name='Conjoint Data', usecols='AC29:AW160')

# Connect to SQLite3 database
conn = sqlite3.connect('CokePepsi.db')
data.to_sql('conjoint_data', conn, if_exists='replace', index=False)

# Function to run regression for each customer and return coefficients
def run_regressions(data):
    coefficients = []
    for index, row in data.iterrows():
    # Assuming the independent variables are in the first three columns
        X = row.iloc[:3].values.reshape(-1, 3)
        y = row.iloc[3]
        model = LinearRegression().fit(X, y)
        coefficients.append(model.coef_)
    return coefficients

# Run the regressions and get coefficients
coefficients = run_regressions(data)

# Perform cluster analysis
kmeans = KMeans(n_clusters=5)
clusters = kmeans.fit_predict(np.array(coefficients))

# Output the cluster results
for i, cluster in enumerate(clusters):
    print(f"Customer {i+1} is in cluster {cluster+1}")

# Close the database connection
conn.close()


### Strategic Insights through Market Basket Analysis

#### Part A - Slide Contents and Brief Discussion:
- **Title: Leveraging Market Basket Analysis for Enhanced Retail Strategy**
  - **Market Basket Analysis Overview**
    - Introduction to the concept of analyzing consumer purchase patterns to identify product associations.
    - Significance in marketing for optimizing product placement, inventory management, and promotional strategies.
  - **Application in Real-World Retail**
    - Example: Supermarkets leveraging insights from cereal and banana purchases to optimize product placement, enhancing the likelihood of simultaneous purchases.
    - Lift Calculation: Demonstrates how understanding product purchase combinations can inform strategic decisions, like store layout and cross-promotional offers.

#### Part B1 - MS Excel Practice/Exercise/Steps:
- **Excel File:** `marketbasket.xlsx`
- **Creating Named Ranges:**
  1. **Total Transactions Calculation:**
     - Cell `L7`: Enter `=COUNT(B:B)` to count the number of transactions.
  2. **Computing Product Purchase Fractions:**
     - Cells `L9 to L14`: Use `=COUNTIF(INDIRECT(K9),1)/$L$7` to calculate the fraction of transactions involving each product. Replace `K9` with the appropriate cell references for each product.
  3. **Day of the Week Transactions Fraction:**
     - Cells `L17 to L23`: Apply `=COUNTIF(day_week, K17)/COUNT(day_week)` for each day, adjusting `K17` accordingly to compute the daily transaction fractions.
- **Troubleshooting Tips:**
  - Ensure named ranges are correctly defined for seamless formula copying.
  - Verify cell references and formulas for accuracy in calculations.

#### Part B2 - Python+SQLite3 Practice:
```python
import pandas as pd
import sqlite3

# Load the Excel file
data = pd.read_excel('marketbasket.xlsx', sheet_name='data', usecols='B:H', skiprows=8)

# Creating the SQLite3 database from the loaded Excel file
conn = sqlite3.connect('marketbasket.db')
data.to_sql('transactions', conn, if_exists='replace', index=False)

# Python code to calculate total transactions
total_transactions = pd.read_sql_query('SELECT COUNT(*) as total FROM transactions', conn)

# Computing fractions of transactions involving each product
products = ['vegetables', 'meat', 'milk']  # Example product list
for product in products:
    query = f'''
    SELECT COUNT(*) * 1.0 / (SELECT COUNT(*) FROM transactions) as fraction
    FROM transactions
    WHERE "{product}" = 1
    '''
    fraction = pd.read_sql_query(query, conn)
    print(f'Fraction of transactions involving {product}:', fraction['fraction'][0])

# Calculating day of the week transaction fractions
days = range(1, 8)  # 1=Monday, 7=Sunday
for day in days:
    day_query = f'''
    SELECT COUNT(*) * 1.0 / (SELECT COUNT(*) FROM transactions) as fraction
    FROM transactions
    WHERE day_week = {day}
    '''
    day_fraction = pd.read_sql_query(day_query, conn)
    print(f'Fraction of transactions on day {day}:', day_fraction['fraction'][0])

# Example of calculating lift for meat and vegetables
lift_query = '''
SELECT (SELECT COUNT(*) FROM transactions WHERE meat = 1 AND vegetables = 1) * 1.0 /
       ((SELECT COUNT(*) FROM transactions) * 
       (SELECT COUNT(*) FROM transactions WHERE meat = 1) / (SELECT COUNT(*) FROM transactions) *
       (SELECT COUNT(*) FROM transactions WHERE vegetables = 1) / (SELECT COUNT(*) FROM transactions)) AS lift
'''
lift = pd.read_sql_query(lift_query, conn)
print('Lift for meat and vegetables:', lift['lift'][0])
```
- **Explanation:** This Python script demonstrates the process of loading data from an Excel file into a SQLite3 database, then computing the total number of transactions, fractions of transactions involving specific products, day of the week transaction fractions, and calculating lift for product combinations using SQL queries.

In [None]:
import pandas as pd
import sqlite3

# Load the Excel file
data = pd.read_excel('marketbasket.xlsx', sheet_name='data', usecols='B:H', skiprows=8)

# Creating the SQLite3 database from the loaded Excel file
conn = sqlite3.connect('marketbasket.db')
data.to_sql('transactions', conn, if_exists='replace', index=False)

# Python code to calculate total transactions
total_transactions = pd.read_sql_query('SELECT COUNT(*) as total FROM transactions', conn)

# Computing fractions of transactions involving each product
products = ['vegetables', 'meat', 'milk']  # Example product list
for product in products:
    query = f'''
    SELECT COUNT(*) * 1.0 / (SELECT COUNT(*) FROM transactions) as fraction
    FROM transactions
    WHERE "{product}" = 1
    '''
    fraction = pd.read_sql_query(query, conn)
    print(f'Fraction of transactions involving {product}:', fraction['fraction'][0])

# Calculating day of the week transaction fractions
days = range(1, 8)  # 1=Monday, 7=Sunday
for day in days:
    day_query = f'''
    SELECT COUNT(*) * 1.0 / (SELECT COUNT(*) FROM transactions) as fraction
    FROM transactions
    WHERE day_week = {day}
    '''
    day_fraction = pd.read_sql_query(day_query, conn)
    print(f'Fraction of transactions on day {day}:', day_fraction['fraction'][0])

# Example of calculating lift for meat and vegetables
lift_query = '''
SELECT (SELECT COUNT(*) FROM transactions WHERE meat = 1 AND vegetables = 1) * 1.0 /
       ((SELECT COUNT(*) FROM transactions) * 
       (SELECT COUNT(*) FROM transactions WHERE meat = 1) / (SELECT COUNT(*) FROM transactions) *
       (SELECT COUNT(*) FROM transactions WHERE vegetables = 1) / (SELECT COUNT(*) FROM transactions)) AS lift
'''
lift = pd.read_sql_query(lift_query, conn)
print('Lift for meat and vegetables:', lift['lift'][0])


### Advanced Market Basket Analysis: Triadic Lifts and Marketing Implications

#### Part A - Slide Contents and Brief Discussion:
- **Title: Enhancing Marketing Strategy with Triadic Lift Analysis**
  - **Exploring Triadic Lift Analysis**
    - Introduction to the concept of calculating the lift for three attributes, such as two product categories and a day of the week, to uncover deeper insights into customer purchasing patterns.
    - Application in marketing for identifying specific days and product combinations that significantly affect purchasing behavior, aiding in targeted promotions and inventory management.
  - **Real-World Application Scenario**
    - Example: Analyzing the lift for purchasing baby goods and DVDs on Thursdays helps retailers understand specific customer behaviors, enabling them to tailor promotions, such as special Thursday discounts on baby goods when bought with DVDs, to drive sales.

#### Part B1 - MS Excel Practice/Exercise/Steps:
- **Excel File:** `marketbasketoptimize.xls`
- **Calculating Three-Way Lifts:**
  1. **Actual Transactions Calculation:**
     - Cell `Q14`: Use array formula `=SUM((INDIRECT(P13)=$P$14)*(INDIRECT(N13)=1)*(INDIRECT(O13)=1))` to compute actual transactions for chosen combinations (e.g., baby goods and vegetables on Friday).
  2. **Predicted Transactions Calculation:**
     - Cell `R14`: Formula `=IF(N13<>O13, VLOOKUP(N13, K9:L14,2, FALSE)*L7*VLOOKUP(O13, K9:L14,2, FALSE)*VLOOKUP(P14, K17:L23,2),0)` calculates predicted transactions assuming independence between the variables.
  3. **Lift Computation:**
     - Cell `S14`: Compute lift with `=IF(R14=0,1,Q14/R14)`, indicating how much more (or less) frequently the selected combination occurs compared to what would be expected if they were independent.
  4. **Optimizing the Three-Way Lift**
     - In an actual situation with many products, there would be a huge number of threewaylifts. For example, with 1,000 products, you can expect 1,0003 = 1 billion threeway lifts! Despite this, a retailer is often interested in fi nding the largest three-way lifts. Intelligent use of the Evolutionary Solver can ease this task. To illustrate the basic idea, you can use the Evolutionary Solver to determine the combination of products and day of the week with maximum lift.
        1. Use Evolutionary Solver with the changing cells being the day of the week (cell P14) and an index refl ecting the product classes (cells N12 and O12). Cells N12 and O12 are linked with lookup tables to cells N13:O13. For instance, a 1 in cell N12 makes N13 be vegetables. Figure 29-4 shows the Evolutionary Solver window.
        2. Maximize lift (S14), and then choose N12 and O12 (product classes) to be
        integers between 1 and 6. P14 is an integer between 1 and 7.
        3. Add a constraint that Q14 >= 20 to ensure you count only combinations that
        occur a reasonable number of times.
        4. Set the Mutation Rate to .5.
        You can find the maximum lift combination, as shown in Figure 29-5.

The three-way lift, as shown in Figure 29-5, indicates that roughly 6.32 times more people, as expected under an independence assumption, buy DVDs and baby goods on Thursday. This indicates that on Thursdays placing DVDs (often an impulse purchase) in the baby sections will increase profits.

- **Troubleshooting Tips:**
  - Ensure correct use of array formulas and verify cell references.
  - Check for correct implementation of INDIRECT functions for dynamic data referencing.

#### Part B2 - Python+SQLite3 Practice:
- **Explanation:** This Python script demonstrates how to perform a three-way lift analysis using a SQLite database, providing insights into the complex interplay between product purchases and specific days of the week.

In [None]:
import pandas as pd
import sqlite3

# Load the Excel file into a DataFrame
data = pd.read_excel('marketbasketoptimize.xls', sheet_name='Initial')  # Assume correct sheet name

# Convert DataFrame to SQLite database
conn = sqlite3.connect('marketbasketoptimize.db')
data.to_sql('transactions', conn, if_exists='replace', index=False)

# Function to calculate three-way lift
def calculate_three_way_lift(product1, product2, day):
    total_transactions = pd.read_sql_query("SELECT COUNT(*) as count FROM transactions", conn).iloc[0]['count']
    transactions_day = pd.read_sql_query(f"SELECT COUNT(*) as count FROM transactions WHERE day_week = {day}", conn).iloc[0]['count']
    product1_count = pd.read_sql_query(f"SELECT COUNT(*) as count FROM transactions WHERE {product1} = 1", conn).iloc[0]['count']
    product2_count = pd.read_sql_query(f"SELECT COUNT(*) as count FROM transactions WHERE {product2} = 1", conn).iloc[0]['count']
    both_and_day_count = pd.read_sql_query(f"SELECT COUNT(*) as count FROM transactions WHERE {product1} = 1 AND {product2} = 1 AND day_week = {day}", conn).iloc[0]['count']
    expected_count = (transactions_day / total_transactions) * (product1_count / total_transactions) * (product2_count / total_transactions) * total_transactions
    lift = both_and_day_count / expected_count if expected_count > 0 else 0
    return lift

# Example calculation
lift_veg_baby_thursday = calculate_three_way_lift('vegetables', 'baby', 4)  # Assuming 4 represents Thursday
print(f'Lift for Vegetables and Baby Goods on Thursday: {lift_veg_baby_thursday}')