### Title: Clustering in Marketing: Unveiling Patterns in Urban Demographics

#### Part A - Slide Contents and Brief Discussion:
- **Clustering U.S. Cities for Marketing Insights**
  - **Concept**: Cluster analysis groups cities based on demographic similarities to unveil patterns useful in marketing research, advertising, and sales strategies.
  - **Application**: By identifying demographically similar clusters, marketers can tailor advertising campaigns and product offerings to meet the specific needs of each group.
  - **Real-World Example**: Suppose a company aims to launch a new product in U.S. cities. Using cluster analysis, they discover four distinct demographic clusters. For Atlanta, with a high percentage of Black population, a different marketing approach is applied compared to a city with a higher Hispanic or Asian demographic. This targeted strategy ensures more effective advertising and better market penetration.

#### Part B1 - MS Excel Practice/Exercise/Steps:
- **Excel File**: `cluster.xlsx`
- **Steps for Standardizing Demographic Attributes**:
  1. Open `cluster.xlsx` and navigate to the sheet with demographic data.
  2. To compute the mean for the Black percentage, enter `=AVERAGE(C10:C58)` in cell C1.
  3. For the standard deviation of Black percentages, use `=STDEV(C10:C58)` in cell C2.
  4. Copy these formulas across D1:G2 to calculate the mean and standard deviation for each demographic attribute.
  5. In cell I10, calculate the standardized percentage of Blacks for Albuquerque by using `=STANDARDIZE(C10, C$1, C$2)`.
  6. Extend this formula from I10 to N58 to compute z-scores for all cities and attributes.
- **Troubleshooting Tips**:
  - Ensure formulas are correctly copied to reflect each attribute's specific column references.
  - Verify the mean and standard deviation calculations by comparing with manual computations for accuracy.

#### Part B2 - Python+SQLite3 Practice:

In [9]:
import pandas as pd
import sqlite3

# Load data from Excel with headers included
data = pd.read_excel('data/cluster.xlsx', sheet_name='cluster', usecols='C:G', skiprows=8, nrows=50)

# Standardize attributes and replace spaces with underscores in column names
data.columns = [column.replace(" ", "_") for column in data.columns]
for column in data.columns:
    data[column] = (data[column] - data[column].mean()) / data[column].std()

# Save to SQLite3 database with modified column names
conn = sqlite3.connect('data/cluster.db')
data.to_sql('cities', conn, if_exists='replace', index=False)

# Function to print DataFrame in a more readable, table-like format
def print_dataframe_sqlite(query, connection):
    df = pd.read_sql(query, connection)
    print(df.to_string(index=False))

# Display the table structure
print("Table Structure:")
cursor = conn.cursor()
cursor.execute("SELECT sql FROM sqlite_master WHERE tbl_name = 'cities' AND type = 'table'")
print(cursor.fetchone()[0])

# Example SQL Query: Load and display data in a readable format
print("\nExample Data Query:")
print_dataframe_sqlite("SELECT * FROM cities LIMIT 5", conn)

conn.close()


Table Structure:
CREATE TABLE "cities" (
"%age_Black" REAL,
  "%age_Hispanic" REAL,
  "%age_Asian" REAL,
  "Median_Age" REAL,
  "Unemployment_rate" REAL
)

Example Data Query:
 %age_Black  %age_Hispanic  %age_Asian  Median_Age  Unemployment_rate
  -1.178721       1.238954   -0.362574    0.061342          -0.751463
   2.355188      -0.764434   -0.452302   -0.439617          -0.751463
  -0.681765       0.510449   -0.272846   -1.441536          -1.495336
   1.913450      -0.825143   -0.452302    0.562301           1.480155
   0.091278      -0.218056   -0.093390   -0.940577          -0.751463


### Title: Harnessing Clustering for Strategic Marketing Insights

#### Part A - Slide Contents and Brief Discussion:
- **Understanding Clustering for Market Segmentation**
  - **Concept Overview**: Clustering allows marketers to group cities or consumers based on demographic similarities or preferences, facilitating targeted marketing strategies.
  - **Application in Marketing**: Identifying clusters helps in tailoring marketing campaigns, product development, and distribution strategies to meet the specific needs of different segments.
  - **Real-World Example**: Analyzing moviegoers' ratings for "Fight Club" and "Sea Biscuit" to segment audiences into four distinct preferences groups enables a movie distribution company to customize promotional activities, enhancing audience engagement and maximizing box office returns.

#### Part B1 - MS Excel Practice/Exercise/Steps:
- **Excel File**: `Clustermotivation.xlsx`
- **Steps for Cluster Analysis**:
  1. **Setup Trial Anchors**: In cells H5:H8, input trial values (1-4) representing initial cluster anchors.
  2. **Lookup Cluster Anchors' Names**: Use `=VLOOKUP(H5, A9:N58, 2, FALSE)` in G5 and copy through G8 to identify each cluster center candidate by name.
  3. **Identify Z-Scores for Anchors**: In I5:N8, apply `=VLOOKUP($H5, A9:N58, COLUMN()-6, FALSE)` to find z-scores for each cluster anchor, adjusting COLUMN() as necessary for your setup.
  4. **Compute Squared Distances**: Use `=SUMXMY2($I$5:$N$5, $I10:$N10)` in O10 to calculate the squared distance from Albuquerque to the first cluster anchor. Adjust cell references for subsequent anchors and copy from O10:R10 down to O58:R58.
  5. **Find Minimum Distance**: Enter `=MIN(O10:R10)` in S10 and copy down to S58 to determine the closest cluster anchor for each city.
  6. **Sum of Squared Distances**: Calculate the total squared distance with `=SUM(S10:S58)` in S8.
  7. **Assign Clusters**: In T10, use `=MATCH(S10, O10:R10, 0)` and copy down to T58 to identify the cluster assignment for each city.
- **Troubleshooting Tips**:
  - Ensure correct cell references and formulas are copied accurately.
  - Verify that the Solver settings are correctly configured for the Evolutionary Solver with a 0.5 Mutation rate for optimal performance.

#### Part B2 - Python+SQLite3 Practice:

In [None]:
import pandas as pd
import numpy as np
import sqlite3
from scipy.spatial.distance import cdist

# Load the Excel file
data = pd.read_excel('data/Clustermotivation.xlsx', sheet_name='Sheet1', skiprows=8, nrows=49, usecols='C:G')
data.columns = [c.replace(" ", "_") for c in data.columns]

# Standardize the data
z_scores = (data - data.mean()) / data.std()

# Save standardized data to SQLite
conn = sqlite3.connect('data/clustering.db')
z_scores.to_sql('cities', conn, if_exists='replace', index=False)

# Define a function to calculate squared distances and assign clusters
def assign_clusters(conn, trial_anchors):
    cursor = conn.cursor()
    query = "SELECT * FROM cities"
    cities = pd.read_sql(query, conn)
    anchors = cities.iloc[trial_anchors]
    distances = cdist(cities, anchors, 'sqeuclidean')
    closest_anchor = np.argmin(distances, axis=1) + 1
    min_distances = np.min(distances, axis=1)
    return closest_anchor, min_distances

# Example: Assigning clusters with trial anchors
trial_anchors = [0, 1, 2, 3]  # Example anchor indices
closest_anchor, min_distances = assign_clusters(conn, trial_anchors)
print("Assigned Clusters:", closest_anchor)
print("Minimum Distances:", min_distances)

conn.close()


#### Part B1 - MS Excel Practice/Exercise/Steps:
- Step-by-Step Guide for Conjoint Analysis in Excel
1. Open the `CokePepsi.xlsx` file and navigate to the 'Conjoint Data' worksheet.
2. For each customer (rows AC29:AW160), run a regression using the LINEST function:
- Select a range with five rows and the number of product attributes + 1 column.
- Enter `=LINEST(J6:J25, K6:M25, TRUE, TRUE)` in the first cell of the selected range.
- Press Control+Shift+Enter to apply the array formula.
3. Create a one-way data table for customer numbers (AY11:AY130):
- Enter customer numbers in AY11:AY130.
- Copy `=R12` into AZ10 and extend to BA10:BB10.
- Select the range AY10:BB130, go to Data > What-If Analysis > Data Table, and set $J$3 as the column input cell.
4. Copy the regression results to the 'cluster' worksheet and run a cluster analysis with five clusters.
- Use customers 1–5 as initial anchors for the clusters.
- Troubleshooting Tips:
- Ensure the array formula is entered correctly with Control+Shift+Enter.
- Verify the cell references match the data ranges in your worksheet.
- Check for consistent use of absolute and relative cell references.
#### Part B2 - Python+SQLite3 Practice:
```python
import pandas as pd
import sqlite3
from sklearn.cluster import KMeans
from sklearn.linear_model import LinearRegression
import numpy as np

# Load the Excel file
data = pd.read_excel('data/CokePepsi.xlsx', sheet_name='Conjoint Data', usecols='AC29:AW160')

# Connect to SQLite3 database
conn = sqlite3.connect('CokePepsi.db')
data.to_sql('conjoint_data', conn, if_exists='replace', index=False)

# Function to run regression for each customer and return coefficients
def run_regressions(data):
    coefficients = []
    for index, row in data.iterrows():
        # Assuming the independent variables are in the first three columns
        X = row.iloc[:3].values.reshape(-1, 3)
        y = row.iloc[3]
        model = LinearRegression().fit(X, y)
        coefficients.append(model.coef_)
    return coefficients

# Run the regressions and get coefficients
coefficients = run_regressions(data)

# Perform cluster analysis
kmeans = KMeans(n_clusters=5)
clusters = kmeans.fit_predict(np.array(coefficients))

# Output the cluster results
for i, cluster in enumerate(clusters):
    print(f"Customer {i+1} is in cluster {cluster+1}")
# Close the database connection
conn.close()
```
- Comments:
- The code loads the Excel data into a pandas DataFrame.
- It then creates a SQLite3 database and imports the data.
- A function is defined to run linear regressions for each customer.
- KMeans clustering is performed on the regression coefficients.
- The cluster for each customer is printed out.

In [None]:
import pandas as pd
import sqlite3
from sklearn.cluster import KMeans
from sklearn.linear_model import LinearRegression
import numpy as np

# Load the Excel file
data = pd.read_excel('data/CokePepsi.xlsx', sheet_name='Conjoint Data', usecols='AC29:AW160')

# Connect to SQLite3 database
conn = sqlite3.connect('CokePepsi.db')
data.to_sql('conjoint_data', conn, if_exists='replace', index=False)

# Function to run regression for each customer and return coefficients
def run_regressions(data):
    coefficients = []
    for index, row in data.iterrows():
    # Assuming the independent variables are in the first three columns
        X = row.iloc[:3].values.reshape(-1, 3)
        y = row.iloc[3]
        model = LinearRegression().fit(X, y)
        coefficients.append(model.coef_)
    return coefficients

# Run the regressions and get coefficients
coefficients = run_regressions(data)

# Perform cluster analysis
kmeans = KMeans(n_clusters=5)
clusters = kmeans.fit_predict(np.array(coefficients))

# Output the cluster results
for i, cluster in enumerate(clusters):
    print(f"Customer {i+1} is in cluster {cluster+1}")

# Close the database connection
conn.close()
