In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("hw3.ipynb")

# HW3: Choice-Based Conjoint Analysis

A sports car manufacturer conducted a conjoint survey where 200 respondents each answered 10 choice questions. In each question, respondents chose between 3 sports car configurations described by:

- **Seats**: 2, 4, or 5
- **Transmission**: Manual or Automatic
- **Convertible**: Yes or No
- **Price**: $30K, $35K, or $40K

The file `hw3_data.csv` contains the survey data, with additional dummy variables for 4-seat, 5-seat, and automatic transmission options.

In [None]:
import polars as pl
import numpy as np
from plotnine import ggplot, aes, geom_line, geom_point, geom_vline, labs, theme_minimal, theme
from xlogit import MultinomialLogit

In [None]:
df = pl.read_csv("hw3_data.csv")
df.head(10)

## Part 1: Model Estimation

We model choice probability using the multinomial logit:

$$P(\text{choice} = j) = \frac{e^{V_j}}{\sum_{k} e^{V_k}}$$

where $V_j = \beta_{seat4} \cdot \mathbb{1}[seat=4] + \beta_{seat5} \cdot \mathbb{1}[seat=5] + \beta_{auto} \cdot \mathbb{1}[auto] + \beta_{conv} \cdot \mathbb{1}[convertible] + \beta_{price} \cdot price$

To estimate the model, we will use the `xlogit` python package. The package requires us to provide a numpy array of covariates `X`, a numpy vector of choices (either 0 or 1) `choice`, a list of varnames, a numpy vector of alternative IDs and a numpy vector of choice IDs (i.e. for each respondent and survey question).

In [None]:
# Prepare data for estimation
varnames = ["seat4", "seat5", "auto", "convert", "price"]
X = df.select(varnames).to_numpy()
choice = df["choice"].to_numpy()
choice_id = df["choice_id"].to_numpy()
alt_id = df["alt"].to_numpy()

In [None]:
model = MultinomialLogit()
model.fit(X=X, y=choice, varnames=varnames, alts=alt_id, ids=choice_id, fit_intercept=False)

# Save the coefficients into a dictionary for later use
coefs = dict(zip(model.coeff_names, model.coeff_))

# Print the model summary
model.summary()

**Question 1 (5 pts):** Which of the estimated coefficients are statistically significant at the 1% level?

Mark true or false on each of the following options.

In [None]:
q1_answer = {
    "seat4_significant": ..., # True or False
    "seat5_significant": ..., # True or False
    "auto_significant": ..., # True or False
    "convertible_significant": ..., # True or False
    "price_significant": ... # True or False
}
q1_answer

## Part 2: Willingness to Pay

Now calculate the willingness to pay for each of the following features:
- 5 seats instead of 2 seats
- Automatic transmission instead of manual
- Convertible instead of non-convertible

Recall that willingness to pay involves the trade-off between how strongly you *value* a feature and how strongly you *dislike* a higher price.

**Question 2 (10 pts):** Calculate the willingness to pay for each feature (in $K). Store the results in `wtp_seat5`, `wtp_auto`, and `wtp_convertible`.

In [None]:
# You can access the coefficients from the regression 

wtp_seat5 = ...
wtp_auto = ...
wtp_convertible = ...

print(f"WTP for convertible: ${wtp_convertible:.2f}K")
print(f"WTP for automatic transmission: ${wtp_auto:.2f}K")
print(f"WTP for 5 seats: ${wtp_seat5:.2f}K")

In [None]:
grader.check("q2")

## Part 3: Market Share and Demand

The power of this model is we can now flexibly predict market shares and demand for different product configurations and prices.

Recall from class that the market share for product $j$ is:

$$Share_j = \frac{e^{V_j}}{\sum_{k} e^{V_k}}$$

(Note that the outside option is not included in this formula since we are only interested in market shares among the offered products, which is consistent with our survey design.)

For convenience, the function below computes market shares for a given set of product attributes and prices.

In [None]:
def utility(attributes, price, coefs):
    """Calculate utility for a product configuration."""
    assert isinstance(attributes, dict), "Attributes must be provided as a dictionary."
    assert attributes.keys() == {"seat", "trans", "convertible"}, "Attributes dictionary must contain 'seat', 'trans', and 'convertible' keys."
    assert attributes["seat"] in [2, 4, 5], "Seat attribute must be 2, 4, or 5."
    assert attributes["trans"] in ["auto", "manual"], "Transmission attribute must be 'auto' or 'manual'."
    assert attributes["convertible"] in ["yes", "no"], "Convertible attribute must be 'yes' or 'no'."    
    v = coefs["price"] * price
    if attributes["seat"] == 4: v += coefs["seat4"]
    elif attributes["seat"] == 5: v += coefs["seat5"]
    if attributes["trans"] == "auto": v += coefs["auto"]
    if attributes["convertible"] == "yes": v += coefs["convert"]
    return v

def market_shares(attributes, prices, coefs):
    """Calculate market shares with outside option."""
    exp_v = [np.exp(utility(attr, price, coefs=coefs)) for attr, price in zip(attributes, prices)]
    denom = sum(exp_v)
    return [e / denom for e in exp_v]

For example, we can compute market shares for the following three products:
1. 4-seat, automatic, non-convertible, $35K
2. 5-seat, automatic, convertible, $40K
3. 2-seat, manual, non-convertible, $32K


In [None]:
attributes = [
    {"seat": 4, "trans": "auto", "convertible": "no"},
    {"seat": 5, "trans": "auto", "convertible": "yes"},
    {"seat": 2, "trans": "manual", "convertible": "no"},
]

prices = [35, 40, 32]

shares = market_shares(attributes, prices, coefs=coefs)
for i, share in enumerate(shares):
    print(f"Product {i+1} market share: {share*100:.2f}%")

## Part 4: Optimal Pricing

You are launching the **Thunderbolt**, a new sports car:
- 4 seats, automatic, non-convertible
- Marginal cost: $25K

The competitor **Speedster** is already in the market:
- 2 seats, manual, convertible
- Price: $35K (fixed)

Total market size: 10,000 potential buyers.



In [None]:
thunderbolt_attrs = {"seat": 4, "trans": "auto", "convertible": "no"}
speedster_attrs = {"seat": 2, "trans": "manual", "convertible": "yes"}
speedster_price = 35 # In thousands of dollars

marginal_cost = 25 # In thousands of dollars
market_size = 10_000
thunderbolt_prices = np.arange(26, 51)

**Question 3 (10 pts):** Compute the demand for the Thunderbolt at each price point from $26K to $50K in increments of $1K. Remember that the price variable we have been using is already in thousands of dollars.

Store the result in `thunderbolt_demand` as a numpy array.

You can use the `market_shares` function defined above to compute the market share for the Thunderbolt at each price point, and then multiply by the total market size to get demand. Remember, the function takes in a list of attribute dictionaries and a list of prices. So you simply need to substitute in the Thunderbolt and Speedster attributes, the Speedster price, and try different candidate prices for the Thunderbolt.

In [None]:
def demand(thunderbolt_prices, thunderbolt_attrs, speedster_attrs, speedster_price, coefs=coefs):
    demand = np.zeros(thunderbolt_prices.shape)
    for i, price in enumerate(thunderbolt_prices):
        attributes = [
            thunderbolt_attrs,
            speedster_attrs
        ]
        price_list = ...
        shares = ...
        thunderbolt_share = shares[0]
        demand[i] = thunderbolt_share * market_size
    return demand

thunderbolt_demand = ...

In [None]:
# Plot the demand curve
demand_df = pl.DataFrame({
    'Price': thunderbolt_prices,
    'Demand': thunderbolt_demand
})

(
    ggplot(demand_df, aes(x='Price', y='Demand'))
    + geom_line()
    + geom_point()
    + labs(title="Demand Curve for Thunderbolt", x="Price ($K)", y="Demand (units)")
    + theme_minimal()
    + theme(figure_size=(10, 6))
)

In [None]:
grader.check("q3")

**Question 4 (10 pts):** Compute the profit at each price point. Store in `thunderbolt_profit` as a numpy array. Then find the profit-maximizing price and store it in `optimal_price`.

In [None]:
thunderbolt_profit = ...
optimal_price = ...

In [None]:
# Plot the profit curve and the optimal price point
profit_df = pl.DataFrame({
    'Price': thunderbolt_prices,
    'Profit': thunderbolt_profit / 1000
})

(
    ggplot(profit_df, aes(x='Price', y='Profit'))
    + geom_line()
    + geom_point()
    + geom_vline(xintercept=optimal_price, linetype='dashed', color='red')
    + labs(title="Profit Curve for Thunderbolt", x="Price ($K)", y="Profit ($M)")
    + theme_minimal()
    + theme(figure_size=(10, 6))
)

In [None]:
grader.check("q4")

## Part 5: Product Design Decision

The head of engineering at the company has proposed making the Thunderbolt a convertible. This would increase the marginal cost and require an upfront investment, but could also increase demand.

Here are the details:
- Additional marginal cost: $700 per unit
- Upfront investment: $8M for retooling and R&D
- Planning horizon: 3 years at 10,000 buyers/year

As head of pricing, you are asked for your input on this business decision.

In [None]:
thunderbolt_conv_attrs = {"seat": 4, "trans": "auto", "convertible": "yes"}
mc_convertible = 25.7  # $25K + $700
upfront = 8000  # $8M in $K
years = 3

**Question 5 (10 pts):** Repeat the demand calculation for the convertible Thunderbolt at each price point from $26K to $50K. Store in `thunderbolt_convertible_demand`.

In [None]:
# You can re-use the demand function from above
thunderbolt_convertible_demand = ...

In [None]:
# Plot the demand curve for the convertible Thunderbolt against the non-convertible version
demand_compare_df = pl.DataFrame({
    'Price': np.tile(thunderbolt_prices, 2),
    'Demand': np.concatenate([thunderbolt_demand, thunderbolt_convertible_demand]),
    'Type': ['Non-convertible'] * len(thunderbolt_prices) + ['Convertible'] * len(thunderbolt_prices)
})

(
    ggplot(demand_compare_df, aes(x='Price', y='Demand', color='Type'))
    + geom_line()
    + geom_point()
    + labs(title="Demand Curve for Thunderbolt: Convertible vs Non-convertible", x="Price ($K)", y="Demand (units)")
    + theme_minimal()
    + theme(figure_size=(10, 6))
)

In [None]:
grader.check("q5")

**Question 6 (10 pts):** Compute the profit curve and profit-maximizing price for the convertible Thunderbolt. Store in `thunderbolt_conv_profit` and `optimal_conv_price`.

In [None]:
# Compute the profit curve and the optimal price for the convertible Thunderbolt
thunderbolt_conv_profit = ...
optimal_conv_price = ...

In [None]:
# Plot the profit curves for the convertible and non-convertible Thunderbolt before upfront cost
profit_compare_df = pl.DataFrame({
    'Price': np.tile(thunderbolt_prices, 2),
    'Profit': np.concatenate([thunderbolt_profit / 1000, thunderbolt_conv_profit / 1000]),
    'Type': ['Non-convertible'] * len(thunderbolt_prices) + ['Convertible'] * len(thunderbolt_prices)
})

vlines_df = pl.DataFrame({
    'xintercept': [optimal_price, optimal_conv_price],
    'Type': ['Non-convertible', 'Convertible']
})

(
    ggplot(profit_compare_df, aes(x='Price', y='Profit', color='Type'))
    + geom_line()
    + geom_point()
    + geom_vline(aes(xintercept='xintercept', color='Type'), data=vlines_df, linetype='dashed')
    + labs(title="Profit Curves for Thunderbolt: Convertible vs Non-convertible (before upfront cost)", x="Price (in $K)", y="Profit ($M)")
    + theme_minimal()
    + theme(figure_size=(10, 6))
)

In [None]:
grader.check("q6")

**Question 7 (5 pts):** Compute the difference in 3-year total profit (after upfront investment) between the convertible and non-convertible versions. Is the convertible upgrade worth the investment? Set `convertible_worth_it` to `True` or `False`.

In [None]:
profit_diff = ...
convertible_worth_it = ...

In [None]:
grader.check("q7")

## Part 6: Segment-Specific Analysis


The survey classified respondents into segments: "basic", "fun", and "racer". The executive team has decided to fully target only the "fun" segment with the Thunderbolt. These represent approximately 2,550 potential buyers in the market.

In [None]:
# Check segment distribution
df.filter(pl.col("alt") == 1).group_by("segment").len()

In [None]:
market_size_fun = market_size * (df['segment'] == 'fun').mean()

**Question 8 (10 pts):** Re-estimate the model using only the "fun" segment. Store the coefficients in `coefs_fun` and calculate the WTP for convertible (`wtp_conv_fun`).

In [None]:
df_fun = df.filter(pl.col("segment") == "fun")

model_fun = MultinomialLogit()
model_fun.fit(
    X=df_fun.select(varnames).to_numpy(),
    y=df_fun["choice"].to_numpy(),
    varnames=varnames,
    alts=df_fun["alt"].to_numpy(),
    ids=df_fun["choice_id"].to_numpy(),
    fit_intercept=False
)
model_fun.summary()

coefs_fun = ...
wtp_conv_fun = ...

print(f"\nWTP for convertible (fun segment): ${wtp_conv_fun:.2f}K")
print(f"WTP for convertible (full sample): ${wtp_convertible:.2f}K")

In [None]:
grader.check("q8")

**Question 9 (10 pts):** Using the "fun" segment coefficients, find the optimal prices for both non-convertible and convertible Thunderbolts. Store results in:
- `optimal_price_fun`: optimal price for non-convertible (using fun coefs)
- `optimal_conv_price_fun`: optimal price for convertible (using fun coefs)

In [None]:
# Compute demand, profit, and optimal price for the fun segment
thunderbolt_demand_fun = ...
profits_fun = ...
optimal_price_fun = ...

# Compute demand, profit, and optimal price for the fun segment convertible
thunderbolt_convertible_demand_fun = ...
thunderbolt_conv_profit_fun = ...
optimal_conv_price_fun = ...

In [None]:
# Plot the new profit curves under the fun segment demand for the convertible and non-convertible Thunderbolt before upfront cost
profit_fun_compare_df = pl.DataFrame({
    'Price': np.tile(thunderbolt_prices, 2),
    'Profit': np.concatenate([profits_fun / 1000, thunderbolt_conv_profit_fun / 1000]),
    'Type': ['Non-convertible'] * len(thunderbolt_prices) + ['Convertible'] * len(thunderbolt_prices)
})

vlines_fun_df = pl.DataFrame({
    'xintercept': [optimal_price_fun, optimal_conv_price_fun],
    'Type': ['Non-convertible', 'Convertible']
})

(
    ggplot(profit_fun_compare_df, aes(x='Price', y='Profit', color='Type'))
    + geom_line()
    + geom_point()
    + geom_vline(aes(xintercept='xintercept', color='Type'), data=vlines_fun_df, linetype='dashed')
    + labs(title="Within Fun Segment Profit: Convertible vs Non-convertible (before upfront cost)", x="Price (in $K)", y="Profit ($M)")
    + theme_minimal()
    + theme(figure_size=(10, 6))
)

In [None]:
grader.check("q9")

**Question 10 (5 pts):** Compute the difference in 3-year total profit (after upfront investment) between the convertible and non-convertible versions with the **fun segment demand**. Is the convertible upgrade worth the investment? Set `convertible_worth_it_fun_segment` to `True` or `False`.

In [None]:
profit_diff_fun = ...
convertible_worth_it_fun_segment = ...

In [None]:
grader.check("q10")

<!-- BEGIN QUESTION -->

**Question 11 (10 pts, Open-Ended):** The convertible upgrade was not worth it using the full sample, but became profitable when targeting the "fun" segment. Explain why accounting for preference heterogeneity across customer segments is important for pricing and product design decisions.

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 12 (5 pts, Open-Ended):** What are some limitations of using the above choice-based conjoint analysis for these pricing decisions?

_Type your answer here, replacing this text._

<!-- END QUESTION -->

---

## Submission

Make sure you have run all cells before submitting.

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

Once you have the zip file, upload the **entire** zip file to Gradescope for grading.

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False, run_tests=True)