# Eksplorasi Data Konjoin

Selamat datang di hari ketiga pada seri webinar _Conjoint Analysis for Retail Business_.

Sebagai ikhtisar, hari ini kita akan membahas beberapa topik seputar:
* eksplorasi data konjoin dengan data industri retail mobil
* eksplorasi data dan atribut

## _Prerequisites_

Pertama, kita memerlukan bantuan beberapa pustaka python berikut.

In [None]:
from itertools import product

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

## Eksplorasi Data

Di sini kita akan menggunakan data konjoin dari sebuah perusahaan retail mobil.

```{note}
Silakan mengunduh data melalui tautan berikut: [🔗](https://drive.google.com/drive/folders/1rfwS32TI0GnChKLkJamBmzwDDvwbEa-I?usp=sharing)
```

Sebelum membangun model, mari kita coba eksplor datanya terlebih dahulu.

In [None]:
sportscar = pd.read_csv("../../data/sportscar_choice/sportscar_choice_long.csv")
sportscar

```{dropdown} Bentuk data di atas adalah bentuk ...
* memanjang
* <s>melebar</s>
```

Berapa banyak responden?

In [None]:
sportscar.resp_id.nunique()

Ada berapa set pertanyaan yang diajukan?

In [None]:
sportscar.ques.nunique()

Berapa banyak alternatif pada satu pertanyaan?

In [None]:
sportscar.alt.nunique()

Berapa banyak pilihan yang dibuat dalam survei tersebut?

In [None]:
sportscar.choice.value_counts()

### Level dan Atribut

Dari data tersebut, ada berapa atribut dan level di masing-masing atribut yang digunakan?

```{figure} ../../assets/sample-sportscar.png
:name: sample-sportscar
5 baris pertama data.
```

In [None]:
sportscar.describe(include="all")

In [None]:
sportscar.segment.value_counts()

In [None]:
sportscar.seat.value_counts()

In [None]:
sportscar.trans.value_counts()

In [None]:
sportscar.convert.value_counts()

In [None]:
sportscar.price.value_counts()

Dari data, terdapat **5 atribut** dengan level masing-masing adalah:
* `segment`: basic, fun, racer
* `seat`: 2, 4, 5
* `trans`: auto, manual
* `convert`: yes, no
* `price`: 35, 30, 40

Selanjutnya, kita coba lihat apa yang responden pilih untuk setiap atribut-atribut tersebut. Untuk itu, kita bisa menghitung berapa banyak mobil dengan fitur tertentu dipilih, yaitu dengan `choice = 1`.

In [None]:
chosen_by_trans = pd.crosstab(sportscar.choice, sportscar.trans)
chosen_by_segment = pd.crosstab(sportscar.choice, sportscar.segment)
chosen_by_seat = pd.crosstab(sportscar.choice, sportscar.seat)
chosen_by_convert = pd.crosstab(sportscar.choice, sportscar.convert)
chosen_by_price = pd.crosstab(sportscar.choice, sportscar.price)

In [None]:
chosen_by_trans

In [None]:
chosen_by_segment

In [None]:
chosen_by_seat

In [None]:
chosen_by_convert

In [None]:
chosen_by_price

In [None]:
chosen_by_trans.loc[1].plot(kind="bar")
plt.show()

In [None]:
chosen_by_segment.loc[1].plot(kind="bar")
plt.show()

In [None]:
chosen_by_seat.loc[1].plot(kind="bar")
plt.show()

In [None]:
chosen_by_convert.loc[1].plot(kind="bar")
plt.show()

In [None]:
chosen_by_price.loc[1].plot(kind="bar")
plt.show()

```{attention}
Apa yang bisa kamu ambil dari visualisasi di atas?
```

### Profil

Mari kita cek apakah setiap responden menerima set pertanyaan dengan alternatif yang sama atau tidak.

In [None]:
sportscar[
    sportscar.duplicated(
        subset=["segment", "trans", "seat", "convert", "price"],
        keep="first"
    )
].sort_values(["resp_id", "ques", "alt"])

In [None]:
sportscar[
    (sportscar.segment == "basic") & (sportscar.seat == 2)
    & (sportscar.trans == "auto") & (sportscar.convert == "no")
    & (sportscar.price == 30)
].sort_values("resp_id")

In [None]:
sportscar[
    (sportscar.segment == "basic") & (sportscar.seat == 2)
    & (sportscar.trans == "auto") & (sportscar.convert == "no")
    & (sportscar.price == 30)
].choice.sum()

In [None]:
sportscar[sportscar.resp_id.isin([188])]

In [None]:
sportscar[
    (sportscar.segment == "fun") & (sportscar.seat == 2)
    & (sportscar.trans == "manual") & (sportscar.convert == "no")
    & (sportscar.price == 30)
].sort_values("resp_id")

In [None]:
sportscar[
    (sportscar.segment == "fun") & (sportscar.seat == 2)
    & (sportscar.trans == "manual") & (sportscar.convert == "no")
    & (sportscar.price == 30)
].shape

In [None]:
sportscar[
    (sportscar.segment == "fun") & (sportscar.seat == 2)
    & (sportscar.trans == "manual") & (sportscar.convert == "no")
    & (sportscar.price == 30)
].choice.sum()

In [None]:
sportscar[
    (sportscar.segment == "basic") & (sportscar.seat == 5)
    & (sportscar.trans == "auto") & (sportscar.convert == "yes")
    & (sportscar.price == 30)
].sort_values("resp_id")

In [None]:
sportscar[
    (sportscar.segment == "basic") & (sportscar.seat == 5)
    & (sportscar.trans == "auto") & (sportscar.convert == "yes")
    & (sportscar.price == 30)
].choice.sum()

In [None]:
sportscar[
    (sportscar.resp_id == 4) & (sportscar.ques == 10)
]

In [None]:
sportscar[
    (sportscar.resp_id == 197) & (sportscar.ques == 1)
]

```{attention}
Apa yang bisa kamu dapatkan dari eksplorasi profil di atas?
```

### Memeriksa Pilihan Responden

Setelah kita mengeksplor bagaimana pilihan responden berdasarkan masing-masing atribut, mari kita periksa pilihan beberapa responden.

In [None]:
sportscar[sportscar.resp_id == 50]

In [None]:
resp_50 = sportscar[sportscar.resp_id == 50]
chosen_by_trans = pd.crosstab(resp_50.choice, resp_50.trans)
chosen_by_segment = pd.crosstab(resp_50.choice, resp_50.segment)
chosen_by_seat = pd.crosstab(resp_50.choice, resp_50.seat)
chosen_by_convert = pd.crosstab(resp_50.choice, resp_50.convert)
chosen_by_price = pd.crosstab(resp_50.choice, resp_50.price)

In [None]:
chosen_by_trans.loc[1].plot(kind="bar")
plt.show()
chosen_by_segment.loc[1].plot(kind="bar")
plt.show()
chosen_by_seat.loc[1].plot(kind="bar")
plt.show()
chosen_by_convert.loc[1].plot(kind="bar")
plt.show()
chosen_by_price.loc[1].plot(kind="bar")
plt.show()

In [None]:
resp_50[resp_50.price == 40]

Mari kita cek ada berapa pilihan yang dibuat oleh responden 50.

In [None]:
sportscar[sportscar.resp_id == 50].choice.sum()

In [None]:
sportscar.groupby("resp_id").agg({"choice": sum}).value_counts()

## Data Konjoin dari Qualtrics

Kita akan coba eksplor data hasil survei konjoin yang disediakan sebagai _template_ melalui tools **Qualtrics**.

```{note}
Silakan mengunduh data melalui tautan berikut:
[🔗](https://drive.google.com/drive/folders/1a_coQ8Ek1jCjghmtPw9g4jBUcaCzkJoo?usp=sharing)
```

In [None]:
qualtrics = pd.read_csv("../../data/Sample Product Optimization (Conjoint) - Numeric.csv", header=0, skiprows=[1, 2])
qualtrics

In [None]:
qualtrics.info()

In [None]:
qualtrics.iloc[:, [0, 1, 2, 3, 4, 5, 6, 7, 8, 21, 22, 23, 24, 25, 26, 27]]

In [None]:
questions = [q for ques in range(1, 8) for q in [ques] * 2]
alternatives = [1, 2] * 7
profiles = {
    "battery_life": [9, 11, 11, 9, 7, 7, 9, 11, 7, 7, 13, 13, 7, 13],
    "camera": ["great", "best", "good", "good", "great", "great", "best", "great", "great", "good", "good", "best", "best", "great"],
    "front_camera": ["notch", "cut-out", "cut-out", "under display", "notch", "notch", "under display", "under display", "cut-out", "cut-out", "notch", "under display", "cut-out", "cut-out"],
    "biometric": ["face", "face+fingerprint", "face", "face+fingerprint", "fingerprint", "face+fingerprint", "face", "face", "face", "fingerprint", "face+fingerprint", "face", "fingerprint", "face"],
    "price": [900, 700, 500, 900, 700, 900, 1100, 1100, 1100, 500, 1300, 1300, 1100, 1300]
}
surveys = pd.DataFrame({"ques": questions, "alt": alternatives, **profiles})
surveys