<div class="alert alert-info">

## Information
- Your solutions to this exercise need to be **submitted by the end of the week**. 
</div>

Consider the following starter code: It computes for each car make (Alfa-Romeo, Audi, BMW, ...) the proportion of cars having a certain exterior color (Beige, Black, Blue ...), and visualizes the results in a stacked bar chart.

```python   

In [140]:
import pandas as pd
import plotly.express as px

In [141]:
df = pd.read_csv("used_cars.csv")

In [142]:
make_color = pd.crosstab(df.make, df.exterior_color, normalize='index')
plotdata = make_color.stack().reset_index(name='proportion')
px.bar(plotdata, x='make', y='proportion', color='exterior_color', barmode='stack', title="Car colors by make")

# Exercise 1

This visualization is intentionally overwhelming and ugly. In this exercise, list the weaknesses of this visualization. Where possible, use terminology from the course (e.g. "proximity", "alignment", "pre-attentive attributes", etc.)

- Figure-Ground Separation
    - there is basically no white space 
    - apart from the white borders of the image there is no background distingzishable from the figure
    - way to much information crowded in the figure

- Pre-Attentive Attributes
    - there is no preattentive attribute
    - with so many different colors for so many different car types and no single color or pattern that pops out

- Encoding Effectiveness
    - to many single car colors to encode each with a separate well distinguishable color
    - y-axis from 0 to 1 $\rightarrow$ hard to interpret, it's not immediately clear how much cars are one color

- Proximity
    - legend is to far away especially for the left part of the figure $\rightarrow$ needs alot of eye movement

- Similarity
    - color encoding does not fit to the actual car color (eg. black cars are shown with a red color in the figure)
    - figure uses too many colors making it hard to distinguish and leads to very similar colors for different car colors that don't necessarily are close to each other

- Color Choice
    - color encoding does not fit to the actual car color
    - similar colors suggest that they are related but they're not necessarily
    - for colorblind people the amount of similar looking colors is probably even higher and harder to distinguish

- Alignment
    - rotated x-axis labels are hard to read

- Other things
    - legend is not fully shown, has scroll function
    - y-axis shows proportion but doesn't show % values
    - order of legend is the other way around
    - no hierarchy, all brands have equal visual weight, no storytelling


# Exercise 2

**Try to make this visualization truthful, useful and beautiful.** For this, you need to find out what is interesting about the data, and redesign the visualization accordingly.

- **Note 1**: you are fully free in your choices. For example, you can
    - change the data processing (e.g. grouping or filtering of colors or car makes)
    - change the plot type
    - change colors, title, layout, fonts, etc. 
    - add annotations
    - etc.
- **Note 2**: Do not only submit your final visualization, but also your initial and intermediate steps and visualizations. This way, I can see your thought process, and give feedback on it.
- **Note 3**: You don’t need to get everything perfect. What matters most is the thinking process, and that you move toward a clearer, more effective result.


#### Preprocessing

In [219]:
import numpy as np
import plotly.graph_objects as go
import cmocean as cm

In [143]:
df.head()

Unnamed: 0,make,model,subtitle,zip_code,longitude,latitude,price,body_type,fuel_type,power,...,environment_badge,exterior_color,exterior_color_detail,interior_color,seats,price_label,seller_type,boost_level,relevance_adjustment,position
0,volkswagen,amarok,3.0 TDI Aventura 4Motion MATRIX-LED,25337,9.69682,53.74876,44900,SUV/Geländewagen/Pickup,Diesel,177.0,...,4 (Grün),Black,Midnight black,Black,5.0,good-price,Commercial,t40,boost,291
1,volkswagen,amarok,Life DC 2.0 TDI 4Motion AUT LED AHK Navi,22761,9.89825,53.56986,43889,Allrad,Diesel,151.0,...,,Metallic,Schwarz,Keine Angabe,,good-price,Commercial,t40,boost,363
2,volkswagen,amarok,Life DC 2.0 TDI 4Motion,24941,9.42538,54.76741,48890,SUV/Geländewagen/Pickup,Diesel,151.0,...,4 (Grün),White,Clear White,Black,5.0,fair-price,Commercial,t50,boost,182
3,volkswagen,amarok,"Pan Americana 3.0 TDI*STANDHZG,LEDER,H&K,AHK*",22761,9.91817,53.56846,64950,SUV/Geländewagen/Pickup,Diesel,177.0,...,4 (Grün),,Midnight Black,Black,,somewhat-expensive,Commercial,t50,boost,131
4,volkswagen,arteon,2.0 TDI Elegance DSG Navi App-Connect SHZ,23560,10.68267,53.84972,24700,Coupé,Diesel,140.0,...,4 (Grün),Black,Deep black perleffekt,Black,5.0,top-price,Commercial,t50,boost,58


In [144]:
df.columns

Index(['make', 'model', 'subtitle', 'zip_code', 'longitude', 'latitude',
       'price', 'body_type', 'fuel_type', 'power', 'gearbox', 'age', 'mileage',
       'fahrzeughalter', 'service_book', 'smoke_free', 'cylinders',
       'engine_size', 'weight', 'co2_emissions', 'environment_badge',
       'exterior_color', 'exterior_color_detail', 'interior_color', 'seats',
       'price_label', 'seller_type', 'boost_level', 'relevance_adjustment',
       'position'],
      dtype='object')

In [145]:
df.make.unique()

array(['volkswagen', 'ford', 'mercedes-benz', 'audi', 'opel', 'renault',
       'bmw', 'skoda', 'hyundai', 'volvo', 'peugeot', 'kia', 'fiat',
       'toyota', 'nissan', 'cupra', 'dacia', 'seat', 'citroen', 'honda',
       'mitsubishi', 'land-rover', 'smart', 'porsche', 'suzuki', 'mini',
       'jeep', 'mazda', 'mg', 'ferrari', 'maserati', 'chevrolet', 'tesla',
       'dodge', 'lexus', 'ds-automobiles', 'iveco', 'alfa-romeo',
       'jaguar', 'subaru'], dtype=object)

In [146]:
df.exterior_color.unique()

array(['Black', 'Metallic', 'White', nan, 'Gray', 'Blue', 'Red', 'Brown',
       'Beige', 'Silver', 'Green', 'Yellow', 'Other', 'Purple', 'Orange',
       'Gold', 'Bronze', '€ 0,-'], dtype=object)

In [147]:
# drop rows with exterior_color '€ 0,-'
df = df[(~df['exterior_color'].isin(['€ 0,-']))]

#### First Idea: Decrease colors and brands

decrease the number of colors used

In [148]:
# Create new color category
def categorize_color(color):
    if color == 'Black':
        return 'Black'
    elif color == 'White':
        return 'White'
    elif color in ['Gray', 'Silver']:
        return 'Gray/Silver'
    else:
        return 'Colors'

df['color_category'] = df['exterior_color'].apply(categorize_color)

# Calculate proportions
counts = df.groupby(['make', 'color_category']).size()
props = (counts / counts.groupby(level=0).sum()).reset_index()
props.columns = ['make', 'color_category', 'proportion']

stacked bar plot

In [149]:
fig = px.bar(props, x='proportion', y='make', color='color_category',
             barmode='stack',
             title='Car Colors by Make',
             color_discrete_map={
                 'Black': '#2c2c2c',
                 'White': '#f0f0f0',
                 'Gray/Silver': '#808080',
                 'Colors': '#ff6b6b'
             },
             category_orders={'color_category': ['Black', 'White', 'Gray/Silver', 'Colors']})

#for trace in fig.data:
#    if trace.name == 'White':
#        trace.marker.line.color = '#666666'
#        trace.marker.line.width = 2

fig.update_yaxes(title='Proportion')
fig.update_xaxes(title='Car Make')
fig.update_layout(legend_title_text= 'Car Color', height=800)
fig.show()

grouped bar plot

In [150]:
fig = px.bar(props, x='make', y='proportion', color='color_category',
             barmode='group',
             title='Car Colors by Make',
             color_discrete_map={
                 'Black': '#2c2c2c',
                 'White': '#f0f0f0',
                 'Gray/Silver': '#808080',
                 'Colors': '#ff6b6b'
             },
             category_orders={'color_category': ['Black', 'White', 'Gray/Silver', 'Colors']})

for trace in fig.data:
    if trace.name == 'White':
        trace.marker.line.color = '#666666'
        trace.marker.line.width = 1

fig.update_yaxes(title='Proportion')
fig.update_xaxes(title='Car Make')
fig.update_layout(legend_title_text= 'Car Color', height=600)
fig.show()

group car makes into categories (highly subjective)

In [151]:
df.make.unique()

array(['volkswagen', 'ford', 'mercedes-benz', 'audi', 'opel', 'renault',
       'bmw', 'skoda', 'hyundai', 'volvo', 'peugeot', 'kia', 'fiat',
       'toyota', 'nissan', 'cupra', 'dacia', 'seat', 'citroen', 'honda',
       'mitsubishi', 'land-rover', 'smart', 'porsche', 'suzuki', 'mini',
       'jeep', 'mazda', 'mg', 'ferrari', 'maserati', 'chevrolet', 'tesla',
       'dodge', 'lexus', 'ds-automobiles', 'iveco', 'alfa-romeo',
       'jaguar', 'subaru'], dtype=object)

In [152]:
def categorize_make(make):
    luxury = ['bmw', 'mercedes-benz', 'audi', 'lexus', 'tesla', 'jaguar', 'maserati']
    sports = ['ferrari', 'porsche', 'cupra', 'dodge']
    suv_truck = ['jeep', 'land-rover', 'iveco', 'chevrolet']
    
    make_lower = make.lower()
    if make_lower in luxury:
        return 'Luxury Car'
    elif make_lower in sports:
        return 'Sports Car'
    elif make_lower in suv_truck:
        return 'SUV/Truck'
    else:
        return 'Economy/Family Car'
    
df['make_category'] = df['make'].apply(categorize_make)

# Calculate proportions by make category
counts_2 = df.groupby(['make_category', 'color_category']).size()
props_2 = (counts_2 / counts_2.groupby(level=0).sum()).reset_index()
props_2.columns = ['make_category', 'color_category', 'proportion']

In [153]:
fig = px.bar(props_2, x='make_category', y='proportion', color='color_category',
             barmode='group',
             title='Car Colors by Vehicle Category',
             labels={'make_category': 'Vehicle Category', 
                     'color_category': 'Color Category',
                     'proportion': 'Proportion'},
             color_discrete_map={
                 'Black': '#2c2c2c',
                 'White': '#f0f0f0',
                 'Gray/Silver': '#808080',
                 'Colors': '#ff6b6b'
             },
             category_orders={
                 'color_category': ['Black', 'White', 'Gray/Silver', 'Colors'],
                 'make_category': ['Economy/Family Car', 'SUV/Truck', 'Luxury Car', 'Sports Car']
             })

fig.update_traces(marker_line_color='#333', marker_line_width=1)
fig.update_yaxes(tickformat='.0%')
fig.update_layout(height=600)
fig.show()

#### Secon Idea: Find top colors

In [None]:
heatmap_data = pd.crosstab(df['make'], df['exterior_color'], normalize='index')

# Sort by Black proportion
heatmap_data = heatmap_data.sort_values('Black', ascending=False)

# Create heatmap
fig = go.Figure(data=go.Heatmap(
    z=heatmap_data.values,
    x=heatmap_data.columns,
    y=heatmap_data.index,
    colorscale='Viridis',
    text=heatmap_data.values,
    texttemplate='%{text:.1%}',
    textfont={"size": 10},
    colorbar=dict(title="Proportion", tickformat='.0%'),
    hovertemplate='<b>%{y}</b><br>%{x}: %{z:.1%}<extra></extra>'
))

fig.update_layout(
    title='Car Color Distribution by Make (Sorted by Black Preference)',
    xaxis_title='Color Category',
    yaxis_title='Car Make',
    height=1000,
    width=1000,
    yaxis=dict(tickfont=dict(size=10))
)

fig.show()

In [None]:
counts = df.groupby(['make', 'exterior_color']).size()
props = counts / counts.groupby(level=0).sum()
top2 = props.reset_index(name='proportion').sort_values('proportion', ascending=False).groupby('make').head(1)

# Plot
fig = px.bar(top2, x='make', y='proportion', color='exterior_color', barmode='group',
             color_discrete_map={
                 'Black': '#2c2c2c',
                 'White': "#f0f0f0",
                 'Gray': '#808080',
                 'Red': "#f82b2b"
             })

for trace in fig.data:
    if trace.name == 'White':
        trace.marker.line.color = '#666666'
        trace.marker.line.width = 1
        
fig.update_yaxes(tickformat='.0%')
fig.update_layout(title='Top Car Color by Make')
fig.show()

#### Third Idea: Find color patterns

In [None]:
# Calculate color proportions per brand
make_color = pd.crosstab(df.make, df.exterior_color, normalize='index')

# Calculate metrics for each brand
metrics = []
for make in make_color.index:
    props = make_color.loc[make]
    
    # Color diversity
    col_div = -np.sum(props[props > 0] * np.log2(props[props > 0]))
    
    # Dominant color info
    dom_prop = props.max()
    dom_color = props.idxmax()
    
    metrics.append({
        'brand': make,
        'entropy': col_div,
        'dominant_proportion': dom_prop * 100,
        'dominant_color': dom_color
    })

metrics_df = pd.DataFrame(metrics)

# Calculate deviation from average
avg_entropy = metrics_df['entropy'].mean()
metrics_df['deviation'] = abs(metrics_df['entropy'] - avg_entropy)

# Sort and get top diverging brands
metrics_df = metrics_df.sort_values('deviation', ascending=False)

In [229]:
fig = px.scatter(
    metrics_df,
    x='entropy',
    y='dominant_proportion',
    text='brand',
    title='Color Diversity vs. Single Color Dominance',
    labels={
        'entropy': 'Color Diversity',
        'dominant_proportion': 'Dominant Color Percentage (%)'
    },
    color='deviation',
    color_continuous_scale='sunsetdark',
    size='deviation',
    hover_data=['dominant_color']
)
fig.update_traces(textposition='top center', textfont_size=9)
fig.update_layout(height=600)
fig.show()

In [216]:
# top diverging brands
top_brands = metrics_df.head(4)['brand'].tolist()
small_multiples_data = []
for brand in top_brands:
    for color, prop in make_color.loc[brand].items():
        small_multiples_data.append({
            'brand': brand,
            'color': color,
            'proportion': prop * 100
        })

small_df = pd.DataFrame(small_multiples_data)

color_map = {
    'Beige': '#d4a574',
    'Black': '#1a1a1a',
    'Blue': '#2563eb',
    'Bronze': '#b87333',
    'Brown': '#92400e',
    'Gold': '#d4af37',
    'Gray': '#808080',
    'Green': '#16a34a',
    'Metallic': '#a0a0a0',
    'Orange': '#ea580c',
    'Other': '#808080',
    'Purple': '#7e22ce',
    'Red': '#dc2626',
    'Silver': '#c0c0c0',
    'White': '#f5f5f5',
    'Yellow': '#fbbf24',
}

fig2 = px.bar(
    small_df,
    x='color',
    y='proportion',
    facet_col='brand',
    facet_col_wrap=2,
    labels={'proportion': 'Percentage (%)', 'color': 'Color'},
    color='color',
    color_discrete_map=color_map,
    height=600
)

for trace in fig2.data:
    if trace.name == 'White':
        trace.marker.line.color = '#666666'
        trace.marker.line.width = 1

# Adjust spacing between subplots
fig2.update_layout(
    margin=dict(t=100),  # Top margin for annotations
    xaxis=dict(domain=[0, 0.45]),      # First column width
    xaxis2=dict(domain=[0.55, 1]),     # Second column width (gap = 0.1)
    xaxis3=dict(domain=[0, 0.45]),     # First column width (second row)
    xaxis4=dict(domain=[0.55, 1]),     # Second column width
    
    yaxis=dict(domain=[0, 0.4]),      # Top row
    yaxis2=dict(domain=[0, 0.4]),     # Top row
    yaxis3=dict(domain=[0.55, 1]),      # Bottom row (gap = 0.1)
    yaxis4=dict(domain=[0.55, 1]),     # Bottom row
    
    title={
        'text': 'Color Distribution: Most Distinctive Brands',
        'x': 0.5,           # Horizontal position (0=left, 0.5=center, 1=right)
        'y': 0.98,          # Vertical position (higher = more at top)
        'xanchor': 'center',
        'yanchor': 'top',
        'font': {'size': 20}
    }
)

fig2.for_each_annotation(lambda a: a.update(
    text=a.text.split("=")[-1].upper(),
    font=dict(size=14),
    y=a.y - 0.06 if a.y < 0.5 else a.y  # Move down only for lower plots
))

fig2.add_annotation(
    text="<b>Brands with strong color preferences</b>",
    xref="paper", yref="paper",
    x=0.15, y=1.1,  # Position above first row
    showarrow=False,
    font=dict(size=14),
    xanchor='center'
)

fig2.add_annotation(
    text="<b>Brands with highest color diversity</b>",
    xref="paper", yref="paper",
    x=0.15, y=0.48,  # Position above second row
    showarrow=False,
    font=dict(size=14),
    xanchor='center'
)
fig2.update_xaxes(tickangle=45)
fig2.show()