# Day 3: Data Transformation
### Belgian Brewery Portfolio – Glide Data Analyst Role
**Objective**: Transform the ingested data into a structured format suitable for analysis and visualization.

**Task**:  
1. Translate French and Dutch names (e.g., beer styles, brewery names) into English.
2. Clean and normalize the data (e.g., remove duplicates, handle missing values).
3. Find geographic information of breweries (e.g., latitude, longitude, address).
4. Store the transformed data in Google Cloud Storage or BigQuery for further analysis.

In [None]:
# Read the file data/raw/wiki_be_beers_breweries_provinces.csv
# and print the first few rows with tabulate.
import pandas as pd
from tabulate import tabulate
from pathlib import Path

# Define the root directory of the project
ROOT_DIR = Path.cwd().parent

# Read the file data/raw/wiki_be_beers_breweries_provinces.csv
df = pd.read_csv(ROOT_DIR / "data/raw/wiki_be_beers_breweries_provinces.csv")

# Print the first few rows with tabulate
print(tabulate(df.head(), headers='keys', tablefmt='github', showindex=False)) # type: ignore

In [None]:
# Get the unique values in the "style_name" column
unique_styles = df["style_name"].unique()

# Only keep non-null and string values, then split by "," to handle multiple styles
unique_styles = set()
for style in df["style_name"].dropna():
    styles = style.split(",")
    for s in styles:
        unique_styles.add(s.strip())

# Remove any empty strings, strings that are only "?", or strings with length 1
unique_styles = {s for s in unique_styles if s and s != "?" and len(s) > 1}

# Sort the unique styles for better readability
unique_styles = sorted(unique_styles)

# Print the count and the first 5 unique styles
print(f"Total unique styles: {len(unique_styles)}")
print("First 5 unique styles:")
print(unique_styles[:5])

In [None]:
# Read the file with brewery addresses
brewery_addresses = pd.read_csv(ROOT_DIR / "data/clean/wiki_be_brewery_addresses.csv")

# Display the first few rows of the brewery addresses with tabulate
print(tabulate(brewery_addresses.head(), headers='keys', tablefmt='github', showindex=False))  # type: ignore

In [None]:
# Write to a new file the brewery names without a full_address
brewery_names_without_address = brewery_addresses[brewery_addresses['full_address'].isnull()]['brewery_name']
brewery_names_without_address.to_csv(ROOT_DIR / "data/raw/brewery_names_without_address.csv", index=False)
print(f"Brewery names without address saved to {ROOT_DIR / 'data/raw/brewery_names_without_address.csv'}")