# Lowe's Data Cleaning Automation Tool

## Overview

This notebook automates the process of cleaning and transforming data for Lowe's product listings. The objective is to ensure the data meets required quality standards before analysis and reporting.

Key tasks include:
- Loading JSON data
- Handling missing values
- Normalizing nested specifications data
- Rearranging columns for analysis
- Exporting the data to Excel and applying styling

Following has applied to improve efficiency:
- Remove unused imports
- Consolidate DataFrame operations
- Streamline Excel styling

## 1. Import Libraries and Load Data

In [1]:
# Importing necessary libraries
import pandas as pd
from openpyxl import load_workbook
from openpyxl.styles import PatternFill, Border, Side, Alignment, Font
from openpyxl.styles import numbers

In [2]:
# Load the JSON file 
df = pd.read_json("../data/raw/snap_m82yajnf2q1vcwnkpi.json")

## 2. Data Cleaning and Preprocessing

In [4]:
# Fill missing values in combinations/specifications/average_rating
df.combinations = df.combinations.fillna('[{"Manufacturer Color/Finish":" "}]')
df.specifications = df.specifications.fillna('{" ":" "}')
df.average_rating = df.average_rating.fillna(0.0)

In [5]:
# Normalize the nested JSON in specifications
spec_normalized = pd.json_normalize(df['specifications'])

# Concatenate the normalized specifications with original DataFrame
df_new = pd.concat([df, spec_normalized], axis=1)

In [7]:
# Essential columns
essential_columns = ['Item#', 'Model#', 'name', 'category', 'combinations', 'description','regular_retail_price','discounted_retail_price']

# Combine essential columns + the rest
remaining_columns = [col for col in df_new.columns if col not in essential_columns]
selected_columns = essential_columns + remaining_columns
df_new = df_new[selected_columns]

## 3. Export to Excel

In [8]:
df_new.to_excel('../data/processed/lowes_retail_data_3132025.xlsx')

## 4. Load Excel Workbook and Apply Styling

In [9]:
# Load the exported Excel workbook
wb =load_workbook(filename = '../data/processed/lowes_retail_data_3132025.xlsx')

# Select the active worksheet
ws = wb.active

# Apply an auto-filter
ws.auto_filter.ref = ws.dimensions

In [10]:
# Define a header font style
font = Font(size=15, bold=True, italic=False, vertAlign=None, underline='none', strike=False, color='FF000000')

# Define text wrapping alignment
wrap = Alignment(wrapText=True,horizontal='left')

# Define left alignment for cells
left_alignment = Alignment(horizontal='left')

# Define a fill pattern
fill = PatternFill("solid", fgColor="00CCFFCC")

# Define thin vorders for cells
top=Side(border_style='thin',color="FF000000")
bottom=Side(border_style='thin', color="FF000000")
left = Side(border_style='thin', color="FF000000")
right = Side(border_style='thin', color="FF000000")
border=Border(top=top,bottom=bottom,left=left,right=right)

In [11]:
# Get the total number of rows in the worksheet
last_row = ws.max_row

# Set a standard row height for all rows
for i in range(2,last_row+1):
    ws.row_dimensions[i].height = 15

# Apply left alignment and thin borders to every cell
for row in ws.iter_rows(min_row=1, max_row=last_row):
    for cell in row:
        cell.alignment = left_alignment
        cell.border = border

In [12]:
# Format the header row 
for cell in ws["1:1"]:
    cell.font = font
    cell.fill = fill

## 5. Apply Additional Alignments for Specific Columns

In [13]:
# Wrap text in certain columns
wrap_columns = ['D', 'H', 'I', 'O']
for col in wrap_columns:
    for cell in ws[col]:
        cell.alignment = wrap

## 6. Freeze Panes and Set Column Widths

In [14]:
# Freeze panes to keep the header visible when scrolling
ws.freeze_panes = ws["B2"]

In [15]:
# Define column widths
col_widths = {
    "B": 20, "C": 20, "D": 60, "E": 20, "F": 20,
    "G": 20, "H": 60, "I": 60, "J": 20, "K": 20,
    "L": 20, "M": 20, "N": 60, "O": 60, "Q": 60
}

# Apply column widths
for col, width in col_widths.items():
    ws.column_dimensions[col].width = width

## 7. Save the Styled Workbook

In [16]:
wb.save("../data/processed/lowes_retail_data_3132025_styled.xlsx")