# Stvaranje skupa podataka

---

## 1. Uvoz knjižnica i postavke

In [6]:
# Knjižnice
import json
import random
import os
import pandas as pd

In [7]:
# Postavke bilježnice
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

---

## 2. Stvaranje i pregled skupa podataka

In [8]:
# Stvaranje skupa podataka
def process_file(file_path, category_name, sample_size=50000):
    rows = []
    
    with open(file_path, "r", encoding="utf-8") as f:
        
        for line in f:
            data = json.loads(line)
            
            if data.get("main_category") == category_name:
                images = data.get("images", [])
                image_thumb, image_large, image_hi_res = None, None, None
                main_image = next((img for img in images if img.get("variant") == "MAIN"), None)
                
                if main_image:
                    image_thumb = main_image.get("thumb")
                    image_large = main_image.get("large")
                    image_hi_res = main_image.get("hi_res")
                elif images:
                    fallback_image = images[0]
                    image_thumb = fallback_image.get("thumb")
                    image_large = fallback_image.get("large")
                    image_hi_res = fallback_image.get("hi_res")
                
                row = {
                    "category": data.get("main_category"),
                    "title": data.get("title"),
                    "description": " ".join(data.get("description", [])),
                    "image_thumb": image_thumb,
                    "image_large": image_large,
                    "image_hi_res": image_hi_res
                }
                
                rows.append(row)
    
    sampled_rows = random.sample(rows, min(len(rows), sample_size))
    
    return pd.DataFrame(sampled_rows)


input_path = "./datasets/dataset_downloaded/"
output_path = "./datasets/dataset_dirty/dataset_dirty.csv"

categories = {
    "Automotive": "meta_Automotive.jsonl",
    "Baby": "meta_Baby_Products.jsonl",
    "Amazon Home": "meta_Home_and_Kitchen.jsonl",
    "Pet Supplies": "meta_Pet_Supplies.jsonl",
    "Sports & Outdoors": "meta_Sports_and_Outdoors.jsonl"
}

columns = ["category", "title", "description", "image_thumb", "image_large", "image_hi_res"]

all_data = []

for category, filename in categories.items():
    file_path = os.path.join(input_path, filename)
    
    print(f"Obrađujem kategoriju: {category}...")
    category_data = process_file(file_path, category)
    
    all_data.append(category_data)

combined_data = pd.concat(all_data, ignore_index=True)
shuffled_data = combined_data.sample(frac=1, random_state=42).reset_index(drop=True)

os.makedirs(os.path.dirname(output_path), exist_ok=True)
shuffled_data.to_csv(output_path, index=False)

print(f"Skup podataka je spremljen na lokaciji: {output_path}")

Obrađujem kategoriju: Automotive...
Obrađujem kategoriju: Baby...
Obrađujem kategoriju: Amazon Home...
Obrađujem kategoriju: Pet Supplies...
Obrađujem kategoriju: Sports & Outdoors...
Skup podataka je spremljen na lokaciji: ./datasets/dataset_dirty/dataset_dirty.csv


In [9]:
# Uvoz i pregled skupa podataka
dataset = pd.read_csv("./datasets/dataset_dirty/dataset_dirty.csv")
dataset.head(10)

Unnamed: 0,category,title,description,image_thumb,image_large,image_hi_res
0,Automotive,"Funny The More People I Meet The More I Love My Dog Vinyl Sticker Car Decal (6"" Black)","You will receive 1x decals made of the highest quality 651 Oracal Vinyl. All decals are made in the USA and generally ship within 24 hours of purchase. Decal Serpent stands behind its product and will guarantee all decals for 5 years. If you want this decal in another color or size please contact us for customization requests. Our decals can be used on cars, laptops, desktops, house windows, walls, and many other surfaces. Please note, that our decals are die-cut vinyl, not stickers. Application instructions will be included.",https://m.media-amazon.com/images/I/41aTgIj41OL._AC_US40_.jpg,https://m.media-amazon.com/images/I/41aTgIj41OL._AC_.jpg,https://m.media-amazon.com/images/I/51wO6SH3qqL._AC_SL1002_.jpg
1,Baby,The First Years American Red Cross Soothing Baby Scale,"Product Description The First Years American Red Cross Baby Scale let you monitor the health of growing breastfed babies and keep track of their health at home. The contour surface can be used with a blanket to keep the baby comfortable on the scale while the tare function helps auto-deduct that extra weight to give accurate weight measurement. It features a large digital display that shows results in various units like pounds, kilograms, or ounces. It also has a button to show the last results and shuts down automatically when not in use to save battery. From the Manufacturer Parents often seek the reassurance of weighing their baby in between pediatrician visits, but this experience can be intimidating for baby. The First Years American Red Cross Soothing Baby Scale has reliable digital accuracy to + or - 10g 0.022 pounds. The Inviting, contoured design is comfortable for lying or sitting baby max and min capacity is 44 pounds to 0.44 pounds. The additional features include easy read LCD display, memory recall of last weight, tare feature to accommodate a baby blanket.",https://m.media-amazon.com/images/I/31EfatyAmWL._SS40_.jpg,https://m.media-amazon.com/images/I/31EfatyAmWL.jpg,https://m.media-amazon.com/images/I/61aCJTAHF1L._SL1500_.jpg
2,Automotive,Goodyear Ultra Grip Ice WRT Winter Radial Tire - 195/65R15 91S,Goodyear's Ultra Grip Ice WRT offers enhanced winter traction no matter what the winter throws at you. It's got two-dimensional blades in center of the tire to offer stopping and starting power that's rock solid. It also handles wet roads like they were dry. So if you're looking for extreme handling on ice and snow then look no further than the Ultra Grip Ice WRT. The Goodyear Ultra Grip Ice WRT tire has the mountain/snowflake symbol for severe snow service.,https://m.media-amazon.com/images/I/51Mz2e1rl5L._AC_US40_.jpg,https://m.media-amazon.com/images/I/51Mz2e1rl5L._AC_.jpg,https://m.media-amazon.com/images/I/71TmF1AhPLL._AC_SL1200_.jpg
3,Amazon Home,KWO Gingerbread Seller Wood German Christmas Nutcracker Decoration Germany New,,https://m.media-amazon.com/images/I/41iE-bu8gRL._AC_US75_.jpg,https://m.media-amazon.com/images/I/41iE-bu8gRL._AC_.jpg,
4,Pet Supplies,Kingspeed Dog Treat Bag Dog Training Pouch for Treats & Toys Lightweight & Stylish Treat Tote with YKK Zippered Compartments & Mesh Pockets Shoulder & Waist Pet Treat Pouch Red & Black,,https://m.media-amazon.com/images/I/410DVXQ8zxL._AC_US40_.jpg,https://m.media-amazon.com/images/I/410DVXQ8zxL._AC_.jpg,https://m.media-amazon.com/images/I/815teT9H-TL._AC_SL1500_.jpg
5,Amazon Home,PERA Pin Lock MFL Fitting Pin Lock Gas and Liquid Set Pin Lock Quick Disconnects for Pin Lock Keg used for Home Brewing Beer Making,,https://m.media-amazon.com/images/I/31AjQq3RSoL._AC_US75_.jpg,https://m.media-amazon.com/images/I/31AjQq3RSoL._AC_.jpg,https://m.media-amazon.com/images/I/51og27DoGaL._AC_SL1001_.jpg
6,Amazon Home,Baxton Studio Larissa Upholstery Lounge Chair in Taupe,,https://m.media-amazon.com/images/I/41VXYTl9HBL._AC_US75_.jpg,https://m.media-amazon.com/images/I/41VXYTl9HBL._AC_.jpg,https://m.media-amazon.com/images/I/61C0uxQpf7L._AC_SL1000_.jpg
7,Pet Supplies,"Arm & Hammer for Pets Air Care Pet Scents Room Spray for Pets in Lavender Scent | 18 oz Air Freshener Spray for Pet Odors in The Home, Lavender Fields Scent","The Arm & Hammer Pet Scents Room Spray in Lavender Fields is just what you need to neutralize pet odors and keep a fresh-smelling, clean home. This easy to use deodorizing room spray takes care of unwelcome pet smells and odors by using the natural deodorizing power of baking soda plus a calming lavender fragrance. This odor neutralizing room spray freshens the air while absorbing and removing all cat and pet odors. In a scent both humans and pets love, and in a formula that's safe for all, Arm & Hammer gets the job done right. The pet odor eliminator spray from Arm & Hammer tackles all kinds of pet smells to leave you with a home that smells fresh and clean. Easy to use and available in a convenient 18 oz pet spray odor eliminator bottle, this odor neutralizer spray for pets is the perfect complement to your household needs. Great for all pet odors, this pet odor eliminator spray mist absorbs and removes pet smells and odors, leaving your home smelling fresh and just how you want it. Air fresheners like this air purifier are great odor eliminators for pet homes. Great for pet households, this could be a dog odor eliminator for home or an air freshener in place of air freshener spray or a glad plug in or plug in air freshener, air wick, scented candle, room spray, febreze spray, air freshener plug in, charcoal bags odor absorber, lavender candle, glad plugins, pet odor eliminator, room fresheners, automatic air freshener spray, cat odor eliminator, dog odor eliminators for home, air wick refills, room air fresheners.",https://m.media-amazon.com/images/I/31nj93FU0BS._AC_US40_.jpg,https://m.media-amazon.com/images/I/31nj93FU0BS._AC_.jpg,https://m.media-amazon.com/images/I/71iRgRhRFgS._AC_SL1500_.jpg
8,Baby,Nail Trimmer Replacement Pads fit for Cuby Baby Buzz B 12sets,,https://m.media-amazon.com/images/I/31GBkE2kCPL._SS40_.jpg,https://m.media-amazon.com/images/I/31GBkE2kCPL.jpg,https://m.media-amazon.com/images/I/51qYwv-vRxL._SL1100_.jpg
9,Amazon Home,"Gutsdoor Porcelain Espresso Cups with Saucers, 4 oz Cappuccino Cups Set of 6, Demitasse Coffee Cups for Coffee Drinks Latte Tea, Ideal Christmas Gift (Marbled Grey)",,https://m.media-amazon.com/images/I/41-Bo5XVfNL._AC_US75_.jpg,https://m.media-amazon.com/images/I/41-Bo5XVfNL._AC_.jpg,https://m.media-amazon.com/images/I/51QBnV453FL._AC_SL1000_.jpg
