# Data Pipeline for Web Shop Scraping

This notebook demonstrates a data pipeline for scraping product information and images from a web shop.

It uses custom modules for:
- Configuration management
- MongoDB interaction
- Image storage
- Product handling
- Web scraping

**Workflow:**
1. Import necessary modules and set up the Python path.
2. Initialize configuration managers and pipeline components.
3. Scrape products and images, render images, and store results in MongoDB.

In [None]:
import sys
sys.path.append(r'c:\Users\ice\projects\iris')

from iris.config.data_pipeline_config_manager import DataPipelineConfigManager
from iris.data_pipeline.mongodb_manager import MongoDBManager
from iris.data_pipeline.image_store_manager import ImageStoreManager
from iris.data_pipeline.product_handler import ProductHandler
from iris.data_pipeline.web_shop_scraper import WebShopScraper

In [None]:
# Initialize configuration manager
config_manager = DataPipelineConfigManager() 
shop_config = config_manager.shop_config
mongodb_config = config_manager.mongodb_config
image_store_config = config_manager.image_store_config

# Initialize MongoDB and image store manager with configurations
mongodb_manager = MongoDBManager(mongodb_config)
image_store_manager = ImageStoreManager(image_store_config)

# Initialize 
web_shop_scraper = WebShopScraper(
    shop_config=shop_config,
    product_handler=ProductHandler(shop_config=shop_config),
)

In [None]:
# Start scraping
with mongodb_manager as db:
    for product, images in web_shop_scraper.scrape():

        print(f"Scraped product: {product.title}, Images: {len(images)}")

        for image in images:
            pil_image = image.render(image_store_manager)

        db.upsert(db.config.product_collection, product)
        db.upsert(db.config.image_metadata_collection, images)