# 3. Minio-to-Azure

This notebook outlines the process of moving a Parquet file, managed as an Iceberg table using the Nessie catalog, from the `grupo-2` bucket in MinIO to a group-specific folder in Azure storage. The workflow employs `dlt` and Iceberg libraries to facilitate the transfer, ensuring data integrity and compatibility with Azure’s storage system. This requires MinIO access, Azure credentials, and the Iceberg library installation.

In [1]:
!pip install pandas pyarrow fsspec s3fs adlfs dlt-hub



In [None]:
import dlt
from dlt.sources.filesystem  import readers
import logging

In [None]:
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s"
)
logger = logging.getLogger("minio_to_azure")

In [None]:
pipeline_config = {
    "pipeline_name": "s3_to_adls",
    "destination": "filesystem",
    "dataset_name": "grupo_2_parquet",
}

try:
    pipeline = dlt.pipeline(**pipeline_config)
    logger.info("Pipeline configured successfully.")
except Exception as e:
    logger.error(f"Error configuring the pipeline: {e}")
    raise

In [None]:
# Define the Parquet reader resource
try:
    parquet_reader = readers(
        bucket_url="s3://grupo-2/grupo_2_parquet/df_data",  # Source bucket in MinIO
        file_glob="*.parquet"                            # Pattern to match Parquet files
    ).read_parquet()  # Read Parquet files
    parquet_reader = parquet_reader.with_name("df_parquet")  # Assign table name
    logger.info(f"Parquet reader configured for bucket: {parquet_reader.bucket_url}")
except Exception as e:
    logger.error(f"Error configuring Parquet reader: {str(e)}")
    raise

In [None]:
try:
    load_info = pipeline.run(
        parquet_reader,
        loader_file_format="parquet",
        write_disposition="replace"
    )
    logger.info("Pipeline execution completed successfully.")
    print(f"Load info: {load_info}")
except Exception as e:
    logger.error(f"Error executing pipeline: {str(e)}")
    raise