Skip to content

masalinas/poc-minio-parquet-docker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Description

Documentation to Deploy a Minio Server locally with parquet parser activated from Docker

Steps

Create minio network before deploy

docker network create minio-net

Create a deployment file for docker compose like this with the environment variable MINIO_API_SELECT_PARQUET set to on to manage parquet files in minio server

version: '3.8'

services:
  minio:
    container_name: minio_local
    image: minio/minio:latest
    ports:
      - '9010:9000'
      - '9100:9090'
    environment:
      - MINIO_ROOT_USER=${MINIO_ROOT_USER}
      - MINIO_ROOT_PASSWORD=${MINIO_ROOT_PASSWORD}
      - MINIO_CONFIG_ENV_FILE=/etc/config.env
      - MINIO_API_SELECT_PARQUET=on
    volumes:
      - ./data:/mnt/data
      - ./credentials/.env.dev:/etc/config.env
    command: server --console-address ":9090"
    networks:
      - minio-net

networks:
  minio-net:
    external: true

Execute the deployment:

docker-compose up -d

We can access to Minio operator at this url and create a bucket called samples and upload the file people.parquet to test from a python client

http://localhost:9100

Execute this python client to load the parquet file people.parquet

import boto3

s3 = boto3.client('s3',
                  endpoint_url='http://localhost:9010',
                  aws_access_key_id='admin',
                  aws_secret_access_key='password',
                  region_name='us-east-1')

r = s3.select_object_content(
    Bucket='samples',
    Key='people.parquet',
    ExpressionType='SQL',
    Expression="select * from s3object",
    InputSerialization={'Parquet': {}},
    OutputSerialization={'CSV': {}},
)

for event in r['Payload']:
    if 'Records' in event:
        records = event['Records']['Payload'].decode('utf-8')
        print(records)
    elif 'Stats' in event:
        statsDetails = event['Stats']['Details']
        print("Stats details bytesScanned: ")
        print(statsDetails['BytesScanned'])
        print("Stats details bytesProcessed: ")
        print(statsDetails['BytesProcessed'])

About

PoC Minio Docker with parquet parser

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages