Skip to content

sujanks/trino-superset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Trino-Superset Integration with Data Masking

This project sets up a data analytics environment using Trino and Apache Superset with data masking capabilities. It provides a secure way to handle sensitive data while allowing analysis through Superset's visualization tools.

Components

  • Trino: A distributed SQL query engine
  • Apache Superset: A modern data exploration and visualization platform
  • PostgreSQL: Database used by Superset for metadata storage

Configuration

Trino Configuration

The Trino server is configured with:

  • Coordinator mode enabled
  • HTTP port: 8080
  • Discovery URI: http://trino:8080
  • Node scheduler includes coordinator

Data Masking Rules

Trino is configured with data masking rules (trino/etc/rules.json) that mask sensitive information:

  • Masked columns:
    • email
    • phone_number
    • ssn
    • credit_card
    • address

The masking rules apply to users with the "support" role, replacing sensitive data with "*****".

Access Control

  • All users have access to all catalogs
  • The "support" user has SELECT privileges with data masking applied

Usage

  1. Start the services:

    docker-compose up -d
  2. Access Superset:

  3. Configure Trino Connection in Superset:

    • Add a new database connection
    • Use the following parameters:
      {
          "connect_args": {
              "user": "support",
              "password": ""
          }
      }
  4. Verify data masking:

    • Create a new SQL query in Superset
    • Query tables containing sensitive data
    • Verify that masked columns show "*****" instead of actual values

Troubleshooting

  1. If Trino is not accessible:

    • Check if the container is running: docker-compose ps
    • View Trino logs: docker logs trino
    • Verify the configuration in trino/etc/config.properties
  2. If data masking is not working:

    • Ensure you're connected as the "support" user
    • Verify the rules in trino/etc/rules.json
    • Check Trino logs for any access control related errors

Security Considerations

  • Data masking is applied at the query level
  • Sensitive data is never exposed to users with the "support" role
  • All connections should be properly authenticated
  • Regular audits of access patterns are recommended

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Architecture

The setup consists of three main components:

  • PostgreSQL: The underlying database storing the data
  • Trino: Distributed SQL query engine for big data
  • Apache Superset: Modern data exploration and visualization platform

Prerequisites

  • Docker
  • Docker Compose
  • Git

Quick Start

  1. Clone the repository:
git clone <repository-url>
cd trino-superset
  1. Start the services:
docker-compose up -d
  1. Wait for all services to be healthy (this may take a few minutes on first run)

  2. Access Superset at http://localhost:8088

    • Username: admin
    • Password: admin

Services and Ports

Connecting Superset to Trino

  1. Log in to Superset

  2. Go to Data → Databases → + Database

  3. Select "Trino" from the database options

  4. Use these connection details:

    SQLAlchemy URI: trino://trino:8080/postgresql
    

    Or fill in the form fields:

    • Host: trino
    • Port: 8080
    • Database Name: postgresql
    • Username: (leave empty)
    • Password: (leave empty)
  5. Additional recommended settings:

    • Check "Expose in SQL Lab"
    • Check "Allow DML"
    • Check "Allow DQL"

Configuration Files

  • docker-compose.yml: Main configuration file for all services
  • Dockerfile.superset: Custom Superset image with Trino dependencies
  • superset_config.py: Superset configuration
  • trino/etc/catalog/postgresql.properties: Trino connector configuration for PostgreSQL
  • docker/docker-init.sh: Initialization script for Superset

Default Credentials

PostgreSQL

  • Username: postgres
  • Password: postgres
  • Database: postgres

Superset

  • Username: admin
  • Password: admin

Trino

  • No authentication configured (development setup)

Development

Rebuilding Services

If you make changes to the Dockerfile or configuration:

docker-compose down -v
docker-compose build
docker-compose up -d

Viewing Logs

# All services
docker-compose logs -f

# Specific service
docker-compose logs -f superset
docker-compose logs -f trino
docker-compose logs -f postgres

Volumes

The setup uses Docker volumes for data persistence:

  • postgres_data: PostgreSQL data
  • superset_data: Superset configurations and metadata
  • trino_data: Trino metadata

Security Notes

This is a development setup and is not secured by default. For production:

  • Change all default passwords
  • Enable authentication for Trino
  • Use proper SSL/TLS certificates
  • Configure proper security policies

Configs

  • trino://support@trino:8080/postgresql/public
  • {"connect_args":{"user":"support"}}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published