This project sets up a data analytics environment using Trino and Apache Superset with data masking capabilities. It provides a secure way to handle sensitive data while allowing analysis through Superset's visualization tools.
- Trino: A distributed SQL query engine
- Apache Superset: A modern data exploration and visualization platform
- PostgreSQL: Database used by Superset for metadata storage
The Trino server is configured with:
- Coordinator mode enabled
- HTTP port: 8080
- Discovery URI: http://trino:8080
- Node scheduler includes coordinator
Trino is configured with data masking rules (trino/etc/rules.json) that mask sensitive information:
- Masked columns:
- phone_number
- ssn
- credit_card
- address
The masking rules apply to users with the "support" role, replacing sensitive data with "*****".
- All users have access to all catalogs
- The "support" user has SELECT privileges with data masking applied
-
Start the services:
docker-compose up -d
-
Access Superset:
- URL: http://localhost:8088
- Default credentials:
- Username: admin
- Password: admin
-
Configure Trino Connection in Superset:
- Add a new database connection
- Use the following parameters:
{ "connect_args": { "user": "support", "password": "" } }
-
Verify data masking:
- Create a new SQL query in Superset
- Query tables containing sensitive data
- Verify that masked columns show "*****" instead of actual values
-
If Trino is not accessible:
- Check if the container is running:
docker-compose ps - View Trino logs:
docker logs trino - Verify the configuration in
trino/etc/config.properties
- Check if the container is running:
-
If data masking is not working:
- Ensure you're connected as the "support" user
- Verify the rules in
trino/etc/rules.json - Check Trino logs for any access control related errors
- Data masking is applied at the query level
- Sensitive data is never exposed to users with the "support" role
- All connections should be properly authenticated
- Regular audits of access patterns are recommended
- Fork the repository
- Create a feature branch
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
The setup consists of three main components:
- PostgreSQL: The underlying database storing the data
- Trino: Distributed SQL query engine for big data
- Apache Superset: Modern data exploration and visualization platform
- Docker
- Docker Compose
- Git
- Clone the repository:
git clone <repository-url>
cd trino-superset- Start the services:
docker-compose up -d-
Wait for all services to be healthy (this may take a few minutes on first run)
-
Access Superset at http://localhost:8088
- Username: admin
- Password: admin
- Superset: http://localhost:8088
- Trino: http://localhost:8080
- PostgreSQL: localhost:5432
-
Log in to Superset
-
Go to Data → Databases → + Database
-
Select "Trino" from the database options
-
Use these connection details:
SQLAlchemy URI: trino://trino:8080/postgresqlOr fill in the form fields:
- Host: trino
- Port: 8080
- Database Name: postgresql
- Username: (leave empty)
- Password: (leave empty)
-
Additional recommended settings:
- Check "Expose in SQL Lab"
- Check "Allow DML"
- Check "Allow DQL"
docker-compose.yml: Main configuration file for all servicesDockerfile.superset: Custom Superset image with Trino dependenciessuperset_config.py: Superset configurationtrino/etc/catalog/postgresql.properties: Trino connector configuration for PostgreSQLdocker/docker-init.sh: Initialization script for Superset
- Username: postgres
- Password: postgres
- Database: postgres
- Username: admin
- Password: admin
- No authentication configured (development setup)
If you make changes to the Dockerfile or configuration:
docker-compose down -v
docker-compose build
docker-compose up -d# All services
docker-compose logs -f
# Specific service
docker-compose logs -f superset
docker-compose logs -f trino
docker-compose logs -f postgresThe setup uses Docker volumes for data persistence:
postgres_data: PostgreSQL datasuperset_data: Superset configurations and metadatatrino_data: Trino metadata
This is a development setup and is not secured by default. For production:
- Change all default passwords
- Enable authentication for Trino
- Use proper SSL/TLS certificates
- Configure proper security policies
- trino://support@trino:8080/postgresql/public
- {"connect_args":{"user":"support"}}