Flint is a minimalist, agnostic Python framework designed to streamline and standardize data engineering pipelines. By embracing Convention over Configuration, flint eliminates environment friction, absolute path hardcoding, and complex PySpark session management.
- Zero-Config File Discovery: Automatic tree-walking directory resolution anchors your data catalog using your local
pyproject.tomlfile. - Decentralized Catalog: Declare your metadata layouts inside modular, self-contained mini-YAML files.
- Elastic Processing Runtimes: Switch dynamically between Pandas and PySpark execution engines using exactly the same unified interface.
- Interactive CLI Scaffolding: Spin up a new production-ready data directory structure instantly with
flint init.
(Once published to PyPI)
pip install flint-coreOr install it directly from the source repository using Poetry:
poetry add git+[https://github.com/idperez720/data-engineering-exp.git](https://github.com/idperez720/data-engineering-exp.git)Navigate to an empty directory and let the interactive wizard scaffold the workspace conventions:
flint init
Add a specification block inside conf/catalog/sample_dataset.yaml:
customers:
description: "Main production customer data"
format: "csv"
engine: "pandas"
storage_path: "data/sample_table.csv"
Create a Python script or open a Jupyter Notebook inside src/notebooks/ and fetch your data instantly:
from flint_core.core.io import DataLoader
# Autodiscovers your project root boundaries and settings
loader = DataLoader()
# Loads the dataset securely as a Pandas DataFrame
df = loader.load("customers")
df.head()For comprehensive guides, testing architecture deep-dives, and complete API references, visit our documentation site: 👉 http://127.0.0.1:8000/ (Replace with your deployed docs URL, e.g., GitHub Pages)
Distributed under the MIT License. Any modification or distribution (including forks) must include the original copyright notice and liability waiver. See LICENSE for more information.