Read the documentation: gloe.ideos.com.br
Souce code: github.com/ideos/gloe
Gloe (pronounced /ɡloʊ/, like "glow") is a general-purpose library designed to guide developers in expressing their code as a flow.
Why follow this approach? Because it ensures that Gloe can keep your Python code easy to maintain, document, and test. Gloe guides you to write code in the form of small, safely connected units, rather than relying on scattered functions and classes with no clear relationship.
What is a flow? Formally, a flow is defined as a DAG (Directed Acyclic Graph) with one source and one sink, meaning it has a beginning and an end. In practice, it is a sequence of steps that transform data from one form to another.
Where can I use it? Anywhere! Gloe is particularly useful for data science and machine learning pipelines, as well as for servers, scripts, or any area where there is a gap between code and logical business understanding.
- Write type-safe pipelines with pure Python.
- Express a pipeline as a set of atomic, isolated, extensible and trackable units of responsibility called transformers.
- Validate the input, the output and the changes between them of transformers during execution.
- Mix sync and async code without worrying about its concurrent nature.
- Keep your code readable and maintainable, even for complex flows. ¹
- Use it anywhere without changing your existing workflow.
- Visualize you pipelines and the data flowing through them. ²
- Use a functional approach to work with conditions and collections.
1. Gloe emerged in the context of a large application with extremely complex business logic. After transitioning the code to Gloe, maintenance efficiency improved by 200%, leading the entire team to adopt it widely.
2. This feature is under development.
Requirements:
- Python (>= 3.9)
- typing-extensions (>= 4.7)
You can install Gloe using pip or conda:
# PyPI
pip install gloe
# or conda
conda install -c conda-forge gloe
Consider the following flow. It is part of an e-commerce server and starts from a HTTP request and ends with a list of recommended products.
get_recommendations = (
extract_request_id >>
get_user_by_id >> (
get_last_seen_items,
get_last_ordered_items,
get_user_network_data,
) >>
get_recommended_items
)
Steps of this flow:
- The user ID is extracted from the request.
- The corresponding user data is retrieved using this ID.
- Three pieces of information about the user are gathered: the last seen items, the last ordered items, and the user's network data (such as personal information and relationships).
- These three pieces of information are used to generate a list of recommended items for this specific user.
Can we agree that it is easy to understand just by reading the code?
Each step of a flow is called a transformer and creating one is as easy as:
from gloe import transformer
@transformer
def extract_request_id(req: Request) -> int:
# your logic goes here
You can connect many transformers using the right shift operator, just like the above example. When the argument of >>
is a tuple, you are creating branches using the default parallel gateway.
Learn more about creating transformers, pipelines, and gateways.
When the manager requests you to return the items grouped by department instead of in a flat list, the refactoring process is straightforward:
get_user_recommendations = (
extract_request_id >>
get_user_by_id >> (
get_last_seen_items,
get_last_ordered_items,
get_user_network_data,
) >>
get_recommended_items >>
group_by_department
)
If, by some chance, you connect two transformers with incompatible types, the IDE along with Mypy will warn you about the malformed flow.
For example, suppose you implemented the extract_request_id
transformer returning a string instead of a integer ID:
@transformer
def extract_request_id(request: Request) -> str:
...
But the get_user_by_id
transformer expects an int
as input:
@transformer
def get_user_by_id(user_id: int) -> User:
...
The result will be something like this:
Considering is everything okay about the types, this pipeline can be invoked from a server, for example:
@users_router.get('/:user_id/recommended')
def recommended_items_route(req: Request):
return get_user_recommendations(req)
When you need to document it somewhere, you can just plot it.
Suppose you don't need to extract the user ID within the flow because you are using a web framework that already does it, like FastAPI, you can remove the extract_request_id
transformer. Since the incoming type for get_user_by_id
is integer, the configuration would be:
get_user_recommendations = (
get_user_by_id >> (
get_last_seen_items,
get_last_ordered_items,
get_user_network_data,
) >>
get_recommended_items >>
group_by_department
)
@users_router.get('/{user_id}/recommended')
def recommended_items_route(user_id: int):
return get_user_recommendations(user_id)
We hope the above example illustrates how easily you can identify maintenance points and gain confidence that the rest of the code will continue to working properly, as long as the transformers' interfaces remain satisfied.
Software development has lots of patterns and good practices related to the code itself, like how to document, test, structure and what programming paradigm to use. However, beyond all that, we believe that the key point of a successful software project is a good communication between everyone involved in the development. Of course, this communication is not necessarily restricted to meetings or text messages, it is present also in documentation artifacts and in a well-written code.
When developers write a code, they are telling a story to the next person who will read or/and refactor it. Depending on the code's quality, this story could be quite confusing, with no clear roles of the characters and a messy plot (sometimes with an undesired twist). The next person to maintain the software may take a long time to clearly understand the narrative and make it clear, or they will simply give up, leaving it as is.
Gloe comes to turn this story coherent, logically organized and easy to follow. This is achieved by dividing the code into concise steps with an unambiguous responsibility and explicit interface. Then, Gloe allows you to connect these steps, clarifying their relationship and identifying necessary changes during refactoring. Thus, you can quickly understand the entire story being told and enhance it. Inspired by things like natural transformation and Free Monad (present in Scala and Haskell), Gloe implemented this approach using functional programming and strong typing concepts.
Currently, unlike platforms like Air Flow that include scheduler backends for task orchestration, Gloe's primary purpose is to aid in development. The graph structure aims to make the code more flat and hence readable. However, it is important to note that Gloe does not offer functionalities for executing tasks in a dedicated environment, nor does it directly contribute to execution speed or scalability improvements.