Skip to content
forked from A3Data/rony

Data Engineering made simple - An opinionated Data Engineering framework

License

Notifications You must be signed in to change notification settings

gquaresma89/rony

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rony - Data Engineering made simple

PyPI version fury.io Test package License GitHub issues GitHub issues-closed PyPI status PyPI pyversions PyPi downloads

An opinionated Data Engineering framework

Developed with ❤️ by A3Data

What is Rony

Rony is an open source framework that helps Data Engineers setting up more organized code and build, test and deploy data pipelines faster.

Why Rony?

Rony is Hermione's best friend (or so...). This was a perfect choice for naming the second framework released by A3Data, this one focusing on Data Engineering.

In many years on helping companies building their data analytics projects and cloud infrastructure, we acquired a knowledge basis that led to a collection of code snippets and automation procedures that speed things up when it comes to developing data structure and data pipelines.

Some choices we made

Rony relies on top of a few decisions that make sense for the majority of projects conducted by A3Data:

You are free to change this decisions as you wish (that's the whole point of the framework - flexibility).

Installing

Dependencies

  • Python (>=3.6)

Install

pip install -U rony

How do I use Rony?

After installing Rony you can test if the installation is ok by running:

rony info

and you shall see a cute logo. Then,

  1. Create a new project:
rony new project_rony
  1. Rony already creates a virtual environment for the project. Windows users can activate it with
<project_name>_env\Scripts\activate

Linux and MacOS users can do

source <project_name>_env/bin/activate
  1. After activating, you should install some libraries. There are a few suggestions in “requirements.txt” file:
pip install -r requirements.txt
  1. Rony has also some handy cli commands to build and run docker images locally. You can do
cd etl
rony build <image_name>:<tag>

to build an image and run it with

rony run <image_name>:<tag>

In this particular implementation, run.py has a simple etl code that accepts a parameter to filter the data based on the Sex column. To use that, you can do

docker run <image_name>:<tag> -s female

Implementation suggestions

When you start a new rony project, you will find

  • an infrastructure folder with terraform code creating on AWS:

    • an S3 bucket
    • a Lambda function
    • a CloudWatch log group
    • a ECR repository
    • a AWS Glue Crawler
    • IAM roles and policies for lambda and glue
  • an etl folder with:

    • a Dockerfile and a run.py example of ETL code
    • a lambda_function.py with a "Hello World" example
  • a tests folder with unit testing on the Lambda function

  • a .github/workflow folder with a Github Actions CI/CD pipeline suggestion. This pipeline

    • Tests lambda function
    • Builds and runs the docker image
    • Sets AWS credentials
    • Make a terraform plan (but not actually deploy anything)
  • a dags folder with some Airflow example code.f

You also have a scripts folder with a bash file that builds a lambda deploy package.

Feel free to adjust and adapt everything according to your needs.

Contributing

Make a pull request with your implementation.

For suggestions, contact us: rony@a3data.com.br

Licence

Rony is open source and has Apache 2.0 License: License

About

Data Engineering made simple - An opinionated Data Engineering framework

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 62.0%
  • HCL 32.7%
  • Dockerfile 3.8%
  • Shell 1.5%