Skip to content

PyFlink data stream processing utilities ๐Ÿฟ

Notifications You must be signed in to change notification settings

eli64s/pyflink-poc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

5 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation


FlinkFlow

๐Ÿ“ Real-time stream processing wiht PyFlink.

๐Ÿš€ Developed with the software and tools below.

py pyflink aioresponses aiohttp asyncio pack


๐Ÿ“š Table of Contents


๐Ÿ“Overview

FlinkFlow is a repository for building real-time data processing apps with PyFlink.

๐Ÿ”ฎ Feautres

[๐Ÿ“Œ INSERT-PROJECT-FEATURES]


โš™๏ธ Project Structure

.
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ conf
โ”‚ย ย  โ”œโ”€โ”€ conf.toml
โ”‚ย ย  โ””โ”€โ”€ flink-config.yaml
โ”œโ”€โ”€ data
โ”‚ย ย  โ””โ”€โ”€ data.csv
โ”œโ”€โ”€ requirements.txt
โ”œโ”€โ”€ scripts
โ”‚ย ย  โ”œโ”€โ”€ clean.sh
โ”‚ย ย  โ””โ”€โ”€ run.sh
โ”œโ”€โ”€ setup
โ”‚ย ย  โ””โ”€โ”€ setup.sh
โ”œโ”€โ”€ setup.py
โ””โ”€โ”€ src
    โ”œโ”€โ”€ alerts_handler.py
    โ”œโ”€โ”€ consumer.py
    โ””โ”€โ”€ logger.py

6 directories, 12 files

๐Ÿ’ป Modules

Scripts
File Summary
run.sh This code is a Bash script that starts a Flink cluster, submits a PyFlink job, and then stops the Flink cluster.
clean.sh This code is a Bash script that cleans up files and directories related to Python, Jupyter Notebooks, and pytest. It deletes Python cache files, build artifacts, Jupyter notebook checkpoints, and log files.
Src
File Summary
alerts_handler.py This code is a REST API alert handler for the Flink consumer. It buffers alerts and sends them to the API in batches using aiohttp, and serializes them using Apache Avro.
logger.py Logger is a class for the project that provides logging capabilities with colored output and different log levels.
consumer.py This code is a Python script that uses Apache Flink to process streaming data. It creates a StreamExecutionEnvironment, sets the parallelism, time characteristic, and checkpointing mode, and creates a StreamTableEnvironment.

๐Ÿš€ Getting Started

โœ… Prerequisites

Before you begin, ensure that you have the following prerequisites installed:

[๐Ÿ“Œ INSERT-PROJECT-PREREQUISITES]

๐Ÿ’ป Installation

  1. Clone the FlinkFlow repository:
git clone https://github.com/eli64s/FlinkFlow
  1. Change to the project directory:
cd FlinkFlow
  1. Install the dependencies:
pip install -r requirements.txt

๐Ÿค– Using FlinkFlow

python main.py

๐Ÿงช Running Tests

#run tests

๐Ÿ›  Future Development

  • [๐Ÿ“Œ COMPLETED-TASK]
  • [๐Ÿ“Œ INSERT-TASK]
  • [๐Ÿ“Œ INSERT-TASK]

๐Ÿค Contributing

Contributions are always welcome! Please follow these steps:

  1. Fork the project repository. This creates a copy of the project on your account that you can modify without affecting the original project.
  2. Clone the forked repository to your local machine using a Git client like Git or GitHub Desktop.
  3. Create a new branch with a descriptive name (e.g., new-feature-branch or bugfix-issue-123).
git checkout -b new-feature-branch
  1. Make changes to the project's codebase.
  2. Commit your changes to your local branch with a clear commit message that explains the changes you've made.
git commit -m 'Implemented new feature.'
  1. Push your changes to your forked repository on GitHub using the following command
git push origin new-feature-branch
  1. Create a pull request to the original repository. Open a new pull request to the original project repository. In the pull request, describe the changes you've made and why they're necessary. The project maintainers will review your changes and provide feedback or merge them into the main branch.

๐Ÿชช License

This project is licensed under the [๐Ÿ“Œ INSERT-LICENSE-TYPE] License. See the LICENSE file for additional info.


๐Ÿ™ Acknowledgments

[๐Ÿ“Œ INSERT-DESCRIPTION]