Skip to content
@flow-php

Flow PHP

img

Flow is a PHP based, strongly typed ETL (Extract Transform Load), asynchronous data processing library with constant memory consumption.

Usage

Extract from the Source, Transform, Load to the Sink.

Usage

<?php

declare(strict_types=1);

use function Flow\ETL\Adapter\Parquet\{from_parquet, to_parquet};
use function Flow\ETL\DSL\{data_frame, lit, ref, sum, to_output};
use Flow\ETL\Filesystem\SaveMode;

require __DIR__ . '/vendor/autoload.php';

data_frame()
    ->read(from_parquet(__FLOW_DATA__ . '/orders_flow.parquet'))
    ->select('created_at', 'total_price', 'discount')
    ->withEntry('created_at', ref('created_at')->cast('date')->dateFormat('Y/m'))
    ->withEntry('revenue', ref('total_price')->minus(ref('discount')))
    ->select('created_at', 'revenue')
    ->groupBy('created_at')
    ->aggregate(sum(ref('revenue')))
    ->sortBy(ref('created_at')->desc())
    ->withEntry('daily_revenue', ref('revenue_sum')->round(lit(2))->numberFormat(lit(2)))
    ->drop('revenue_sum')
    ->write(to_output(truncate: false))
    ->withEntry('created_at', ref('created_at')->toDate('Y/m'))
    ->mode(SaveMode::Overwrite)
    ->write(to_parquet(__FLOW_OUTPUT__ . '/daily_revenue.parquet'))
    ->run();
$ php daily_revenue.php
+------------+---------------+
| created_at | daily_revenue |
+------------+---------------+
|    2023/10 |    206,669.74 |
|    2023/09 |    227,647.47 |
|    2023/08 |    237,027.31 |
|    2023/07 |    240,111.05 |
|    2023/06 |    225,536.35 |
|    2023/05 |    234,624.74 |
|    2023/04 |    231,472.05 |
|    2023/03 |    231,697.36 |
|    2023/02 |    211,048.97 |
|    2023/01 |    225,539.81 |
+------------+---------------+
10 rows

The reasons behind creating this project can be explained in few tweets. To get familiar with basic ETL Api, please look into flow-php/etl repository, everything else is listed below.

Features

  • constant memory consumption
  • caching
  • reading from any data source
  • writing to any data source
  • rich collection of data transformation functions
  • grouping & aggregating
  • remote files processing
  • joins
  • sorting
  • displaying datasets as ASCII table
  • validation against schema

Building blocks

  • DataFrame - Lazy data processing frame.
  • Rows - Immutable colllection of Row objects.
  • Row - Immutable, strongly typed collection of Entry objects.
  • Entry - Immutable, strongly typed object representing cell in a row.
  • Extractor (Reader) - Memory safe, Data Source returning \Generator, yielding Rows to the Pipeline
  • Transformer - Data transformer receiving and returning Rows (in most cases transformer), one instance of Rows at once.
  • Loader (Writer) - Memory safe representation of Data Sink, responsibility of Loader is to write Rows into destination storage, one at time.
  • Pipeline - Interface representing ETL process, each received Rows instanced is pased through all Pipes, also responsible for error handling.
  • Pipe - Loader of Transformer instance existing in Pipes collection.
  • Function - transformation that might happen on a single row, single entry, rows or group of rows

Supported PHP versions

  • 8.1 - ✅
  • 8.2 - ✅
  • 8.3 - ✅

Available Data Types

Available Adapter

Transformation Functions

Flow ETL provides a rich set of official functions to transform data, please find them all in flow-php/etl repository.

Sponsors

Flow PHP is sponsored by:

  • Blackfire - the best PHP profiling and monitoring tool!

Pinned

  1. flow flow Public

    Flow PHP - data processing framework

    PHP 364 22

  2. etl etl Public

    PHP - ETL (Extract Transform Load) data processing library

    PHP 340 21

  3. etl-adapter-doctrine etl-adapter-doctrine Public

    PHP ETL Adapter: Doctrine

    PHP 3 2

  4. etl-adapter-json etl-adapter-json Public

    PHP ETL Adapter: JSON

    PHP 5 3

  5. etl-adapter-xml etl-adapter-xml Public

    PHP ETL Adapter: XML

    PHP 4 2

Repositories

Showing 10 of 30 repositories

Top languages

Loading…

Most used topics

Loading…