Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply default batchSize based on Loader type #738

Closed
norberttech opened this issue Nov 5, 2023 · 0 comments · Fixed by #788
Closed

Apply default batchSize based on Loader type #738

norberttech opened this issue Nov 5, 2023 · 0 comments · Fixed by #788
Assignees
Labels
core Developer Experience Resolving this issue should improve development experience for the library users.
Milestone

Comments

@norberttech
Copy link
Member

In order to improve DX, Flow should try to detect if the current BatchSize is suitable for a given Loader.

(new Flow())
    ->read(
        Dbal::from_limit_offset(
            $sourceDbConnection,
            'source_dataset_table',
            new OrderBy('id', Order::DESC)
        )
    )
    ->withEntry('id', ref('id')->cast('int'))
    ->withEntry('name', concat(ref('name'), lit(' '), ref('last name')))
    ->drop('last_name')
    ->write(Dbal::to_table_insert($dbConnection, 'flow_dataset_table'))
    ->run();

In this example, batchSize is equal to 1. This means that Flow will try to insert rows into the db, one by one.
This can be easily changed by putting batchSize(1_000) just above write, but it also requires from developer some knowledge about how loaders work internally.

What we can do, is use Optimizer in order to detect current batchSize when Loaders are added, and whenever we notice that batchSize wasn't set, we can automatically apply one.
The exact numbers should be predefined, I think we can start from 1k for each of the following:

  • ElasticSearch
  • Dbal
  • Meilisearch

For the file-based loaders, this is irrelevant, as most of them are writing rows one by one.

@norberttech norberttech added Developer Experience Resolving this issue should improve development experience for the library users. core labels Nov 5, 2023
@norberttech norberttech added this to the 0.5.0 milestone Nov 6, 2023
@norberttech norberttech self-assigned this Nov 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Developer Experience Resolving this issue should improve development experience for the library users.
Projects
Archived in project
1 participant