Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pulse Scalability rocks! - extract Pulse data repository? #353

Closed
cord opened this issue Apr 5, 2024 · 1 comment
Closed

Pulse Scalability rocks! - extract Pulse data repository? #353

cord opened this issue Apr 5, 2024 · 1 comment

Comments

@cord
Copy link

cord commented Apr 5, 2024

While enjoying Pulse as a way to view into my Laravel apps I am also evaluating the Pulse data storage as a generic, versatile storage for Time Series Data.

Time Series Data could be e.g. measurements of IoT Sensors (temperature, humidity,...) but also anything else you want to measure over time and analyse.

For validation I have performed a scalability and performance test for 97M recorded events spread across a virtual timeframe of 4 weeks.

Nice! Versatile key value approach

Due to the key_hash approach for all queries, the value for key does not influence the indexing / performance.

So the key can be used as a path to a sensor e.g. location72.sectionB.sensor42 for recording, retrieval and analytics.

Test Scenario

For the test scenario I have seeded Pulse for 1000 keys and 5 types using the command attached at the bottom.

The entries are randomly spread across 4 weeks to generate a large number of aggregates.

All tests done local on MBP M2, MySql 8. After running the command a couple of times the database looks like this:

table rows
pulse_aggregates 2.3M
pulse_entries 97M

Pulse::Graph

following some tests performed in tinkerwell w/o output of the result.

graph() - default method

graph(1 key)* - #344, supporting keys

The method is called as following:

Pulse::graph(["type:1", "type:3"], "max", CarbonInterval::hours(168));

The modified method is called as following:

Pulse::graph(["type:1", "type:3"], "max", CarbonInterval::hours($hrs), keys: ["key:142", "key:1", "key:789"]]);

Test results

Number of Types Timeframe graph() graph(1 key)* graph(10 keys)*
1 1 30ms / 32MB 24ms / 31MB 24ms / 31MB
1 6 140ms / 34MB 100ms / 31MB 150ms / 31MB
1 24 500ms / 42MB 200ms / 31MB 200ms / 31MB
1 168 500ms / 42MB 220ms / 31MB 220ms / 31MB
2 1 40ms / 32MB
2 6 200ms / 37MB
2 24 1000ms / 51MB
2 168 1000ms / 52MB
3 1 30ms / 32MB
3 6 250ms / 37MB
3 24 1100ms / 52MB
3 168 1100ms / 53MB
4 1 40ms / 32MB
4 6 300ms / 39MB
4 24 x
4 168 x
5 1 50ms / 32MB 25ms / 32MB 30ms / 32MB
5 6 250ms / 39MB 140ms / 32MB 140ms / 32MB
5 24 x 300ms / 32MB 300ms / 32MB
5 168 x 500ms / 32MB 500ms / 32MB

The query performance is always excellent (some ms) due to the index structure.

The allover performance of the default implementation is depending on the number of types and the timeframe.
However above 3 types it runs into "out-of-memory" (128MB) as all results for all keys are loaded into the result collection.

Improvements of #344

The performance of the method of #344 shows a significant improvement in time and memory consumption as only the relevant data for the given keys is extracted into the result collection.

Conclusion

With some minor changes the data storage of Pulse could be used as a powerful storage for Time Series Data adding to the great Laravel stack!

Some ideas

  • adding more analytics features
  • configurable periods for aggegrates
  • configurable trimming periods
  • data retention to archive historic data

Your thoughts?


Command used for seeding Data

namespace App\Console\Commands;
use Illuminate\Console\Command;

class seedData extends Command
{
    /**
     * The name and signature of the console command.
     *
     * @var string
     */
    protected $signature = 'app:seed-data';

    /**
     * The console command description.
     *
     * @var string
     */
    protected $description = 'Pulse Data Seeder';

    /**
     * Execute the console command.
     */
    public function handle()
    {

        $this->numberOfKeys = 1000;
        $this->numberOfTypes = 5;
        $this->numberOfEvents = 10000;
        $this->dateRange = 4 * 7 * 24 * 60 * 60; // spread across 4 weeks

        $keys = collect(range(1, $this->numberOfKeys))->map(function ($number) {
            return  'key:' . $number;
        });

        $types = collect(range(1, $this->numberOfTypes))->map(function ($number) {
            return 'type:' . $number;
        });

        $keys->each(function ($key) use ($types) {
            for ($i = 1; $i <= $this->numberOfEvents; $i++) {
                $types->each(function($type) use ($key) {
                    \Laravel\Pulse\Facades\Pulse::record($type, key: $key, value: rand(-10000, 10000), timestamp: time()-rand(0, $this->dateRange))->avg()->min()->max()->count();
                });
            }
        });
    }
}
@driesvints
Copy link
Member

Hey @cord, thank you for this overview. Could you comment this on the PR so we can keep the conversation focussed in one place?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants