Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage schema improvement #22

Closed
R4scal opened this issue Sep 14, 2018 · 4 comments

Comments

@R4scal
Copy link
Contributor

commented Sep 14, 2018

Now

SELECT
    table,
    partition,
    column,
    formatReadableSize(size) AS size,
    rows,
    days,
    formatReadableSize(avgDaySize) AS avgDaySize
FROM
(
    SELECT
        table,
        partition,
        column,
        sum(column_bytes_on_disk) AS size,
        sum(rows) AS rows,
        min(min_date) AS min_date,
        max(max_date) AS max_date,
        max_date - min_date AS days,
        size / (max_date - min_date) AS avgDaySize
    FROM system.parts_columns
    WHERE active
    GROUP BY
        table,
        partition,
        column
    ORDER BY rows DESC
)

┌─table───────────────────┬─partition─┬─column───────────────┬─size───────┬─────────rows─┬─days─┬─avgDaySize─┐
│ graphite_reverse        │ 201808    │ Path                 │ 105.61 GiB │ 166439821590 │   30 │ 3.52 GiB   │
│ graphite_reverse        │ 201808    │ Date                 │ 1.92 GiB   │ 166439821590 │   30 │ 65.54 MiB  │
│ graphite_reverse        │ 201808    │ Timestamp            │ 515.08 GiB │ 166439821590 │   30 │ 17.17 GiB  │
│ graphite_reverse        │ 201808    │ Time                 │ 553.24 GiB │ 166439821590 │   30 │ 18.44 GiB  │
│ graphite_reverse        │ 201808    │ Value                │ 140.33 GiB │ 166439821590 │   30 │ 4.68 GiB   │
│ graphite_tagged         │ 201808    │ Date                 │ 406.78 MiB │    644830350 │   30 │ 13.56 MiB  │
│ graphite_tagged         │ 201808    │ Path                 │ 4.02 GiB   │    644830350 │   30 │ 137.09 MiB │
│ graphite_tagged         │ 201808    │ Version              │ 1.62 GiB   │    644830350 │   30 │ 55.35 MiB  │
│ graphite_tagged         │ 201808    │ Deleted              │ 12.35 MiB  │    644830350 │   30 │ 421.51 KiB │
│ graphite_tagged         │ 201808    │ Tags                 │ 13.28 GiB  │    644830350 │   30 │ 453.18 MiB │
│ graphite_tagged         │ 201808    │ Tag1                 │ 110.11 MiB │    644830350 │   30 │ 3.67 MiB   │
│ graphite_series_reverse │ 201808    │ Deleted              │ 327.00 B   │        12211 │   30 │ 10.90 B    │
│ graphite_series_reverse │ 201808    │ Date                 │ 476.00 B   │        12211 │   30 │ 15.87 B    │
│ graphite_series_reverse │ 201808    │ Path                 │ 18.46 KiB  │        12211 │   30 │ 630.07 B   │
│ graphite_series_reverse │ 201808    │ Level                │ 494.00 B   │        12211 │   30 │ 16.47 B    │
│ graphite_series_reverse │ 201808    │ Version              │ 1.01 KiB   │        12211 │   30 │ 34.57 B    │
│ graphite_tree           │ 201611    │ Deleted              │ 53.00 B    │          478 │    0 │ inf YiB    │
│ graphite_tree           │ 201611    │ Level                │ 88.00 B    │          478 │    0 │ inf YiB    │
│ graphite_tree           │ 201611    │ Path                 │ 6.65 KiB   │          478 │    0 │ inf YiB    │
│ graphite_tree           │ 201611    │ Version              │ 381.00 B   │          478 │    0 │ inf YiB    │
│ graphite_tree           │ 201611    │ Date                 │ 56.00 B    │          478 │    0 │ inf YiB    │

Time + Timestamp columns in graphite_reverse table 1 Tb, while Path + Value is 240 Gb
Questions:

  1. It's possible to remove Timestamp column?
    Timestamp used in graphite-clickhouse AS SELECT Path, Time, Value, Timestamp FROM graphite_reverse but filter AND (Time >= 1536922240 AND Time <= 1536922309))
  2. May be split Path column to:
    Metric
    Tags
    This change will make easier to share, analyse and aggregate data
@R4scal

This comment has been minimized.

Copy link
Contributor Author

commented Sep 20, 2018

For me remove Timestamp save 3 Tb of disk space on clickhouse cluster.
Can I create PR with new uploader type?

@lomik

This comment has been minimized.

Copy link
Owner

commented Sep 21, 2018

It's possible to remove Timestamp column?

Timestamp is used only for select last value for several points with equal time. Making this field optional is hard, but you can try writing a zero value to it.

May be split Path column to: Metric, Tags.

This is incompatible with GraphiteMergeTree engine. You can use url functions like extractURLParameter for work with current Path encoding.

Can I create PR with new uploader type?

Maybe it's better to add option zero-timestamp topoints and points-reverse uploaders

lomik added a commit that referenced this issue Oct 4, 2018
@ipeacocks

This comment has been minimized.

Copy link

commented Dec 16, 2018

Timestamp is used only for select last value for several points with equal time.

Could you explain this a little bit better?

@lomik

This comment has been minimized.

Copy link
Owner

commented Jan 10, 2019

zero-timestamp option released in v0.9.0

@lomik lomik closed this Jan 10, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.