Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alert example fires for maintained snapshots #11

Closed
v4rakh opened this issue Mar 23, 2023 · 15 comments
Closed

Alert example fires for maintained snapshots #11

v4rakh opened this issue Mar 23, 2023 · 15 comments
Labels
question Further information is requested

Comments

@v4rakh
Copy link

v4rakh commented Mar 23, 2023

Hi,

maybe it's just my use case, but I found it confusing to get alerts for old snapshots which I want to keep, e.g. more than 15 days for my backups. This though happens with the example alert provided in the README file as restic-exporter reports timestamps for each snapshot, of which some might be old, yes. For my case the alert should only fire if the latest snapshot has a certain age, e.g. a backup has potentially been missed.

Maybe we like to add it to the README?

# for one day
(time() - (max without (snapshot_hash) (restic_backup_timestamp))) > (1 * 86400)

# for 15 days as currently outlined in the README
(time() - (max without (snapshot_hash) (restic_backup_timestamp))) > (15 * 86400)
@ngosang
Copy link
Owner

ngosang commented Mar 23, 2023

Alerts are optional and they are provided just as reference.

The common use case is to automate backups with a cron task or another scheduler. I'm doing incremental backups every day. I have configured the 2 alerts in the readme:

  • check alert => checks if the backup repository has errors / corrupted
  • old backup alert => checks if some device is not running backups on schedule. it could be due to network issues, error in the scheduled task...

In you case, you can keep the first alert and do custom alerts for specific backups. Could you publish the response of the /metrics endpoint? I would like to understand the issue better.

@ngosang ngosang added the question Further information is requested label Mar 23, 2023
@v4rakh
Copy link
Author

v4rakh commented Mar 23, 2023

Always thanks for your quick reply.

Not sure it's an issue and not expected behavior of the counter? Each snapshot hash has its own counter attribute exported. So if a lot of snapshots are retained, they'll end up in the metrics or is this unexpected behavior?

Here's an example of one of my backups, copied output of the endpoint of the exporter:

# HELP restic_check_success Result of restic check operation in the repository
# TYPE restic_check_success gauge
restic_check_success 2.0
# HELP restic_snapshots_total Total number of snapshots in the repository
# TYPE restic_snapshots_total counter
restic_snapshots_total 24.0
# HELP restic_backup_timestamp Timestamp of the last backup
# TYPE restic_backup_timestamp gauge
restic_backup_timestamp{client_hostname="mantell",client_username="root",snapshot_hash="428a4022933f2a1e162cbfa6685055afb27fbaefb20b784c63fbefc33a25d49e",snapshot_tag=""} 1.667183411e+09
restic_backup_timestamp{client_hostname="mantell",client_username="root",snapshot_hash="2ec546bf3f53ecef07491a6536fe1e889b9e2d3a230d26cd3d0b189fd9325bb3",snapshot_tag=""} 1.673836203e+09
restic_backup_timestamp{client_hostname="mantell",client_username="root",snapshot_hash="dba9a400b7961289953865932ed0c142ed218bcc0d736ca8bb92af2141340160",snapshot_tag=""} 1.674441002e+09
restic_backup_timestamp{client_hostname="mantell",client_username="root",snapshot_hash="13a9064b8f2f0c6e176b6997dcd52668660d5f2e8dbcf77ed047f4a26e20ed6b",snapshot_tag=""} 1.675045802e+09
restic_backup_timestamp{client_hostname="mantell",client_username="root",snapshot_hash="57a5b3ab8dfb3d770ff484bf52a75ec94569d8a8e2da68dd932b193675f99629",snapshot_tag=""} 1.679279404e+09

EDIT: I have multiple exporters running at a time by the way. They are all exported as a different instance and scraped separately. Not sure if that's helpful.

Interestingly enough, I've just ran restic snapshots manually and the result somehow differs:

repository 2663d695 opened (version 1)
ID        Time                 Host        Tags        Paths
-----------------------------------------------------------------------
c72f769a  2023-03-17 01:00:03  mantell                 /home/data/stripped
5afb6fb9  2023-03-20 01:00:15  mantell                 /home/data/stripped
599983ef  2023-03-22 01:00:14  mantell                 /home/data/stripped

The hashes seem to not match, first time I take a closer look though.

@ngosang
Copy link
Owner

ngosang commented Mar 23, 2023

The label "snapshot_hash" in the exporter is not the snapshot hash in Restic. The hash in Restic changes frequently when you do maintenance operations or full backups.
The exporter hash is calculated with the hostname and the path =>

def calc_snapshot_hash(self, snapshot: dict) -> str:

In you case, you should have just 1 line in the exporter because the hostname and path matches. 🤔

Update: Could you run this command?

  • restic snapshots --json --latest 1

@v4rakh
Copy link
Author

v4rakh commented Mar 23, 2023

Cleaned up my setup, so assigning different networks to each of the exporters in my docker-compose file, but I guess the root cause is something different, at least that solved it for one exporter, but now the other one has more.

Correct one:

# HELP restic_backup_timestamp Timestamp of the last backup
# TYPE restic_backup_timestamp gauge
restic_backup_timestamp{client_hostname="mantell",client_username="root",snapshot_hash="8bc201180ab9e700369b659f3f0a75c99dd1a3a72fbbd9e8ad24389d3917cead",snapshot_tag=""} 1.679443214e+09
[{"time":"2023-03-22T01:00:14.29359609+01:00","parent":"5afb6fb92919f038a09f8f53fd6181b556e16ed0d1a7b5c31d3666696c11ef10","tree":"2c9a7be5967957a22deb5e28b5fc89ebdce29ffe70465bba46e145f1d074d8f2","paths":["/home/data/music"],"hostname":"mantell","username":"root","id":"599983efe95ad60dcf93a8ef05a23e6ab38f9b11156849c91fe3a57d8ce80c7e","short_id":"599983ef"}]

Sorry, closed too early.

Now I checked another exporter and the underlying repository with the command you've asked for and it actually returned an array with more than 1 entry.

[
  {
    "time": "2022-10-31T03:30:11.085083219+01:00",
    "parent": "19e95feb063cab22ed2281fbfed5aa16c53c03ab62ed312b06e55a91e5bb2244",
    "tree": "7fc6f808fd4165c3736d85f517420d8cdee14c59c406df69600c8a1683a15599",
    "paths":
      [
        "/etc",
      ],
    "hostname": "mantell",
    "username": "root",
    "id": "c7ef0b0cc48bdc723b8822589ea5988fd4c3671c08f581aa5d79fab26d1c9690",
    "short_id": "c7ef0b0c",
  },
  {
    "time": "2023-01-16T03:30:03.544488116+01:00",
    "parent": "4f5b4ef0d38223ccabff879b075e2361f7b9c373ea8b05c622236c0cc160b2b7",
    "tree": "302e423e9321b826044973b8a7591099346fdef73f6c1cd1e5c77c2d3d450906",
    "paths":
      [
        "/etc",
      ],
    "hostname": "mantell",
    "username": "root",
    "id": "a3381e1fecced88ec099b447355fc22f10fdb4b7b24a33805fdf4c024eb31924",
    "short_id": "a3381e1f",
  },
  {
    "time": "2023-01-23T03:30:02.336435099+01:00",
    "tree": "ad4ba7054b03c3325b62a3f6e724a2160ba13cbecfe266523ef5ea4a640ff4ed",
    "paths":
      [
        "/etc",
      ],
    "hostname": "mantell",
    "username": "root",
    "id": "18907418d224b7cb3dbaa2f9a1e80bed02a15b4de8d0c06d7276c2636655fab3",
    "short_id": "18907418",
  },
  {
    "time": "2023-01-30T03:30:02.926379261+01:00",
    "parent": "9176548f5fe7369dda9f923b229e8bb8fd19a2bd9a0e1bc2ca462a7d157d7608",
    "tree": "66223f24e4b670ebb836404a9cbc403627c30a455de7c789f96768c9949f22c5",
    "paths":
      [
        "/etc",
      ],
    "hostname": "mantell",
    "username": "root",
    "id": "486b28989cfffac24a34b037b14cacdaa3343849e499c3f41351f1b80eb7967a",
    "short_id": "486b2898",
  },
  {
    "time": "2023-03-20T03:30:04.545656117+01:00",
    "parent": "8c45d75b8a8c08984556e352397850a1b0646359280a1a561463c04835d93c88",
    "tree": "db41d0846694a609f9b9a651ca91dceb371792b9442c294947d1387e7f91a839",
    "paths":
      [
        "/etc",
      ],
    "hostname": "mantell",
    "username": "root",
    "id": "4d1d3f09f6f056a905d6f295375cea23d28e83dfe10da2331307b906c4e64959",
    "short_id": "4d1d3f09",
  },
]

The exporter reports the following

# TYPE restic_backup_timestamp gauge
restic_backup_timestamp{client_hostname="mantell",client_username="root",snapshot_hash="428a4022933f2a1e162cbfa6685055afb27fbaefb20b784c63fbefc33a25d49e",snapshot_tag=""} 1.667183411e+09
restic_backup_timestamp{client_hostname="mantell",client_username="root",snapshot_hash="2ec546bf3f53ecef07491a6536fe1e889b9e2d3a230d26cd3d0b189fd9325bb3",snapshot_tag=""} 1.673836203e+09
restic_backup_timestamp{client_hostname="mantell",client_username="root",snapshot_hash="dba9a400b7961289953865932ed0c142ed218bcc0d736ca8bb92af2141340160",snapshot_tag=""} 1.674441002e+09
restic_backup_timestamp{client_hostname="mantell",client_username="root",snapshot_hash="13a9064b8f2f0c6e176b6997dcd52668660d5f2e8dbcf77ed047f4a26e20ed6b",snapshot_tag=""} 1.675045802e+09
restic_backup_timestamp{client_hostname="mantell",client_username="root",snapshot_hash="57a5b3ab8dfb3d770ff484bf52a75ec94569d8a8e2da68dd932b193675f99629",snapshot_tag=""} 1.679279404e+09

@v4rakh v4rakh closed this as completed Mar 23, 2023
@v4rakh v4rakh reopened this Mar 23, 2023
@ngosang
Copy link
Owner

ngosang commented Mar 23, 2023

Could you try the previous release 1.1.0 ?

@v4rakh
Copy link
Author

v4rakh commented Mar 23, 2023

Same result. By the way, I always had that and thought it was expected behavior that all snapshots have their own gauge. :-)

So maybe it wasn't even the network setup I had. Would have been weird. But anyway, the output of snapshots --json --latest 1 also seems to report more than one.

Not sure if it helps, but restic version is 0.15.1 on the host which creates the snapshots.

@ngosang
Copy link
Owner

ngosang commented Mar 23, 2023

The results you posted in #11 (comment) don't make sense. I calculated the hash and it's impossible the json with 5 snapshots produce that 5 metrics.
Could you double check you are getting the json and the metrics from the same repository? I have some ideas to improve the code but I have to reproduce the issue first.

@v4rakh
Copy link
Author

v4rakh commented Mar 23, 2023

So this is from one repository: #11 (comment)
This is from the other: (the second half of the post): #11 (comment)
I'll double check later.

@ngosang
Copy link
Owner

ngosang commented Mar 23, 2023

Test this PR. It should fix your problems => #12

@v4rakh
Copy link
Author

v4rakh commented Mar 23, 2023

Thanks for providing this, I tested it by building the docker image locally and I have very same results.

# HELP restic_backup_timestamp Timestamp of the last backup
# TYPE restic_backup_timestamp gauge
restic_backup_timestamp{client_hostname="mantell",client_username="root",snapshot_hash="20c494f14bbb7e5188a4b36702a1dcce59baa4c516f34268106f92f494eba783",snapshot_tag=""} 1.667183411e+09
restic_backup_timestamp{client_hostname="mantell",client_username="root",snapshot_hash="734c259855b1ad6067777f85598521cab79a4d0fd5a149b4698d8081de33ca88",snapshot_tag=""} 1.673836203e+09
restic_backup_timestamp{client_hostname="mantell",client_username="root",snapshot_hash="f5247872f6ead56c1d7add82e8cbe2d873f47f894fdacd93f22b4b5140273a3b",snapshot_tag=""} 1.674441002e+09
restic_backup_timestamp{client_hostname="mantell",client_username="root",snapshot_hash="8571d9536f0d1616c6e99ad3a1f68d94af547e2616a343e29e97e3c4a2ed557f",snapshot_tag=""} 1.675045802e+09
restic_backup_timestamp{client_hostname="mantell",client_username="root",snapshot_hash="14fb46e68138c0033e32be3a67ac6aa20f6c95f6f4f339e1ee83e1cec4ad8d93",snapshot_tag=""} 1.679279404e+09

@ngosang
Copy link
Owner

ngosang commented Mar 25, 2023

@v4rakh compile this PR and give the log traces. #13

@v4rakh
Copy link
Author

v4rakh commented Mar 26, 2023

See comment in #13.

@ngosang
Copy link
Owner

ngosang commented Mar 26, 2023

@v4rakh When I run the code with your dump I see 5 metrics:

restic_backup_timestamp{client_hostname="mantell",client_username="root",snapshot_hash="428a4022933f2a1e162cbfa6685055afb27fbaefb20b784c63fbefc33a25d49e",snapshot_tag=""} 1.667183411e+09
restic_backup_timestamp{client_hostname="mantell",client_username="root",snapshot_hash="2ec546bf3f53ecef07491a6536fe1e889b9e2d3a230d26cd3d0b189fd9325bb3",snapshot_tag=""} 1.673836203e+09
restic_backup_timestamp{client_hostname="mantell",client_username="root",snapshot_hash="dba9a400b7961289953865932ed0c142ed218bcc0d736ca8bb92af2141340160",snapshot_tag=""} 1.674441002e+09
restic_backup_timestamp{client_hostname="mantell",client_username="root",snapshot_hash="13a9064b8f2f0c6e176b6997dcd52668660d5f2e8dbcf77ed047f4a26e20ed6b",snapshot_tag=""} 1.675045802e+09
restic_backup_timestamp{client_hostname="mantell",client_username="root",snapshot_hash="57a5b3ab8dfb3d770ff484bf52a75ec94569d8a8e2da68dd932b193675f99629",snapshot_tag=""} 1.679279404e+09

That is totally fine because each backup of you have different folders. For example, you don't have embyserver in the other snapshots.

        "paths": [
            "/etc",
            "/home/admin",
            "/home/musicstreamer",
            "/opt/embyserver/config",
            "/opt/jellyfin/config",
            "/opt/portainer",
            "/opt/unifi",
            "/root",
            "/tmp/package_list.txt",
            "/tmp/package_list_aur.txt",
            "/usr/local/bin"
        ],
        "hostname": "mantell",
        "username": "root",

        "paths": [
            "/etc",
            "/home/admin",
            "/home/musicstreamer",
            "/opt/jellyfin/config",
            "/opt/nodered_data",
            "/opt/portainer",
            "/opt/prometheus_config",
            "/opt/unifi",
            "/root",
            "/tmp/package_list.txt",
            "/tmp/package_list_aur.txt",
            "/usr/local/bin"
        ],
        "hostname": "mantell",
        "username": "root",

The function to calc the hash takes into account the username, hostname and paths. If you have different paths they are considered different backups. That makes sense to me if you want to track the number of files or size across the time. You can not compare different things.

text = snapshot["hostname"] + snapshot["username"] + ",".join(snapshot["paths"])

Since most of your backups have the same folders I would recommend you to include all folders in just 1 backup and delete all previous backups. I'm closing this since I can not fix what is not broken.

@ngosang ngosang closed this as completed Mar 26, 2023
@v4rakh
Copy link
Author

v4rakh commented Mar 26, 2023

I agree. Changing source directories though can be a requirement, e.g. not everyone will create a new restic repository or wipe all existing snapshots when adding new or changing existing paths to it, e.g. for an installed application. In my example I could have used /opt instead of listing them individually or work with exclude, though I prefer to explicitly include them.

Just an idea here, would it be an option to include an env var to change the hash calculation to include paths or not and by default it's enabled to include it?

Also, documenting the above mentioned different alerts could also be of help for people having a similar setup, declining that this is a use case is probably not correct, I mean that source directories won't change forever. Your call, if you like to keep, I would still propose to document how the hash is actually being calculated/what impact it has to the underlying metrics then.

Thanks for looking into it in depth.

@ngosang
Copy link
Owner

ngosang commented Mar 26, 2023

#14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants