Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove fss files from a snapshot when a block is removed. #2859

Merged
merged 1 commit into from
Feb 10, 2022

Conversation

derekcollison
Copy link
Member

During a filestore snapshot we generate the fss files for the snapshot but were not cleaning them up if the block was deleted before a server restart.

https://gist.github.com/nekufa/010185dfb59261f222a0042d3a7d2a1c

Signed-off-by: Derek Collison derek@nats.io

/cc @nats-io/core

server/filestore.go Outdated Show resolved Hide resolved
server/filestore_test.go Outdated Show resolved Hide resolved
…eaning them up if the block was deleted before a server restart.

https://gist.github.com/nekufa/010185dfb59261f222a0042d3a7d2a1c

Signed-off-by: Derek Collison <derek@nats.io>
Copy link
Member

@kozlovic kozlovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@derekcollison derekcollison merged commit 6fcd879 into main Feb 10, 2022
@derekcollison derekcollison deleted the fss_cleanup branch February 10, 2022 01:33
@nekufa
Copy link

nekufa commented Feb 17, 2022

@derekcollison by the way, is this fix included in docker image synadia/nats-server:nightly-20220217?
even we have started from scratch - disk infinitely grows. i try to reproduce it locally, but has no success..

@derekcollison
Copy link
Member Author

Should be, let me double check that nightlies being built correctly etc.

@derekcollison
Copy link
Member Author

Yes should be, what are you seeing?

@nekufa
Copy link

nekufa commented Feb 17, 2022

i see that pvc is infinitly used
image

and caling nats stream reports shows me all streams have zero messages..
when i run nats stream purge X disk is magically freeing.

i have reduce cluster configuration to 1 instance (we use official nats helm), but looks like configuration is not the same as simply running official nats image in docker locally. i debug messages on client side - all mesages are fetched and acked correctly. nats-server log does not contain any issues or errors. we've try to set debug to true, but there are many messages about client connections and closed. looks like, it's not usefull in this case.

i understand that it's hard to fix that is not reproducable, but i can't write the reproducer, locally everything works fine and the load is not very high
image

@derekcollison
Copy link
Member Author

Similar to before would need to see du -sh for the storage directory and a full ls -lR as well to see which files are still there etc.

@exename
Copy link

exename commented Feb 17, 2022

there are so few files since we recently purge streams
https://gist.github.com/exename/24b4935836840e09f4c4a446bef4e87e

du -sh /data/jetstream/$G
7.3M	/data/jetstream/

image: synadia/nats-server:nightly-20220217

@derekcollison
Copy link
Member Author

With nightly and the fix referenced, on a server restart all *.fss files from /data/jetstream/$G/streams/anomaly/msgs should be cleaned up, we have a test for that.

So I would suggest doing a server restart and specifically checking /data/jetstream/$G/streams/anomaly/msgs directory, if fss files are still there that is not expected.

@exename
Copy link

exename commented Feb 17, 2022

i cant find *.fss files, our problem - many *.key files.
nats/jetstream not clean it and they fill all inodes on disk

@derekcollison
Copy link
Member Author

My apologies, are you using encrypted file storage?

@exename
Copy link

exename commented Feb 17, 2022

seems yes, its our options in helm values file https://github.com/nats-io/k8s/tree/main/helm/charts/nats

nats:
  image: synadia/nats-server:nightly-20220217

  logging:
    debug: true
    trace: false
    logtime: true
    connectErrorReports:
    reconnectErrorReports:
  jetstream:
    enabled: true
    domain:
    encryption:
      key: Xaxe8eith2joseiX
    memStorage:
      enabled: true
      size: 1Gi
    fileStorage:
      enabled: true
      storageDirectory: /data
      size: 2Gi
      accessModes:
        - ReadWriteOnce
      annotations:

cluster:
  enabled: false
  replicas: 1

natsbox:
  enabled: true
  image: natsio/nats-box:0.7.0
  pullPolicy: IfNotPresent
 
reloader:
  enabled: true
  image: natsio/nats-server-config-reloader:0.6.2

exporter:
  enabled: true
  image: natsio/prometheus-nats-exporter:0.9.0

result nats.conf

  nats.conf: |
    # PID file shared with configuration reloader.
    pid_file: "/var/run/nats/nats.pid"

    ###############
    #             #
    # Monitoring  #
    #             #
    ###############
    http: 8222
    server_name:$POD_NAME
    ###################################
    #                                 #
    # NATS JetStream                  #
    #                                 #
    ###################################
    jetstream {
      key: "Xaxe8eith2joseiX"
      max_mem: 1Gi
      store_dir: /data

      max_file:2Gi
    }
    debug: true
    logtime: true
    lame_duck_duration: 120s

@derekcollison
Copy link
Member Author

Ok, will get that fixed today and posted to nightly, same logic as this fix.

Thanks for your patience.

@exename
Copy link

exename commented Feb 17, 2022

thanks for hint about encryption, without it we cant see key files and all good.
We will wait for a fix and probably tomorrow we will check with encryption

@derekcollison
Copy link
Member Author

#2878

@derekcollison
Copy link
Member Author

ok this has been merged and is also being built for nightly as we speak.

@nekufa
Copy link

nekufa commented Feb 18, 2022

@derekcollison with latest build everything works with encrypted storage :) thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants