Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object loss: (not so) redundant local copy removed on all machines #2267

Closed
roman-khimov opened this issue Feb 21, 2023 · 5 comments · Fixed by #2273
Closed

Object loss: (not so) redundant local copy removed on all machines #2267

roman-khimov opened this issue Feb 21, 2023 · 5 comments · Fixed by #2273
Assignees
Labels
bug Something isn't working neofs-storage Storage node application issues U3 Regular
Milestone

Comments

@roman-khimov
Copy link
Member

Expected Behavior

Object storage system should store objects. I mean, you put them and then get them back at any time until you delete them.

Current Behavior

So we have a four-node network with Hw57cmN31gCrqyEyKL5km31TFYETzQa3qk8DNECk6a4H container using this policy:

REP 2 IN X
CBF 2
SELECT 2 FROM F AS X
FILTER Deployed EQ NSPCC AS F

An AV3Z7kpn8hxnaWWB2QRZUF9x8YhnyuB5ntC5fX6PRhz1 object was uploaded into this container some (pretty long) time ago. It's stored on nodes 3 and 4 (there were some movements before the incident, but it's not relevant) until this happens:

Feb 15 17:25:49 node3 neofs-node[9352]: 2023-02-15T17:25:49.912Z        error        replicator/process.go:62        could not replicate object        {"component": "Object Replicator", "node": "029538f2e2de2beff5a380e672732461ccac0f1c6c255fc2c364dfc0356f7f5bd3", "object": "Hw57cmN31gCrqyEyKL5km31TFYETzQa3qk8DNECk6a4H/AV3Z7kpn8hxnaWWB2QRZUF9x8YhnyuB5ntC5fX6PRhz1", "error": "(*putsvc.RemoteSender) could not send object: (*putsvc.remoteTarget) could not put object to [/dns4/st1.t5.fs.neo.org/tcp/8080]: write object via client: status: code = 2048 message = access to object operation denied"}
Feb 15 17:25:50 node3 neofs-node[9352]: 2023-02-15T17:25:50.139Z        error        replicator/process.go:62        could not replicate object        {"component": "Object Replicator", "node": "024cda0e7f60284295101465fc47e8a324da994b1cb39ba56825550025c9e9598b", "object": "Hw57cmN31gCrqyEyKL5km31TFYETzQa3qk8DNECk6a4H/AV3Z7kpn8hxnaWWB2QRZUF9x8YhnyuB5ntC5fX6PRhz1", "error": "(*putsvc.RemoteSender) could not send object: (*putsvc.remoteTarget) could not put object to [/dns4/st2.t5.fs.neo.org/tcp/8080]: write object via client: status: code = 2048 message = access to object operation denied"}
Feb 15 17:25:50 node3 neofs-node[9352]: 2023-02-15T17:25:50.163Z        info        policer/check.go:129        redundant local object copy detected        {"component": "Object Policer", "object": "Hw57cmN31gCrqyEyKL5km31TFYETzQa3qk8DNECk6a4H/AV3Z7kpn8hxnaWWB2QRZUF9x8YhnyuB5ntC5fX6PRhz1"}
Feb 15 17:25:50 node3 neofs-node[9352]: 2023-02-15T17:25:50.164Z        info        log/log.go:13        local object storage operation        {"shard_id": "Bcd5yGWeVfJgj2iKLdPiQC", "address": "Hw57cmN31gCrqyEyKL5km31TFYETzQa3qk8DNECk6a4H/AV3Z7kpn8hxnaWWB2QRZUF9x8YhnyuB5ntC5fX6PRhz1", "op": "db DELETE"}

and

Feb 15 17:25:51 node4 neofs-node[9802]: 2023-02-15T17:25:51.424Z        error        replicator/process.go:62        could not replicate object        {"component": "Object Replicator", "node": "029538f2e2de2beff5a380e672732461ccac0f1c6c255fc2c364dfc0356f7f5bd3", "object": "Hw57cmN31gCrqyEyKL5km31TFYETzQa3qk8DNECk6a4H/AV3Z7kpn8hxnaWWB2QRZUF9x8YhnyuB5ntC5fX6PRhz1", "error": "(*putsvc.RemoteSender) could not send object: (*putsvc.remoteTarget) could not put object to [/dns4/st1.t5.fs.neo.org/tcp/8080]: write object via client: status: code = 2048 message = access to object operation denied"}
Feb 15 17:25:51 node4 neofs-node[9802]: 2023-02-15T17:25:51.603Z        error        replicator/process.go:62        could not replicate object        {"component": "Object Replicator", "node": "024cda0e7f60284295101465fc47e8a324da994b1cb39ba56825550025c9e9598b", "object": "Hw57cmN31gCrqyEyKL5km31TFYETzQa3qk8DNECk6a4H/AV3Z7kpn8hxnaWWB2QRZUF9x8YhnyuB5ntC5fX6PRhz1", "error": "(*putsvc.RemoteSender) could not send object: (*putsvc.remoteTarget) could not put object to [/dns4/st2.t5.fs.neo.org/tcp/8080]: write object via client: status: code = 2048 message = access to object operation denied"}
Feb 15 17:25:51 node4 neofs-node[9802]: 2023-02-15T17:25:51.604Z        info        policer/check.go:129        redundant local object copy detected        {"component": "Object Policer", "object": "Hw57cmN31gCrqyEyKL5km31TFYETzQa3qk8DNECk6a4H/AV3Z7kpn8hxnaWWB2QRZUF9x8YhnyuB5ntC5fX6PRhz1"}
Feb 15 17:25:51 node4 neofs-node[9802]: 2023-02-15T17:25:51.605Z        info        log/log.go:13        local object storage operation        {"shard_id": "2xcLzuvvNzKg8jELk1mn4q", "address": "Hw57cmN31gCrqyEyKL5km31TFYETzQa3qk8DNECk6a4H/AV3Z7kpn8hxnaWWB2QRZUF9x8YhnyuB5ntC5fX6PRhz1", "op": "db DELETE"}
Feb 15 17:26:35 node4 neofs-node[9802]: 2023-02-15T17:26:35.502Z        info        log/log.go:13        local object storage operation        {"shard_id": "2xcLzuvvNzKg8jELk1mn4q", "address": "Hw57cmN31gCrqyEyKL5km31TFYETzQa3qk8DNECk6a4H/AV3Z7kpn8hxnaWWB2QRZUF9x8YhnyuB5ntC5fX6PRhz1", "op": "metabase DELETE"}

Nodes 3 and 4 (holding the object) decide to move it to 1 and 2 at around the same time. Both fail to do so for some reason (which is not really important, replication can fail for a number of reasons). Both then delete their local copies. Object is gone. Forever.

Possible Solution

Looks like something is wrong in the logic ensuring a proper number of copies exists before deleting local one.

Context

Yeah, it's T5 testnet.

Your Environment

Node version 0.34.0.

@roman-khimov roman-khimov added bug Something isn't working triage neofs-storage Storage node application issues labels Feb 21, 2023
@alexchetaev alexchetaev added the U3 Regular label Feb 21, 2023
@roman-khimov
Copy link
Member Author

Blocks around this event:

Feb 15 17:24:38 node1 neogo-morph-rpc[4584]: 2023-02-15T17:24:38.164Z#011INFO#011persisted to disk#011{"blocks": 1, "keys": 20, "headerHeight": 1151494, "blockHeight": 1151494, "took": "177.768938ms"}
Feb 15 17:24:53 node1 neogo-morph-rpc[4584]: 2023-02-15T17:24:53.128Z#011INFO#011persisted to disk#011{"blocks": 1, "keys": 21, "headerHeight": 1151495, "blockHeight": 1151495, "took": "131.762306ms"}
Feb 15 17:25:08 node1 neogo-morph-rpc[4584]: 2023-02-15T17:25:08.859Z#011INFO#011runtime log#011{"tx": "c66a2af0d83bfdd03554e6e354cad1c7dba7f609e65238acab968c8aecf946fe", "script": "c4576ea5c3081dd765a17aaaa73d9352e74bdc28", "msg": "process new epoch"}
Feb 15 17:25:09 node1 neogo-morph-rpc[4584]: 2023-02-15T17:25:09.396Z#011INFO#011persisted to disk#011{"blocks": 1, "keys": 21, "headerHeight": 1151496, "blockHeight": 1151496, "took": "391.008975ms"}
Feb 15 17:25:24 node1 neogo-morph-rpc[4584]: 2023-02-15T17:25:24.262Z#011INFO#011runtime log#011{"tx": "ad15b4112d432d823b0a8cc9dd66470cef80baf5d814bb98b173e357bfaa55de", "script": "c4576ea5c3081dd765a17aaaa73d9352e74bdc28", "msg": "process new epoch"}
Feb 15 17:25:24 node1 neogo-morph-rpc[4584]: 2023-02-15T17:25:24.284Z#011INFO#011runtime log#011{"tx": "292275efff910309398460f0100ac595bf5e5d756f238c55ac52bdc0da3aa897", "script": "70cf38deb1ff7a4f64cb1127eafcd75cd38083d4", "msg": "notification has been produced"}
Feb 15 17:25:25 node1 neogo-morph-rpc[4584]: 2023-02-15T17:25:25.382Z#011INFO#011persisted to disk#011{"blocks": 1, "keys": 86, "headerHeight": 1151497, "blockHeight": 1151497, "took": "368.991893ms"}
Feb 15 17:25:40 node1 neogo-morph-rpc[4584]: 2023-02-15T17:25:40.319Z#011INFO#011runtime log#011{"tx": "0322a27625fd219d131b8d4868d04f26398c7446e56b88fcaf8ffa62f37aa224", "script": "70cf38deb1ff7a4f64cb1127eafcd75cd38083d4", "msg": "notification has been produced"}
Feb 15 17:25:41 node1 neogo-morph-rpc[4584]: 2023-02-15T17:25:41.499Z#011INFO#011persisted to disk#011{"blocks": 1, "keys": 123, "headerHeight": 1151498, "blockHeight": 1151498, "took": "474.346908ms"}

IR:

Feb 15 17:25:08 node1 neofs-ir[4832]: 2023-02-15T17:25:08.840Z#011info#011netmap/handlers.go:15#011tick#011{"type": "epoch"}
Feb 15 17:25:24 node1 neofs-ir[4832]: 2023-02-15T17:25:24.273Z#011info#011netmap/handlers.go:29#011notification#011{"type": "new epoch", "value": 4244}
Feb 15 17:25:24 node1 neofs-ir[4832]: 2023-02-15T17:25:24.306Z#011info#011netmap/handlers.go:91#011tick#011{"type": "netmap cleaner"}
Feb 15 17:25:24 node1 neofs-ir[4832]: 2023-02-15T17:25:24.315Z#011info#011settlement/calls.go:20#011new audit settlement event#011{"epoch": 4244}
Feb 15 17:25:24 node1 neofs-ir[4832]: 2023-02-15T17:25:24.315Z#011info#011settlement/handlers.go:14#011process audit settlements#011{"epoch": 4244}
Feb 15 17:25:24 node1 neofs-ir[4832]: 2023-02-15T17:25:24.315Z#011info#011audit/calculate.go:65#011calculate audit settlements#011{"current epoch": 4244}
Feb 15 17:25:24 node1 neofs-ir[4832]: 2023-02-15T17:25:24.315Z#011info#011governance/handlers.go:33#011new event#011{"type": "sync"}
Feb 15 17:25:24 node1 neofs-ir[4832]: 2023-02-15T17:25:24.319Z#011info#011settlement/handlers.go:18#011audit processing finished#011{"epoch": 4244}
Feb 15 17:25:24 node1 neofs-ir[4832]: 2023-02-15T17:25:24.321Z#011info#011governance/process_update.go:49#011no governance update, alphabet list has not been changed
Feb 15 17:25:26 node1 neofs-ir[4832]: 2023-02-15T17:25:26.638Z#011info#011netmap/handlers.go:48#011notification#011{"type": "add peer"}
Feb 15 17:25:26 node1 neofs-ir[4832]: 2023-02-15T17:25:26.645Z#011info#011netmap/process_peers.go:65#011approving network map candidate#011{"key": "03580e5bf6318513059da53975d692d6492c03fb478d7fff9a7370bb368d034a22"}
Feb 15 17:25:27 node1 neofs-ir[4832]: 2023-02-15T17:25:27.153Z#011info#011netmap/handlers.go:48#011notification#011{"type": "add peer"}
Feb 15 17:25:27 node1 neofs-ir[4832]: 2023-02-15T17:25:27.158Z#011info#011netmap/process_peers.go:65#011approving network map candidate#011{"key": "03000a222431891c481f2fbce61297547e816e164750f5ea94e2831c219bbe3504"}

Netmap:

$ ./bin/neo-go contract testinvokefunction -r https://rpc5.morph.t5.fs.neo.org:51331 --historic 1151496 c4576ea5c3081dd765a17aaaa73d9352e74bdc28 netmap       
{
  "state": "HALT",
  "gasconsumed": "248743",
  "script": "wh8MBm5ldG1hcAwUKNxL51KTPaeqeqFl1x0Iw6VuV8RBYn1bUg==",
  "stack": [
    {
      "type": "Array",
      "value": [
        {
          "type": "Struct",
          "value": [
            {
              "type": "ByteString",
              "value": "CiEClTjy4t4r7/WjgOZycyRhzKwPHGwlX8LDZN/ANW9/W9MSIC9kbnM0L3N0MS50NS5mcy5uZW8ub3JnL3RjcC84MDgwGhMKCUNvbnRpbmVudBIGRXVyb3BlGhYKB0NvdW50cnkSC05ldGhlcmxhbmRzGhEKC0NvdW50cnlDb2RlEgJOTBoRCghEZXBsb3llZBIFTlNQQ0MaFQoITG9jYXRpb24SCUFtc3RlcmRhbRoLCgVQcmljZRICMTAaEwoJVU4tTE9DT0RFEgZOTCBBTVMgAQ=="
            }
          ]
        }
      ]
    }
  ],
  "exception": null,
  "notifications": []
}
$ ./bin/neo-go contract testinvokefunction -r https://rpc5.morph.t5.fs.neo.org:51331 --historic 1151497 c4576ea5c3081dd765a17aaaa73d9352e74bdc28 netmap
{
  "state": "HALT",
  "gasconsumed": "248743",
  "script": "wh8MBm5ldG1hcAwUKNxL51KTPaeqeqFl1x0Iw6VuV8RBYn1bUg==",
  "stack": [
    {
      "type": "Array",
      "value": [
        {
          "type": "Struct",
          "value": [
            {
              "type": "ByteString",
              "value": "CiECTNoOf2AoQpUQFGX8R+ijJNqZSxyzm6VoJVUAJcnpWYsSIC9kbnM0L3N0Mi50NS5mcy5uZW8ub3JnL3RjcC84MDgwGhMKCUNvbnRpbmVudBIGRXVyb3BlGhIKB0NvdW50cnkSB0dlcm1hbnkaEQoLQ291bnRyeUNvZGUSAkRFGhEKCERlcGxveWVkEgVOU1BDQxodCghMb2NhdGlvbhIRRnJhbmtmdXJ0IGFtIE1haW4aCwoFUHJpY2USAjEwGhAKBlN1YkRpdhIGSGVzc2VuGhAKClN1YkRpdkNvZGUSAkhFGhMKCVVOLUxPQ09ERRIGREUgRlJBIAE="
            }
          ]
        },
        {
          "type": "Struct",
          "value": [
            {
              "type": "ByteString",
              "value": "CiEClTjy4t4r7/WjgOZycyRhzKwPHGwlX8LDZN/ANW9/W9MSIC9kbnM0L3N0MS50NS5mcy5uZW8ub3JnL3RjcC84MDgwGhMKCUNvbnRpbmVudBIGRXVyb3BlGhYKB0NvdW50cnkSC05ldGhlcmxhbmRzGhEKC0NvdW50cnlDb2RlEgJOTBoRCghEZXBsb3llZBIFTlNQQ0MaFQoITG9jYXRpb24SCUFtc3RlcmRhbRoLCgVQcmljZRICMTAaEwoJVU4tTE9DT0RFEgZOTCBBTVMgAQ=="
            }
          ]
        }
      ]
    }
  ],
  "exception": null,
  "notifications": []
}

At around the next epoch:

$ ./bin/neo-go contract testinvokefunction -r https://rpc5.morph.t5.fs.neo.org:51331 --historic 1151738 c4576ea5c3081dd765a17aaaa73d9352e74bdc28 netmap
{
  "state": "HALT",
  "gasconsumed": "248743",
  "script": "wh8MBm5ldG1hcAwUKNxL51KTPaeqeqFl1x0Iw6VuV8RBYn1bUg==",
  "stack": [
    {
      "type": "Array",
      "value": [
        {
          "type": "Struct",
          "value": [
            {
              "type": "ByteString",
              "value": "CiECTNoOf2AoQpUQFGX8R+ijJNqZSxyzm6VoJVUAJcnpWYsSIC9kbnM0L3N0Mi50NS5mcy5uZW8ub3JnL3RjcC84MDgwGhMKCUNvbnRpbmVudBIGRXVyb3BlGhIKB0NvdW50cnkSB0dlcm1hbnkaEQoLQ291bnRyeUNvZGUSAkRFGhEKCERlcGxveWVkEgVOU1BDQxodCghMb2NhdGlvbhIRRnJhbmtmdXJ0IGFtIE1haW4aCwoFUHJpY2USAjEwGhAKBlN1YkRpdhIGSGVzc2VuGhAKClN1YkRpdkNvZGUSAkhFGhMKCVVOLUxPQ09ERRIGREUgRlJBIAE="
            }
          ]
        },
        {
          "type": "Struct",
          "value": [
            {
              "type": "ByteString",
              "value": "CiEClTjy4t4r7/WjgOZycyRhzKwPHGwlX8LDZN/ANW9/W9MSIC9kbnM0L3N0MS50NS5mcy5uZW8ub3JnL3RjcC84MDgwGhMKCUNvbnRpbmVudBIGRXVyb3BlGhYKB0NvdW50cnkSC05ldGhlcmxhbmRzGhEKC0NvdW50cnlDb2RlEgJOTBoRCghEZXBsb3llZBIFTlNQQ0MaFQoITG9jYXRpb24SCUFtc3RlcmRhbRoLCgVQcmljZRICMTAaEwoJVU4tTE9DT0RFEgZOTCBBTVMgAQ=="
            }
          ]
        },
        {
          "type": "Struct",
          "value": [
            {
              "type": "ByteString",
              "value": "CiEDAAoiJDGJHEgfL7zmEpdUfoFuFkdQ9eqU4oMcIZu+NQQSIC9kbnM0L3N0NC50NS5mcy5uZW8ub3JnL3RjcC84MDgwGhoKCUNvbnRpbmVudBINTm9ydGggQW1lcmljYRoYCgdDb3VudHJ5Eg1Vbml0ZWQgU3RhdGVzGhEKC0NvdW50cnlDb2RlEgJVUxoRCghEZXBsb3llZBIFTlNQQ0MaGQoITG9jYXRpb24SDVNhbiBGcmFuY2lzY28aCwoFUHJpY2USAjEwGhQKBlN1YkRpdhIKQ2FsaWZvcm5pYRoQCgpTdWJEaXZDb2RlEgJDQRoTCglVTi1MT0NPREUSBlVTIFNGTyAB"
            }
          ]
        },
        {
          "type": "Struct",
          "value": [
            {
              "type": "ByteString",
              "value": "CiEDWA5b9jGFEwWdpTl11pLWSSwD+0eNf/+ac3C7No0DSiISIC9kbnM0L3N0My50NS5mcy5uZW8ub3JnL3RjcC84MDgwGhEKCUNvbnRpbmVudBIEQXNpYRoUCgdDb3VudHJ5EglTaW5nYXBvcmUaEQoLQ291bnRyeUNvZGUSAlNHGhEKCERlcGxveWVkEgVOU1BDQxoVCghMb2NhdGlvbhIJU2luZ2Fwb3JlGgsKBVByaWNlEgIxMBoTCglVTi1MT0NPREUSBlNHIFNJTiAB"
            }
          ]
        }
      ]
    }
  ],
  "exception": null,
  "notifications": []
}

So it was seriously degraded for some time (and unknown reason).

@roman-khimov
Copy link
Member Author

Ah, they were degraded because of #2263, then notary balances were fixed and nodes came back.

@cthulhu-rider
Copy link
Contributor

In current implementation, when node leaves the container (due to network map changes), it initiates objects' replication and throws them away.

The point is we could hold out-of-container unreplicated objects until successful migration. So, there are two possible approaches:

  1. mark out-of-container objects in replication tasks, and make Replicator to consider replica as redundant only if replication succeeded
  2. store out-of-container object replica anyway, and throw them away on particular needs only (e.g. no more space for in-container objects)

@roman-khimov
Copy link
Member Author

roman-khimov commented Feb 22, 2023

  1. Node must not delete any data if it's not a part of the current network map. If it's not on a map, we don't know its relation to containers at all, so some safe behaviour should be assumed. This would be a complete fix for this particular case.
  2. If a node is on a map and it can be used for some container then it can delete objects only after successful replication (that I believe is Remove redundant local copies by Replicator not Policer #1453), also note that holding object in this case it's not an error, so it might as well keep them until some GC cycle.
  3. If a node is on a map and it's not a part of some container then it's Attempt to replicate object in case of a policy with only one copy and ACL error (node in the container) #1184 and the fix from Attempt to replicate object in case of a policy with only one copy and ACL error (node in the container) #1184 (comment) is the most appropriate one.

@roman-khimov
Copy link
Member Author

1. mark out-of-container objects in replication tasks, and make `Replicator` to consider replica as redundant only if replication succeeded

2. store out-of-container object replica anyway, and throw them away on particular needs only (e.g. no more space for in-container objects)

Of this two, BTW, I think the first one is more correct. In general out-of-container may mean that we have a node in CN with some data belonging to a RU-only container and it's a serious policy violation. We can't delete data unless we ensure that proper replicas exist, but once we do it must be deleted.

cthulhu-rider pushed a commit to cthulhu-rider/neofs-node that referenced this issue Feb 26, 2023
In previous implementation `Policer.processObject` considered local
object replica as redundant if `processNodes` loop didn't set
`needLocalCopy` flag. According to implementation of the `processNodes`
method, it could break by context with unset flag. After that
`processObject` triggered clean routine. In order to prevent potential
data loss, `Policer` must handle this case.

Check context expiration after placement loop but before `needLocalCopy`
condition in `Policer.processObject`.

Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
cthulhu-rider pushed a commit to cthulhu-rider/neofs-node that referenced this issue Feb 26, 2023
…o context

There is no need to explicitly pass checking object descriptor and node
cache to `processNodes` since they are not changed during process.

Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
cthulhu-rider pushed a commit to cthulhu-rider/neofs-node that referenced this issue Feb 26, 2023
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
cthulhu-rider pushed a commit to cthulhu-rider/neofs-node that referenced this issue Feb 26, 2023
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
cthulhu-rider pushed a commit to cthulhu-rider/neofs-node that referenced this issue Feb 26, 2023
Local storage node can be outside the container of some object and hold
its single replica. In previous implementation `Policer` considered this
replica as redundant. There is a need to change behavior for such cases:
`Policer` must not mark replica as redundant if it did not find any
valid replicas at the stage of bypassing the container.

Make `Policer.processObject` determine the presence of a local node in a
container in the `processNodes` call loop. Do not consider single
existing replica outside the container as redundant and, accordingly, do
not pass it to the redundant replica's callback.

Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
cthulhu-rider pushed a commit to cthulhu-rider/neofs-node that referenced this issue Feb 26, 2023
In the absence of a node in the network map, it is impossible to
determine its belonging to the container in the case of a return. To
prevent the potential loss of a single instance of data, offline nodes
must hold objects.

Make `Policer.processObject` to look up for local node in the network
map before considering object replica as redundant. If the node is
outside the network map, the replica is considered meaningful.

Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
cthulhu-rider pushed a commit to cthulhu-rider/neofs-node that referenced this issue Feb 26, 2023
In the absence of a node in the network map, it is impossible to
determine its belonging to the container in the case of a return. To
prevent the potential loss of a single instance of data, offline nodes
must hold objects.

Make `Policer.processObject` to look up for local node in the network
map before considering object replica as redundant. If the node is
outside the network map, the replica is considered meaningful.

Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
cthulhu-rider pushed a commit to cthulhu-rider/neofs-node that referenced this issue Feb 27, 2023
In the absence of a node in the network map, it is impossible to
determine its belonging to the container in the case of a return. To
prevent the potential loss of a single instance of data, offline nodes
must hold objects.

Make `Policer.processObject` to look up for local node in the network
map before considering object replica as redundant. If the node is
outside the network map, the replica is considered meaningful.

Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
roman-khimov added a commit that referenced this issue Mar 1, 2023
* closes #2267
* closes #1453

I decided to not directly implement #1453 proposal:
* out-of-container (netmap) cases are handled in a special way
* insta replica removal after successful replication doesn't seem very
good behavior: some nodes have been already violated storage policy and
lost the object, therefore, it is worth waiting for some time interval,
and make a decision already at the next check
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working neofs-storage Storage node application issues U3 Regular
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants