Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[efr32] Settings #4706

Open
dismirlian opened this issue Mar 19, 2020 · 25 comments
Open

[efr32] Settings #4706

dismirlian opened this issue Mar 19, 2020 · 25 comments

Comments

@dismirlian
Copy link
Contributor

Recently, a rework of the settings storage was merged (see PR #4552), which retains format compatibility with the original implementation. This settings format (both the original and new implementations) has an issue with the efr32 platform:

The MG12 datasheet states:

It is possible to write words twice between each erase by keeping at 1 the bits that are not to be changed. [...] Note that there is a maximum of two writes to the same word between each erase due to a physical limitation of the flash.

The MG21 datasheet says something similar:

The Flash memory is organized into 64-bit wide double-words. Each 64-bit double-word can be written only twice between erase cycles

The current implementation writes the RecordHeader twice during Add and once more during Delete.

Additionally, PR #4521 provides an alternate implementation of the settings area, specifically for the efr32 platforms. This will not retain compatibility with the old format.

I wanted to "expose" a potential issue, for those with devices on the field:

  • Either they update to [efr32] add nvm3 support #4521, and break format compatibility or
  • They stay with the original format which violates the datasheet recommendations. I don't know what implications this may have. Maybe increased wear or decreased reliability/retention? Perhaps someone from Silabs can shed some light on this.

In our case, we are not impacted by this potential problem, because we don't have devices on the field. I am opening this issue to see to what extent this is an actual issue or not. If it's not, then we can close it.

Thanks!

@silabs-YupingX
Copy link

Diego,

You are correct that the maximum number of writes on a flash word in between erase cycles is 2 on EFR32MG21. It is a statistically safe recommendation according to our testing. Writing a 3rd time may not be a problem immediately but it’s risky and violates our recommendation, in which case we can no longer guarantee that the flash always perform to its datasheet specifications such as write/erase cycle endurance and data retention.

Two questions about PR #4552:

  1. I can see the two writes in Add and one write in Delete but it’s not obvious to me that they are writing to the exact same flash location. Can you confirm/elaborate? This question is due to my lack of familiarity with the code.
  2. Permitting a small number of writes between erase cycles is a typical limitation on flash, not an exception, although the exact number of such guaranteed writes may vary. Assuming answer to question 1 is positive, then I’m curious to know which chip families this implementation is intended for, and if the respective datasheets have been consulted with to confirm it’s a sound implementation.

I should add that we ask those using EFR32 to adopt Silicon Labs NVM3 solution instead of the OT default NV manager. One thing to note is the token values cannot be expected to be preserved during firmware upgrade if switching from OT default NV manager to NVM3.

Best regards,
Yuping Xiao

@jwhui
Copy link
Member

jwhui commented Apr 9, 2020

@yuping-xiao, thank you for the input.

@LuDuda, can you comment on any flash limitations that the nRF52xx may have?

@LuDuda
Copy link
Member

LuDuda commented Apr 9, 2020

@jwhui @dismirlian thanks for finding this issue!

Indeed for nRF52840/nRF52833 the OPS says:

The same 32-bit word in the flash can only be written nWRITE number of times before a page erase must be performed.

The NVMC is only able to write 0 to bits in the flash that are erased (set to 1). It cannot rewrite a bit back to 1. Only full 32-bit words can be written to flash using the NVMC interface. To write less than 32 bits, write the data as a full 32-bit word and set all the bits that should remain unchanged in the word to 1. Note that the restriction on the number of writes (nWRITE) still applies in this case.

The nWRITE values is defined as 2 (maximum).

I'll ask hardware engineers to elaborate what are the implicantions with writing to the same address 3 times. Most likely it follows the @yuping-xiao description. But in any case, I think we should try to follow the datasheets (now we know there are few CPU affected) and fix it asap.

Can you Jonathan provide answers for these questions:

  1. Does the 3x flash write per one 32-bit address were performed also with the previous implementation of the flash driver?
  2. Elaborate if you can see the possibility to rewrite the code to have just 2x write operation per 32-bit word, keeping the backward compatibility?

@dismirlian
Copy link
Contributor Author

Hi @yuping-xiao, thanks for your response!

  1. I can see the two writes in Add and one write in Delete but it’s not obvious to me that they are writing to the exact same flash location. Can you confirm/elaborate? This question is due to my lack of familiarity with the code.

Each record begins with a RecordHeader, which has the following members:

        uint16_t mKey;
        uint16_t mFlags;
        uint16_t mLength;
        uint16_t mReserved;

mFlags has some relevant bits that indicate the state of the record:

            kFlagsInit       = 0xffff, ///< Flags initialize to all-ones.
            kFlagAddBegin    = 1 << 0, ///< 0 indicates record write has started, 1 otherwise.
            kFlagAddComplete = 1 << 1, ///< 0 indicates record write has completed, 1 otherwise.
            kFlagDelete      = 1 << 2, ///< 0 indicates record was deleted, 1 otherwise.
            kFlagFirst       = 1 << 3, ///< 0 indicates first record for key, 1 otherwise.

The code adds a record by writing on erased flash space, aligned to 4 bytes. During Add, it writes the RecordHeader with kFlagAddBegin = 0 (the rest of the mFlags/mLength/mReserved bits are 1). Then, it writes the data, and afterwards rewrites the whole header with kFlagAddComplete = 0. So the whole RecordHeader (2x 32bit words) is written twice during Add, even if only one bit is flipped from 1 to 0 during the second write.

During Delete, the record is marked as deleted by writing the whole header with kFlagDelete = 0. This is the third write operation on the header, only modifying one bit. There is something similar going on with the kFlagFirst bit.

I should add that we ask those using EFR32 to adopt Silicon Labs NVM3 solution instead of the OT default NV manager. One thing to note is the token values cannot be expected to be preserved during firmware upgrade if switching from OT default NV manager to NVM3.

Ok, that's clear, thanks.

@dismirlian
Copy link
Contributor Author

dismirlian commented Apr 9, 2020

@LuDuda, given that only mFlags requires update, we could use mReserved for the first and deleted flags. Incidentally (and fortunately) mReserved and mFlags are in separate 32-bit words. What do you think?

@jwhui
Copy link
Member

jwhui commented Apr 9, 2020

@LuDuda, given that only mFlags requires update, we could use mReserved for the first and deleted flags. Incidentally (and fortunately) mReserved and mFlags are in separate 32-bit words. What do you think?

@dismirlian, I think this could work. Of course, we would also need to support reading the existing delete flag for backward compatibility. Would you like to take a stab at implementing this?

@dismirlian
Copy link
Contributor Author

@jwhui I can give it a shot.

@yuping-xiao, the EFR32MG21 reference manual says:

The Flash memory is organized into 64-bit wide double-words. Each 64-bit double-word can be written only twice between erase cycles. The lower and upper 32-bit words may be written sequentially in any order, or one at a time. Each flash bit is 1 after erase. Writing
a 0 will clear the bit. Writing a 1 will not change the bit value.

While it is possible to write twice to the lower or upper 32-bit word of the 64-bit double word, then the other 32-bit word cannot be used. In this case, it is permitted to write to either the lower or upper 32-bit word twice between each erase, so long as no bit is ever cleared
more than once.

Could you clarify the last sentence? Specifically, what does it mean that "... the other 32-bit word cannot be used"?

Thanks.

@dismirlian
Copy link
Contributor Author

@jwhui on second thought, a word can be written more than 2 times with this method. Imagine we put the deleted and first flags in mReserved. Then, there is a possibility that mReserved is first written to 0xffff together with mLength, then a second time with kFlagFirst = 0, and then a third time with kFlagDeleted = 0. I think it would be possible to elide the second write, which occurs if we are deleting record index 0, and there is another record which should become the first, here:

if ((index == 1) && (aIndex == 0))
{
record.SetFirst();
otPlatFlashWrite(&GetInstance(), mSwapIndex, offset, &record, sizeof(record));
}

What do you think? Any other ideas?

@jwhui
Copy link
Member

jwhui commented Apr 10, 2020

@dismirlian, some thoughts:

  1. Only write the "key", "length", and "first" values after the data has been written out. This would require always scanning the "unused" portion of flash during initialization to check if there was a record in the middle of writing. This would remove the need to have "add begin" and "add complete" flags.

  2. Change the Delete(aIndex=-1) implementation to simply add a zero-length record marked as "first" and "deleted" simultaneously.

In summary, the last record with the "first" flag is where index 0 is. We should have at most 2 writes to the header (once when writing the header out after writing the value and potentially a second time if the record is deleted using an explicit index value).

Do you think this will work?

@jwhui
Copy link
Member

jwhui commented Apr 30, 2020

@dismirlian , any thoughts on the above? Have you been able to make any progress on this?

@dismirlian
Copy link
Contributor Author

Hi @jwhui, sorry for the delay, I've been busy... I'm planning to do it over the weekend. I'll keep you updated.

@silabs-YupingX
Copy link

@dismirlian about your question on April 10 about flash write on EFR32MG21 (sorry about the super late response, I missed it earlier but thought it'd be better late than never):

The Flash memory is organized into 64-bit wide double-words. Each 64-bit double-word can be written only twice between erase cycles, so long as no bit is ever cleared (i.e. written as a 0) more than once.

If a bit is written during the first write (1->0), then in the second write this bit should be masked. Any other “1” bits can be cleared to 0. For example, if first time we write
0xFFFFFFFF_FFFFFF00
We can then write 0xFFFFFFFF_FFFF00FF the 2nd time to achieve a result of 0xFFFFFFFF_FFFF0000,
or we can write 0xFFFFFF00_FFFFFFFF the 2nd time to achieve a result of 0xFFFFFF00_FFFFFF00,
or we can write 0xFFFFFF00_FFFF00FF the 2nd time to achieve a result of 0xFFFFFF00_FFFF0000.
After a second write, the 64-bit double-word cannot be written again until after a page erase.

Hope this is clearer now!

@dismirlian
Copy link
Contributor Author

Hi @silabs-YupingX, thanks for your answer. Yes, it's clear!

@jwhui I think this would require a larger change to the settings driver. Given that the problem is specific for the EFR32MG2x platform, and that Silabs has transitioned to the NVM3 implementation, I don't think this is worth tackling. What do you think?

@jwhui
Copy link
Member

jwhui commented Jun 1, 2020

@jwhui I think this would require a larger change to the settings driver. Given that the problem is specific for the EFR32MG2x platform, and that Silabs has transitioned to the NVM3 implementation, I don't think this is worth tackling. What do you think?

Are you suggesting to close #4926 or keep it since it potentially helps the Nordic case?

What do you think of my proposal in #4706 (comment) ? This would allow effectively two writes (once when we initially create the record, and only one more when we delete the record).

@dismirlian
Copy link
Contributor Author

Hi @jwhui, I think #4926 potentially helps the Nordic case. If you (and Nordic) agree, we could keep it (or a variation of it). I don't know if it's worth tackling the 64-bit word case.

About your proposals:

  1. Only write the "key", "length", and "first" values after the data has been written out. This would require always scanning the "unused" portion of flash during initialization to check if there was a record in the middle of writing. This would remove the need to have "add begin" and "add complete" flags.

I think this has the minor disadvantage that data written as 0xff will not be counted/"detected"; I know this is probably nitpicking.

  1. Change the Delete(aIndex=-1) implementation to simply add a zero-length record marked as "first" and "deleted" simultaneously.

I actually liked this one, but when you posted your idea I had already began working on the alternative I ended up proposing. If you prefer this one, I can update the PR.

@jwhui
Copy link
Member

jwhui commented Jun 8, 2020

@dismirlian, thanks again for your contributions on this issue.

In general, if we can solve the issue more generally, I prefer that. In this case, while EFR-32 has its preferred path moving forward, there may be other platforms in the future that have similar limitations. It would be nice not to have to keep churning on the flash layout.

I think this has the minor disadvantage that data written as 0xff will not be counted/"detected"; I know this is probably nitpicking.

One possibility it to check and avoid writing "0xff"?

I actually liked this one, but when you posted your idea I had already began working on the alternative I ended up proposing. If you prefer this one, I can update the PR.

Apologies for the additional effort. As mentioned above, I would prefer a more general solution. Thanks again for your effort on this.

@jwhui
Copy link
Member

jwhui commented Jul 15, 2020

@dismirlian , any updates on this? Is this still something you would like to push forward?

@dismirlian
Copy link
Contributor Author

Hi @jwhui, I'm sorry for the slow responsiveness; I've been very busy. I can give it a shot this weekend.

@dismirlian
Copy link
Contributor Author

Hi @jwhui, I've been thinking about this. I think your proposals are a step in the right direction, but I still have some concerns:

  1. Your first suggestion eliminates the need for the begin/complete flags (we must also avoid writing 0xffffffff words altogether). There is one minor detail on which I'd appreciate some input: nothing assures us that the header is written atomically. In case of a power failure/reset while writing the header, it can end up in a bad state. The kComplete flag ensured that this was never the case. We could implement some kind of check, maybe duplicating part of the header; what do you think?

  2. We should align each record to 64 bits, and increase kSwapMarkerSize accordingly.

  3. There are some cases in which this will result in three writes. For example, starting with an empty swap area:

    • Operation 1: Add(kKey1, some_data)
    • Operation 2: Add(kKey1, some_data)
    • Operation 3: Delete(kKey1, index=0)
    • Operation 4: Delete(kKey1, index=0)

    Operation 4 would result in three writes in the header of record 2. Something like 9e7e920 would solve the issue. There could be other possible solutions.

In sum, I propose the following path forward:

  • Implement your suggestion to eliminate kBegin/kComplete flags, taking care of avoiding writing 0xffffffff, and implementing some kind of mechanism to detect partial header writes (suggestions?).
  • Align everything to 64 bits.
  • Avoid writing kFlagFirst when deleting record with index=0.

As an unrelated issue, I propose implementing an erase counter for every swap area. I think this would provide valuable insights. This would be easy if we increase kSwapMarkerSize.

@jwhui
Copy link
Member

jwhui commented Aug 3, 2020

@dismirlian

  1. Your first suggestion eliminates the need for the begin/complete flags (we must also avoid writing 0xffffffff words altogether). There is one minor detail on which I'd appreciate some input: nothing assures us that the header is written atomically. In case of a power failure/reset while writing the header, it can end up in a bad state. The kComplete flag ensured that this was never the case. We could implement some kind of check, maybe duplicating part of the header; what do you think?

I wonder if we should go one step further and add some kind of integrity check to the end of the record, like a CRC-32. The integrity check would ensure the header and data were written properly. Thoughts?

  1. We should align each record to 64 bits, and increase kSwapMarkerSize accordingly.

Agree.

  1. There are some cases in which this will result in three writes. For example, starting with an empty swap area:

    • Operation 1: Add(kKey1, some_data)
    • Operation 2: Add(kKey1, some_data)
    • Operation 3: Delete(kKey1, index=0)
    • Operation 4: Delete(kKey1, index=0)

    Operation 4 would result in three writes in the header of record 2. Something like 9e7e920 would solve the issue. There could be other possible solutions.

Why would Operation 4 cause 3 writes to record 2? Operation 3 should set the "delete" flag in record 1. Operation 4 should set the "delete" flag in record 2. I must be missing something.

As an unrelated issue, I propose implementing an erase counter for every swap area. I think this would provide valuable insights. This would be easy if we increase kSwapMarkerSize.

An erase counter sounds like a great idea.

@dismirlian
Copy link
Contributor Author

@jwhui

I wonder if we should go one step further and add some kind of integrity check to the end of the record, like a CRC-32. The integrity check would ensure the header and data were written properly. Thoughts?

Well, this would make the implementation much more robust. We could use the CRC16 implementation already present (here).

Why would Operation 4 cause 3 writes to record 2? Operation 3 should set the "delete" flag in record 1. Operation 4 should set the "delete" flag in record 2. I must be missing something.

Maybe I'm making a mistake, but here is what I understand:

  • Operation 1: Add(kKey1, some_data): Write to record location 1.
  • Operation 2: Add(kKey1, some_data): Write to record location 2. <== 1st write to record 2.
  • Operation 3: Delete(kKey1, index=0): Mark record 1 as deleted, mark record 2 as first (here). <== 2nd write to record 2.
  • Operation 4: Delete(kKey1, index=0): Mark record 2 as deleted. <== 3rd write to record 2.

Commit 9e7e920 elides the marking of the record with index == 1 as first when deleting index == 0.

As an unrelated issue, I propose implementing an erase counter for every swap area. I think this would provide valuable insights. This would be easy if we increase kSwapMarkerSize.

An erase counter sounds like a great idea.
Great.

I have another concern:

  1. This change will decrease the effective room for data in the flash. I really don't know if this is something which could affect the operation of OT in implementations with small swap areas. As a side note, the first swap operation between the old and new format might be unable to copy some records to the new area with the new format (what would happen with the dataset?).

What do you think?

Thanks!

@jwhui
Copy link
Member

jwhui commented Aug 6, 2020

@dismirlian

Thanks for your continued effort on this.

Well, this would make the implementation much more robust. We could use the CRC16 implementation already present (here).

Sounds good to me.

Maybe I'm making a mistake, but here is what I understand:

  • Operation 1: Add(kKey1, some_data): Write to record location 1.
  • Operation 2: Add(kKey1, some_data): Write to record location 2. <== 1st write to record 2.
  • Operation 3: Delete(kKey1, index=0): Mark record 1 as deleted, mark record 2 as first (here). <== 2nd write to record 2.
  • Operation 4: Delete(kKey1, index=0): Mark record 2 as deleted. <== 3rd write to record 2.

Commit 9e7e920 elides the marking of the record with index == 1 as first when deleting index == 0.

I was hoping to remove the "first" flag. We only write to a record a second time if it is deleted.

In the case of deleting all records for a given key, we could write a new zero-length record at the end indicating that all previous records have been deleted.

I have another concern:

  1. This change will decrease the effective room for data in the flash. I really don't know if this is something which could affect the operation of OT in implementations with small swap areas. As a side note, the first swap operation between the old and new format might be unable to copy some records to the new area with the new format (what would happen with the dataset?).

I think the swap areas need to be sufficiently large to avoid wear. If the swap areas are too small, then each new write could cause an erase. My inclination is that the size increase should not be much of a concern.

@dismirlian
Copy link
Contributor Author

@jwhui

Maybe I'm making a mistake, but here is what I understand:

  • Operation 1: Add(kKey1, some_data): Write to record location 1.
  • Operation 2: Add(kKey1, some_data): Write to record location 2. <== 1st write to record 2.
  • Operation 3: Delete(kKey1, index=0): Mark record 1 as deleted, mark record 2 as first (here). <== 2nd write to record 2.
  • Operation 4: Delete(kKey1, index=0): Mark record 2 as deleted. <== 3rd write to record 2.

Commit 9e7e920 elides the marking of the record with index == 1 as first when deleting index == 0.

I was hoping to remove the "first" flag. We only write to a record a second time if it is deleted.

In the case of deleting all records for a given key, we could write a new zero-length record at the end indicating that all previous records have been deleted.

Ok, so for the Set() operation we should delete all existing records instead of using the "first" flag. The zero-length record would "invalidate" all previous records (of the same index). I think this would work.

BTW, I think that marking the record "deleted" instead of writing a zero-length record at the end would simplify the Get and Swap logic. However, it would make Delete(-1) take longer.

I have another concern:

  1. This change will decrease the effective room for data in the flash. I really don't know if this is something which could affect the operation of OT in implementations with small swap areas. As a side note, the first swap operation between the old and new format might be unable to copy some records to the new area with the new format (what would happen with the dataset?).

I think the swap areas need to be sufficiently large to avoid wear. If the swap areas are too small, then each new write could cause an erase. My inclination is that the size increase should not be much of a concern.

Great!

@jwhui
Copy link
Member

jwhui commented Aug 6, 2020

@dismirlian

I think that marking the record "deleted" instead of writing a zero-length record at the end would simplify the Get and Swap logic. However, it would make Delete(-1) take longer.

Yes, I think the zero-length record is an optimization. I'll leave it to your judgement on which approach to take.

@dismirlian
Copy link
Contributor Author

@jwhui, please see #5378.

@LuDuda LuDuda removed their assignment Aug 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants