Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdlib] Remove attribute self._reserved from Dict #2905

Closed

Conversation

gabrieldemarmiesse
Copy link
Contributor

@gabrieldemarmiesse gabrieldemarmiesse commented May 31, 2024

Rationale:

Our Dict implementation is too complicated for what it does. We have to keep self._reserved updated so that it's always equal to len(self._entries). That's 1) taking memory (64 bits) for nothing 2) is a potential source of bugs as self._reserved could become out of sync with len(self._entries) in future code changes.

We could use len(self._entries) everywhere, but from an algorithmic point of view, I guess it's also easier to read self._reserved.

Since we don't have @property available in Mojo yet, I settled with

fn _reserved(self) -> Int:
     return len(self._entries)

Using all the capacity available in the List

I also added another optimization here, to make the dict compatible in the future with small buffer optimization:
If we ask self._entries to allocate for N elements, but the List allocator tell us that now the capacity of the list is M, then we use M as the size of the list (and also the capacity).

Concretely, let's say that we have small buffer optimization and that we ask for a list with a capacity of 8, it may be possible that the list has a small buffer of size 16, thus giving us a list of capacity 16 (the minimum) when we asked 8. Then we can use the 16 slots available. This is also good in case we later on allow Dict to grab another structure's pointer/list, as we'll use all the memory available.

If the second optimization is not a good idea, I can remove it. There might be some requirement (always be a power of 2 for example) that I'm not aware of and in this case we shouldn't let List decide of the capacity of our Dict.

Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
@gabrieldemarmiesse gabrieldemarmiesse requested a review from a team as a code owner May 31, 2024 18:28
self._index = _DictIndex(self._reserved)
self._entries = Self._new_entries(self._reserved)
self._entries = Self._new_entries(Self._initial_reservation)
self._index = _DictIndex(len(self._entries))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I guess it makes more sense to use Self._initial_reservation here to make clear that it's compile time constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

len(self._entries) can be greater than or equal to Self._initial_reservation, but they're not equal in all cases. See the section "Using all the capacity available in the List" of the pull request description

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point exactly: should we leave that to List at all? Your call anyways, shouldn't be blocking.

@ConnorGray
Copy link
Collaborator

!sync

1 similar comment
@ConnorGray
Copy link
Collaborator

!sync

Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
@ConnorGray
Copy link
Collaborator

!sync

@modularbot modularbot added the imported-internally Signals that a given pull request has been imported internally. label Jun 5, 2024
@modularbot
Copy link
Collaborator

✅🟣 This contribution has been merged 🟣✅

Your pull request has been merged to the internal upstream Mojo sources. It will be reflected here in the Mojo repository on the nightly branch during the next Mojo nightly release, typically within the next 24-48 hours.

We use Copybara to merge external contributions, click here to learn more.

@modularbot modularbot added the merged-internally Indicates that this pull request has been merged internally label Jun 5, 2024
@modularbot
Copy link
Collaborator

Landed in a831f69! Thank you for your contribution 🎉

@modularbot modularbot added the merged-externally Merged externally in public mojo repo label Jun 6, 2024
@modularbot modularbot closed this Jun 6, 2024
modularbot pushed a commit that referenced this pull request Jun 6, 2024
…1393)

[External] [stdlib] Remove attribute `self._reserved` from `Dict`

### Rationale:
Our Dict implementation is too complicated for what it does. We have to
keep `self._reserved` updated so that it's always equal to
`len(self._entries)`. That's 1) taking memory (64 bits) for nothing 2)
is a potential source of bugs as `self._reserved` could become out of
sync with `len(self._entries)` in future code changes.

We could use `len(self._entries)` everywhere, but from an algorithmic
point of view, I guess it's also easier to read `self._reserved`.

Since we don't have `@property` available in Mojo yet, I settled with
```mojo
fn _reserved(self) -> Int:
     return len(self._entries)
```

### Using all the capacity available in the List
I also added another optimization here, to make the dict compatible in
the future with small buffer optimization:
If we ask `self._entries` to allocate for N elements, but the List
allocator tell us that now the capacity of the list is M, then we use M
as the size of the list (and also the capacity).

Concretely, let's say that we have small buffer optimization and that we
ask for a list with a capacity of 8, it may be possible that the list
has a small buffer of size 16, thus giving us a list of capacity 16 (the
minimum) when we asked 8. Then we can use the 16 slots available. This
is also good in case we later on allow Dict to grab another structure's
pointer/list, as we'll use all the memory available.

If the second optimization is not a good idea, I can remove it. There
might be some requirement (always be a power of 2 for example) that I'm
not aware of and in this case we shouldn't let `List` decide of the
capacity of our Dict.

Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com>
Closes #2905
MODULAR_ORIG_COMMIT_REV_ID: 066be7c692f8effeba508574f753d2acbbb358b0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
imported-internally Signals that a given pull request has been imported internally. merged-externally Merged externally in public mojo repo merged-internally Indicates that this pull request has been merged internally
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants