Skip to content

Hide data representation inside RDKit::Dict#9113

Merged
greglandrum merged 9 commits intordkit:masterfrom
postera-ai:rm-getData
Mar 20, 2026
Merged

Hide data representation inside RDKit::Dict#9113
greglandrum merged 9 commits intordkit:masterfrom
postera-ai:rm-getData

Conversation

@bddap
Copy link
Copy Markdown
Contributor

@bddap bddap commented Feb 14, 2026

implements #9112

bddap (Coding Agent) and others added 4 commits February 13, 2026 23:33
Replace direct access to Dict's internal std::vector<Pair> with
encapsulated methods: size(), empty(), const iteration via
begin()/end(), appendPair(), markNonPOD(), and getRawVal().

This enables future changes to Dict's internal representation
without breaking callers.

Ref: rdkit#9112
appendPair(Pair&&) now auto-detects non-POD status via
RDValue::needsCleanup(), eliminating markNonPOD() and the
risk of dangling references or uninitialized entries.

needsCleanup() is placed next to destroy() on RDValue to
keep the POD/non-POD distinction in one place.
Both callers ignored the output. Non-POD detection is now handled
by Dict::appendPair via RDValue::needsCleanup().
Comment thread Code/RDGeneral/StreamOps.h Outdated
for (unsigned index = 0; index < count; ++index) {
CHECK_INVARIANT(streamReadProp(ss, dict.getData()[startSz + index],
dict.getNonPODStatus(), handlers),
Dict::Pair pair;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is likely to be much less performant since we don't have the call to "resize".

Copy link
Copy Markdown
Contributor

@bp-kelley bp-kelley Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See note below

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a new bulk-append method.

  void extend(std::vector<Pair> &&pairs) {
    for (auto &p : pairs) {
      _hasNonPodData |= p.val.needsCleanup();
    }
    _data.insert(_data.end(), std::make_move_iterator(pairs.begin()),
                 std::make_move_iterator(pairs.end()));
  }

It's still one more allocation than master. Since we are moving elements to a new vec.

if (propType == handler->getPropName()) {
handler->read(ss, pair.val);
dictHasNonPOD = true;
return true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have CustomTag in the needsCleanup above.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After review, this might be converted to an AnyTag, we should add a test for this now that we don't have the dictHasNonPod anymore.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See FooHandler in tesDict.cpp

Comment thread Code/RDGeneral/StreamOps.h Outdated
Comment thread Code/RDGeneral/Dict.h Outdated
@bp-kelley
Copy link
Copy Markdown
Contributor

Nice work, I think it's a good idea to hide the implementation. I have a few suggestions and the Custom handlers need to be explicitly tested so that we are sure they are destroyed.

- Add Dict::append(vector<Pair>&&) for bulk insertion with reserve
- Use bulk append in streamReadProps to restore pre-allocation
- Rename getRawVal -> getRDValue per reviewer preference
- Add test verifying custom AnyTag data is destroyed through Dict lifecycle
@bp-kelley
Copy link
Copy Markdown
Contributor

@greglandrum I think this is worth looking at now.

Comment thread Code/RDGeneral/Dict.h
Comment thread Code/RDGeneral/Dict.h Outdated
Comment thread Code/RDGeneral/Dict.h Outdated
Comment thread Code/RDGeneral/testDict.cpp Outdated
bddap (opencode opus-4.6) added 3 commits February 27, 2026 13:01
Exercises the full streamWriteProps/streamReadProps path with an
ExplicitBitVect in an RDProps Dict, confirming the custom handler
is invoked and no memory is leaked (verified under valgrind).
@bddap
Copy link
Copy Markdown
Contributor Author

bddap commented Mar 4, 2026

Nice work, I think it's a good idea to hide the implementation. I have a few suggestions and the Custom handlers need to be explicitly tested so that we are sure they are destroyed.

5116890 adds a test to exercise the DataStructsExplicitBitVecPropHandler custom handler. valgrind reports no leaks.

@bddap bddap marked this pull request as ready for review March 4, 2026 00:36
@bddap bddap changed the title [wip] Hide data representation inside RDKit::Dict Hide data representation inside RDKit::Dict Mar 4, 2026
Copy link
Copy Markdown
Member

@greglandrum greglandrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small change requested to the tests.
Otherewise this looks good to me and I agree with Brian that it makes a lot of sense.

}
};

TEST_CASE("custom AnyTag data is destroyed through Dict lifecycle") {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The count tests here pass even if you never call setVal.
I think this would be more precise if you checked the expected value of count instead of simply testing that it's > a target value.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 0a31cf2

@greglandrum
Copy link
Copy Markdown
Member

@bddap any chance that you can make this change relatively quickly?

This one would need to be in a major release and we're doing one of those next week. Otherwise we'll have it on master, but it won't make it out in a release until September.

Copy link
Copy Markdown
Member

@greglandrum greglandrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@greglandrum greglandrum merged commit cbedbb7 into rdkit:master Mar 20, 2026
12 checks passed
@greglandrum
Copy link
Copy Markdown
Member

Thanks for the contribution @bddap!

@greglandrum greglandrum added the Cleanup Code cleanup and refactoring label Mar 22, 2026
@greglandrum greglandrum added this to the 2026_03_1 milestone Mar 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Cleanup Code cleanup and refactoring

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants