Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce Python reference cycles especially when interacting with MS Word #11414

merged 5 commits into from Jul 27, 2020


Copy link

@michaelDCurran michaelDCurran commented Jul 23, 2020

Link to issue number:


Summary of the issue:

Python reference counts all its objects, and automatically deletes them as soon as their reference count reaches 0. Most of the time this is as soon as the object goes out of scope. However, if there is a reference cycle, where by an object directly or indirectly references itself, the reference count will never reach 0.
The Python garbage collection (gc) therefore periodically walks all tracked objects, and detects and breaks reference cycles once an object is deemed to be unreachable. However, this can happen in any thread, and sometimes a comtypes COM object might be deleted in the wrong thread.
Although reference cycles can never be completely avoided in Python (tracebacks can sometimes cause them) there are a few places in NvDA where we are creating them, where we don't really have to. These particular cycles are part of the cause for the freezes in MS Word as these cycles are cleaned up by the garbage collector sometimes in one of our incoming rpc threads.

Description of how this pull request fixes the issue:

Refactor the code to address these particular cycles.


Previously, when initializing, this class creates an XMLParser object and stores it as a member on itself. But it also assigns some of its own instance methods to this parser object, thereby creating a reference cycle.
This in tern is holding around ControlField objects, which in tern are holding around some MS Word Range COM objects.
This code has now been refactored to not do anything at initialization time, and instead just hold the parser as a local variable in the parse method. So although we still assign self's instance methods to the parser, we don't store the parser on self.

WordDocumentTextInfo and lazy controlFields

When we normalize the ControlFields for Microsoft Word text content, we specifically make the row and column header values load lazily, as these are rather costly to compute, and not always required. We previously did this by creating a ControlField subclass which an overridden 'get' method. However, this subclass was defined inline inside WordDocumentextInfo's _normalizeControlField method, also making reference to self as a non-local variable. This somehow was causing a reference cycle on WordDocumentTextInfo... I don't fully understand this one, but it was hard to read and rather unnecessary anyway.
This code has been refactored out into its own class that is not inline, and takes the WordDocumentExtInfo instance as an argument, rather than referencing it as a non-local variable.
This fix, and the fix above, no longer leaves MS Word Range COM objects alive for the garbage collector to eventually find and destroy (possibly in the wrong thread).


For certain editable controls that fire suitable caret events and such, we try and detect selection changes automatically. We do this by storing a TextInfo representing the initial selection on the object in the gainFocus event, and then keep checking and updating the TextInfo each time the caret moves. However, we never delete this TextInfo from the object, and for some TextInfos (such as CompoundDocument TextInfos) this creates reference cycles as the TextInfo may hold child NVDAObjects who may have cached their parent (the original object).
Now, a terminateautoSelectDetection method has been added to EditableTextWithAutoSelectDetection which clears the TextInfo from the object. the loseFocus event now calls terminateAutoSelectDetection, matching the call to inititAutoSelectDetection in the gainFocus event.

Testing performed:

  • Instructed Python to save all unreachable objects in gc.garbage with gc.set_debug(gc.DEBUG_SAVEALL)
  • Interacted with Microsoft Word (sch as moving with up and down arrows, navigating a table)
  • Forced Python to do a garbage collection run with gc.collect()
  • Looked through the contents of gc.garbage to verify that none of the above objects (WordDocumentTextInfo, XMLParser etc) appear in this list anymore.

Also, ensured that Microsoft word no longer froze after typing very quickly in a Word document for about 30 seconds or so, with Braille enabled.

Known issues with pull request:

This by no means gets rid of all possible reference cycles caused by NVDA. But it does address the ones that played a part in the Microsoft Word freezes we have been seeing recently.

…internal parser at the same time as assigning our object's methods to it.
…longer inline and does not create a reference cycle.
…tion in loseFocus event to ensure that NVDAObjects don't hold on to the TextInfo representing the selection for ever.
Copy link

@feerrenrut feerrenrut left a comment

Good work, couple of minor things, but otherwise this all makes sense to me.

@@ -798,25 +823,7 @@ def _normalizeControlField(self,field):
# Hack support for lazy fetching of row and column header text values
Copy link

@feerrenrut feerrenrut Jul 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this comment.

@@ -558,6 +558,31 @@ def collectionFromRange(self,rangeObj):
def filter(self,item):
return item.type==wdInlineShapeChart

class LazyControlField(textInfos.ControlField):
Copy link

@feerrenrut feerrenrut Jul 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this is just for row and column header text values, consider renaming it to make it clear eg LazyControlField_RowAndColumnHeaderText verbose, but captures the specific nature of what this does. At the moment this class sounds more generic than it is.

Lazy fetching of row and column header text values

class LazyControlField(textInfos.ControlField):

def __init__(self, ti):
self._ti = ti
Copy link

@leonardder leonardder Jul 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered using a weak reference here?

Copy link
Member Author

@michaelDCurran michaelDCurran Jul 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible that the TextInfo may die while the ControlField is still alive, as the TextInfo may only be a local variable where it is used in both speech and braille.

Copy link
Member Author

michaelDCurran commented Jul 26, 2020

@feerrenrut I addressed both your review comments.

@michaelDCurran michaelDCurran merged commit 27134cd into master Jul 27, 2020
1 check passed
@michaelDCurran michaelDCurran deleted the reduceReferenceCycles branch Jul 27, 2020
@nvaccessAuto nvaccessAuto added this to the 2020.3 milestone Jul 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

Successfully merging this pull request may close these issues.

None yet

4 participants