Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto language detection based on script detection In t2990 review #7629

Closed
wants to merge 17 commits into from

Conversation

dineshkaushal
Copy link
Contributor

Link to issue number:

#2990

Summary of the issue:

Often text is written in multiple languages, but some applications such as notepad do not detect language automatically. Synthesizer needs to know which language to use for a specified script.

Description of how this pull request fixes the issue:

We use Unicode script property to detect script of a character. This approach is different from block based approaches. The problem with block based approach is that many characters could be in different blocks.

At next level, if a script could be used by multiple languages, there is an option in Language detection dialog to choose preferred language.

Testing performed:

@nishimoto and I have developed 21 test cases to verify our expectations.

Known issues with pull request:

The language detection may not work for some languages, so I have added disable script detection option in language detection dialog. for now this option is disabled by default. Thanks to @nishimoto, Japanese language should be working fine.

Change log entry:

  • New features
    Add ability to detect language based on script property of Unicode

Copy link
Contributor

@feerrenrut feerrenrut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a big change, that I will have to go through carefully again. Thanks for all the work!

source/languageDetection.py Outdated Show resolved Hide resolved
source/gui/settingsDialogs.py Outdated Show resolved Hide resolved
source/gui/settingsDialogs.py Outdated Show resolved Hide resolved
source/gui/settingsDialogs.py Outdated Show resolved Hide resolved
#while adding new languages we filter out existing languages in the prefered language list
ignoreLanguages = {x[0] for x in self.languageNames}
languageList = languageDetection.getLanguagesWithDescriptions(ignoreLanguages)
dialog = wx.SingleChoiceDialog(None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you consider using a wx.MultiChoiceDialog? This would allow the user to add all of the languages they care about at once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why we are not using multi choice dialog is when we add a language, it gets added on top, and in most of the cases users would only add one language.

Sequence of these languages also affects which language gets priority.

@@ -0,0 +1,824 @@
scriptCode= [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the usual copyright header. Also a docstring / explanation for this would be helpful.
Include in the docstring answers to the following:

  • What each value represents, is it (unicodeStartOfRange, unicodeEndOfRange, rangeDescription)?
  • Should value ranges overlap?
  • Should all values be represented?
  • Are these values contiguous?

Please also add unit tests to confirm the above restrictions should they exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added copyright and doc string with details about each item, but adding test cases may not be needed as these values are obtained from scripts.txt and it is done occasionally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though this data is created programmatically, tests for the data will help to ensure its correctness. It will be hard to test the script that creates this data, and it will complicate the code in NVDA to remove the assumptions made. An automated test to ensure that these assumptions are correct will make it much easier to track down the kind of bugs that would arise if this data does get out of order / values overlapping.

Please also state if the start / end of each range is inclusive or exclusive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This range comes from scripts.txt which comes from unicode.org. Should we still check it? I am assuming and I could be wrong that Unicode would not make an overlapping list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another point, this list is sorted and is created once in a while.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm more concerned that a mistake is made in the process of creating this. Perhaps the source data on unicode.org changes format and this causes a bug in the importer, this is even more likely that a mistake will be made if its been a while since someone last went through the process, and there is nothing in the production code to ensure the assumptions are met. It's quite hard to manually check that our assumptions about the scriptCode data hold true, but it's quite easy to write a unit test to do the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 3 tests for checking the validity of the list.

  1. test_unicodeRangesEntryStartLessEqualEnd
  2. test_unicodeRangesEntriesDoNotOverlapAndAreSorted
  3. test_unicodeRangesEntryScriptNamesExist

Do let me know if we need to add more tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness, we could also test that the first elements start value is greater than 0.

elif 0xff10 <= characterUnicodeCode <= 0xff19:
return "FullWidthNumber"
while( mEnd >= mStart ):
midPoint = (mStart + mEnd ) >> 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there not a built in way of performing this search? So we don't reinvent the wheel, so to speak?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so as is it not a simple binary search.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use bisect, in the initialisation create a list of the rangeEndValues like so:
unicodeScriptRangeEnd = [ k[1] for k in scriptCode]
Then in this function you can do:

# Based on the following assumptions: 
# - ranges must overlap
# - range end and start values are included in that range
# - there may be gaps between ranges.

# Approach: Look for the first index of a range where the range end value is greater
# than the code we are searching for. If this is found, and the start value for this range
# is less than or equal to the code we are searching for then we have found the range.
# That is startValue <= characterUnicodeCode <= endValue

index = bisect.bisect_left(unicodeScriptRangeEnd, characterUnicodeCode )
if index == len(unicodeScriptRangeEnd):
    # there is no value of index such that: `characterUnicodeCode <= scriptCode[index][1]`
    # characterUnicodeCode is larger than all of the range end values so a range is not 
    # found for the value:
    return None

# Since the range at index is the first where `characterUnicodeCode <= rangeEnd` is True,
# we now ensure that for the range at the index `characterUnicodeCode >= rangeStart` 
# is also True. 
candidateRange = scriptCode[index]
rangeStart = candidateRange[0]
if rangeStart > characterUnicodeCode :
    # characterUnicodeCode comes before the start of the range at index so a range 
    # is not found for the value
    return None
rangeName = candidateRange[2]
return rangeName

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumption of ranges overlapping is not valid.

At the outset this algorithm looks to me is O(N) and binary search is O(log N) n being the number of ranges.
The loop
unicodeScriptRangeEnd = [ k[1] for k in scriptCode]
runs for number of ranges.
getScriptCode is called for every character so it should be as fast as possible. Considering that I think binary search is better.

I am not familiar with bisect so I am looking in to it.
Isn't it better to not modify this function if it is working fine?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumption of ranges overlapping is not valid.

Oh, the comment meant to say "must not overlap"

At the outset this algorithm looks to me is O(N) and binary search is O(log N) n being the number of ranges.

I believe bisect is O(log N). There is the step to split out the rangeEndValues, though this should only be done once, and as such should not be noticable. If the performance of the function is such a concern, then we should measure it and compare results of optimised results.

Isn't it better to not modify this function if it is working fine?
My concerns were readability, and for edge cases that are easy to miss, and therefor write tests for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said earlier, bisect would be O(log N) but the loop before that would be O(N). We can take that loop out, but that method looks more complicated to me.
unicodeScriptRangeEnd = [ k[1] for k in scriptCode]

I would try to write test than change this algorithm which took me some time to test. But at that time, we didn't have unit tests in NVDA so I had only done manual testing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the creation of unicodeScriptRangeEnd is O(N). But it is only done once, so it does not need to be considered. Since speed is a concern I thought I would measure these approaches, see this gist with a timing script. Place the file testSpeedUnicodeCharacterLookup.py in the <nvdaRepoRoot>/source/ directory of your branch.

The results on my machine were:

time python testSpeedUnicodeCharacterLookup.py
using 100 iterations
testing over range 0-65535, a total of 65535 values
withBisect: 5.69379344301
customBinarySearch: 12.9052367034

real    0m18.679s
user    0m0.015s
sys     0m0.000s

I have not profiled the two solutions, so I can not say why bisect is so much faster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will also notice in this script that there are problems with using ord() on values of 0x10000 / 65536 or greater. This is something that might need to be considered for this PR. Is there any situation where chr passed to getScriptCode() could be this large? The values in scriptRanges go up to 0Xe007f .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your time check. I have implemented your suggestion and removed my custom implementation of binary search.

@rtype: string"""
# we are using loop during search to maintain priority
for priorityLanguage, priorityScript, priorityDescription in languagePriorityListSpec:
log.debugWarning(u"priorityLanguage {}, priorityScript {}, priorityDescription {}".format(priorityLanguage, priorityScript, priorityDescription ) )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if isinstance(item,ScriptChangeCommand):
scriptCode = item.scriptCode
else:
log.debugWarning(u"script: {} for text {} ".format( scriptCode , unicode(item) ) )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be here? What does this log message tell us?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

url = 'http://www.unicode.org/Public/UNIDATA/Scripts.txt'
scriptDataFile = urllib2.urlopen(url)
for line in scriptDataFile:
p = re.findall(r'([0-9A-F]+)(?:\.\.([0-9A-F]+))?\W+(\w+)\s*#\s*(\w+)', line)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide an example (in a comment) of the data that this should work on, including edge cases that had to be worked around.

@dineshkaushal
Copy link
Contributor Author

dineshkaushal commented Jan 18, 2018 via email

@mohdshara
Copy link

@feerrenrut requested changes. Once those review comments / changes are addressed, this PR will go under another review and hopefully be accepted or other changes will be requested.

@dineshkaushal could you please look into the review comments above?

@dineshkaushal
Copy link
Contributor Author

dineshkaushal commented Jan 19, 2018 via email

@param scriptCode: the script identifier
@type scriptCode: int
"""
self.scriptCode =scriptCode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little bit worried about the name collision with the import on line 14: from unicodeScriptData import scriptCode. I don't think there is anywhere where this causes a bug. But I think it's a bit confusing, and isn't immediately obvious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the confusion by moving getScriptCode function from languageDetection.py to unicodeScriptData.py. Earlier I did not put this function in unicodeScriptData as the entire file was being generating by unicodeScriptPrep. now unicodeScriptPrep is generating unicodeScriptDataTemp from which the scriptRanges can be copied to unicodeScriptData. scriptRanges list was earlier known as scriptCode.

…nicodeScriptData, added comments in unicodeScriptPrep explaining what the regular expression does and updated the unicode script ranges
characterUnicodeCode = ord(chr)
# Number should respect preferred language setting
# FullWidthNumber is in Common category, however, it indicates Japanese language context
if DIGIT_ZERO <= characterUnicodeCode <= DIGIT_NINE:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than having a special condition for "Number" and "FullWidthNumber", I think its cleaner to make these entries in the scriptRanges. Add these two entries explicitly from unicodeScriptPrep. Actually, this will require ensuring that there is no overlap with other entries which may be tricky to do. So maybe not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of this suggestion? It would be interesting to test the performance of the getScriptFunction without the two special cases, and manually put those ranges for numbers in the scriptRanges list.

…@feerrenrut and added test cases for testing integraty of scriptRanges
@dineshkaushal
Copy link
Contributor Author

dineshkaushal commented Feb 3, 2018 via email

@@ -885,28 +887,49 @@
( 0Xe0020 , 0Xe007f , "Common" ),
]


unicodeScriptRangeEnd = [ k[1] for k in scriptRanges]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a doc string here please? Mention that for performance reasons this should only be created once.

@feerrenrut
Copy link
Contributor

ord() should be able to check for larger values if we compile with ucs4, but I could not find if NVDA build is compiled with that.

If NVDA can not process unicode characters greater 0x10000 perhaps we should test and slice the scriptRanges list so that NVDA is not searching through a list much larger than necessary. Something like:

OUTSIDE_RANGE_NARROW_PYTHON_BUILD = 0x10000
try:
    unichr(OUTSIDE_RANGE_NARROW_PYTHON_BUILD)
except ValueError:
    newEndIndex = bisect.bisect_left(unicodeScriptRangeEnd, OUTSIDE_RANGE_NARROW_PYTHON_BUILD)
    scriptRanges = scriptRanges[:newEndIndex]
    unicodeScriptRangeEnd = [ k[1] for k in scriptRanges]

Again this is something that should only be done once, and we should perhaps first test the timing to find how big of a difference this actually makes by just manually splitting the list within the tests.

The reason I suggest testing for narrow python build, and only splitting in that case, is that in the future the build may change, and then we have support for this. Though there are likely many things that would need to be updated for that anyway. Will the move to python 3 affect this?

@dineshkaushal
Copy link
Contributor Author

dineshkaushal commented Feb 6, 2018 via email

…criptRanges is greater than or equal to zero
@dineshkaushal
Copy link
Contributor Author

dineshkaushal commented Feb 6, 2018 via email

@dineshkaushal
Copy link
Contributor Author

dineshkaushal commented Feb 6, 2018 via email

@feerrenrut
Copy link
Contributor

After limiting the range to 0x10000, the performance for the same iterations is 0.76481559737

Could you fork that gist to show how you got this result?

Copy link
Contributor

@larry801 larry801 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hiragana and Katakana is only used in Japanese. Bopomofo is used in Chinese

@dineshkaushal
Copy link
Contributor Author

dineshkaushal commented Sep 3, 2018 via email

@Adriani90
Copy link
Collaborator

@feerrenrut were there any further discussions on this PR with @dineshkaushal? Are there any further suggestions to be added / considered? I think this PR is waiting far too long for being finalized.
cc: @michaelDCurran, @Qchristensen, @josephsl, @derekriemer.
@dineshkaushal thank you for your important work on this.

@Adriani90
Copy link
Collaborator

@dineshkaushal did you consider further testing as @feerrenrut proposed above? I guess if this PR is not perfect, it is an optional setting and further issues can be reported by many users. I think this is stable enough at least for a try build or an alpha build. Then if there are big issues caused users will report them and it can be reverted. So is there major performance lag? Note that if you enable pages in document formating settings NVDA becomes incredibly slow in big word documents and though that feature is implemented in final release although lots of users complain about performance issues in MS Word. So testing this PR in a big word document with many tables and with pages enabled will bring NVDA really to its limits if there is significant performance issues caused by this PR.

@mohdshara
Copy link

I completely agree with @Adriani90 . as @dineshkaushal stressed, this PR is optional and won't have any impact if not enabled. I really think it deserves to see the light of day. @michaelDCurran could you kindly also give us your 2 cents on this matter?

@aaclause
Copy link
Contributor

aaclause commented Jan 6, 2019

I also completely agree with @Adriani90.
@dineshkaushal In the meantime, maybe that an add-on could be an alternative for this work?

@feerrenrut
Copy link
Contributor

I understand the frustrations expressed here. Regarding priorities, I agree this is important, and expect this PR to become my main focus soon. This PR is stuck because there has still not been a thorough explanation of the performance impact.

only one function is the entry point for the detection code

How often is this function called per NVDA update loop / pump?

I gave some suggestions for how to put the performance impact into perspective:

  • The current timing specifically looks at one function and gets an average of 2.4 millisecond time for the detection.
  • Some context needs to be given to this, how long (absolute and relative) is this compared to the standard core pump time?
  • Analysis of the number of times this code will run within each pump. ( The min, norm and max)

Secondly, this code only gets invoked when the user chooses for the detection so for those, for whom this feature is not helpful, it does not impact in any way.

This is fine, but we want to be sure that we understand the performance impact of the feature when it is enabled.

Speaking generally, it may seem like a good idea to merge a PR, waiting for complaints and then fix them. But it actually means that NVDA maintainers are committing to maintain this code. It is also relying on users to test the feature, sometimes issues are not found for several releases. There is no guarantee that the original contributor will address issues raised. This is particularly concerning when our code review feedback is not addressed.

@larry801
Copy link
Contributor

larry801 commented Jan 7, 2019 via email

@feerrenrut
Copy link
Contributor

I have started looking at this PR. The results of running the unit test on my machine are as follows, I believe the units are seconds:

Additional time per line with detection on 0.000394934361649
total timeTakenWithDetectionOn 2.50316016579
Total timeTakenWithDetectionOff 0.0123091468666
Number of lines 6307 text length 1545715

ThetimeTakenWithDetectionOff value is the time taken to run the same code with the feature turned off using config. Which results in it falling back to regular language switching."

Manual Testing

Using the "SampleText.txt" document (from the unit tests, whixh has many lines in different scripts) I open it in Notepad++ and move by line (using espeak synth) paying attention to the responsiveness of NVDA.

  • Some lines seem to be ok, but when encountering a new script there is occasionally a long pause.
  • When trying to use OneCore or SAPI5 I do not get any output for the majority of these lines. I assume that it is because I do not have the required languages installed.

Further performance testing

To put the performance of this feature in perspecive let's answer the following:

  • What is the total time taken for an loop in NVDA?
  • How much of this time is taken by the language detection code?
  • How many times per loop is the language detection code run?

To achieve this I:

  • I added some metrics to core.py main() corepump(), speech.py speak() and speech.py speakSpellingGen()

  • Each time we detect language it is timed and added to a total, and a counter is incremented

  • In corePump() we time the whole pump, and at the end reset the detectLanguage total time and counter, and print out the values to the log.

  • Results look like this:
    end of core pump, took 0.0716s, detect language took 0.0002s, called 2 times

    Stats:

    Measure Core Loop Language Detection
    Number of samples: 279 279
    Mean 0.1515 0.0001
    Mode 0.0016 0
    Max 9.4875 0.0015
    Min 0 0
    Std Dev 0.9173 0.0001
  • This is interesting, but it does not seem that the delays are coming from the language detection code. Looking at the top 10 records for core loop time:

    Core loop Language detection time Number of calls to language detection
    9.4875 0.001 3
    8.7615 0.0001 1
    5.4363 0.0001 1
    5.1658 0.0001 1
    4.083 0 1
    0.3585 0 0
    0.2061 0.0007 1
    0.184 0.0009 1
    0.1828 0.0006 1
    0.182 0.0015 1

UX

  • Add button should add the next item to the bottom of the list.
  • Due to the presence of the "move up" and "move down" buttons, it seems like the order of the preferred languages is important, however this is not explained in the GUI.
  • The explanation for preferred languages needs improvement.
  • If this is tied to a setting in another dialog, ("automatic language switching" in voice settings) perhaps we should rethink its location. Perhaps automatic language switching should be present in both.
    • From the users standpoint, why is this a different option from automatic language switching?
  • The label for "Disable scriptDetection"
    • This is not a very user friendly name. Consider those who are not developers.
    • It's clearer if this the wording / logic of this check box is inverted. Check to enable, uncheck to disable.
      If all the settings on this dialog are related to script detection, enabling / disabling script detection should be the first choice, other controls should be disabled when script detection is disabled.

Other issues

  • This branch needs to be cleaned up and rebased onto master.
  • I Got an error when using input help, occurred with several keys incl. NVDA+F1:
INFO - inputCore.InputManager._handleInputHelp (13:39:03.426):
Input help: gesture kb(desktop):NVDA
ERROR - queueHandler.flushQueue (13:39:03.433):
Error in func _handleInputHelp from eventQueue
Traceback (most recent call last):
  File "queueHandler.py", line 50, in flushQueue
    func(*args,**kwargs)
  File "inputCore.py", line 514, in _handleInputHelp
    speech.speakText(textList[0], reason=controlTypes.REASON_MESSAGE, symbolLevel=characterProcessing.SYMLVL_ALL)
  File "speech.py", line 431, in speakText
    speak(speechSequence,symbolLevel=symbolLevel)
  File "speech.py", line 544, in speak
    detectedLanguageSequence = languageDetection.detectLanguage(item , prevLanguage)
  File "languageDetection.py", line 204, in detectLanguage
    tempSequence = detectScript(text)
  File "languageDetection.py", line 168, in detectScript
    if unicodedata.category(text[index] ) == "Ps":
TypeError: category() argument 1 must be unicode, not str
  • I got the following error after setting preferred languages:
ERROR - eventHandler.executeEvent (14:28:27.943):
error executing event: gainFocus on <NVDAObjects.Dynamic_TerminalDisplayModelLiveTextIAccessibleWindowNVDAObject object at 0x05414530> with extra args of {}
Traceback (most recent call last):
  File "eventHandler.py", line 143, in executeEvent
    _EventExecuter(eventName,obj,kwargs)
  File "eventHandler.py", line 91, in __init__
    self.next()
  File "eventHandler.py", line 98, in next
    return func(*args, **self.kwargs)
  File "NVDAObjects\behaviors.py", line 352, in event_gainFocus
    super(Terminal, self).event_gainFocus()
  File "NVDAObjects\__init__.py", line 936, in event_gainFocus
    self.reportFocus()
  File "NVDAObjects\__init__.py", line 831, in reportFocus
    speech.speakObject(self,reason=controlTypes.REASON_FOCUS)
  File "speech.py", line 387, in speakObject
    speakObjectProperties(obj,reason=reason,index=index,**allowProperties)
  File "speech.py", line 325, in speakObjectProperties
    speakText(text,index=index)
  File "speech.py", line 431, in speakText
    speak(speechSequence,symbolLevel=symbolLevel)
  File "speech.py", line 544, in speak
    detectedLanguageSequence = languageDetection.detectLanguage(item , prevLanguage)
  File "languageDetection.py", line 221, in detectLanguage
    languageCode = getLangID( item.scriptCode  )
  File "languageDetection.py", line 123, in getLangID
    if scriptName in priorityLanguage.scriptID:
TypeError: argument of type 'NoneType' is not iterable

@feerrenrut
Copy link
Contributor

Perhaps I should clarify my last comment. I am happy with the performance of this part of the code, though there does seem to be a performance issue with Notepad++ when opening the sample.txt file. This seems to be unrelated to NVDA, it also occurs when NVDA is not running. However, I would advise against using the sample.txt file for testing.

@feerrenrut
Copy link
Contributor

In order for this issue to progress, there are several other concerns that I have also highlighted. Several errors, some issues with the UX, lacking user guide updates, and the branch needs to be rebased and made ready for merging.

@dpy013
Copy link
Contributor

dpy013 commented Aug 24, 2019

Found the following error:
Conflicting files
source/config/configSpec.py
source/gui/settingsDialogs.py
source/speech.py
See if it can be fixed?
thank

@feerrenrut
Copy link
Contributor

Closing this PR due to lack of activity. This feature is on our road map, in the mean time anyone interested may take it on.
A new PR will also make this easier to follow, since many of the comments on this one are no longer relevant.

@feerrenrut feerrenrut closed this Apr 1, 2020
@Adriani90
Copy link
Collaborator

I vote for reopening this to make it easier to find for new developers who want to take over this very valuable work. Indeed the concept of this PR has not been rejected, and in my view finding closed pull requests is very difficult on github. I suggest to label this pull request as abandoned rather than closing it.

cc: @CyrilleB79 maybe you have also an opinion on this.

In the end we need a decision from @seanbudd and @michaelDCurran on how to proceed with this.

@CyrilleB79
Copy link
Collaborator

Since I'm asked for my opinion:
I do notthink that reopening this PR is a good idea since its author is not active anymore.

It seem to me that the process is that only PRs that have a chance to be finalized by the initial author are kept open. So NV Access can have a look to open PRs to review them.
If a PR has no available active author, there is the risk that it remains indefinitely open. I think that there is an "abandoned" label for such PRs. There is also the "concept approved" label for PRs whose motivation has been approved by NV Access.

A contributor developer wanting to take over an abandoned PR may search for closed (not merged) PRs with these 2 labels, provided PRs are correctly labeled. Else, a triage work should be done on closed PRs to label them correctly.

If needed, a comment in the corresponding issue can be added to mention this existing PR and summarize its state (i.e. quite advanced development).

Triage and contribution documentations are currently being updated; it may be the opportunity to clarify these points if it's not already clear.

@Adriani90
Copy link
Collaborator

It seem to me that the process is that only PRs that have a chance to be finalized by the initial author are kept open. So NV Access can have a look to open PRs to review them.

This assumption is actually wrong and contradicts the open source principles of this project. There are pull requests that have been overtaken by others see for example #15331 replacing #11270. Would #11270 have been closend, I am quite sure @LeonarddeR would have not found it as fast unless he was tagged on that PR of he was aware of the work without knowing the PR number exactly.
In fact, I would prefer to have such valuable work being visible, even if it is open forever. Because this is really work that should not be hidden.
Given the small number of PRs in this project, I still think labeling a PR as abandoned should be enough to express inactive work instead of hiding it under almost 3,000 PRs by closing.
I am reopening this PR and hope that NV Access will share a strong argument why this work should be hidden even though the concept as such is approved according to the comments above.

@Adriani90 Adriani90 reopened this Sep 4, 2023
@Adriani90 Adriani90 requested a review from a team as a code owner September 4, 2023 09:25
@Adriani90 Adriani90 requested review from seanbudd and removed request for feerrenrut September 4, 2023 09:25
@CyrilleB79
Copy link
Collaborator

It seem to me that the process is that only PRs that have a chance to be finalized by the initial author are kept open. So NV Access can have a look to open PRs to review them.

This assumption is actually wrong and contradicts the open source principles of this project. There are pull requests that have been overtaken by others see for example #15331 replacing #11270. Would #11270 have been closend, I am quite sure @LeonarddeR would have not found it as fast unless he was tagged on that PR of he was aware of the work without knowing the PR number exactly. In fact, I would prefer to have such valuable work being visible, even if it is open forever. Because this is really work that should not be hidden. Given the small number of PRs in this project, I still think labeling a PR as abandoned should be enough to express inactive work instead of hiding it under almost 3,000 PRs by closing. I am reopening this PR and hope that NV Access will share a strong argument why this work should be hidden even though the concept as such is approved according to the comments above.

@Adriani90 I disagree with you when you state that my assumption contradicts the open source principles of this project.
There's no point in making assumptions on NV Access thoughts; better ask them to clarify this point.

@seanbudd, @michaelDCurran:
What is recommended for abandoned PRs? Keep open or close them? Add a specific label to be able to find them?

@seanbudd
Copy link
Member

seanbudd commented Sep 4, 2023

What is recommended for abandoned PRs? Keep open or close them? Add a specific label to be able to find them?

Abandoned PRs should be closed. Draft state implies that work will continue. Ready state implies that it is ready for review.
You can easily find abandoned PRs by searching by label.

@seanbudd seanbudd closed this Sep 4, 2023
@Adriani90 Adriani90 added the Abandoned requested reports or updates are missing since more than 1 year, author or users are not available. label Sep 6, 2023
@Adriani90
Copy link
Collaborator

For this the PR needs to have ofcourse the correct label. I marked this as abandoned so someone can find it easier by filtering the coresponding label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Abandoned requested reports or updates are missing since more than 1 year, author or users are not available.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet