Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CollateX Python port, unicode and Python 3 #16

Closed
rhdekker opened this issue Oct 22, 2014 · 3 comments
Closed

CollateX Python port, unicode and Python 3 #16

rhdekker opened this issue Oct 22, 2014 · 3 comments

Comments

@rhdekker
Copy link
Member

Python 3 is a backwards incompatible API break of Python with the goal to make Unicode Strings the default. See the reasoning on the following page:
http://ncoghlan-devs-python-notes.readthedocs.org/en/latest/python3/questions_and_answers.html

Unicode support is a major problem in the current preview versions of CollateX Python.

I am currently investigating what a clean port to Python 3 would entail.

@dirkroorda
Copy link

Hoi Ronald,
mooi informatief stuk.
Nadat we uit elkaar gingen schoot me nog een issue te binnen tussen p2 en p3; sorting.
In python 2 kun je aan het sorteren een compare functie meegeven. In python 3 niet meer, maar nog wel een key functie.
Dat lijkt zwakker.
Maar met een key kun je toch een compare simuleren, namelijk door als key een object te nemen, met methoden die het object vergelijken met anderer objecten.
Er is zelfs een module met een functie om een compare automatisch over te zetten in een key.
Zie cmp_to_key in https://docs.python.org/3.4/library/functools.html#module-functools
Ik denk dat je dit vast bij je collatie algoritme nodig zult hebben.
Groet, Dirk

Dirk Roorda
researcher
dirk.roorda@dans.knaw.nlmailto:dirk.roorda@dans.knaw.nl
Data Archiving and Networked Services (DANS)
DANS promotes sustained access to digital research data. DANS is an institute of KNAW and NWO.
www.dans.knaw.nlhttp://www.dans.knaw.nl/

On 2014-10-22, at 16:41, Ronald Haentjens Dekker <notifications@github.commailto:notifications@github.com> wrote:

Python 3 is a backwards incompatible API break of Python with the goal to make Unicode Strings the default. See the reasoning on the following page:
http://ncoghlan-devs-python-notes.readthedocs.org/en/latest/python3/questions_and_answers.html

Unicode support is a major problem in the current preview versions of CollateX Python.

I am currently investigating what a clean port to Python 3 would entail.


Reply to this email directly or view it on GitHubhttps://github.com//issues/16.

@rhdekker
Copy link
Member Author

Hi Dirk,
Thanks for this additional information. I did indeed run into this issue.

Old code:
#################################
# Dichotomy search of subString #
#################################
lower=0
upper=self.length
success=False

    while upper-lower >0:
        middle=(lower+upper)//2

        middleSubString=string[SA[middle]:min(SA[middle]+lenSubString,self.length)]
        cmpRes=cmp(subString, middleSubString)

        if cmpRes == -1:
            upper=middle
        elif cmpRes == 1:
            lower=middle+1
        else:
            success=True
            break        

    if not success:
        return False
    else:
        return middle

New code:
middleSubString=string[SA[middle]:min(SA[middle]+lenSubString,self.length)]

        #NOTE: the cmp function is removed in Python 3
        #Strictly speaking we are doing one comparison more now
        if subString < middleSubString:
            upper=middle
        elif subString > middleSubString:
            lower=middle+1
        else:
            success=True
            break        

This conversion is correct, but like I remarked in the NOTE comment this is strictly speaking less efficient.

@rhdekker
Copy link
Member Author

This issue is fixed in the CollateX Python 2.0.0pre10 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants