Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read from an abbreviations file outside of titlecase() and only when invoked from command line #60

Merged
merged 2 commits into from
Jun 15, 2020
Merged

Conversation

igorburago
Copy link
Contributor

@igorburago igorburago commented Jun 12, 2020

If the titlecase() function is called directly (say, from a library),
the caller can still use a custom list of abbreviations by passing
an appropriate function as callback.

Fixes #58.

@coveralls
Copy link

coveralls commented Jun 12, 2020

Coverage Status

Coverage decreased (-3.9%) to 81.218% when pulling 2882fc6 on iburago:no-wordlist-arg into c88fb22 on ppannuto:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-3.7%) to 81.443% when pulling bc0fbca on iburago:no-wordlist-arg into c88fb22 on ppannuto:master.

@igorburago
Copy link
Contributor Author

igorburago commented Jun 12, 2020

If we choose this PR over #59, we might want to rename
create_wordlist_filter() to create_wordlist_filter_from_file()
(or something similar), and allow the user to import it to
simplify creating the corresponding callback function.

But we can even keep that function private, of course, unless we
expect a great number of users who are calling the titlecase()
function directly—rather than using the command line tool—to load
their abbreviations lists from files. In that case, we would save the
user some time and effort in reimplementing reading the list from file
and creating a callback.

Here I keep this function importable, but not added to __all__.

@igorburago
Copy link
Contributor Author

igorburago commented Jun 12, 2020

As a side note, I am not sure it is a great idea to have that
logger.debug() call in the titlecase(). When the titlecase()
is called frequently, it might have a small performance cost.
However tiny it is, it still might be undesirable for such a small
library function like titlecase().

I would remove that logging call or, at least, would protect it
with if __debug__:.

I am also not sure that we really need any logging in
create_wordlist_filter, but I’m fine with keeping it there
(assuming there are reasons for that).

@1kastner
Copy link
Contributor

I don't believe that this library targets high-performance use cases. Moreover, one old programmer's wisdom: Only optimize if you need to optimize! As long as there is no urge, it might be a waste of time and it might clutter the code. I could be persuaded by proper benchmarks though. In addition, have you checked the logging library whether they maybe already have something similar in place? I guess they did optimize their code.

@1kastner
Copy link
Contributor

To be honest, for my use case I only care about the command line interface so the internals do not bother me so much as long as nothing breaks, there are tests, and everything is nicely documented.

@igorburago
Copy link
Contributor Author

igorburago commented Jun 13, 2020

Regarding logging:

If Python is started with the -O option, the code protected by
if __debug__: is optimized out when the code is loaded into
the interpreter and, therefore, has no performance cost during
interpretation later on.

The logger’s debug() method in the logging module does check
whether the debug level of logging is enabled, and does not log the
given message otherwise. But one still pays the (however small)
performance cost of the function call and the log-level check.

It is unfortunate that no benchmarks were made when this logging
code was added. It is adding code that may clutter, not removing.

I do not have time to do benchmarks at the moment, so, as I said
previously, I would be all right with keeping the logging code for
now, as long as we merge this PR (which makes no modifications
to the logging code).

(I brought up logging as a side node, just out of curiosity.)

@igorburago
Copy link
Contributor Author

igorburago commented Jun 13, 2020

As to the subject of this PR, your interest is not affected, then, since:

  1. The command-line interface remains unchanged with the command
    having exactly the same default behavior of reading the ~/.titlecase.txt
    abbreviations file as before.

  2. Those who are using the package as a library instead (by calling
    the titlecase() function directly), still have the ability to protect
    abbreviations loaded from a file by specifying an appropriate callback.
    The create_wordlist_filter_from_file() function provides an easy
    way for the user to make a callback with that same original functionality,
    so that calls like

    titlecase(str, wordlist_filter=file_path)
    

    simply change to

    titlecase(str, callback=create_wordlist_filter_from_file(file_path))
    
  3. The test for the wordlist-based filtering in the tests.py file
    was modified to use the new approach (just like shown above).
    The tests are all passed.

@igorburago
Copy link
Contributor Author

igorburago commented Jun 13, 2020

Although I have touched on the advantages of this PR in #58, let me
reiterate on them here a bit more elaborately.

  1. Separation of concerns. The titlecase() function is kept to its
    main purpose of title-casing a given string, and is not burdened by
    any particular abbreviation-list mechanism being embedded into it.

  2. Simplification of the API without loss of functionality. The
    titlecase() function still has (the original) mechanism for
    supporting any exceptions in capitalization via the callback
    parameter. That is, it uses a single interface for all kinds of custom
    words, regardless of wether they were loaded from a file or populated
    any other way.

  3. Abbreviation file parsing is still available through a callback.
    The create_wordlist_filter_from_file() function is kept around
    to support the same use cases (with the primary instance being
    the command implementation in cmd()). The only change is that
    the call to create_wordlist_filter_from_file() now happens
    outside of titlecase(), instead.

  4. Flexibility in composing multiple capitalization-exception sources.
    When all custom capitalization code is shaped into those callback
    functions, the user can easily compose them in any order, while
    currently callback is always called before wordlist_filter.

@1kastner
Copy link
Contributor

That sounds great to me!

@igorburago igorburago changed the title Read from an abbreviations file only when invoked from command line Read from an abbreviations file outside of titlecase() and only when invoked from command line Jun 14, 2020
Igor Burago added 2 commits June 14, 2020 14:59
If the titlecase() function is called directly (say, from a library),
the caller can still use a custom list of abbreviations by passing
an appropriate function as callback.
@ppannuto ppannuto changed the base branch from master to main June 15, 2020 15:32
@ppannuto
Copy link
Owner

This makes a lot of sense to me. I was probably a little quick to move on pushing out the prior iteration of this, it had just sat for a while so I felt bad.

Strictly speaking, I think this will also require a 2.0 release as it changes the interface; not that I think a ton of folks have pulled the 1.0 version yet, but I would like to try to be a good citizen of the semver universe.

@ppannuto ppannuto merged commit c2651ee into ppannuto:main Jun 15, 2020
@igorburago igorburago deleted the no-wordlist-arg branch June 15, 2020 16:29
@igorburago
Copy link
Contributor Author

igorburago commented Jun 15, 2020

Changing the major version number due to the change of the API
makes total sense.

However, unless you have to, could you please postpone making
a new release? I have another couple of pull requests in the works
for cleaning up the regular-expression constants, for making it
easier for the user to provide the list of custom small words, and,
possibly, for doing some minor refactoring.

@ppannuto
Copy link
Owner

No urgency to release here. Excited to see the new PRs!

ppannuto added a commit that referenced this pull request Jan 22, 2021
Major (interface) changes (#60) -- big thanks @iburago!!:
 - Read from an abbreviations file only when invoked from command line
 - Rename and refactor the create_wordlist_filter() function

Minor changes:
 - #63: Do not capitalize small words occurring within hyphenated word groups (thanks @iburago!)
 - #65: Always capitalize 'Mc'-prefixed small words in compound word groups (thanks @iburago!)
 - #67: Don't capitalize Mr, Ms, Mrs (thanks @GurraB!)
 - #71: Add Manifest.in and include license file in src dist (thanks @synapticarbors!)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

It is impossible to prevent titlecase() from reading the ‘~/.titlecase.txt’ file
4 participants