PyGlossary 3.0.0

@ilius ilius released this Jun 9, 2016 · 82 commits to master since this release

New versioning

  • Using date as the version was a mistake I made 7 years ago
  • From now on, versions are in X.Y.Z format (major.minor.patch)
  • While X, Y and Z are digits(0-9) for simplicity (version strings can be compared alphabetically)
  • Starting from 3.0.0
    • Take it for migrating to Python 3.x, or Gtk 3.x, or being alphabetically larger than previous versions (date string)

Since I believe this is the first standard version, I'm not sure which code revision should I compare it with. So I just write the most important recent changes, in both application-view and library-view.

Breaking Compatibility

  • Config migration
    • Config file becomes a config directory containing config file
    • Config file format changes from Python (loaded by exec) to JSON
    • Remove some obsolete / unused config parameters, and rename some
    • Remove permanent sort boolean flag
      • Must give --sort in command line to enable sorting for most of output formats
    • Load user-defined plugins from a directory named plugins inside config directory
  • Glossary class
    • Remove some obsolete / unused method
      • copy, attach, merge, deepMerge, takeWords, getInputList, getOutputList
    • Rename some methods:
      • reverseDic -> reverse
    • Make some public attributes private:
      • data -> _data
      • info -> _info
      • filename -> _filename
    • Clear (reset) the Glossary instance (data, info, etc) after write operation
      • Glossary class is for converting from file(s) to file, not keeping data in memory
    • New methods:
      • convert:
        • convert method is added to be used instead of read and then write
        • Not just for convenience, but it's also recommended,
          • and let's Glossary class to have a better default behavior
          • for example it enables direct mode by default (stay tuned) if sorting is not enabled (by user or plugin)
        • all UI modules (Command line, Gtk3, Tkinter) use Glossary.convert method now
    • Sorting policy
      • sort boolean flag is now an argument to write method
        • sort=True if user gives --sort in command line
        • sort=False if user gives --no-sort in command line
        • sort=None if user does not give either, so write method itself decides what to do
      • Now we allow plugins to specify sorting policy based on output format
        • By sortOnWrite variable in plugin, with allowed values:
          • ALWAYS: force sorting even if sort=False (user gives --no-sort), used only for writing StarDict
          • DEFAULT_YES: enable sorting unless sort=False (user gives --no-sort)
          • DEFAULT_NO: disable sorting unless sort=True (user gives --sort)
          • NEVER: disable sorting even if sort=True (user gives --sort)
        • The default and common value is: sortOnWrite = DEFAULT_NO
        • Plugin can also have a global sortKey function to be used for sorting
        • (like the key argument to list.sort method, See pydoc list.sort)
    • New way of interacting with Glossary instance in plugins:
      •, defi)) -> glos.addEntry(word, defi)
      • for item in -> for entry in glos:
      • for key, value in -> for key, value in glos.iterInfo():

Gtk2 to Gtk3

  • Replace obsolete PyGTK-based interface with a simpler PyGI-based (Gtk3) interface

Migrating to Python 3

  • Even though master branch was based on Python 3 since 2016 Apr 29, there was some problem that are fixed in this release
  • If you are still forced need to use Python 2.7, you can use branch python2.7

Introducing Direct mode

  • --direct command line option
  • reads and writes at the same time, without loading the whole data into memory
  • Partial sorting is supported
    • --sort in command line
    • --sort-cache-size=1000 is optional
  • If plugin defines sortOnWrite=ALWAYS, it means output format requires full sorting, so direct mode will be disabled
  • As mentioned above (using Glossary.convert method), direct mode is enabled by default if sorting is not enabled (by user or plugin)
  • Of course user can manually disable direct mode by giving --indirect option in command line

Progress Bar

Automatic command line Progress Bar for all input / output formats is now supported

  • Implemented based on plugins Reader classes
  • Works both for direct mode and indirect mode
    • Only one progress bar for direct mode
    • Two progress bars for indirect mode (one while reading, one while writing)
  • Plugins must not update the progress bar anymore
  • Still no progress bar when both --direct and --sort flags are given, will be fixed later
  • User can disable progress bar by giving --no-progress-bar option (recommended for Windows users)

BGL Plugin

  • BGL plugin works better now (comparing to latest Python 2.7 code), and it's much cleaner too
  • I totally refactored the code, made it fully Python3-compatible, and much more easier to understand
  • This fixes bytes/str bugs (like Bug #54), and CRC check problem for some glossaries (Bug #55)
  • I'm a fan of micro-commits and I usually hate single-commit refactoring, but this time I had no choice!

Other Changes

Feature: Add encoding option to read and write drivers of some plain-text formats

Feature: SQL and SQLite: read/write extra information from/to a new table dbinfo_extra, backward compatible

New format invented and implemented for later implementation of a Glossary Editor

  • (Editable Linked List of Entries) is optimized for adding/modifying/removing one entry at a time
  • while we can save the changes instantly after each modification
  • Using the ideas of Doubly Linked List, and Git's hash-based object database

Rewrite non-working Reverse functionality

  • The old code was messy, not working by default, slow, and language-dependent
  • It's much faster and cleaner now

Improve and complete command line help (-h or --help)