Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usability improvement proposal: Pythonic enums #1251

Merged
merged 29 commits into from Dec 30, 2023
Merged

Conversation

jorio
Copy link
Contributor

@jorio jorio commented Nov 24, 2023

This PR introduces "pythonic" IntEnum and IntFlag classes that can supersede GIT_... integers in user code.

The main benefit of these enum classes is that they naturally guide the user toward the set of flags you can pass to a function, instead of having the user trawl through a list of GIT_ constants. Some examples:

-repo.checkout(someref, strategy=GIT_CHECKOUT_FORCE)
+repo.checkout(someref, strategy=CheckoutStrategy.FORCE)

-repo.apply(patch, GIT_APPLY_LOCATION_INDEX)
+repo.apply(patch, ApplyLocation.INDEX)

-diff = tree1.diff_to_tree(tree2, GIT_DIFF_SHOW_BINARY | GIT_DIFF_INCLUDE_UNMODIFIED)
+diff = tree1.diff_to_tree(tree2, DiffFlags.SHOW_BINARY | DiffFlags.INCLUDE_UNMODIFIED)

Another benefit is that the new enums are augmented with docstrings. I'm a heavy user of pygit2 and I tend to frequently have to dive into libgit2's source code to recall what some flags do – the new enums reduce this friction:

Screenshot_20231124_230949

Notes:

  • GIT_ constants are still available for backward compatibility.
  • The enums transparently demote to ints where needed, including passing values to C functions that expect an int.
  • IntFlag objects can be constructed with arbitrary ints. So, those are fine for forward compatibility. For example, if libgit2 ever returns a GIT_STATUS that is, say, 1<<31, and we don't support it yet, StatusFlags(1<<31) will still work and it'll convert to a valid int.
  • IntEnum objects, on the other hand, can NOT be constructed with arbitrary ints. Extra care must be given when converting an int that comes from libgit2 to an IntEnum. Luckily, such cases are rare – so far the only occurrence is in Repository.state() (where I just return a raw int if it can't be converted to a RepositoryState).
  • Additionally, this PR exposes DiffFlags.IGNORE_BLANK_LINES (GIT_DIFF_IGNORE_BLANK_LINES) a bunch of new checkout strategy constants that were previously missing from pygit2.

If you think this PR worth merging into pygit2, I'm happy to discuss naming choices, etc.


Next steps: If this PR is accepted, I will submit another one later that makes some C functions return actual Python enums. This way, functions such as repo.status_file(...) will return a real StatusFlags instead of an int, which would make it easier to toy with pygit2 in an REPL.

 >>> r.status_file('pygit2/callbacks.py')   # this is implemented in pure C
- now returns: 256      # still an int, but you can already compare it to StatusFlags transparently
+ will return: <StatusFlags.WT_MODIFIED: 256>       # next PR (if this one is accepted)

@jdavid
Copy link
Member

jdavid commented Nov 27, 2023

This is great @jorio , we just need to get the names right because we won't be able to change them later.

My first reflex would be to do a literal translation from the libgit2 constants, as you have done in some cases, like:

GIT_APPLY_LOCATION_WORKDIR -> ApplyLocation.WORKDIR

But for others this is not so obvious. For some you have chosen to add the suffix Flags, like:

GIT_ATTR_CHECK_FILE_THEN_INDEX -> AttrCheckFlags.FILE_THEN_INDEX

An alternative would be to namespace the enums and flags, for example:

from pygit2 import flags
flags.AttrCheck

Instead of:

from pygit2 import AttrCheckFlags
AttrCheckFlags

I think this approach would have the following benefits:

  • Help with name consistency
  • Reduce the clutter in the root namespace pygit2.*

What do you think?

@jdavid
Copy link
Member

jdavid commented Nov 27, 2023

For reference, if I have not made errors, the proposed name changes in this PR could be summarized like...

IntEnum:

GIT_APPLY_LOCATION_WORKDIR          ApplyLocation.WORKDIR
GIT_CONFIG_LEVEL_PROGRAMDATA        ConfigLevel.PROGRAMDATA
GIT_DELTA_UNMODIFIED                DeltaStatus.UNMODIFIED
GIT_DESCRIBE_DEFAULT                DescribeStrategy.DEFAULT
GIT_FETCH_PRUNE_UNSPECIFIED         FetchPruneMode.UNSPECIFIED
GIT_FILTER_TO_WORKTREE              FilterMode.TO_WORKTREE
GIT_OPT_GET_MWINDOW_SIZE            LibraryOption.GET_MWINDOW_SIZE
GIT_OBJECT_ANY                      GitObjectType.ANY
GIT_REFERENCES_ALL                  ReferenceFilter.ALL
GIT_REPOSITORY_INIT_SHARED_UMASK    RepositoryInitMode.SHARED_UMASK
GIT_REPOSITORY_STATE_NONE           RepositoryState.NONE
GIT_RESET_SOFT                      ResetType.SOFT
GIT_STASH_APPLY_PROGRESS_NONE       StashApplyProgress.NONE
GIT_SUBMODULE_IGNORE_UNSPECIFIED    SubmoduleIgnore.UNSPECIFIED

IntFlag:

GIT_ATTR_CHECK_FILE_THEN_INDEX      AttrCheckFlags.FILE_THEN_INDEX
GIT_BLAME_NORMAL                    BlameFlags.NORMAL
GIT_BLOB_FILTER_CHECK_FOR_BINARY    BlobFilterFlags.CHECK_FOR_BINARY
GIT_BRANCH_LOCAL                    BranchType.LOCAL
GIT_CHECKOUT_NOTIFY_NONE            CheckoutNotifyFlags.NONE
GIT_CHECKOUT_NONE                   CheckoutStrategy.NONE
GIT_CREDENTIAL_USERPASS_PLAINTEXT   CredentialType.USERPASS_PLAINTEXT
GIT_DIFF_FLAG_BINARY                DeltaFlags.BINARY
GIT_DIFF_FIND_BY_CONFIG             DiffFindFlags.FIND_BY_CONFIG
GIT_DIFF_NORMAL                     DiffFlags.NORMAL
GIT_DIFF_STATS_NONE                 DiffStatsFlags.NONE
GIT_FEATURE_THREADS                 FeatureFlags.THREADS
GIT_FILEMODE_UNREADABLE             FileMode.UNREADABLE
GIT_FILTER_DEFAULT                  FilterFlags.DEFAULT
GIT_MERGE_ANALYSIS_NONE             MergeAnalysis.NONE
GIT_MERGE_PREFERENCE_NONE           MergePreference.NONE
GIT_REF_INVALID                     ReferenceType.INVALID
GIT_REPOSITORY_INIT_BARE            RepositoryInitFlags.BARE
GIT_REPOSITORY_OPEN_NO_SEARCH       RepositoryOpenFlags.NO_SEARCH
GIT_REVSPEC_SINGLE                  RevSpecFlags.SINGLE
GIT_STATUS_CURRENT                  StatusFlags.CURRENT
GIT_SUBMODULE_STATUS_IN_HEAD        SubmoduleStatus.IN_HEAD
GIT_SORT_NONE                       WalkerSortFlags.NONE

Let's agree on the names first.

@jorio
Copy link
Contributor Author

jorio commented Dec 2, 2023

Thank you for your time reviewing this!

For context, most class names are actually based on the C typedef name, because the names of the actual values within an enum can be messy. It's difficult to make perfect conversions because there's a lot of overlapping prefixes across different enums.

As for your suggestions:

  • Cleaning up the root namespace by making enums available only via pygit2.enums is fine with me
  • I believe that splitting enums and flags might create some friction for users. In 6 months I'm certainly going to go "uh, is FileMode in enums or flags again?" :)

I also believe that users should be free to import enum classes directly (from pygit2.enums import Whatever) without dealing with name clashes. If you agree with this point, then we should avoid simplifications like BranchType -> Branch, DiffFlags -> Diff, or ReferenceType -> Reference.

Here's a proposal for revised names. I attempted to simplify them a bit while still making sure they're distinct from existing classes in pygit2.

C enum/typedef pygit2? Passed to/Returned by
GIT_APPLY_LOCATION_WORKDIR
git_apply_location_t
ApplyLocation Repository.apply(...)
Repository.applies(...)
GIT_ATTR_CHECK_FILE_THEN_INDEX
(no typedef)
AttrCheck Repository.get_attr(flags=...)
GIT_BLAME_NORMAL
git_blame_flag_t
BlameFlag?
('Blame' would clash)
Repository.blame(flags=...)
GIT_BLOB_FILTER_CHECK_FOR_BINARY
git_blob_filter_flag_t
BlobFilter BlobIO(flags=...)
GIT_BRANCH_LOCAL
git_branch_t
BranchType
('Branch' would clash)
Branches(flag=...)
GIT_CHECKOUT_NONE
git_checkout_strategy_t
Checkout(Strategy?) Repository.checkout(strategy=...)
GIT_CHECKOUT_NOTIFY_NONE
git_checkout_notify_t
CheckoutNotify CheckoutCallbacks.checkout_notify_flags()
CheckoutCallbacks.checkout_notify(why=...)
GIT_CONFIG_LEVEL_PROGRAMDATA
git_config_level_t
ConfigLevel pygit2.settings.search_path
GIT_CREDENTIAL_SSH_KEY
git_credential_t
CredentialType
GIT_DELTA_UNMODIFIED
git_delta_t
Delta
(might be too broad?)
DiffDelta.status
GIT_DESCRIBE_DEFAULT
git_describe_strategy_t
Describe(Strategy?) Repository.describe(describe_strategy=...)
GIT_DIFF_FIND_BY_CONFIG
git_diff_find_t
DiffFind Diff.find_similar(flags=...)
GIT_DIFF_FLAG_BINARY
git_diff_flag_t
DiffFlag? DiffDelta.flags
DiffFile.flags
GIT_DIFF_NORMAL
git_diff_option_t
DiffOption? Blob.diff_to_buffer(flag=...)
Tree.diff_to_tree(flag=...)
etc.
GIT_DIFF_STATS_NONE
git_diff_stats_format_t
DiffStatsFormat
('DiffStats' would clash)
DiffStats.format(format=...)
GIT_FEATURE_THREADS
git_feature_t
Feature pygit2.features
GIT_FETCH_PRUNE_UNSPECIFIED
git_fetch_prune_t
FetchPrune Remote.fetch(...)
GIT_FILEMODE_UNREADABLE
git_filemode_t
FileMode DiffFile.mode
IndexEntry.mode
TreeBuilder.insert(attr=...)
GIT_FILTER_DEFAULT
git_filter_flag_t
FilterFlag FilterSource.flags
GIT_FILTER_TO_WORKTREE
git_filter_mode_t
FilterMode FilterSource.mode
GIT_MERGE_ANALYSIS_NONE
git_merge_analysis_t
MergeAnalysis Repository.merge_analysis()
GIT_MERGE_PREFERENCE_NONE
git_merge_preference_t
MergePreference Repository.merge_analysis()
GIT_OBJECT_ANY
git_object_t
ObjectType
('Object' would clash)
peel
Object.type
GIT_OPT_GET_MWINDOW_SIZE
git_libgit2_opt_t
Option? _pygit2.option(option=...)
GIT_REFERENCE_INVALID
git_reference_t
ReferenceType
('Reference' would clash)
Reference.type
GIT_REFERENCES_ALL
(made up by pygit2)
ReferenceFilter
('References' would clash)
References.iterator( references_return_type=...)
GIT_REPOSITORY_INIT_BARE
git_repository_init_flag_t
RepositoryInitFlag pygit2.init_repository(flags=...)
GIT_REPOSITORY_INIT_SHARED_UMASK
git_repository_init_mode_t
RepositoryInitMode
('Mode' in the chmod sense)
pygit2.init_repository(mode=...)
GIT_REPOSITORY_OPEN_NO_SEARCH
git_repository_open_flag_t
RepositoryOpenFlag Repository(flags=...)
GIT_REPOSITORY_STATE_NONE
git_repository_state_t
RepositoryState Repository.state()
GIT_RESET_SOFT
git_reset_t
Reset(Mode?) Repository.reset(...)
GIT_REVSPEC_SINGLE
git_revspec_t
RevSpecFlag
('RevSpec' would clash)
RevSpec.flags
GIT_SORT_NONE
git_sort_t
WalkerSort
('Sort' feels too broad to me)
Walker.sort(...)
GIT_STASH_APPLY_PROGRESS_NONE
git_stash_apply_progress_t
StashApplyProgress StashApplyCallbacks.stash_apply_progress
GIT_STATUS_CURRENT
git_status_t
FileStatus?
(Would 'Status' be too broad?)
Repository.status()
GIT_SUBMODULE_IGNORE_UNSPECIFIED
git_submodule_ignore_t
SubmoduleIgnore SubmoduleCollection.status(...)
GIT_SUBMODULE_STATUS_IN_HEAD
git_submodule_status_t
SubmoduleStatus SubmoduleCollection.status(...)

Trickier alternative: dispatch enums to relevant classes

Yet another approach might be to get rid of enums.py and move the enums to the classes they pertain to.

In this case, name clashes wouldn't matter anymore.

It might clean up the namespaces even more, but there are a couple downsides:

  • Enums scattered across codebase – more complicated maintenance
  • Embedding the enums into classes that are currently defined in C isn't going to be pretty...
  • A lot of enums would have to be stuffed into Repository
GIT_APPLY_LOCATION_WORKDIR      --> Diff.ApplyLocation? (Repo.apply takes a Diff + an ApplyLocation)
GIT_DIFF_FIND_BY_CONFIG         --> Diff.Find
GIT_DIFF_NORMAL                 --> Diff.Option?

GIT_DIFF_FLAG_BINARY            --> DiffDelta.Flag, DiffFile.Flag? (used by both)
GIT_DELTA_UNMODIFIED            --> DiffDelta.Status

GIT_FILTER_DEFAULT              --> FilterSource.Flag (I couldn't find unit tests for those)
GIT_FILTER_TO_WORKTREE          --> FilterSource.Mode

GIT_SUBMODULE_IGNORE_UNSPECIFIED--> SubmoduleCollection.Ignore
GIT_SUBMODULE_STATUS_IN_HEAD    --> SubmoduleCollection.Status

GIT_ATTR_CHECK_FILE_THEN_INDEX  --> Repository.AttrCheck
GIT_CHECKOUT_NONE               --> Repository.Checkout(Strategy?)
GIT_DESCRIBE_DEFAULT            --> Repository.Describe(Strategy?)
GIT_MERGE_ANALYSIS_NONE         --> Repository.MergeAnalysis
GIT_MERGE_PREFERENCE_NONE       --> Repository.MergePreference
GIT_REPOSITORY_INIT_BARE        --> Repository.InitFlag
GIT_REPOSITORY_INIT_SHARED_UMASK--> Repository.InitMode
GIT_REPOSITORY_OPEN_NO_SEARCH   --> Repository.OpenFlag
GIT_REPOSITORY_STATE_NONE       --> Repository.State
GIT_RESET_SOFT                  --> Repository.Reset

GIT_BLAME_NORMAL                --> Blame.Flag
GIT_BLOB_FILTER_CHECK_FOR_BINARY--> BlobIO.Filter
GIT_BRANCH_LOCAL                --> Branches.Filter
GIT_CHECKOUT_NOTIFY_NONE        --> CheckoutCallbacks.Notify
GIT_DIFF_STATS_NONE             --> DiffStats.Format
GIT_FETCH_PRUNE_UNSPECIFIED     --> Remote.FetchPrune
GIT_REFERENCES_ALL              --> References.Filter
GIT_REFERENCE_INVALID           --> Reference.Type
GIT_REVSPEC_SINGLE              --> RevSpec.Flag
GIT_STASH_APPLY_PROGRESS_NONE   --> StashApplyCallbacks.Progress
GIT_SORT_NONE                   --> Walker.Sort

GIT_CONFIG_LEVEL_PROGRAMDATA    --> (root).ConfigLevel?
GIT_FEATURE_THREADS             --> (root).Feature
GIT_FILEMODE_UNREADABLE         --> (root).FileMode (used by multiple classes)
GIT_STATUS_CURRENT              --> (root).FileStatus?
GIT_OBJECT_ANY                  --> (root).GitObjectType?
GIT_OPT_GET_MWINDOW_SIZE        --> (root).Option?

@jdavid
Copy link
Member

jdavid commented Dec 17, 2023

Hi @jorio

I agree with your points (all in pygit2.enums and the name must be unique).
Here a proposal, it's a synthesis of your two proposals:

ApplyLocation
AttrCheck
BlameFlag
BlobFilter
BranchType
CheckoutStrategy
CheckoutNotify
ConfigLevel
CredentialType
DeltaStatus
DescribeStrategy
DiffFind
DiffFlag
DiffOption
DiffStatsFormat
Feature
FetchPrune
FileMode
FilterFlag
FilterMode
MergeAnalysis
MergePreference
ObjectType
Option
ReferenceType
ReferenceFilter
RepositoryInitFlag
RepositoryInitMode
RepositoryOpenFlag
RepositoryState
ResetMode
RevSpecFlag
SortMode *
StashApplyProgress
FileStatus 
SubmoduleIgnore
SubmoduleStatus

Only change of mine is that I've changed WalkerSort to SortMode.

So the general rule would be:

  • names based on the typedef,
  • shortened when that does not make the name ambiguous,
  • when the name is ambiguous, added a suffix: either Flag, Type or Mode

With few exceptions: FileStatus because it makes sense, Option instead of Opt.

If the list of names above looks good, then you can go ahead with the PR. Or tell me if you still want to make changes.
Thanks

@jorio
Copy link
Contributor Author

jorio commented Dec 17, 2023

Hi @jdavid, this looks good to me! I'll amend the PR with the new names soon-ish. Thanks again.

@jorio
Copy link
Contributor Author

jorio commented Dec 29, 2023

Hi @jdavid, I rebased the PR on master with the name changes that you suggested.

Additional changes since last time:

  • Removed individual enum classes from the root namespace per your suggestion.
  • Moved the GIT_... constants (for backwards compatibility) into legacyenums.py to make __init__.py a bit lighter. I don't mind undoing this if you prefer having fewer files.

Let me know if there's anything that still needs tweaking. Thank you!

@jdavid jdavid merged commit 611a3b3 into libgit2:master Dec 30, 2023
6 checks passed
@jdavid
Copy link
Member

jdavid commented Dec 30, 2023

Thanks @jorio !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants