Skip to content

Notes from libgit2 contributor summit 2018 #5873

@ethomson

Description

@ethomson

On 9 March 2018, we invited the contributors and users of libgit2 to the libgit2 contributor summit, coinciding with Git Merge 2018.

Individual attendees submitted topics for discussion, then we voted on the interest of these topics and discussed them in order of popularity. Here were the topics discussed:


Memory Usage

GitLab is seeing problems where they see their memory usage blow up on certain hosting instances - where their processes are using insane amounts of memory. Could be a rugged or a libgit2 problem; further investigation is required.

  • We offered to provide a build that shows how much we think that we've allocated.
  • We should provide an API to show the usage for the caches
  • Rugged doesn't free native memory until a GC happens. Can we have a force free function in rugged?
  • Pluggable malloc?

Similarly, Working Copy is seeing mmap errors on checkout with large repositories (WebKit sized).

  • Could be mmap limits, which can actually be set in the library before operating on a repository.

Memory leak detection in the CI build - we continue to have problems with both openssh and openssl leaking memory.

  • We should get our CI jobs to fail as soon as we have a memory leak so that do not introduce new memory leaks.
  • We should consider regression testing memory usage. Test operations on a large repository.

Benchmarking

We should be benchmarking both against ourselves (historically) and against core git.

  • Benchmark time and memory
  • Ed will publish his gitbench repository somewhere somehow sometime
  • Perf is the primary concern
  • Merge + checkout is slightly slower
  • We welcome people reporting problems - please provide perfview trace or repo for reproduction

Transparency to our Users

What happens inside of libgit2? We're not doing a good job of making announcements when new things happen. Who are the current maintainers? What's going on in the release pipeline?

  • Should we have a blog? Regular updates, what changes have been made, what changes are coming? We should also do a monthly report and be sure to mention contributors who are working on the project. Action Item
  • Slack is a good platform for Chat, not a good platform for discussion?
  • Some discussion location: a mailing list is a possibility, but something indexed by google is a requirement. We'll try working in a separate GitHub repository for discussions. Resolved: libgit2/discussions
  • Mailing list for announcements. Action Item
  • Have regular conversations / skype calls. Publish an itinerary in advance in some location that people can request / suggest topics. Action Item

Feature focus

Here's a list of features that we're missing that people are interested in:

  • Blame
  • untracked cache
  • grafting / replacements
  • sparse checkout
  • per-domain http(s) config
  • binary search for ref backends
  • git bundles
  • intent-to-add bit on index
  • serialized graph support
  • MIDX
  • GVFS
  • GC
  • upload-pack/recv-pack
  • NTLM2 (non-windows)
  • rev-list limiting
  • commit API
  • patch application
  • repository formats
  • warnings
  • atexit handler

Action Item: Publish this

Feature matrix / compare with core git

We should have a page that documents the feature matrix and comparison with core git, in particular, features that we're missing.

  • What features and functionality are we missing?
  • Are there features that we don't recommend using? (eg, blame)
  • Need to keep it updated. With some things we can control this with testing. eg, set up indexv4 tests that fail when we add the functionality, so that we can remember to update the documentation.

GVFS

We're missing support. We don't really even know what it would take to add support.

Security releases and discussion

We need processes in place for the maintainers about how to update things.

  • GitLab gave information about their process: when they discover that there is an issue, they send an announcement to their security list that there is a security release pending, and that there will be a date for the release. When they do a release, they don't do a full disclosure. 2-4 weeks later, they do a full disclosure about what/where happened.
  • Need to provide libgit2 package maintainers information about upcoming security releases. Prefer not to use an email list; instead we'll open up the private security repository up to people who rely on security information.

Reviewing code quickly

We're bad at getting code reviewed

  • Can we make a call out for reviewers.
  • We bikeshed too much on formatting. Can we use the kernel's perl script for formatting correctness? Make it easy for people to use as they're developing.
  • Put this as the first step in the build process to catch formatting problems quickly.
  • We need to clean up the old pull requests:
    • Close the ones and move them to issues.
    • Add the Feedback Given label
    • Ask if people are interested in continuing their work? Give them two weeks to fix their problems, otherwise close.

Attracting New Contributors

  • We need code reviewers
  • Good getting started contribution. We made a pass on this earlier in 2018 but should take another one to make sure that we document easily:
    • How do you set up your work environment?
    • How do you run your tests...
  • Blog posts: point out new people, point out the contributors
  • Bring how to contribute stuff to the top of READMEs and getting started documentation
  • What's holding people back from contributing? It may be that C is hard and people don't like it but we're probably not making enough of an effort.
  • We should consider doing more 101 libgit2 examples, youtube, etc.
  • We don't have direct end-users; even though a lot of people use libgit2 indirectly, only a few people use it directly.

Binding Maintainership

We should provide a completeness matrix for the bindings so that people know which ones are "complete" and what features are offered (or missing) from them.

Fuzzing

We would like to fuzz the library:

  • PR for libfuzzing needs reviewing
  • We need to decide what we want to fuzz?
    • network-facing code is most important
    • index pack
    • config
    • indexes

Release Schedules

We need to get better about doing releases on a set schedule:

  • Maintenance / point releases every 2 months
  • Minor release every 6 months
  • We need to add this information to the release notes Action Item
  • Send calendar entries reminding people that this needs to happen? Action Item

API stability / 1.0 release

Thinking about API stability, we have talked about "1.0" as the time when we get stable. We probably want to have a release that has a stable API before (eg, 0.28, if it stays stable, then we can release 1.0).

We may also want to break APIs up into "stable" and "experimental"; the former will not change, the latter may.

We briefly bikeshedded API naming that we might want to change:

  • git_statuslist -> git_status
  • signature function names

Platform support

  • NFS
  • We should have three tiers of support:
    • We explicitly support these platforms (Mac, Win32, Linux)
    • We will take patches
    • We will reject patches

SSH

Should we vendor libssh2?

Command-based library

Try to better improve the command-based library framework, for example, git_cmd_commit that does something more like git commit. The goal is to give people the semantics of the git commands, while libgit2 stays worried about the data and data structures.

A libgit2 based git CLI

  • Would be nice to have this for testing and benchmarks.

  • Can we change the core git unit tests to use one git to set up the tests and one git to execute the tests?

  • Maybe we can implement a git wrapper that looks at arguments: do we support checkout? Yes! Run it. Do we support merge? No. Don't run it.

CI Infrastructure

  • Travis is weird. We would like to have a container based build.

  • We should consider moving to another system, but only if it offers a distinctive improvement over what we have.

  • We need to start running SSH on the Windows CI builds. Investigate SSH on macOS. Action Item

  • We have a GitLab CI system set up already, but this wouldn't benefit macOS and is not good on Windows either.

  • Do we want to add "the BSDs" to the CI matrix? eg, FreeBSD amd64

Should we start thinking about running a test pass on arm, sparc or other weird systems before a release?

CI on Security Releases

We need it. It needs to be private. Ed would like to have this set up before our next security release.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions