Skip to content

Milestone 5 summary

eloylp edited this page Jun 12, 2023 · 8 revisions

Milestone 5 summary

The milestone 5 is the culmination of the last milestone of this grant RFP .

Its the convergence of all previous development work. During this journey, we tried to ensure accessibility and operational ease for both, users and developers.

We think about this project as an entry point to the Zcash ecosystem for other communities. That was an extra motivation for ensuring good documentation practices and the ease of use of the generated packages. However we expect the future use of this project will probably determine more improvements we are not able to visualize yet.

If we had to define the results achieved in few words, we could talk about this project as an "API mapping, binding generation and package distribution platform". We really hope this results to match Zcash community expectations and we will be glad of receiving feedback of any kind.

All the RFP goals, plus some extras were achieved, unless this document says the opposite. Lets do a high level walk-through of the achieved goals.

Testing tools

As it was expected, doing similar tests for the different target languages made room for developing some testing tools that helped to reduce boiler plate code and duplication. All the testing tools can be located in the uniffi-zcash-test crate. There are two main tools developed here that should be taken into account for further testing.

Data generators

Most of our tests coverage is based on introducing an input buffer and expecting an output one. But, how do we obtain such "golden data vectors" ? The answer is by making use of the original librustzcash API for generating and store them (see next section regarding storage). This data generators can be found here. Then we ensure the bindings produces the same data output, which provides certain guarantees we are calling the correct, original methods from the wrapper code.

K/V store

In order to not repeat the same dataset on each language runtime, we developed a K/V store for storing and retrieving such datasets. This K/V store is also exposed to all languages by using ... yeah ! UniFFI . So testing tools also take advantage of the UniFFI layer in order to be present in all languages.

The presence of this K/V store can be noticed in tests for retrieving the needed data. Lets see an example:

import unittest
from zcash import *

class Test(unittest.TestCase):
    def test_spending_key_conversions(self):
        zts = TestSupport.from_csv_file()
        key_bytes = zts.get_as_u8_array("orchard_spending_key") ## Retrieving the orchard spending key bytes.

        key = ZcashOrchardSpendingKey.from_bytes(key_bytes)

        self.assertEqual(key.to_bytes(), key_bytes)
## More tests ...

The storage of this K/V is a CSV file that can be updated by using our internal repo CLI (see following sections).

Improved testing speed

We were noticing slow feedback loop in tests. This has been improved. The problem here is that the UniFFI tool brings up an entire testing environment per each test file (see here and here). In case the runtime is lightweight, like in python or ruby , this is doesn't seem a problem. But in the case of Kotlin , this was slowing down the testing, reaching times of 35-45 minutes till all the test suite passed. There is a tradeoff here. In the one hand, we do not want to have all tests in the same file. In the other hand, splitting the tests in different files tends to be slow. Finally, we decided to group all current Kotlin tests in the same file as looks the heaviest runtime. The improvements are that now all tests passes in "only" 4-12 minutes , depending on the saturation of CI host runner.

Summarizing, the above limitations should be taken into account when new tests are added, to think twice before splitting tests in multiple files and probably to think in a better strategy for splitting the testing work.

Internal repo CLI

As we started to work in the respective languages package build process, some questions came up in our minds:

  • How many executable workflows has this repository ?
  • How a developer would easily find all available workflows ?
  • How to not couple too much with custom CI process scripts in the future ?
  • How we can ensure all this building logic to be cross platform ?
  • How to ensure the same build steps in the CI pipeline can be reproduced locally and have exactly the same effects ?

The answer to this questions was to create an internal CLI , that basically groups all the possible execution paths. This CLI gives access to all build stages of the shared libs, bindings, packages and publishing. Such commands needs to be invoked in cascade. The output of the previous command is required as input of the next one.

graph LR;
sharedlibs-->bindgen-->release-->publish;

The intention is to also provide a better debugging experience in case of problems, as all generation and publish steps are properly isolated. It also made the process more resilient and re-playable in case of problems. This philosophy is also extended to the CI pipeline, as it uses the CLI. We will see it in following sections. We highly recommend the to take a look to the CLI docs to get a in depth view and maybe do some experimentation.

Extending the CI pipeline

The CI pipeline was designed in Github workflows. There are 2 main workflows:

  • main.yml - This will trigger on each PR that is going to be merged in the main branch and after the merge. It builds and executes the entire test suite. Also, it simulates the package creation and runs the packages by automatically importing them from sample applications (built in flight) for both, Linux and MacOS for the x86_64 architecture.
  • tag.yml - This will trigger whenever a tag is pushed. It will assume that all the progress till such git tag is OK , so it will just build the packages and publish them in the relevant package registries skipping the testing part. Tags should be only pushed from the main branch.

The 2 above workflows makes use of lower level CI building blocks, called re-usable workflows in the github CI jargon. We can think about this "reusable workflows" as functions calls, they expect some input arguments and they could even produce outputs. At the moment if this write, they are:

  • bindgen.yml - Generates the shared libraries for Linux and MacOS. It also generates the bindings and uploads the artifacts for the next CI jobs.

  • docs.yml - This was intended to automatically publish the generated docs to a Github pages server. We had some issues regarding this by the way, see automatic-code-docs generation section for more info.

  • packages.yml - This uses the output artifacts from the bindgen.yml workflow as inputs. It builds the relevant packages and runs them with sample dummy applications. That ensures the entire import chain and shared library loading its not broken by i.e a change in the UniFFI way of generating bindings. It produces artifacts with the relevant packages ready to be published. There is more information about how we build packages in the details-on-packaging section.

  • publish.yml - This uses the output of packages.yml workflow as inputs. It just publish the previously built packages to the relevant package registries. It uses an exponential backoff retry based mechanism to avoid partial releases as much as possible.

In general, the CI uses caches for all the relevant parts that would require recurrent downloads and cargo caches. We think its fairly optimized, but that doesn't mean no more improvements are possible.

The CI always makes use of the CLI commands. Our idea was to keep all the logic in the CLI, so a developer could replay the same steps locally easily. Also, coupling with specific CI systems is reduced.

Details on shared libs (.so and .dylib)

Shared libs is where all the bits of our rust wrapping code plus the librustzcash resides. Currently the sharedlibs command of the CLI is able to cross compile for the following targets:

  • x86_64-unknown-linux-gnu
  • aarch64-apple-darwin
  • x86_64-apple-darwin

The sharedlibs CLI command can be executed in Linux or MacOS without problems. As all the CLI commands.

The shared libs are included in the packages, so internal ffi loading code will find them on runtime. We thought this would be more ergonomic for users, to just require the packages and be able to run the software.

Its important to note that we excluded the Sapling crypto parameters from the shared libs. Initially we wanted to include them, but the binary sizes were exceeding 100MB. Finally, we reach the conclusion that Orchard is already available and doesn't require the crypto setup, so we expect low interaction with Sapling transactions. However, users can make use of the exposed local prover interface for including the params in the desired way. The CLI also includes a setup saplingparams subcommand that will help developers to download the sapling params on their $HOME directory, place in which the local prover could look for by default.

Details on MacOS support

We received early feedback from the community in the RFP forum regarding MacOS support. Although initially out of our targets, we decided to go for it.

In difference with Linux ELF binaries, MacOS uses the Mach-O binary format, which allows packing different binaries architectures in the same binary, being able to discriminate what section should be loaded depending on the executing hardware. For doing that, we made use of the fat-macho dependency, packing the previously generated shared libs in the unified one. So we have built a universal 2 binary that adds supports for aarch64-apple-darwin and x86_64-apple-darwin in the same .dylib shared library. That means we have support for all m1/m2 laptops and also previous ones based on intel processors. Of course, this fat binary doubled the size of the .dylib shared library, which is currently around 20MB. We accepted this tradeoff in favour of increasing the target audience.

Details on Packaging

The packaging process

The packaging process happens on its own CLI command. Its nurtured by the previously built bindings and shared libs (.so for Linux and .dylib for MacOS). It uses each specific language package tooling under the hood in the most standard way to properly pack the bindings and the shared libs.

During this process, we need to generate each language package manifest that contains package metadata used by package indexes. In order to do this, the CLI release command copies the package templates from here to the respective package working directory (usually under lib/packages folder git ignored folder), replacing via a templating system the specific dynamic metadata, like setting up the version.

Packages contains the following main elements:

  • The UniFFI bindings.
  • A .so shared library for Linux platforms.
  • A .dylib shared library for MacOS platforms.

The specific OS shared library (.so or .dylib) is loaded in runtime by the bindings code. We decided to ship it along with the packages, so the user just needs to import the packages as any other one written in "pure" language code.

How do we test packages

Once the packages are built, the CLI release command automatically tests that the packages can be imported and executed with the help of sample test applications. This is a crititical step that ensures the entire import chain and the dynamic, in runtime load of shared libs, is still healthy. There are a lot ongoing changes by the UniFFI tool, that could break this anytime.

All this sample, testing applications resides in the template folder. They are named <language>_test_app and each of them contains a proper project structure for each language and imports the packages on each language specific way.

This check is always enforced in the CI before and after merging in the main branch.

The CLI has documentation generation capabilities, once the packages have been built. Basically, it will generate API surface documentation for the bindings and place it in the docs/<language>/<version> folder. Initially, we thought that committing this documentation in the repo would be something doable, but we find the kotlin documentation weights around 30MB . Committing such size per each release looks excessive from our point of view. Probably another solution like uploading the docs to a cloud object storage that provides static web serving of the content would be a better solution.

For the initial idea, we tried to setup a CI step for automatically upload the committed docs to a github pages site. But looks like this is a beta feature and we were getting errors without any feedback. Finally we left that step disabled in case further testing is done in the future.

One of the RFP requirements was to properly forward code level comments from the rust wrapper code to each target language bindings automatically. Although there was no support by UniFFI, we tried to do an upstream contribution to the project, but sadly due to low time resources, we had to postpone it. We hope to continue pushing it in parallel in future grants. Thats the reason because this project still has the Eiger fork as UniFFI dependency and not the original one. Moving to the original one is fine, but it will disable in-code comments propagation.

Available documentation

On this summary we did a more or less complete review of the work done. However, more details can found in repo documentation that also needs evaluation. Heres is the documentation an user will find in order of appearance:

  1. The README.md , which immediately shares what we think are the more useful resources for users:
    1. A glance of the project.
    2. direct links to the respective language registries, where the published packages will be located (see in the following sections) .
    3. How to build the packages locally, with the in-repo, cross platform CLI (see following sections)
    4. Access to Manuals and a FAQ.
  2. The CONTRIBUTING.md , in which infomation about contributions, local setup and releases can be found.
  3. All internal workflows of this repository are automated with the help of the in-repo CLI and the documenation can be found here.

Probably it would be good to add a changelog whenever the first package version takes place.

Manuals

The Manuals needs to be evaluated. Our intention was to provide a basic understanding of the API bindings for:

  • Key derivation
  • Unified key sets
  • Transactions

Examples can be copy and pasted and they should run without problems once the packages are imported in users project.

How to kickstart this repo

As indicated in the README.md, users can already locally build their packages and use them on their applications. However, packages still needs to be published for the first time on the respective language packages registry. As doing this is an inmutable operation, we think this should be done on the first release, after setting the following elements in order:

  1. Review the respective package template metadata and adjust it properly. It can be found per each target language here:

  2. The following environment variables needs to be setup in the CI as secrets.

    • GIT_USER_EMAIL

      • This is currently needed for committing the swift package in the backing git repo.
    • GIT_USER_NAME

      • This is currently needed for committing the swift package in the backing git repo.
    • LANGUAGES

      • This will indicate what are the target lanaguages for the bindgen CLI step. For this project, the value should be a comma separated list of languages: python,ruby,kotlin,swift.
    • PYTHON_REGISTRY_URL

      • The of the target package index. For Pypi production it should be https://upload.pypi.org/legacy/.
    • PYTHON_REGISTRY_USERNAME

      • If using token, it should be __token__.
    • PYTHON_REGISTRY_PASSWORD

      • The password or token.
    • RUBY_REGISTRY_URL

      • For rubygems.org production, this should be https://rubygems.org
    • RUBY_REGISTRY_TOKEN

    • KOTLIN_REGISTRY_URL

      • For production in maven, this should be: https://repo.maven.apache.org/maven2
    • KOTLIN_REGISTRY_USERNAME

      • if using token, it should be token.
    • KOTLIN_REGISTRY_PASSWORD

    • SWIFT_GIT_REPO_URL

      • This is the Git URL where the package is committed and pushed. It should follow the format https://user:ghp_token@github.com/user/myrepo.git . This currently works with Github personal access tokens.
    • SWIFT_REGISTRY_URL

      • For production it should be https://swiftpackageindex.com/
    • SWIFT_REGISTRY_TOKEN

  3. Now we can "release the brakes" of the publish steps by setting this dry-run option to false, by just removing it from here.

  4. Add language registry links to README.md whenever the first package publication happens, so users can directly download them and see version listing.

  5. Configure a place to drop code level documentation. This is something that needs to be configured. As commented in previous sections, the CLI already provides docs generation capabilities.

A final word

We really hope this project to accomplish the expectations of the Zcash Community. We will be glad to answer any questions and to receive any kind of feedback or suggestions.

At Eiger, we strongly believe in the importance of increasing the adoption of privacy oriented solutions, like the Zcash project. With the delivery of this grant, we hope to be helping not only to the Zcash community, but also from a global community, holistic perspective.