Skip to content
Tom Kaitchuck edited this page Apr 15, 2022 · 84 revisions

Table of contents:

Introduction

Issues and Pull requests

Signing your commits

Code style

Javadoc

Threading

Logging

Exceptions

Test cases

Debugging the Build

Lombok

Dependencies

TODOs in the code

Introduction

Pravega really encourages anyone to contribute and to make our system better. We ask contributors, however, to read this guide carefully and follow the established guidelines. We don't claim this is perfect, so these guidelines will change over time as we feel that it is not complete or appropriate. Suggestions on how to improve this guide for contributors are also welcome.

Issues and Pull requests

To produce a pull request against Pravega, follow these steps:

  • Create an issue: Create an issue and describe what you are trying to solve. It doesn't matter whether it is a new feature, a bug fix, or an improvement. All pull requests need to be associated to an issue. See more here: Creating an issue
  • Issue branch: Create a new branch on your fork of the repository. Typically, you need to branch off master, but there could be exceptions. To branch off master, use git checkout master; git checkout -b <new-branch-name>.
  • Push the changes: To be able to create a pull request, push the changes to origin: git push --set-upstream origin <new-branch-name>. I'm assuming that origin is your personal repo, e.g., fpj/pravega.git.
  • Branch name: Use the following pattern to create your new branch name: issue-number-description, e.g., issue-1023-reformat-testutils.
  • Create a pull request: Github gives you the option of creating a pull request. Give it a title following this format Issue ###: Description, _e.g., Issue 1023: Reformat testutils. If your PR covers more than one issue, use the format Issues 123, 456: Description. Follow the guidelines in the description and try to provide as much information as possible to help the reviewer understand what is being addressed. It is important that you try to do a good job with the description to make the job of the code reviewer easier. A good description not only reduces review time, but also reduces the probability of a misunderstanding with the pull request.
  • Merging: If you are a committer, see this document: Guidelines for Committers

Another important point to consider is how to keep up with changes against the base the branch (the one your pull request is comparing against). Let's assume that the base branch is master. To make sure that your changes reflect the recent commits, we recommend that you rebase frequently. The command we suggest you use is:

git pull --rebase upstream master
git push --force origin <pr-branch-name>

Very important: in the above, I'm assuming that:

  • upstream is pravega/pravega.git
  • origin is youraccount/pravega.git

The rebase might introduce conflicts, so you better do it frequently to avoid outrageous sessions of conflict resolving.

Creating an issue

When creating an issue, there are two important parts: title and description. The title should be succinct, but give a good idea of what the issue is about. Try to add all important keywords to make it clear to the reader. For example, if the issue is about changing the log level of some messages in the segment store, then instead of saying "Log level" say "Change log level in the segment store". The suggested way includes both the goal where in the code we are supposed to do it.

For the description, there are three parts:

  • Problem description: Describe what it is that we need to change. If it is a bug, describe the observed symptoms. If it is a new feature, describe it is supposed to be with as much detail as possible.

  • Problem location: This part refers to where in the code we are supposed to make changes. For example, if it is bug in the client, then in this part say at least "Client". If you know more about it, then please add it. For example, if you that there is an issue with SegmentOutputStreamImpl, say it in this part.

  • Suggestion for an improvement: This section is designed to let you give a suggestion for how to fix the bug described in the Problem description or how to implement the feature described in that same section. Please make an effort to separate between problem statement (Problem Description section) and solution (Suggestion for an improvement).

We next discuss how to create a pull request.

Creating a pull request

When creating a pull request, there are also two important parts: title and description. The title can be the same as the one of the issue, but it must be prefixed with the issue number, e.g.:

Issue 724: Change log level in the segment store

NOTE: Github has this trap that if the PR has a single commit, then at merge time, it will populate the commit title with the title of the commit and not with the title of the PR. It happens often that contributors put generic messages, like "fix" or "comments", and committers don't pay close enough attention and that becomes part of the commit log forever.

It is good practice to avoid such mistakes to make the message of the first commit being the same as the title of the PR. Committers should still always revise title and message according to the committer guidelines, but the contributor can clearly help to reduce the number of occurrences of such mistakes.


The description has four parts:

  • Changelog description*: This section should be the two or three main points about this PR. A detailed description should be left for the What the code does section. The two or three points here should be used by a committer for the merge log.
  • Purpose of the change: Say whether this closes an issue or perhaps is a subtask of an issue. This section should link the PR to at least one issue.
  • What the code does: Use this section to freely describe the changes in this PR. Make sure to give as much detail as possible to help a reviewer to do a better job understanding your changes.
  • How to verify it: For most of the PRs, the answer here will be trivial: the build must pass, system tests must pass, visual inspection, etc. This section becomes more important when the way to reproduce the issue the PR is resolving is non-trivial, like running some specific command or workload generator.

Signing your commits

We require that developers sign off their commits to certify that they have permission to contribute the code in a pull request. This way of certifying is commonly known as the Developer Certificate of Origin (DCO). We encourage all contributors to read the DCO text before signing a commit and making contributions.

To make sure that pull requests have all commits signed off, we use the Probot DCO plugin.

Signing off a commit

Using the git command line

Use either --signoffor -s with the commit command. See this document for an example.

Make sure you have your user name and e-mail set. The --signoff | -s option will use the configured user name and e-mail, so it is important to configure it before the first time you commit. Check the following references:

Using InteliJ

Intellij has a checkbox in the top-right corner of the Commit Dialog named Sign-off commit. Make sure you also enter your correct email address in the Author box right above it.

Using Eclipse

The EGit plugin supports signing commits. See their instructions. To enable it by default, in Window -> Preferences -> Team -> Git -> Commiting under "Footers" check "Insert Signed-off-by".

Using Sourcetree

When you commit right above where you enter the commit message on the far right, the "Commit options..." pull down has a "Sign Off" option. Selecting it will enable it for the commit. There is no configuration to force it always to be on.

I have forgot to sign-off a commit in my pull request

There are a few ways to fix this, one such a way is to squash commits into one that is signed off. Use a command like:

git rebase --signoff -i <after-this-commit>

where <after-this-commit> corresponds to the last commit you want to maintain as is and follow the instructions here.

Code Style

The base for these is Oracle Code Conventions - a pretty verbose document, if you have time to read it.

Checkstyle-enforced rules

Please refer to our checkstyle rules for all enabled checkstyle rules. The same folder has code style files that can be imported in IntelliJ and Eclipse. Use them.

Rules not enforceable by checkstyle

  • Comments
    • Every comment starts with a space, followed by a capital letter (just like a real sentence). Except commented out code.
    • Every comment ends with a period (it's a sentence).
    • Empty line required before a line of comment (except if precended by another comment line or the first line in a code block)
    • No 'author' JavaDoc tags.
  • Code format
    • Empty line after the closing brace of each code block.
    • If a line needs to wrap around, place each parameter on its own line (i.e., if we have 3 params on a line, then 1, then 2, it looks really ugly)
    • Order class components like this:
      • Class variable declarations
      • Constructor
      • Auto-closeable implementation (if applicable)
      • Public methods (includes any interface implementations)
      • Package-private methods
      • Protected methods
      • Private methods
      • Static methods (public first, followed by private)
      • Nested classes, interfaces, enums
  • Annotations
    • Use the @Override attribute to mark every method that is implemented from an interface or overridden from a super class.
    • All guarded fields should be annotated by '@GuardedBy'.
    • Use Lombok for equals()/hash() if possible.
  • Class organization
    • Classes should not be cyclically dependent on one another.
    • No public nested classes. The only exception is data-only classes (no logic).
      • For example, if class Foo has class Bar that only has get() methods, this is OK. But if Bar has other complex logic baked in it, then Bar needs to be its top level class.
      • Public nested interfaces or enums are OK, as long as they are mainly used inside the hosting class (in method signatures) and not used extensively outside of the class.
    • Every class, constructor, method should have the minimum possible visibility. If only private is needed, then make it private. Otherwise package-private, protected, public.
      • If a method needs to be made visible for testing, use the '@VisibleForTesting' tag.
    • Every public method should do its best to validate arguments, by means of using Preconditions (guava) or Exceptions (our extensions to Preconditions)
    • No singletons. Static constructors or factories are preferred.
    • No static non-final member variables.
  • Code organization
    • Imports at the top of the file should be ordered alphabetically followed by all of the static imports ordered alphabetically.
    • Use "assert" to validate state when needed (after complex internal computations or at the beginning of private methods)
    • Don't hide a member variable with a parameter name or local variable name. Exception: constructor arguments.
    • All members should be final if possible.
    • No code labels.
    • Methods whose name starts with 'get' should not block the current thread. Return a Future if you need to do async work.

Desired checkstyle rules (not yet enforced, but soon will be)

  • Check indentation (currently disabled because checkstyle does not like lambda expressions).
    • No tabs and 4-space indentation.
  • Annotations should go on their own line above the thing they are annotating (AnnotationLocation with default values)
  • Max line length (120).
  • Javadoc
    • Enforce Javadoc on every public method of every protected/public class (currently disabled because there's lots of places where this is missing).
  • No fallthrough in switch statement. (IE: each case should end with break;)
  • No newlines between the keyword 'new' and name of the class being instantiated.

Javadoc Guidelines

The base for these is the Oracle JavaDoc Conventions. The items below supersede that document.

Javadoc scopes (from highest to lowest):

  • Scope 1. All client-visible APIs (contracts and classes we expect the users to consume in order to make use of Pravega) and all publicly visible code in common (which we treat as stuff we'd like to have in the JRE but is not there).
  • Scope 2. Inter-application contracts and APIs (i.e., SegmentStore contracts used by Client or Controller and Controller contracts used by SegmentStore or Client).
  • Scope 3. Intra-application contracts used by various components in that application that are not exposed outside.
  • Scope 4. Individual components and types (classes, interfaces and enums) within an application. This includes "package-private" types, public nested types, as well as any non-private method, field or constructor.
  • Scope 5. Non-public nested types and private methods. Essentially everything else.

Refer to the scopes above for each type of Javadoc listed below. Please note that these are minimum requirements - it is always encouraged to be more detailed with Javadoc for a particular type of component.

  • Types (Classes, Interfaces and Enums)
    • Scopes 1-4 types should have a description explaining the purpose of the type.
    • Scope 1 Classes: should include a small code example of how to use.
    • Scope 1-4 enums should have a description on each enum value explaining its meaning.
    • Javadoc required for Scopes 1-4 and recommended for Scope 5.
  • Constructor
    • Default text for a constructor should be Creates a new instance of the <class-name> class.. If the constructor does something unusual, it should be described here.
    • Javadoc required for Scopes 1-4.
  • Methods
    • Should include descriptive text (not just reiterating method name). Scope 1 requires sample usage, where appropriate.
    • @params tag for every argument and generic type
    • @throws tag for every checked exception. Runtime exceptions are not required, however if those exceptions are used in an unusual way, it is highly recommended documenting that here.
      • If declaring a common exception (i.e., @throws StreamingException, then list possible sub-exceptions under this tag and under what conditions they arise).
      • If the method returns a type extending from Future (i.e., executes asynchronously), do not include exceptions that the returned future can be failed with (see below how to document this). Only include exceptions that are thrown synchronously.
    • @returns tag required for all non-getter methods.
      • If the method returns a type extending from Future list all possible exceptions the returned future may fail here. Do not include them in the @throws tag.
    • If a method overrides a method from a super class or interface it is not necessary to add Javadoc to it as long as it has the @Override annotation.
    • Javadoc required for Scopes 1-4 and recommended for Scope 5.

Other miscellaneous things of importance:

  • In addition, any check imposed by checkstyle or the Javadoc Compiler that results in a warning or error should be fixed rather than suppressed.
  • For Lombok auto-generated getters and setters (done via @Data, @Getter or @Setter annotations) the Javadoc should be set on the field itself. There is no need to add the @return or @param docs as they will be auto-generated.

Example:

This example is for Javadoc illustration purposes only; the methods have been left empty on purpose and no guarantees are made as to the correctness of the code or the design of the class.

/**
 * Represents an asynchronous processor for general items.
 *
 * @param <ItemT> Type of items to process.
 */
public class ItemProcessor<ItemT> implements AutoCloseable {
    /**
     * Defines how many items we can have in the queue at any given time.
     */
    public static final int MAX_QUEUE_SIZE = 1000;
    private final Queue<ItemT> queue;
    private final Executor executor;
    private final Consumer<ItemT> itemProcessor;

    /**
     * Keeps track of how many items have been completed since the creation of this ItemProcessor.
     */
    @Getter
    private int completedItemCount;

    /**
     * Creates a new instance of the ItemProcessor class.
     *
     * @param itemProcessor A Consumer that will be invoked on each item to process.
     * @param executor      An Executor to use for background operations.
     */
    public ItemProcessor(Consumer<ItemT> itemProcessor, Executor executor) {
    }

    @Override
    public void close() {
    }

    /**
     * Gets a value indicating the number of items in the queue.
     */
    public int getQueueSize() {
    }

    /**
     * Queues the given item for processing.
     *
     * @param item    The item to process.
     * @param timeout A timeout for the operation. The item will not be processed if the timeout elapsed prior to this 
     *                timeout expiring.
     * @return A CompletableFuture that, when completed, will indicate the item has been processed. If an exception occurred,
     * it will be completed with the causing exception. Notable exceptions:
     * * {@link java.util.concurrent.TimeoutException} If the given timeout expired prior to processing the item.
     * * {@link io.pravega.common.ObjectClosedException} If this instance of ItemProcessor has been closed prior to processing
     * the item.
     */
    public CompletableFuture<Void> process(ItemT item, Duration timeout) {
    }

    private void executeNextItem() {
    }
    
    private class QueueItem{
        QueueItem(ItemT item, CompletableFuture<Void> result){
        }
        private final ItemT item;
        private final CompletableFuture<Void> result;
    }
}

Threading

All top level and non-private classes should be thread safe.

This is accomplished as follows:

  • If a class can be made immutable; it should be.
    • This is the easiest way to ensure thread safety.
  • Any static member MUST be immutable. As static object is implicitly shared between all instances.
  • If a class has members that change; they should be protected by a lock.
    • When this lock is held, calls to 'external' objects that are not 'wholly owned' by the class with the lock should not be performed. This generally implies, but does not strictly require that blocking operations not be performed while a lock is held.
    • If a blocking operation is performed under lock, it should be very obvious and clear that it is absolutely necessary.
  • If there is a non-threadsafe class it should be annotated @NotThreadSafe and the class that holds it is responsible for its synchronization.
    • This implies that any non-threadsafe class must have exactly one owner and should not be passed around.
  • When a member is protected by a lock it should be annotated with @GuardedBy and the name of the lock object.
  • Prefer Lombok's synchronized to the Java synchronized keyword.
  • Classes passed into or returned from methods should not be used to circumvent ownership, lock ordering etc. IE: Passing a mutable map to a setter is not ok, even if the map itself is threadsafe as the caller could retain a copy of it and modify it later to the surprise of the class holding it.
    • This is generally not a problem for "top level" classes being "wired together", but other mutable class being passed into or returned from a method should be viewed with suspicion, and avoided if possible.
  • Following from the above rules Collections.synchronizedList() wrapping is discouraged because it does not 'internalize' the synchronization because the caller is expected synchronize while iterating. When threadsafe collections are needed the 'concurrent' collections are prefered. (If there is no concurrency required but synchronization is needed for passing an object between threads, immutable lists/objects are prefered, but if this is inconvenient, Vector may be a replacement for "synchronizedList".)

Within the Pravega host there are some more specific rules:

  • Multiple operations can occur in parallel to each of the components of the system, even involving the same stream.
  • All modifications to stream data should have a strictly defined order.
    • Updates to the state for a stream should be atomic and involve no IO.
    • Each read should see data from one version. (IE: If events 1, 2, 3 are written it is unacceptable for 1 and 3 to be visible but not 2)
  • All IO proceeds in parallel and is performed by dedicated pools.
  • Callbacks should not perform blocking operations. They may update in-memory objects and queue blocking operations to be done later.

Logging

To be consistent on what log levels mean what, use the following as a rough guide:

  • Errors - Something bad happened that can't be automatically be handled, e.g., "There is no configured endpoint to connect to", "Data from X is corrupted".
  • Warnings - Something bad has happened that can be handled, but you still might want to alarm on if there is a ton of them, e.g., "A connection has dropped".
  • Info - Anything that would be required to be known by someone looking at a log to understand the state of the server in order to make sense of a strange error that happened subsequently. Note that this should not be frequent or very detailed that it imposes a significant performance penalty, e.g., "Connecting to host X."
  • Debug - Information that is useful for debugging, it could be frequent and detailed. Keep in mind that occasionally we will want to run in production with debug on, so there should be some effort to minimize the impact to performance. E.g., "Event published successfully"
  • Trace - Anything too detailed or frequent to be Debug. We do not expect production systems to run with trace level enabled.

Full stack traces should be logged at Error levels, but never at info levels or below. They should be logged at Warn level only if it is likely to have anything interesting. (e.g., A connection drop would not).

Exceptions

Throwing exceptions

  1. Specificity
    • Applicability: everywhere.
    • Make sure you throw adequate/specific exceptions, so that when someone does catch them, or when they see it in a log, they have a better idea on what to do.
      • Example 1: Do not throw general "Exception" - unless you have no other choice.
      • Example 2: Do not throw general "IOException" if you have a more specific exception that derives from IOException.
      • Example 3: Do not throw general IllegalStateException if you have a more specific exception. It is OK to use Preconditions.checkState to throw this, (see below), but in case you want to indicate an object has been closed and is no longer usable, you should use ObjectClosedException (which derives from IllegalStateException, but delivers a more specific message) - callers can still catch IllegalStateException (which will handle both), or they may decide to catch ObjectClosedException and take action based on that alone.
    • Create specific exceptions that derive from a general exception (example: ObjectClosedException extends IllegalStateException) See below
    • Declaring exceptions
      • You should declare each type of checked (non-runtime) exception that you may throw
      • It is desirable to aggregate some of these into their closest parent exception (i.e., IOException is good, but Throwable is not OK) if the list is longer than 4 , but you need to document them in the method's JavaDoc.
        • Ex1: if you have E1, E2, E3, E4 extend IOException, and method foo() throws all of them, it is OK for foo() to declare it throws IOException (provided its JavaDoc describes everything).
        • Ex2: If you have E1, E2, E3, E4 extend IOException, but method foo() throws only E1 and E2, then foo() must declare it throws E1 and E2.
  2. API/3rd party exceptions
    • Applicability: in those modules/classes dealing directly with external (non-Pravega, non Java-library) components.
    • Rethrow external exceptions wrapped in your own exception so you still have a trace of the original, without forcing the caller to have a hard dependency on that external API. Not doing this would break encapsulation - your code's users would need to know how your code works and design around it.
      • A deviation from this rule is acceptable when the original is a type built into Java that makes sense. However, bear in mind that if you are working on an abstraction layer (such at Tier 1/2/Cache abstraction), then you may want to translate the actual implementation's exceptions into your own standard; otherwise every time you change the implementation you'll get a different set of exceptions, which may cause the calling code not to behave properly.
  3. Preconditions
    • Applicability
      • Public methods arguments
      • Constructor arguments (especially in those constructors which just store an argument into a private member)
    • Usage
      • Check whether the operation is valid given the current state of the object
        • Preconditions.checkState()
        • Exceptions.checkNotClosed - for those objects implementing AutoCloseable.
      • Check whether the arguments are valid
        • Preconditions.checkNotNull for null arguments
        • Exceptions.checkNotNullOrEmpty for string arguments that cannot be null or empty
        • Preconditions.checkArgument will generally satisfy the rest
      • Check out Preconditions and Exceptions for a whole list of checks that can be used.
  4. Assertions
    • Applicability:
      • Private methods arguments
      • Non-trivial computations
    • Usage
      • Use assertions (assert : ) to validate that your input/output/invariant is as expected
    • Notes
      • Assertions are not enabled by default in Java. It is highly recommended you run your JVM with the "-ea" options to enable this locally.
        • If you want assertions that always fire, consider doing "if() { throw new AssertionError();}". This will always execute like regular Java code regardlessof whether "-ea" is passed or not.
      • If assertions are disabled, or if is false, the statement on the right () is never evaluated, and it will have no consequences.

Catching exceptions

  1. If you don't know what to do with an exception, don't catch it.
  2. When catching an exception, never print it to the console, whether using System.out.print or Exception.printStackTrace. This is ok for your own local development, but it makes no sense in a service environment.
  3. Don't catch, log and then rethrow the exception, because this results in the exception being duplicated in the log.
    • When rethrowing the exception, make sure you include the original exception as "cause" - otherwise it will be lost.
  4. When catching multiple exceptions that are handled with the same logic, the Java 8 multi-catch is preferred to using a parent class.
  5. Never silently swallow exceptions. Even if you think it's impossible for an exception to be thrown (but its method declares so), consider using @SneakyThrows or rethrowing it as a RuntimeException (see below)
    • "Silently" means just catching and doing absolutely nothing about it. It is OK if the exception is expected behavior and you properly handle the exception, without rethrowing it.
    • @SneakyThrows
      • For the specific case where you know with certainty an exception cannot occur, (i.e., the IOException when writing to a ByteArrayOutputStream) @SneakyThrows can be used to force the compiler to treat the exception as a runtime exception when it is naturally a checked exception. Because it hides the exception from callers @SneakyThrows should be used judiciously, only where this behavior is desired.
      • It can be used to allow an exception that is expected and handled to be thrown through an intermediate class who's signature prohibits it. When doing a CompletableFuture.thenApply(...) the function is not allowed to throw, even though if it does the exception is handled correctly by causing the future to fail with the corresponding exception. So rather than wrapping the checked exception in a Runtime which would lose the type, it can be annotated with @SneakyThrows so that the exception does not have to be wrapped in runtime and then unwrapped by the caller.
  6. Catching Throwable
    • Java has Exceptions and Errors. Both inherit from Throwable. Normally you need to only catch Exception, but in rare occasions you may need to catch Throwable (i.e. AssertionError is an Error) so that you may fail a Future that someone else may be waiting for.
    • If you need to catch Throwable, ALWAYS check if it's safe to do anything with it. Use ExceptionHelpers.mustRethrow() to verify if it's a terminal exception (OutOfMemory/StackOverflow). If so, you need to rethrow the exception immediately and not do anything else.
  7. Catching InterruptedException
    • When you catch InterruptedException it removes the interrupt from the thread. For this reason, if something throws InterruptedException you should use Exceptions.handleInterrupted() or FutureHelpers.getAndHandleExceptions() if it is coming from a future. This ensures that the current thread will also be interrupted.

Exception Structure

  1. Consider having all Pravega exceptions (except runtime exceptions (NullPtr, InvalidState, InvalidArgument)) derive from a common class - this will make it easier to spot & manage:
    • PravegaException: top-level exception.
      • SegmentStoreException: top-level exception that indicates it was raised by the Segment Store Service (derived from PravegaException)
      • ControllerException: top-level exception that indicates it was raised by the Controller (derived from PravegaException)
      • PravegaClientException: top-level exception that indicates it was raised by the Client (derived from PravegaException).
  2. Guidelines for when you are creating an exception type, deciding if it is runtime or checked
    • Runtime exceptions should be used in cases where the caller cannot reasonably be expected to do anything about it. IllegalArgmentException and IllegalStateException are good examples of what should be RuntimeExceptions
    • Checked Exceptions should be used in cases where the caller will almost certainly be handling the exception. An example of this would be an exception on an RPC.
  3. Examples from SegmentStore:
    • SegmentStoreException: top-level exception, which derives directly from Exception. Like the name implies, this is a very vague exception, but it does imply it has to do with the SegmentStore.
    • StreamSegmentException: derived from SegmentStoreException, it indicates an exception dealing with a segment. Its constructor requires a segment name.
      • Various exceptions that derive from StreamSegmentException (SegmentExists/SegmentNotExists/SegmentMerged/SegmentSealed, etc)
      • The beauty of this is that all exceptions derived from this will have a common message format and are guaranteed to include the segment name. This makes debugging light years faster.
    • ContainerException: similar to StreamSegmentException.

Test cases

In addition to the required code coverage, all pull requests should include at least one test case. If it is a bug fix, then the test case should reproduce the issue it is fixing. If it is a new feature, then the test case(s) should cover all the new code and changes. Reviewers are expected to enforce this and request test cases from the contributor accordingly.

In some instances, providing a test case is really hard if not impossible, so it is acceptable for some pull requests to not have a test case, e.g., for some nasty race condition that is really hard to reproduce. In some cases, when doing the test is difficult but doable, we expect the contributor to go that extra mile and possibly build some classes that help with future test cases.

We also have a framework for system tests. Consider writing system tests to see how it runs in a real cluster.

In addition, we have a tool that we can use for local stress testing and correctness testing (if you do not have access to a real cluster or want to debug locally). See Local Stress Testing.

Test timeouts

Tests should be designed to run quickly as there are many tests and they all get run on every PR. For this reason sleeps or other time consuming operations should be avoided if at all possible. Similarly the use of fixed resources such as static variables or constant ports which would prevent test parallelization are not allowed. (There are multiple utility classes in testcommon to work within these constraints)

Additionally, we don't want a broken test case with say a deadlock preventing a build from completing indefinitely. To avoid such situations, we typically add timeouts to test case. A fine-grained way of adding a timeout value is to set a parameter along with the @Test annotation like this:

@Test(timeout = 30000)

Although we prefer to set a timeout to each test case individually so that we have more control, it is also acceptable to set the same timeout to all test cases in a class by using a @Rule:

@Rule
 public final Timeout globalTimeout = new Timeout(30, TimeUnit.SECONDS);

The default value we suggest is 30s, but if you really think that your test is simple enough to always complete in less, then pick a smaller value. Note that jobs share resources in some build systems, so a conservative value is desirable.

Debugging the Build

There are a few things to consider when the build fails. The worst thing to do is immediately restart the build without doing a root-cause-analysis of the problem. DON'T DO IT! Consider one of the following debugging strategies:

Compilation error.

This is the simplest one. Fix it. But better yet, please make sure your code compiles before checking it in. ./gradlew compileMain compileTest will do the trick.

Checkstyle error.

We have various Checkstyle rules in place (see above). To check these locally, run ./gradlew checkstyleMain checkstyleTest and make sure it's happy.

Test error

This falls into multiple categories:

  1. Assertion error. Verify the root cause of the assertion. Run the test locally on your computer. If reproducible, fix the code. If not, see next.
  2. Flaky test (i.e., sometimes passes, sometimes doesn't). This just adds to the pain for every developer on the project. Just because it may "work on your machine", it doesn't mean that it will work in Travis (or on someone else's machine). Environments differ (think CPU, memory pressure, IO latency, etc) so there could be multiple things that do not work as expected. Most likely, due to the async nature of our code, the result of your computation may not be immediately available and you may need to adjust your testing code to account that it may eventually update. In most cases, this is a test bug, but this could very well indicate a real production bug.
  3. Timeouts. This means that either you haven't allowed enough timeout for a slow build system (like Travis), or that you are doing too much work in your unit test, or that something in your code hung (i.e., a Future) and your test will likely never complete. The first two are easy to fix, but the last one may require significant debugging. Whatever the fix is, increasing the timeout should be used as a last resort, only after all other attempts to fix the test have proved fruitless.
  4. Resource leaks. See next section.

Resource Leaks

Leaking a resource means that the code released all of its pointers (references) to the object that owns it (i.e, it has been garbage-collected), yet that resource hasn't been released or otherwise deallocated. Common suspects include direct (native) memory, file handles, network connections and thread leaks.

Direct memory leaks.

Most of our unit tests extend the LeakDetectorTestSuite class. This special class is designed to track Netty ByteBuf leaks and halt the test execution when one of them is encountered. Note that due to the way the GC works (much later than the problematic test, and on a different thread), you will not be immediately notified of a leak. On the contrary, some other subsequent test will fall victim to this.

To debug this, you will need to run the tests locally and inspect the logs (yes, all the tests output logs). You will see plenty of WARNs and ERRORs coming from Netty's allocators indicating the source of the problem.

Thread (Pool) Leaks

This is one of the most serious leaks and also one of the hardest to spot. While a memory leak will just waste "some memory", in Java, each Thread is backed by a native OS thread. That means that, unless properly terminated, a thread leak in your code will consume kernel resources (to manage the native thread) and about 1MB (more or less) of memory in your process, which is reserved for the thread stack (also an OS setting). It is easy to see that leaking only 100 threads will eat up about 100MB of memory for your process. This is significant if our build is run in a constrained environment such as Travis or GitHub Actions.

Fortunately for us, there is a way to detect these (though not an easy one). All of the thread pools in Pravega are (and should be - no exceptions) created using ExecutorServiceHelpers. This class check the ThreadLeakDetectionLevel system property and based on that it may operate in 3 modes:

  • None: No thread leak detection is performed. This is the default value and should be enabled for production use.
  • Light detection: A WARN log message is output when a thread pool leak is detected, along with a stack trace pointing to the code that instantiated that thread pool.
  • Aggressive detection: Same as Light, but will halt the VM with an exit code of 99.

The Gradle build is configured to run tests using an Aggressive Detection Level. As such, if you see that your build failed due to the process exiting with code 99, do the following:

  1. If you have the ability, extract the build logs. One of the last things logged should be an ERROR message with the stack trace pointing to the leaked thread pool. If you got this, skip to step 3 below.
  2. Either:
    • Run the build locally and get the logs
    • Update the build to output more details:
      • Modify java.gradle (Section task.withType(Test)) like this:
        • testLogging.showStandardStreams = true
        • testLogging.events = ["PASSED", "FAILED", "STANDARD_ERROR"]
      • Optionally check in the java.gradle ONLY into your branch.
      • Re-run the build.
      • Don't forget to revert the java.gradle file before merging your PR.
  3. Once you get the stack trace that identifies the thread pool you are leaking, you need to verify that you are properly shutting it down:
    • Does your owning class have a close/shutdown method? If so, close your pool there.
    • In a Unit test? Consider extending from ThreadPooledTestSuite, or using @Cleanup("shutdownNow") or shutting it down in your @After block.
    • In production code? Verify that your pool is not left orphan (i.e, an exception after you create it) and that your owning class properly shuts down (i.e., no timeouts or unexpected exceptions before shutting down the pool).
    • There could be other reasons too. But the ones listed above are the most common.

Lombok

Pravega uses lombok to reduce boiler plate code. If you use an IDE you'll need to install a plugin to make the IDE understand it. See their instruction on how to do that.

Dependencies

Our code makes extensive use of:

These libraries should be used whenever possible rather than adding more dependencies. Additionally feel free to make use of annotations in JSR-305. However additional dependencies need to be weighed carefully. If possible create a reusable class and place it in Common rather than adding an additional external dependency.

If an additional dependency is needed it should be scoped using a gradle target as narrowly as possible, so that other code does not accidentally depend on it.

TODOs in the code

We don't like TODOs in the code. It is really best if you sort out all issues you can see with the changes before we check the changes in. However, if you exceptionally need to leave a TODO comment in the code, we ask you to:

  1. Create an issue explaining the remaining work that needs to be done
  2. Reference the issue number in the comment

E.g., // TODO: Writing to /dev/null sounds like a bad idea (Issue #666)

Clone this wiki locally