Skip to content

Coding Standards

Tim edited this page Mar 1, 2016 · 1 revision

TODO: formatting cleanup and review

Philosophy

Why have coding guidelines? Why not let each developer use their own style and patterns at will? In our experience, there are a few good reasons:

  1. reading code becomes easier. You can actually discern patterns visually without having to read each token. This greatly speeds up reading each others' code
  2. It speeds up writing code. If you have simple rules to follow when writing code, you don't have to make judgements about how it looks. You have more brainpower left to think about the important things (such as if you have an off-by-1 error..., potential NPEs, semantic bugs, ...)
  3. they should encapsulate shared experience of many developers, and hence help one developer not make the same error another did in the past

The style standard itself doesn't matter too much. I have personally worked in four distinct coding standards and adapted to each. Here we aim to try to be consistent so that while we code, we can do most code formatting with the IDE or w/o thinking; and having the same design "patterns" for the same cases is important. Personally, if I see a pattern that feels "foreign" or "odd", I first try to understand the logic behind the pattern. I then present the logic I have (pros, cons--technical or design facts). The decision for what to do should be based on the technical facts as much as is possible. If an arbitrary choice must be made, then each pattern should be listed and pros/cons of each, and an attempt to explain when and why one might choose one or another (tie goes to coin flip)

Our standards should free us to think in higher abstractions, and come up with new patterns, rather than enforcing just what we've always done or seen.

All standards are always open to discussion amongst contributors to the java developers group, and these are subject to change (though hopefully not drastically and frequently, else, we'll not have consistent code)

Recommended Reading

Warmup

New to Java? Have 10 years of C++ experience and figure that'll make you a Java Guru? Start with the basics. Java is not just interpreted C++-like code. You end up being a top-tier Java programmer both in terms of readable code and code that performs well, but understanding how to help the JVM trigger all it's advanced runtime optimizations (JIT, etc)

Style and Patterns

  • Effective Java 2nd Edition : Get this book. We have a copy in Data Freeway if you can't afford. I'm sure your manager will expense it. We follow virtually every guideline he has here on coding format, style, and guidelines. It covers good dos and don'ts of patterns and style for maintainability as well as our good friend performance.

Concurrency

If you want to understand concurrency pretty well, these books are must reads:

  • Java Concurrency in Practice
  • The Art of Multiprocessor Programming (goes into way more details on how to construct lock-free data structures--helps just understand what the java libs do and where you can do better--rare, but sometimes; even talks about how to optimize building locks from CAS on NUMA architecture--from 2008, hoping for an update soon)

In a nutshell, to summarize 100s of pages, use task-based parallelism, limit the # of threads (cost of creation, teardown in jvm, use built-in concurrency (Executors, CountDownLatch, CyclicBarrier, Phaser, ForkJoin, ..... see the java.util.concurrent package contents)

Performance

  • Java Performance : The best book that will help you understand how to either tune your JVM directly, or how to alter your application to better utilize how the JVM. Chapters 1,2 are probably worth skimming (if you didn't know pidstat or sar very well maybe). Chapter 3 and 7 are essential to understanding how the VM works.

Overview

We adhere to Java naming conventions and style.
The best reference is within http://www.amazon.com/Java-Programming-Language-4th-Arnold/dp/0321349806

Settings

If you use Intellij Idea (why aren't you?), here is a settings file

Media:Java--CodingStandards--IntelliJIdea11-settings.jar

You can import only the code-style portion if you don't want the colors, fonts, etc.

alternatively, a more up-to-date file may be in:

~srash/settings/idea-settings/current-settings.jar

do be careful. If you've made any changes, export your settings first so you can re-import pieces. I've had reports of importing just code style messing with live templates

Major Java Projects

Many of our standards may be inferred from our large existing codebase. Not all code follows all conventions to the letter, and you'll find newer ones do more closely (ex: Ptail, DFC, Calligraphus, jcommon, Puma). Also ones that aren't bound by open source standards that might conflict (as far as I know, the ones for HDFS or HBase do not, but just have been around much longer; I don't know Hive's well enough to comment).

Specifics

class definitions

We do not define classes as final, and even frown upon making a method final in our own code-base. This can make testing via manual override more difficult and typically the reason for making a class final is flawed in some way (ie, whatever it achieves may be achieved via other mechanisms).

For methods, if a public method is final, this is usually because the class calls it. In this case, prefer a private internal method that both the class implementation and public method may call. If a method is protected, similar is suggested. Sub-classes directly inherit so there is also no chance of implementations changing.

Final classes also violate the open/closed principle: http://en.wikipedia.org/wiki/Open/closed_principle

keyword order

  1. annotations
  2. access modifiers (public, private, protected)
  3. abstract
  4. static
  5. final
  6. synchronized
  7. native
  8. strictfp

See The Java Programming Language, 4th edition

ex: public static final String FUU = "bar";

Naming Conventions

  • Use full names. Java IDEs such as Intellij will complete these for you. ex: Manager, Interface, RefreshableConfigProvider, ...

  • try to limit to 4-5 'words' in the name

  • examples

      private static final Logger LOG = Logger.getLogger(ParallelStorage.class);
    
      private final Storage<K, V> storage;
      private final int degreeOfParallelism;
    

Units

When unable to use proper types that includes units (such as java6 TimeUnit), we use the convention of including units in variable names. This has saved a number of headaches traversing code and finding one place expected seconds, another millis:

ex:

intervalMillis; // interval in milliseconds
durationSecond;  // duration in seconds

I can't emphasize this enough

member variables

simple use camel-case with the first being a lower-case letter. Special naming using _ at the start or end is not used (This is controversial). Example from JDK source of PriorityQueue (by Josh Bloch, author of Effective Java, major JDK contributor)

public class PriorityQueue<E> extends AbstractQueue<E>
  implements java.io.Serializable {

  private static final long serialVersionUID = -7720805057305804111L;

  private static final int DEFAULT_INITIAL_CAPACITY = 11;

  /**
   * Priority queue represented as a balanced binary heap: the two
   * children of queue[n] are queue[2*n+1] and queue[2*(n+1)].  The
   * priority queue is ordered by comparator, or by the elements'
   * natural ordering, if comparator is null: For each node n in the
   * heap and each descendant d of n, n <= d.  The element with the
   * lowest value is in queue[0], assuming the queue is nonempty.
   */
  private transient Object[] queue;

  /**
   * The number of elements in the priority queue.
   */
  private int size = 0;

  /**
   * The comparator, or null if priority queue uses elements'
   * natural ordering.
   */
  private final Comparator<? super E> comparator;

  /**
   * The number of times this priority queue has been
   * <i>structurally modified</i>.  See AbstractList for gory details.
   */
  private transient int modCount = 0;

local variables

Avoid redefining local variables and arguments. Variables should be assigned to once.

Note that there are a couple common exceptions to the rule:

  • variables used in "for" or "while" loops, e.g.,

      for (int i = 0; i < max; i++) {
        // do stuff
      }
    
      while ((line = reader.readLine()) != null) {
        //do stuff
      }
    
  • variables that require a conditional "initialization" if they happen to be certain values (in these cases, the reassignment should happen before the variable is used for any other purpose), e.g.,

      public void foo(String bar) {
        if (bar == null) {
          bar = DEFAULT_BAR;
        }
      }
    
      public void goo(int bar) {
        if (bar == -1) {
          bar = DEFAULT_BAR;
        }
      }
    
      public void hoo(Config config) {
        int bar = config.getBar();
    
        if (bar == null) {
          bar = DEFAULT_BAR;
        }
      }
    

final variables

member variables should always be final unless it's absolutely impossible (and if not, consider a builder pattern to isolate mutability). This makes the member variables thread-safe. ex:

public class Fuu {
  private final int num;

  public Fuu(int num) {
    this.num = num;
  }
...

There are also cases where final is required when doing poor-man's closures in java (aka anonymous inner classes;

public Baz someMethod(int num) {
  final int numSquared = num * num;

  return new Baz() {
    public int getNum() {
      return numSquared;
    }
  }

we typically do NOT declare local variables as final except as above, but theere is some dissent in opinions here. The argument being if the language can enforce a convention, use it. On the other hand, we treat all local variables as final pretty much in all cases except those enumerated--loops and complicated initialization sequences where using many variables would make it less readable. The general consensus in calligraphus, ptail, puma, jcommon, nifty, swift, and "presto" (placeholder name) is not to use final for local variables. It clutters up otherwise readable code.The same goes for method parameters--they are only declared final if needed for a closure.

Consider declaring class constants for literal values that are used more than once in the code. That way those values can be easily updated and automatically kept in sync. (at the top, private static final <type> <THE_NAME_HERE> = <value>)

Constructor Variant Order

Go with most params to least. The idea here is to keep the declarations nearest the code that manipulates it (or at least initializes it; hopefully not much more if we strive for immutability)

public class ExecutorServiceFront extends AbstractExecutorService {
  private final Lock lock = new ReentrantLock();
  private final BlockingQueue<Runnable> workQueue;
  private final ExecutorService executor;
  private final String poolName;
  private final long maxTimeSliceMillis;
  private final BlockingQueue<Drainer> drainerList;

  // this the only location directly assigns to variables (they are all final! yay!
  public ExecutorServiceFront(
    BlockingQueue<Runnable> workQueue, 
    ExecutorService executor,
    String poolName,
    int maxDrainers,
    long maxTimeSlice,
    TimeUnit maxTimeSliceUnit
  ) {
    this.workQueue = workQueue;
    this.executor = executor;
    this.poolName = poolName;
    this.maxTimeSliceMillis = maxTimeSliceUnit.toMillis(maxTimeSlice);
    drainerList = new ArrayBlockingQueue<Drainer>(maxDrainers);

    for (int i = 0; i < maxDrainers; i++) {
      drainerList.add(new Drainer(String.format("%s-%03d", poolName, i)));
    }
  }

 // convenience constructor 1
  public ExecutorServiceFront(
    BlockingQueue<Runnable> workQueue, 
    ExecutorService executor,
    int maxDrainers,
    long maxTimeSlice,
    TimeUnit maxTimeSliceUnit
  ) {
    this(workQueue, executor, "Drainer", maxDrainers, maxTimeSlice, maxTimeSliceUnit);
  }

 // convenience constructor 2
  public ExecutorServiceFront(
    BlockingQueue<Runnable> workQueue, 
    ExecutorService executor,
    String poolName,
    int maxDrainers
  ) {
    this(
      workQueue, executor, poolName, maxDrainers, Long.MAX_VALUE, TimeUnit.MILLISECONDS);
  }

 // convenience constructor 3
  public ExecutorServiceFront(
    BlockingQueue<Runnable> workQueue, 
    ExecutorService executor, 
    int maxDrainers
  ) {
    this(
      workQueue, executor, "Drainer", maxDrainers, Long.MAX_VALUE, TimeUnit.MILLISECONDS);
  }
...

Static Methods

If a method is to be used by a class other than the current (ie, pkg private, protected, or public visibility), especially in the case of static "helper" methods, well, static makes sense. However, when it comes to private static methods, there is confusion to the benefit of this. Will the "compiler" inline the static and not the member function? The answer here is no, they both get inlined identically. It's a trivial optimization based on visibility constraints and that a constructor cannot call any member function. Here's the one case our conventions call for private static methods: when the language requires it.

One case is during the constructor. Say you have a bit of complicated code and want to factor it out. It must be a static method (and I suggest private) because this is not fully initialized. Probably some other cases exist that I'm blatantly forgetting.

Possible counter-argument: "well, static tells me as a reader if the function uses any member variables". This sounds great on the surface, but especially as code evolves quickly, that which was a member function could be turned into static and results in more lines changing in 'diffs' if the author remembers, or inconsistencies in the code where some member functions don't use member variables.

Method Definitions and Invocations

Many methods in Java will simply not fit on one line, even at larger widths due to generics and verbose naming. Here are schemes we use for consistent wrapping

There are no differences between constructors and other methods

cases:

  • fit on one line:

      public ParallelStorage(Storage<K, V> storage, int degreeOfParallelism) {
    
  • throws to next line to first

      public ParallelStorage(Storage<K, V> storage, int degreeOfParallelism) 
        throws MyVeryOwnException {
    
  • fit on one line by themselves

      public AllocationManager(
        AtomicLong globalQueueSize  AtomicLong localQueueSize, int fuu
      ){
    
  • do not all fit on one line

      public AllocationManager(
        AtomicLong globalQueueSize,
        AtomicLong localQueueSize,
        long maxLocalQueueSize,
        long maxGlobalQueueSize,
        PipeStatsCollector statsCollector
      ) {
    
  • includes a throws statement:

      public void fuu(
        String longArgName1, String longArgName2
      ) throws FuuException {
    
  • a fourth case was recently shown to me where you can't even make the first '('. In this case, favor moving everything up to the return type up one line

      public static 
      Collection<Method> findAnnotatedMethods( 
        Class<?> type, Class<? extends Annotation> annotation
      ) {
    

Method Calls

Single Method calls

object.method()

should remain on line line. Move arguments to subsequent lines. The only case these shouldn't be on the same line is if length(object) + length(method) + 1 > 100 (our max width)

Method Chaining

When doing method chaining (>= 1 method call, "chained")

object.method1()
  .method2()
  .method3();

More Wrapping

  • variable declaration and initialization don't fit on one line. Avoid splitting the object.method, but you have options:

      TimeSeriesStorage<RowKey, ReadableAggregation<Datum, Datum>> storage =
        storageFactory.create(storageKey);
    

    -or-

      TimeSeriesStorage<RowKey, ReadableAggregation<Datum, Datum>> storage = storageFactory.create(
        storageKey
      );
    

The first is preferred as it's fewer lines; unless the arguments to create() spanned many lines:

TimeSeriesStorage<RowKey, ReadableAggregation<Datum, Datum>> storage = storageFactory.create(
  storageKey,
  limit,
  bufferSize,
  clusterName,
  tableName,
  columnNames,
);

Then you the above is preferred, again, due to fewer lines and readability.

this

We do not use this.x except when required to disambiguate. This code snippet illustrates:

public CoreConcurrentCache(
  ValueFactory<K, V, E> valueFactory, ExceptionHandler<E> exceptionHandler
) {
  this.valueFactory = valueFactory;  // using this here allows the param + member that match to be named identically
  this.exceptionHandler = exceptionHandler;
}

@Override
public V get(final K key) throws E {
  Object value = cache.get(key);  // no 'this' here

Comments

Yes you should have them, but do so judiciously. Too many comments, or useless ones that state the obvious clutter up code. Now to formatting. Simple; comments should stand on their own line.

single-line comment:

// soft limit on # of permits for this semaphore
private final int maxPermits;

We strongly discourage comments on the same line, even if short

for multi-line comments, there is not convention (except javadocs of course).

// this is fine
// for a multi-line comment
System.out.println("exiting");
/* 
so 
is 
this
*/

However, even though the java coding language can be overly explicit and verbose, we need not be in our comments:

/**
 * the timeInterval
 */
private final TimeInterval timeInterval;
/**
 * list of aggregation functions
 */
private final List<String> aggregationFunctionList = new LinkedList<String>();
  • using /* */ is fine, but we don't need the blank lines. You can also use //

      /* the timeInterval */
      private final TimeInterval timeInterval;
      // list of aggregation functions
      private final List<String> aggregationFunctionList = new LinkedList<String>();
    

Eight lines to four! Now, to reduce it further, do these comments add any value? We've named our variables clearly, unless there's something to add, here's the ideal code for the original block:

private final TimeInterval timeInterval;
private final List<String> aggregationFunctionList = new LinkedList<String>();
  • Aim to write self-documenting code when possible, and we'll save comments for the really tricky stuff
  • the benefit to removing unnecessary comments in say, class declarations, or keeping comments that are one line of text to one line is more info is on the screen at once. It's beneficial to get as much of a class's declaration without lots of whitespace. There's no control flow vs statement vs block statement that we suggest in the newlines below

Newlines

We aren't strict on anything here. We don't even enforce it. It's just a style a couple of us follow to keep things "regular". The benefit of following it? As usual with standards, it takes the thinking out of "should I put a newline here? this is a logical group...but that end statement is related to the next." Yet, it puts newlines in the code to make it readable.

Without further ado, here is one schema Rationale This scheme contains some logical grouping for readability. You can glance at code and see common constructs.

Rules:

  1. newline before/after series of declarations. None within the series. ExceptIon: declarations at the start of a statement block (ie, first line in { } block )
  2. newline before/after any 'block' construct (if, for, while, try-catch, etc)
  3. newline before/after a series of "statements" (ones that don't define a variable). Exception: first statement of a { } block
  4. newline between class/member variable declarations (see above) of varying types: as a sub-example, group public static final, private static final, private final, private, private volatile with newlines between, none within (not a complete statement of all modifier types, but group by distinct modifier types and order specified above)
  5. before the return statement (unless it is the first statement in a block, ie only)
  6. before a throw statement; it's like a return statement (same exception, not if only statement in a statement block)
  7. other normal cases such as between functions

Exceptions:

  1. no blank line before any of the above case if they are the first line of a function, unless there is a throws clause or other function info that aligns directly with the statement making it confusing
  2. comment lines are considered part of the lines of code they are about, so apply rules above. If the comment is about a non-variable declaration, and a variable declaration is above, put a blank line above the comment. If the comment is about one variable declaration in the middle of a series, no blank line is needed.

example

private Storage<K, V> checkoutStorage() throws StorageException {
  Storage<K, V> storage = storagePool.poll();

  // safe double-checked locking for fast-path reduction of lock contention
  if (storage == null) {
    boolean createStorageOk = false;
  
    synchronized (creationLock) {
      storage = storagePool.poll();

      if (storage == null) {
        if (numInstances < maxDegreeOfConcurrency) {
          numInstances++;
          createStorageOk = true;
        }
      } else {
        return storage;
      }
    }
    // check while holding lock acquired a 'permit' to create a lock
    if (createStorageOk) {
      LOG.debug("creating instance number %d / %d", numInstances, maxDegreeOfConcurrency);

      return factory.create();
    }
  
  } else {
    return storage;
  }

  // if we make it here, it means no storage instances were available, and we reached the max
  // # we are allowed to create, so block and wait for someone to return an instance
  try {
    return storagePool.take();
  } catch (InterruptedException e) {
    throw new StorageException(e);
  }
}

Format Strings

As a general rule, use %s instead of %d or %f. It's both technically correct (every int, long, float or double is auto-boxed into an Integer, Long, Float or Double before it's added to the varargs array), but also -- and more importantly -- it's more forgiving. The most-common use of format strings is in logging, and a lot of that logging happens in catch statements, so it's better if the formatter fails quietly (and you get a slightly-garbled error message) than if it throws an exception (and you lose any information about the actual error). For example, here are some cases we've run into in the past which %s would have handled perfectly well:

  • using %d with an Integer, and the Integer was sometimes null
  • changing the order/number of format args and forgetting/messing up the ordering of the %ss and %ds

In practice, the only time to use %d or %f is when you need additional format specifiers, e.g., %.02d.

Visibility

Default Guideline

We typically use getters() and setters() for accessing values in alien object types, even if nested or package private. This holds even for nested static and non-static inner classes where java allows the container class to access private/protected fields. The concept of using such getters/setters still holds in these cases: changes in representation now are exposed, just to a single package or file. Further, modern IDEs will auto-generate the getters and setters, so this shouldn't be much of a hassle.

Example:

public class Bar {
  private final Deque<Fuu> fuuList = new ArrayDeque<Fuu>();

  public void addFuu(int value) {
    Fuu fuu = new Fuu(value);

    fuuList.add(fuu);
  }

  public int getLastFuu() {
    Fuu lastFuu = fuuList.getLast();

    return  lastFuu.value; // add a getValue() to Fuu by default
  }

  private static class Fuu {
    private final int value;

    private Fuu(int value) {
      this.value = value;
    }
  }
}

just add a getter; it's a little superfluous, and you can access the field via the getter instead.

private static class Fuu {
  private final int value;

  private Fuu(int value) {
    this.value = value;
  }

  public int getValue() {
    return value;
  }
}

Counter-Opinion

From Effective Java, 2nd edition, item 14, bottom of page 71, Bloch does say:

However, if a class is package-private or is a private nested class, there is nothing inherently wrong with exposing its data fields—assuming they do an adequate job of describing the abstraction provided by the class.This approach generates less visual clutter than the accessor-method approach, both in the class definition and in the client code that uses it. While the client code is tied to the class’s internal representation, this code is confined to the package containing the class. If a change in representation becomes desirable, you can make the change without touching any code outside the package. In the case of a private nested class, the scope of the change is further restricted to the enclosing class.

He makes a strong argument for the restricted scope, especially for nested classes.

Final Opinion

Stick with the default unless you can clearly satisfy Bloch's statement of "adequately describing the abstraction provided by the class, and are comfortable with the cost in change in representation.

If you do use judgement and forego getters/setters for the reason of "less visual clutter", document the variables well (as you would methods as they are now a private API)

Interfaces

"Always" an Interface

We prefer that virtually any class that is non-trivial to construct, or performs a non-trivial action be an interface. A common exception would be a 'bean' or data container class. The reasons are a few

  1. it's easy to see what the methods are that define an "interface" is for complex classes as an actual interface
  2. javadoc fits nicely here w/o making implementation files overly large
  3. there may be public methods that aren't part of the interface, but are necessary for property initialization (may be made non-private for use by factories) or for other utilities and testing.
  4. you can do you own custom mocking w/o mockito which is sometimes more natural, depending what is you want to do (see [https://phabricator.fb.com/diffusion/H/browse/hadoop/branches/titan/datafreeway/common/src/main/java/com/facebook/datafreeway/hdfs/MemoryFileSystem.java MemoryFileSystem] which can be used with anything that uses the hadoop FileSystem API
  5. This approach encourages you to think in terms of API and what an object should do, not how it does it,

Naming

If you think in terms of objects being interfaces, even when not, then the "...If" suffix becomes superfluous. We shy away from it, but sometimes, there's just nothing better. If you can name an object what it is or provides, use that. A corollary is that the "...Impl" suffix. This is often used when there's just one implementation, and the interface really is the best name for it, but we've followed the rule above and are stuck. Coming up with another name to avoid "...Impl" usually just makes two, unrelated names and causes more confusion.

Unless...

some or all

  1. Your object is cleanly defined in terms of public methods themselves defining the interface
  2. has good javadoc on the public "interface" methods
  3. mocking it out manually is not difficult for someone else needing to create a stub/mock version of it in other test cases

Our Quirks

We over-specify the access levels on an interface:

public MyInterface {
  public void methodA();
  public void methodB();
}

on a public interface, all methods are public. Here, we include the access specifier even though it's redundant. The primary motivation is that interfaces then are consistent with classes; all members have an access modifier.

Filters and the likes

Often you have a default filter for some type of interface:

public Filter {
  public boolean passes(SomeClass someInstance);
}

it's common practice to provide the default as a public static variable inside the class. Depending on the nature of the filter, this may be one instance, a true and false instance (two), or more. If it exceeds more than this, consider an enum that implements the filter.

example of a true filter

public Filter {
  public static final Filter TRUE = new Filter() {
    public boolean passess(SomeClass someInstance) {
      return true;
    }
  }

  public boolean passes(SomeClass someInstance);
}

See Josh Bloch's Effective Java book for extensible enums for what I mean on having enums and interfaces mixed.

Order of Components in a class

We define the order in which nested classes and variables should appear in a class. The order could be arbitrary, but there is some logic to it in that it tends to group constructors that initialize variables (often final) close, and bookend common public static vars and sometimes used public static classes.

NOTE: group all vars by final, regular, volatile for readability if possible. We also typically put a blank line between groups.

  1. public static variables
  2. package private static variables
  3. private static variables
  4. public instance variables
  5. package private instance variables
  6. private instance variables
  7. constructors
  8. static constructor methods
  9. if abstract class, put abstract methods here
  10. methods (partial order proposed below. for instance methods, tend towards order that is easy to follow call-graph, or a documented logical grouping)
    1. public instance methods
    2. other public static methods
    3. private instance methods
    4. private static methods
  11. non-public inner classes
  12. non-public static inner classes
  13. public inner classes
  14. public static inner classes

Data Structures

From Effective Java, 2nd Edition, item 25: Prefer lists to arrays. We follow this heavily.

This first avoids two major complexities around arrays, Generics, and types.

First, arrays are covariant:

class T extends S {
..
}

T[] tArray = ...;
S[] sArray = tArray;

ie, if T extends S, T[] extends S[]. This is not true in generics.

Try this code and I'm sure you've seen it fail:

List<T> tList = ...;
List<S> sList = tList; // compiler error

For what it's worth, Generics have ways around this by allowing wildcards and bounded wildcards in the <type>

The second major difference is that arrays are reified which means at compile time, the type of the elements in the array are known. Barring some tricks, which really don't contradict this statement, instances created with a generic type have that type erased.

// at runtime, reflection cannot tell us the type of stringList--it is in fact "Object" and all the generics are syntactic sugar that does allow strict type-checking at compile time.
List<String> stringList = new ArrayList<>;

Another bit that is amusing is that these are illegal statements: (in some scope where you have a generic E)

static <E>void fuu() {
  new List<E>[];
  new List<String>[];
  new E[];
}

You may see some code like this, but it's due to very shady casting that's fragile at best.

So, basically summarizing Bloch so far, arrays and generics don't mix. If you want more detail, read the item on page 119 (again, item 25).

The end point is that they don't mix well, and Lists have far more utility functions. Even an immutable list you don't plan on using anything can be backed by an array. Look at some source from Guava's Immutable array

class RegularImmutableList<E> extends ImmutableList<E> {
  private final transient int offset;
  private final transient int size;
  private final transient Object[] array;
...

Where size and offset are the range in array that this list represents. The great part about immutable objects is copying sublists is constant time, as you can see above. If you are wondering if you should use an array or List for some reason, go with the List. Lists convert to arrays easier than going back,

Testing

This covers style, content, and philosophy of testing. We are a bit more lax on the formatting of our code here, but much more rigorous on the nature of the tests.

Salad

  • we use TestNG. We could just have well used junit 4, but we started with one, and the are about they same, so we'll switch if there's significant benefit to offset converting all existing tests (mixed types are discouraged)
  • tests can be run in Idea/Intellij or with mvn at the command-line, both on our laptops (mac a least) and dev boxes. They should pass on all three. While linux is ostensibly the most important, understanding a failure in idea or on mvn + mac is good
  • name the test class for Fuu TestFuu
  • we use groups "fast" and "slow". Tests in the "fast" group should finish in ms, whereas slow may takes seconds.
  • try to come up with good test names, such as testNullForFuu. Common ones to just start out are testSanity which does some simple positive test.

Hearty Starters

  1. anatomy a good test class: It's about thinking through the edge cases, and often writing the test case before you are sure how your class will behave. In this way, a well-written test class will have test cases for the various "cases" (as defined by 'types" of input and state), including positive (succeeds) and failures. The latter test that the failure behavior is as expected: throws exception FuuException, returns null, etc. For both positive and negative, consider all edge cases (null, positive/negative, 0, "", and so on)
  2. use a setup() method annonated with @BeforeMethod(alwaysRun = true) for almost all setup of test methods. To quote one experienced Java code, the test method is "essence over ceremony". Put the ceremony in the setup() method. Another way to look at it is that test method should largely flow: state a precondition (may be a comment or aptly named object you constructed or both), execute some statements that trigger some known case, assert outcome. Doing this as concisely as possible greatly helps in reading a test.
  3. guidelines:
    • if the construction of the object has nothing to do with the test, it usually should go in the setup() method. If it's a trivially construction, and the exact (or functionally same) construct appears more than once, put in a setup() method
    • for complex objects with variations, you should construct all variations in the setup() method and give them meaningful names such as pipeThatThrowsException or pipeThatDropsMessages. Less preferred alternatives (that do the same thing, but ostensibly are more efficient, is make a method createPipeThatDropsMesssages() and call it only when needed. Construction in setup is still preferred (it is "setup", after all?) imrprovments
    • we hope to add more to the test

Main Course

  1. dataproviders
    • avoid dataproviders. They obfuscate what is going on. Unit tests are not about blasting a lot of data at code and hoping you cover it. Beck has great comments on how to design cases in his book (see below) which talk about figuring out what data + what state exercise a class. Using brute force isn't mentioned, that I recall (I could be wrong).
    • exceptions to the rule
      • when you can't reason through the cases for your class (which often means a design flaw in the class itself), for whatever reason, use them judiciously.
      • start with tons of data, then when you can see which half-dozen of your 200 inputs actually cover the true cases, fix it (mantra: iterate and improve)
      • anecdote: I used to ask mergeSort() a lot and when the decent candidates finished it up, I'd ask for how to test it. So often what separated the mediocre from the good (at least in this domain) was "well,I'd generate random data, or just some sets of numbers and check that they are sorted). The better ones said: "well, I'd try the empty list, size 1 list, size 2, ... and some known sequences with known results" depending on their algorithm. Better ones even pointed out they'd try lists size of max_int (assuming enough memory) or other good boundary cases. Often, too, actively thinking about how your code can fail causes you to catch bugs at this stage.
    • another question to ask here is if adding new data covers new code paths. You can find out, run cobertura or see our generated ones from continusous builds

Dessert

  1. lastly: see [http://www.amazon.com/Test-Driven-Development-Kent-Beck/dp/0321146530 Test Driven Development]. we don't follow it the extreme, but the idea that you can often bootstrap a class or two of a project, write some test cases (which aslo let you use the class and see if the API is broken in obvious ways). Does Kent still work here? I just haven't seen him post or a while.
  2. make unit tests as easy to debug as possible. This doesn't mean pretty messages or the like--we all can use a debugger. It means having both some end2end tests that make sure the unique combinations of pieces of the system in different states play well together (really more integration test), but focus on unit tests for classes. This way, when something breaks, and you have a test for every class, you know where to look immediately versus coarse end2end tests.
  3. test methods ("cases") should be independent of one another. If one depends on the other, it only complicates debugging and in fact, can often slow the tests down (ex: test2 depends on test1, and test1 does some extra work on top of it's normal work to make some data available to test 2 by writing to disk when it need only write to memory
    • it makes it difficult and less intuitive to pick a single test method and run it as a test case

Guice

Ah Guice, (look it up, it is pronounced "Juice"). It looks so scary at first, but is so magical and helpful in testing and modularizing code later.

These are some standards and guidelines we aim for (but in practice you will see we take shortcuts)

  1. do not create any object inside the configure() method. Binding constants from configs is alright, but the basic idea is that any class should be created by injection, either via the appropriate constructor, or a provider. If we follow the guideline, than any node in the object graph may be replaced via a Guice mechanism. This is quite useful in tests. If an object is created in the configure() method, it can't be overridden.
  2. use @Provides judiciously. Ask why--a provider makes clear what params you are getting in injected, allows for optional parameters via setX() methods with the optional=true annotation, and are generally more clear. They also promote modularity by having you make classes that you eventually even move to top level clases.

Basically, @Provides straddles a fine line between making the object itself @Inject with annotated constructors (maybe you don't want to and this can substitute for that) and providers that start to do anything non-trivial. If you can't make the class itself injectable, but the provider would be absolutely trivial (no optional params, not more than one instance with varied constructor args, etc), then you found the sweet spot.