Contributing to GISMO

Glenn Thompson edited this page May 20, 2016 · 13 revisions
Clone this wiki locally

There are two ways you can contribute to GISMO:

  1. If you have a useful seismic code (whether or not it currently uses GISMO), you are welcome to email it to us. We will be happy to work with you to rebuild it using GISMO best practices.

  2. If you wish to contribute to GISMO directly, you can join the GISMO development team. This essentially means you need a GitHub account, we add you to the GitHub project, and then you can git commit and push your modifications / additions directly.

In addition to these, please cite https://github.com/geoscience-community-codes/GISMO whenever you publish a paper using GISMO.

If you have questions about GISMO functionality, or if you find bugs, please join The GISMO Users Group and post your question / issue there. We will still get the email, but others may be able to help you first (we have day jobs too).


Contents

  • Overview
  • Which contributed archive should I add to?
  • Required elements
  • Recommended elements
  • Packages and Classes
  • Naming conventions

1. OVERVIEW

The directory structure of GISMO is:

core/                    - a set of core classes & packages
contributed/             - contributed codes that depend only on the core classes & MATLAB toolboxes
contributed_antelope/    - contributed codes that also require Antelope
contributed_uaf/         - contributed codes that are likely only of use at the University of Alaska Fairbanks

The core classes are developed & maintained by GISMO's development team. They handle the basics of loading seismic waveform, catalog and instrument response data from a variety of different formats, e.g. SAC, MiniSEED, Seisan, Antelope. Simple visualization and processing of these data is also supported. The core classes have been designed more as building blocks than application software. The emphasis has been on designing robust tools to handle the routine manipulations of seismic data in a standardized fashion. These allow users to develop more glamorous products without concerning themselves with the underlying "bookkeeping" inherent with reading (and writing) seismic waveform data, different event catalog formats, instrument responses etc.

The contributed archives come from the GISMO user community. They may be written as functions, packages or classes. The contributed archives are designed to help users share higher level developments that built on the same foundation. When a code is of possible use by a wider audience (not project-specific), we strongly encourage users to become contributors and add these codes to the contributed archives.

This document provides a description of standard protocols for contributed software in the GISMO suite. The REQUIRED category describes the minimum standards for all contributed code. The RECOMMENDED category describes styles and features that are suggested, but not required. The contributed archive is self-policing, or at least follows the honor system. But the motivation for agreeing to a common set of protocols (even if they are not the best protocols) is to create an intuitive and self-consistent codebase that can be used by a wide audience without having to understand the inner workings of the archive.

The contributed motto is "It should just work" - C. Reyes

2. WHICH CONTRIBUTED ARCHIVE SHOULD I ADD TO?

GISMO/contributed

The contributed archive contains codes that have widespread use and have as their only dependencies the core elements of the GISMO suite. Contributed codes may also depend on other contributed codes. This is encouraged for efficiency, though authors should understand that not all contributed codes have the same level of documentation, error handling or bug testing as the core products.

GISMO/contributed_antelope

The contributed_antelope archive contains codes that may have widespread use and require access to the Antelope toolbox for MATLAB in addition to GISMO tools. At startup, these tools are only added if the Antelope toolbox is already in the path. When writing these codes it is useful to make use of the admin.antelope_exists function, e.g.

if ~admin.antelope_exists
    error('This function requires the Antelope toolbox for Matlab');
    return;
end

GISMO/contributed_internal

The contributed_internal archive contains codes that build on the GISMO suite but have additional requirements or functionality that probably limit their use to the University of Alaska's Geophysical Institute.

3. REQUIRED ELEMENTS

Directory structure

Each set of tools belongs in a contributed subdirectory with a descriptive name based on the functionality of the tools. In order to be included in the MATLAB search path, all codes need to be in contributed subdirectories. Do not place files (including .txt, .doc, .pdf, etc.) directly in contributed/. This interferes with the startup routine and they may not be included in the path.

Documentation

The success of the contributed GISMO code relies almost entirely on good documentation. Fortunately this is quite easy in MATLAB. The only non-negotiable requirement for inclusion in the contributed archive is an unambiguous usage statement and a sufficiently detailed explanation of the input and output variables that others can actually use the code. This description must be written into the commented header of the M-files such that can be displayed by MATLAB's help function (see simple example here). Several strongly suggested features of documentation are listed under the RECOMMENDED ELEMENT section.

4. RECOMMENDED ELEMENTS

Directory names

Contributed subdirectories should have descriptive names that tell what the codes have in common. If they have nothing in common, consider placing them in multiple directories. Avoid vague names such as "tools", "seismic_codes" or "mikes_stuff".

Documentation

In addition to the non-negotiable minimum requirements above, several simple features will make contributed codes easier to understand (and troubleshoot) by others.

  • Authors are encouraged to go beyond the function usage and summarize how the code actually works. When implementing a specific technique, it is best practice to include a reference to the paper(s) from which the technique is drawn. Authors are also encouraged to include their name, affiliation, and the contribution or revision date. This will assist with bug tracking and will encourage people to cite your work.

  • The specific style of the usage and comments should follow the standards set by Matlab. By doing this, authors will ensure that their codes will be properly aggregated by documentation engines such as m2html and other that may not even exist yet.

  • It goes without saying that comments embedded in the code will make it far easier for others to understand the exact procedure.

Functions vs. scripts

Contributed codes should generally be written as functions or classes. There may be limited situations where an M-script might make sense, though even these can usually be converted to functions with the addition of a simple function line at the top. The resulting code however is more robust because it is more fully encapsulated. Scripts require, and produce, variables with specific names. Functions can accept arbitrary names. Functions also avoid variable name confusion because their variable declarations are internal to the function.

5. PACKAGES AND CLASSES

Packages

To prevent namespace clashes, you are encouraged to make use of MATLAB packages. These are identified by a directory name that begins with "+". For example, an m-file called set_debug.m in the directory +debug is only known to the MATLAB interpreter if you explicitly call debug.set_debug(). This differentiates it from any other function on the MATLAB path called set_debug.

Classes

Object-oriented MATLAB programming is encouraged. If you find yourself using a structure to aggregate related data, e.g. earthquake latitudes, longitudes, origin times, Ml, etc., you will find that a class is what you really want (e.g. the Catalog class). Functions that operate on the structure become methods in your class. You will see that waveform, scnlobject and filterobject are examples of classes in GISMO that use the old-style of MATLAB class definition, where each method is in a separate m-file, and the directory name containing them begins with "@". The modern way of writing classes is to use the classdef statement. A simple example of a class is given here. Follow some of the examples in GISMO for guidance.

FILE/FOLDER NAMING CONVENTIONS

Follow the MATLAB style guide, the following naming conventions are encouraged:

  • Class M-files: use CamelCase, e.g. "EventRate", "SeismicTrace", "Catalog". Class name must be same as M-filename, e.g. the Catalog class is defined in a file called Catalog.m.

  • Packages: use lowercase, e.g. "+debug", "+unittest", "+magnitude"

  • Function M-files: use lowercase, not CamelCase. Where names are long, use "_", e.g. db_get_origins.m.