diff --git a/aboutcode-data/README.rst b/aboutcode-data/README.rst
new file mode 100644
index 0000000..9beb9cd
--- /dev/null
+++ b/aboutcode-data/README.rst
@@ -0,0 +1,1202 @@
+ABCD: Using the AboutCode Data structures to describe data ... about code.
+==========================================================================
+
+Summary
+-------
+
+ABCD is an abbreviation for ABout Code Data. The AboutCode Data goal is
+to provide a simple, standardized and extensible way to document data
+about software code such that:
+
+-  It is a common way to exchange data about code between any nexB tools
+   by import and export.
+
+-  It becomes the preferred way to exchange data between nexB tools and
+   other tools.
+
+-  It could become a valuable structure to exchange data between any
+   tools concerned with data about software.
+
+ABCD is technology and programming language neutral, preferring JSON or
+YAML document formats.
+
+ABC Data is structured around a few basic objects: Products, Components,
+Packages, Files, Parties and Licenses. It is extensible to other
+specific or future object types.
+
+Objects have "attributes" that are simple name/value pairs. A value can
+be either a plain value or another object or a list of objects and
+attributes.
+
+ABC Data is minimally specified by design: only a few basic objects and
+attributes are documented with conventions to name and structure data
+and how to define relationships between objects. There is only a small
+reference dictionary for some well known attributes documented here.
+
+The planned benefit for tools using ABC Data is simplified data exchange
+and integration between multiple best-of-breed tools.
+
+Context
+-------
+
+There is currently no easy way to describe information about code and
+software in a simple and standardized way. There have been many efforts
+to provide this data in a more structured way such as:
+
+-  SPDX (focused on packages and licenses),
+-  DOAP (focused on projects),
+-  The original ABOUT metafile format, and
+-  The many different software package metadata formats (Maven, NPM,
+   RPM, Deb, etc).
+
+These data structures are fragmented and generally too purpose- or
+technology-specific.
+
+Recently there have been efforts to collect and expose more data such
+as:
+
+-  libraries.io (a catalog of packages, AGPL-licensed) 
+   and dependencyci.com its companion commercial service,
+-  versioneye.com (a catalog of package versions updates, now
+   MIT-licensed),
+-  softwarearchive.org (an effort to build an all-encompassing software
+   source code archive),
+-  sources.debian.net (a Debian-focused code and metadata serach
+   facility),
+-  searchcode.com (an add-supported source code search engine exposing
+   some metadata)
+-  appstream (a cross-distro effort to normalize desktop package
+   metadata access to Linux desktops
+   https://www.freedesktop.org/software/appstream/docs/).
+
+These efforts are all useful, but they do not address how you can
+consistently exchange data about code in a user-centric and
+technology-neutral, normalized way.
+
+Why does this matter? Software and code are everywhere. FLOSS code is
+exploding with millions of components and packages.  The data about this
+code is out there somewhere but getting it is harder than needed. This a
+problem of data normalization, aggregation and exchange.  
+
+Whether you consume or produce software, accessing and creating
+normalized data about your code and the code you use should be made
+easier such that:
+
+-  You can efficiently make code selection and re-use decisions,
+
+-  You can discover what is in your code, and
+
+-  You can continuously track updates, bugs, licenses, liveliness,
+   quality and security attributes of the code you use or consider
+   using.
+
+With the notable exceptions of SPDX and the earlier versions of the
+ABOUT format, available data formats about software have been designed
+first for a specific technology (e.g. Linux distros) or programming
+language (e.g. maven, npm, etc.) and documentation of code provenance
+and related attributes has been secondary and minimal. In most cases,
+the primary focus has been to provide first comprehensive support for
+package installation, dependency resolution or package building and
+provenance and licensing information is often treated with lesser
+details.
+
+
+ABCD: the AboutCode Data structure
+----------------------------------
+
+ABCD is an abbreviation for ABout Code Data. The goal is to provide a
+simple, standardized and extensible way to document things about
+software code.
+
+In contrast with other approaches, the AboutCode Data structure is
+focused on providing data that is useful to users first and is not
+limited to software package data only. AboutCode Data does not need not
+be as strictly specified as traditional package manager data formats
+because its purpose is not to drive a software build, package creation
+or software installation nor is it to compute the resolution of
+dependencies. It only provides information (metadata) about the code.
+
+The vision for the ABC Data structure is to provide a common way to
+exchange data about code between all nexB tools, such that these tools
+can all import and export data about code seamlessly (TraceCode,
+ScanCode, AboutCode Manager, AttributeCode, upcoming MineCode, etc.).
+The ABCD structure should also be the preferred way to exchange data
+about code between nexB tools and other tools. We may create small
+adapters to convert other data formats in and out of the ABCD structure
+and encourage other tool authors to natively support ABC Data though
+the main focus is on our tools.
+
+The ABCD structure is technology and programming language neutral and
+designed so that the parties exchanging data about code can do so
+reliably with some minimal conventions; and that the data is easily
+processed by machines and not hard to read by humans.
+
+ABC Data is structured around "objects". Objects have "attributes" that
+are simple name/value pairs. A value can be either a plain value or
+another object or a list of nested objects and attributes.
+
+ABC Data is organized around:
+
+-  a few known object types,
+
+-  simple conventions to create lists of objects and describe object
+   relationships,
+
+-  simple conventions to create attributes as name/value pairs, and
+
+-  a small dictionary or vocabulary of well-known attribute names that
+   have a common definition across all tools.
+
+ABC Data is "under-specified" by design: only a few essential objects
+and attributes are documented here with the conventions on how to
+structure the ABC Data.
+
+
+Basic objects describing data about code
+----------------------------------------
+
+At the top level we have these main object types:
+
+-  Product(s): a software product, application or system such as a Cost
+   Accounting application.
+
+-  Component(s): a software component, such as the PostgreSQL 9 database
+   system, usually a significant or major version
+
+-  Package(s): a set of files that comprise a Component as used, such as
+   a postgresql-9.4.5-linux-x86.zip archive. The version is exact and
+   specific.
+
+-  File(s): any file and directory identified by a path, such as a
+   binary package or a source code directory or file.
+
+
+And these secondary, important but less prominent object types:
+
+-  Party(ies): a person or an organization. An organization can be a
+   project, formally or informally organized, a company, a department
+   within a company, etc. A Party typically has contact information
+   (such as an email or physical address or home url). A Party may have
+   defaults that apply to much of its software (for an org that creates
+   software) such as a default Apache license for Apache Foundation
+   projects. Parties often relate to other objects through a
+   role relationship such as owner, author, maintainer, etc.
+
+-  License(s): information about the license of code. A License
+   typically has a name and text and additional categories, tags or
+   attributes.
+
+
+Each of these objects has a few identifying attributes and eventually
+many tool- or application-specific data attributes. Each tool defines
+and documents the attributes they can handle and care for. When some
+agreement is reached on the definition of new attributes or objects, the
+ABCD dictionary may be updated accordingly with new objects types such
+as for software security, quality or other interesting aspects.
+
+Objects are interrelated with other objects. Objects can relate to each
+other via a reference using identifiers pointing to other objects or via
+an embedded list of objects. The nature of the relationship between two
+objects can also be specified with additional attributes as needed.
+
+Here are typical relationships between objects:
+
+|image1|
+
+Here is an example of relationships for a simple Widget product:
+
+|image2|
+
+Tools can define any custom objects and some used more commonly may be
+promoted to be documented here over time.
+
+
+Attribute Names and Values
+--------------------------
+
+By convention, a tool receiving ABC Data should process only the data it
+knows and should ignore unknown attributes or objects. This is important
+to allow the data structure to evolve and provide some forward and
+backward compatibility. When an ABCD payload contains data elements that
+a receiver does not know about, the receiver should still be able to
+process the known objects and attributes.
+
+-  Attributes are name/value pairs.
+
+-  Attribute names are always strings, not numbers, not booleans, not any
+   other data format. In these strings, leading and trailing white spaces
+   (spaces, tabs, line returns, etc) are not significant and can be safely
+   ignored or removed.
+
+-  Attribute values are one of the standard JSON types: string, number,
+   boolean or null. In strings, leading and trailing white spaces (spaces,
+   tabs, line returns, etc) are not significant and can be safely ignored
+   or removed.
+
+-  Self-explicit names should be used rather than obscure names or
+   abbreviations: names should be self-explicit and self-evident.
+
+Except for the data organization conventions described here and the use
+of the well-known object and attribute names, nothing is mandatory in
+the ABCD format. This means that even partial, incomplete or sketchy
+data about code can be transferred in this format.
+
+The meaning of well known object names such as Product, Component,
+Package, File, Party and License is defined in this document.
+
+
+Name conventions
+----------------
+
+-  Names are strings composed ONLY of ASCII letters, numbers or
+   underscores. Names cannot start with a number. Names cannot contain
+   spaces nor other punctuation, not even a dot or period.
+
+-  Names are NOT case sensitive: upper or lowercase does not matter and
+   the standard is to use lowercase. It is a mistake to use upper or
+   mixed case but this is something a parser receiving ABC Data should
+   recover from nicely by converting the names to lowercase.
+
+-  Names are made of English words: there is no provision currently for
+   non-English names. Tools that deal with multilingual content may
+   define their own conventions to provide content in other languages.
+   ABCD may add one of these conventions in the future.
+
+-  Parser implementation can be smarter and gentler: For names, anything
+   that is not ASCII or number or underscore can be accepted by a parser
+   and could be replaced by an underscore, including a starting digit if
+   any. Or a parser may provide a warning if there is an unknown name
+   that is very close to a well known name. Or a parser may accept
+   CamelCase and transform names to underscore_case and perform another
+   transformation to conventional ABC Data.
+
+-  Names are singular or plural: When a name refers to more than one
+   item, the name of the field is plural and the value is a list of
+   values. For instance "url" and "urls".
+
+-  Top level known objects are ALWAYS plural and stored in lists:
+   "parties" or "files" or "products" or "components". This makes it
+   easier to write tools because the top level types are always lists,
+   even when there is a single object in that list.
+
+-  A value must not be used as a name: in an attribute name/value pair,
+   the name is always a name, not a value and every value must have a
+   name.
+
+-  For instance, this JSON snippet would not be correct where a URL is
+   used as a name::
+
+       {"http://someurl.com": "this is the home URL"}
+
+-  Use rather this form to specify a name for the URL attribute::
+    
+       {"url": "http://someurl.com", "note": "this is the home URL"}
+
+-  But this would be correct when using a list of plain values where
+   "urls" is plural::
+    
+       {"urls": ["http://someurl.com", "http://someurl2.com"]}
+
+-  An attribute names without a value is not needed. Only names with
+   values are needed, and attributes without values can be omitted: each
+   tool may do what it wants for these cases. For instance it may handy
+   to provide all attributes even if not defined in an API payload. But
+   when serializing data as YAML meant for human editing, including all
+   empty values may not help with reading and processing the YAML text.
+   An undefined attribute without a set value should be assigned with
+   the null JSON value: this has the same meaning as if the attribute
+   was not specified and absent from the payload. If you want to specify
+   that an attribute has an empty value and does not have a value (as
+   opposed to have an unknown value) use an empty string instead.
+
+-  Avoid abbreviated names, with some exceptions: names should be always
+   fully spelled out except for:
+
+    -  url: uniform resource locator
+    -  uri: uniform resource identifier
+    -  urn: uniform resource name
+    -  vcs: version control system
+    -  uuid: universally unique identifier, used for uuid4 string
+       `https://tools.ietf.org/html/rfc4122.html <https://tools.ietf.org/html/rfc4122.html&sa=D&ust=1487355496775000&usg=AFQjCNFPvpqA_MFbGaOmykUF8a5GGUKRSw>`__ 
+    -  id: identifier
+    -  info: information
+    -  os: operating system
+    -  arch: architecture
+
+-  For some common names we use the common compound form such as:
+
+    -  codebase: and not code_base
+    -  filename: and not file_name
+    -  homepage: and not home_page
+
+Well known attribute names include:
+
+-  name: the name of a product, component, license or package.
+-  version: the version of a product, component, package.
+-  description: description text.
+-  type: some type information about an object. For instance, a File
+   type could be: directory, file or link.
+-  keywords: a list of keywords about an object. For example, the
+   keywords of a component used to "tag" a component.
+-  path: the value is the path to a file or directory, either absolute
+   or relative and using the POSIX convention (a forward slash as
+   separator). For Windows paths, replace backslash with forward
+   slashes. Directories should end with a slash in a canonical form.
+-  key: the value is some key string, slug-like, case-insensitive and
+   composed only of ASCII letters and digits, dash, dot and underscore.
+   No white spaces. For example: org.apache.maven-parent
+-  role: the value describes the role of a Party in a relationship with
+   other objects. For instance a Party may be the
+   "owner" or "author" of a Component or Package.
+-  uuid: a uuid4 string
+   `https://tools.ietf.org/html/rfc4122.html <https://tools.ietf.org/html/rfc4122.html>`__ 
+-  algorithms for checksums: to store checksums we use a name/value
+   pairs where the name is an algorithm such as sha1 and the value is a
+   checksum in hexadecimal such as "sha1": "asasa231212" . The value is
+   the standard/default string created by command line tools such as
+   sha1sum. Supported algorithm may evolve over time. Common checksums
+   include md5, sha1, sha256, sha512.
+-  notes: some text notes. This is an exception to the singular/plural
+   rule for names: notes is a single text field and not a list.
+
+As the usage of the ABCD structure matures, more well known names will
+be documented in a vocabulary.
+
+
+Value conventions
+-----------------
+
+-  Attribute values are one of the standard JSON types: string, number,
+   boolean or null. In strings, leading and trailing white spaces
+   (spaces, tabs, line returns, etc) are not significant and can be
+   safely ignored or removed.
+
+-  To represent a date/time use the ISO format such as 2016-08-15
+   defaulting to UTC time zone if the time zone is not specified in the
+   date/time stamp.
+
+-  All string values are UTF-8 encoded.
+
+
+Well known name prefixes or suffixes can be used to provide a type hint
+for the value type or meaning:
+
+-  xxx_count, xxx_number, xxx_level: the value is an integer number.
+   Example: results_count or curation_level
+
+-  date_xxx or xxx_date: the value is a date/time stamp in ISO format
+   such as 2016-08-16 (See https://www.ietf.org/rfc/rfc3339.txt ).
+   Examples: last_modified_date, date_created
+
+-  xxx_url: the value is a URL for web http(s) or ftp url that points
+   to an existing valid web resource (that could possibly no longer
+   exist on the web). Example: homepage_url or api_url
+
+-  xxx_uri: the value is a URI typically used as an identifier that may
+   not point to an existing web resource. Example:
+   git://github.com/nexb/scancode-toolkit
+
+-  xxx_file or xxx_path: the value is a file path. This can come handy
+   for external files such as a license file. Example: notice_file
+
+-  xxx_filename: the value is a file name. Example: notice_filename
+
+-  xxx_text: the value is a long text. This is only a hint that it may
+   be large and may span multiple lines. Example: notice_text
+
+-  xxx_line: such as start_line and end_line: the value is a line
+   number. The first line number is 1.
+
+-  xxx_status: such as configuration_status. Indicates that the value
+   is about some status.
+
+-  xxx_name: such as short_name. Indicates that the value is a name.
+   Commonly used for long_name, short_name. The bare name shout be
+   preferred for the obvious and most common way an object is named.
+
+-  xxx_flag, is_xxx, has_xxx: such as is_license_notice. Indicates
+   that the string value is a boolean.
+
+
+Object identifiers
+------------------
+
+We like objects to be identifiable: there is a natural way to identify
+and name most objects: for instance the full name of a person or
+organization or the name and version of a Component or Package or the
+path to a File are all natural identifiers to an object.
+
+However, natural names are not always enough to fully identify an object
+and may need extra context to reference an object unambiguously. There
+could be several persons or organizations with the same name at a
+different address.. Or the foo-1.4 Package could be available as a
+public RubyGem and also as an NPM; or a private Python package foo-1.4
+has been created by a company and is also available on Pypi. Or the
+"foo" Package is the name of a Linux Package, an NPM and a Ruby Package
+but these three packages are for unrelated components.
+
+Hence each object may need several attributes to be fully identified.
+
+For example, public package managers ensure that a name is unique within
+the confines of a source. "logging" is the unique name of a single
+Sourceforge project at 
+`https://sourceforge.net/projects/logging/ <https://sourceforge.net/projects/logging/>`__ .
+"logging" is the unique name of an Apache project at the Apache
+Foundation `http://logging.apache.org/ <http://logging.apache.org/>`__ .
+
+Yet, these two names point to completely different software. In most
+cases, providing information about the "source" where an identifier is
+guaranteed to be unique is enough to ensure proper identification. This
+"source" is easily identified by its internet source name, and an
+internet source name is guaranteed to be unique globally. The "source"
+of identifiers is not mandatory but it is strongly encouraged to use as
+an attribute to provide good unique identifiers: still, tools exchanging
+ABC Data must be able to exchange under-specified and partially
+identified data and may sometimes rely on comparing many attributes of
+two objects to decide if they are the same.
+
+The minimal way to identify top level objects is the combination of a
+"source" and a unique identifier within this source. The source can be
+implicit when two parties are exchanging data privately or explicit
+using the "source" attribute.
+
+Within a source, we use the most obvious and natural identifies for an
+object. For example:
+
+-  For Products, Components and Packages we can use their name and
+   version.
+
+-  For Files we use a path of a file or directory, possibly relative to
+   a package or a product codebase; or a checksum of a file or archive
+   such as a sha1.
+
+-  For Parties, we use a name possibly supplemented with a URL or email.
+
+-  For all object types we can use a "universally unique id" or UUID-4
+   (https://tools.ietf.org/html/rfc4122.html)
+
+-  For all object types, we can use a key, which is a slug-like string
+   identifier such as a license key.
+
+-  For all object types, we can use a URN
+   (https://en.wikipedia.org/wiki/Uniform_resource_name) Tools may
+   also define their own URNs namespaces and names such as a DejaCode
+   urn as is urn:dje:component:16fusb:1.0
+
+
+
+Beyond direct identification, an object may have several alternative
+identifiers aka "external references". For instance a Package may have
+different names and slightly different versions in the Linux Debian or
+Fedora distros and a Pypi Package with yet another name where all these
+Packages are for the same Component and the same code. Or a Party such
+as the Eclipse Foundation may be named differently in DejaCode and the
+NVD CPEs.
+
+To support these cases, the "external_reference(s)" attribute can be
+used where needed in any object to reference one or more external
+identifiers and what is the source for this identifier (note: "external"
+is really a matter of point of view of who owns or produces the ABC
+Data.)  An attribute with name suffix of "xxx_reference" may also be
+used to provide a simpler external reference such as "approval_reference".
+
+
+For example this ABC Data could describe the external id of Party to a
+CPE and to TechnoPedia (here in a YAML format)::
+
+    parties:
+      - name: Apache Foundation
+        homepage_url: http://apache.org
+        type: organization
+        external_references:
+            - source: nvd.nist.gov
+              identifier: apache
+            - source: technopedia.com
+              identifier: Apache Foundation (The)
+            - source: googlecode.com
+              identifier: apache-foundation
+
+Other identifiers may be used as needed by some tools, such as
+in hyperlinked APIs.
+
+
+Organizing data and relationships
+---------------------------------
+
+Describing relationships between objects is essential in AboutCode Data.
+There are two ways to describe these relationship: by referencing or by
+embedding objects.
+
+When using a reference, you relate objects by providing identifiers to
+these objects and may provide additional object details in separate
+lists. When embedding, you include not only the reference but also the
+related object details in another object data. This could include all
+data about an object or a subset as needed.
+
+For example this components list embeds a list of two packages. Note
+that components is always a list even when it has a single component::
+
+    {"components": [{
+        "source": "http://apache.org",
+        "name": "Apache httpd",
+        "version": "2.3",
+        "packages": [
+            {"name": "httpd", 
+             "version": "2.3.4", 
+             "download_url": "http://apache.org/dist/httpd/httpd-2.3.4.zip",
+             "sha1": "acbf23256361abcdf", 
+             "size": 3267, 
+             "filename": "httpd-2.3.4.zip"
+            },        
+    
+            {"name": "httpd", 
+             "version": "2.3.5",
+             "download_url": "http://apache.org/dist/httpd/httpd-2.3.5.tar.gz",
+             "sha1": "ac8823256361adfcdf", 
+             "size": 33267, 
+             "filename": "httpd-2.3.5.tar.gz"
+            }
+        ]
+    }]}
+
+
+In this example, the component list references two packages that are
+listed separately and uses the checksum as package identifiers for the
+reference. This data is strictly equivalent to the previous example but
+using a different layout. When all the data is provided, the effect of
+embedding or referencing objects results in the same data, just
+organized differently::
+
+    {"components": [{
+        "source": "http://apache.org",
+        "name": "Apache httpd",
+        "version": "2.3",
+        "packages": [
+            {"sha1": "aacbf23256361abcdf"},
+            {"sha1": "ac8823256361adfcdf"}
+        ]
+    }],
+    
+    "packages": [
+        {"name": "httpd", "version": "2.3.4",  
+         "download_url":
+         "http://apache.org/dist/httpd/httpd-2.3.4.zip",
+         "sha1": "acbf23256361abcdf", "size": 23267, "filename": "httpd-2.3.4.zip"},
+    
+        {"name": "httpd", "version": "2.3.5", 
+         "download_url": "http://apache.org/dist/httpd/httpd-2.3.5.tar.gz",
+         "sha1": "ac8823256361adfcdf", "size": 33267, "filename": "httpd-2.3.5.tar.gz"}
+    ]}
+
+In this third example the packages are referencing one component
+instead. That component is always wrapped in a components list. The
+component detail data is not provided. The details may be available
+elsewhere in a tool that tracks components::
+
+    "packages": [
+        {"name": "httpd", "version": "2.3.4", 
+         "download_url": "http://apache.org/dist/httpd/httpd-2.3.4.zip",
+         "sha1": "acbf23256361abcdf", "size": 23267, "filename": "httpd-2.3.4.zip",
+         "components": [
+            {"source": "http://apache.org", "name": "Apache httpd", "version": "2.3"}
+         ]
+        },
+    
+        {"name": "httpd", "version": "2.3.5", 
+         "download_url":"http://apache.org/dist/httpd/httpd-2.3.5.tar.gz",
+         "sha1": "ac8823256361adfcdf", "size": 33267, "filename": "httpd-2.3.5.tar.gz",
+         "components": [
+            {"source": "http://apache.org", "name": "Apache httpd", "version": "2.3"}
+         ]
+        }
+    ]
+
+
+Relationships can be documented with this approach in different ways.
+Typically when the primary concern is about a Product, then the Product
+object may embed data about its Components. When the primary concern is
+Packages, they may embed or reference Products or Components or files.
+For example:
+
+-  A tool may prefer to provide data with products or components as top level
+   objects. The components used in a Product are naturally embedded in the products.
+
+-  A tool may be concerned more with files will provide files as top
+   level objects and may embed packages details when they are found for
+   a file or directory path.
+
+-  Another tool may focus on packages and provide packages first with
+   components references and possibly embedded files. A matching tool
+   may provide packages first and reference matched files. The file
+   paths of a package are naturally embedded in the package, though
+   using references may help keep the data simpler when there is a large
+   volume of files
+
+-  A tool that generates attribution documentation may be interested
+   first by components and second by licenses or packages references.
+
+-  A tool dealing with security vulnerabilities may define a
+   Vulnerability object and reference Packages and Files that are
+   affected by a Vulnerability.
+
+To better understand the embedding or referencing relationships:
+
+-  using references is similar to a tabular data layout, akin to a
+   relational database table structure
+
+-  using embedding is similar to a tree data layout such as in a
+   file/directory tree or nested data such as XML.
+
+Another way to think about these relationships is a "GROUP BY" statement
+in SQL. The data can be grouped-by Component, then Packages or
+grouped-by Files then Components.
+
+Both referencing and embedding layouts can be combined freely and are
+not mutually exclusive. When using both at the same time, some care is
+needed to avoid creating documents with conflicting or duplicated data
+that is referenced and embedded at the same time.
+
+Using references is often useful when there is an agreement on how to
+reference objects between two tools or parties. For instance, when using
+nexB tools, a unique and well defined license key is used to reference a
+license rather than embedding the full license details. A concise
+reference to the name and version of a public package from a well known
+package repository such as RPM or Maven can be used to the same effect.
+Or an SPDX license identifier can be used to reference an SPDX-listed
+license without having to embed its full license text.
+
+The nature of the relationship between two objects can be specified when
+it is not obvious and requires some extra specification.  Each tool can
+define additional attributes to document these. For instance a common
+relationship between a party and a product or component is a role such
+as owner. For packages a role can be maintainer, author, etc.  Or the
+license of a file or package may be the "asserted" license by the
+project authors. It may differ from the "detected" license from a scan
+or code inspection and may further differ from a "concluded" license or
+a "selected" license when there is a license choice. At the package and
+license level the types of relationships documented in the SPDX
+specification are a good source for more details. For example this
+component references two parties where one is the author and the other
+is the maintainer documented using a role attribute::
+
+    "components": [{
+        "source": "http://apache.org",
+        "name": "Apache httpd",
+        "version": "2.3",
+        "parties": [
+            {"name": "John Doe", "type": "person", "role": "author"},
+            {"name": "Jane Smith", "type": "person", "role": "maintainer"},
+            {"name": "Jane Smith", "type": "person", "role": "owner"},
+        ]
+    }]
+
+
+Document format conventions
+---------------------------
+
+The default ABC Data format is JSON (though it can be serialized to
+anything else that would preserve its structure). YAML is also supported
+and preferred for storage of simple documents that document one or a few
+top level objects and that need to be edited by a human.
+
+The data structure by nested name/value pairs attributes and lists of
+values maps naturally to the corresponding JSON and YAML constructs. In
+JSON-speak these are arrays (lists) and objects (name/value pairs).
+
+ABC Data can be provided as simple files or embedded in some API
+payload. As files, their content can be either JSON or YAML and should
+have either a .json or .yml extension by convention. For backwards
+compatibility with previous AboutCode conventions, the .ABOUT extension
+can be used for YAML documents. For instance this is used in the legacy
+about_code_tool and its successors. The DocumentCode tool can store
+individual attribution data in a .ABOUT yml file.
+
+The top level structure of an ABC Data block is always a JSON object or
+YAML dictionary. Depending on the context this top level structure may
+be wrapped in another data structure (for instance when exchanging
+AboutCode Data in some web api, the API may provide ABC Data as a
+payload in a "results" or "body" or "data" block and also have some
+"headers" or "meta" block).
+
+The top level elements must contain at least one of the object names and
+a list of objects such as here with a list of files::
+
+    files:
+        - path: this/foo/bar
+          size: 123
+          sha1: aaf35463472abcd
+        - path: that/baz
+
+Optionally an "aboutcode_version" attribute can be added at the top
+level to document which version of the AboutCode Data structure is used
+for a document. For example: aboutcode_version: 4.0
+
+Order of attributes matters to help reading documents: tools that write
+ABC Data should attempt to  use a consistent order for objects and
+attribute names rather than a random ordering. However, some tools may
+not be able to set a specific order so thi is only a recommendation. The
+preferred order is to start with identifiers and keys and from the most
+important to the least important attributes, followed by attributes
+grouped logically together,  followed by related objects.
+
+
+References between documents and payload, embedding other files
+---------------------------------------------------------------
+
+ABC Data may reference other data. For instance in a hyperlinked REST
+API a list of URLs to further invoke the API and get licenses details
+may be provided with an api_url attribute to identify which API calls
+to invoke. The ways to reference data and the semantics and mechanics of
+each of these embeddings or references needed to get the actual data are
+not specified here. Each tool may offer its own mechanism. A convention
+for an hyperlinked REST API JSON payload could be to use
+api_url(s) identifier to specify additional "GET"able endpoints. The
+AttributeCode tool use \*_file attributes in YAML or JSON documents
+to reference external license and notices text files to load with the
+text content.
+
+Another convention is used in ScanCode to reference license texts and
+license detection rules by key:
+An ABC Data YAML file contains the ABC Data. And side by side there is a
+file with the same base name and a LICENSE, SPDX or NOTICE, RULE,
+extension that contains the actual text corresponding to the license,
+the SPDX text or the rule text. The convention here is to use an
+implicit reference between files because they have the same base name
+and different extensions.
+
+In the future, we may specify how to embed an external ABC Data file in
+another ABC Data file; this would only apply to file-based ABC Data
+payload though and could not apply to hyperlinked REST APIs.
+
+
+Document-as-files naming, exchange and storage
+----------------------------------------------
+
+Each tool handling ABC Data may name an ABC Data file in any manner and
+store the data in any way that is appropriate. The structure is a set of
+data exchange conventions and may be used for storage but nothing is
+specified on how to do this.
+
+For consistency, tools consuming AboutCode Data are encouraged to use
+the same data structure internally and in their user interface to
+organize and name the data, but this is only a recommendation.
+
+For instance, the AtttributeCode tool use a convention to store ABC Data
+as YAML in a file with an .ABOUT extension and use the ABC Data structures
+internally and externally.
+
+When exchanging data (for instance over an API) the API provider of ABC
+Data should support a request to return either embedded data or data by
+reference and ideally allow the caller to specify which objects and
+attributes it is interested in (possibly in the future using something
+like GraphQL).
+
+When interacting with tools through an  API, the conversation could
+start by sending an ABC Data payload with some extra request data and
+receiving an ABC Data payload in return. For instance, when requesting
+matching packages from a matching tool, you could start by passing scan
+data with checksums for several files at once and receive detailed data
+for each of the matched files or packages.
+
+
+Documenting and validating attributes
+-------------------------------------
+
+Each tool handling ABC Data may only be interested in processing certain
+objects and attributes when accepting data in, or when providing data
+out. Attributes that are unknown should be ignored. To document which
+objects and which attributes a tool can handle, a tool should provide
+some documentation. The documentation format is not specified here, but
+it could use a JSON schema in the future. This should include
+documentation regarding if and how data is validated, and when and how
+errors or warnings are triggered and provided when there is a validation
+error. For example, a validation could be to check that an SPDX license
+id exists at SPDX or that a URL is valid.
+
+
+Notes on YAML format
+--------------------
+
+YAML is the preferred file format for ABC Data destined for reading or
+writing primarily by humans.
+
+-  Block-style is better.
+
+-  When you write AboutCode Data as YAML, you should privilege block-style
+   and avoid flow-style YAML which is less readable for humans.
+
+-  Avoid Multi-document YAML.
+
+-  Multi-document YAML documents should be avoided (when using the ---
+   separators).
+
+-  Beware of parser shenanigans:  Most YAML parsers recognize and convert 
+   automatically certain data types such as numbers, booleans or dates. 
+   You should be aware of this because the ABC Data strings may contain
+   date stamps. You may want to configure a YAML parser to deactivate some
+   of these automated format conversions to avoid unwanted conversions.
+
+
+Notes on JSON Format
+--------------------
+
+JSON is the preferred file format for ABC Data destined for reading and
+writing primarily by machines.
+
+- "Streamable"  JSON with JSON-lines.
+
+A large JSON document may benefit from being readable line-by-line
+rather than loaded all at once in memory. For this purpose, the
+convention is to use JSON lines where each line in the document is a
+valid JSON document itself: this enables reading the document in
+line-by-line increments. The preferred way to do so is to provide one
+ABCD top level object per document where the first line contains meta
+information about the stream such as a notice, a tool version or  the
+aboutcode version.
+
+- Avoid escaped slash.
+
+The JSON specification says you CAN escape forward slash, but this is
+optional. It is best to avoid escaping slash when not needed for better
+readability.
+
+For instance for URLs this form::
+
+    "https://enterprise.dejacode.com/component_catalog/nexB/16fusb/1.0/"
+
+should be preferred over this escaped form when backslashes are not
+needed::
+
+    "https:\\/\\/enterprise.dejacode.com\\/component_catalog\\/nexB\\/16fusb\\/1.0\\/"
+
+
+Notes on embedding ABC Data in source code files.
+-------------------------------------------------
+
+It could be useful to include ABC Data directly in a source code file,
+such as to provide structured license and provenance data for a single
+file. This requires of course a file modification. While this is not a
+preferred use case, it can be handy to document your own code one file
+at a time. Using an external ABC Data file should be preferred but here
+are conventions for this use case:
+
+-  The ABC Data should be embedded in a top level block of comments.
+-  Inside that block of comments the preferred format is YAML.
+-  How a tool collects that ABC Data when embedded in code is to be
+   determined.
+-  Tools offering such support should document and eventually enforce
+   their own conventions.
+
+
+Notes on spreadsheet and CSV files
+----------------------------------
+
+ABC Data does not support or endorse using CSV or spreadsheets for data
+exchange.
+
+CSV and other spreadsheet file formats are NOT recommended to store ABC
+Data. In most cases you cannot store a correct data set in a spreadsheet.
+However, these tools are also widely used and convenient.
+Here are some recommendations when you need  to communicate ABC data in
+a CSV or spreadsheet format: even though ABC Data is naturally nested
+and tree-like, it should be possible to serialize certain ABCD objects
+as flat, tabular data.
+
+-  Naming columns
+
+The table column names may need to be adjusted to correctly reference
+the multiple level of object and attribute nesting using a dot as a
+separator. The dot or period is otherwise not allowed in attribute
+names. For example, you could use files.path for files or
+components.name to reference a component name. Some tools may prefer to
+create tabular files with their own column names and layout, and provide
+mappings to ABC Data attribute and object names.
+
+-  Example for an inventory:
+
+Since ABC Data can be related by reference, the preferred (and
+cumbersome) way to store ABC Data in a spreadsheet is to use one tab for
+each object type and use identifying attributes to relate objects
+between each others across tabs. For instance, in a Bill of Materials
+(BOM) spreadsheet for a Product, you could use a tab to describe the
+Product attributes and another tab to describe the Components used in
+this Product and possibly additional tabs to describe the related
+packages and files corresponding to these
+
+-  Care is needed for Packages, Components and other names and for dates,
+   versions, unicode and UTF-8 to avoid damaging content (aka. mojibake)
+
+Spreadsheet tools such as Excel or LibreOffice  automatically recognize
+and convert data to their own format: a date of 20016-08-17 may be
+converted to a date number when a CSV is loaded and difficult to recover
+as a correct original date stamp string afterwards. Or a version 1.0 may
+be irreversibly converted to 1 or 1.90 to 1.9 losing important version
+information.
+
+Spreadsheet tools may not recognize and handle properly UTF-8 texts and
+damage descriptions and texts. These tools may also treat strings
+starting with the equal sign as a formula. When incorrectly recognizing
+special accentuated characters this may damage texts creating what is
+called "mojibake" (See https://en.wikipedia.org/wiki/Mojibake)
+
+Always use these tools with caution and be prepared for damage to your
+data if you use these tools to save or create ABC Data.
+
+
+Impact on AttributeCode
+~~~~~~~~~~~~~~~~~~~~~~~
+
+As an integration tool, AttributeCode itself may specify only a very few elements.
+
+The new structure will need to be implemented. Here could be an example
+in YAML::
+
+    aboutcode_version: 4.0
+    components:
+     -  source: dejacode.com
+        name: bitarray
+        version: 0.8.1
+        homepage_url: https://github.com/ilanschnell/bitarray
+        copyright: Copyright (c) Ilan Schnell and others
+        files:
+            - path: some/directory/
+               type: dir
+            - path: bitarray-0.8.1-cp27-cp27m-macosx_10_9_intel.whl
+            - path: someotherdir/bitarray-0.8.1-cp27-cp27m-manylinux1_i686.whl
+            - path: bitarray-0.8.1-cp27-cp27m-manylinux1_x86_64.whl
+            - path: bitarray-0.8.1-cp27-cp27m-win_amd64.whl
+            - path: bitarray-0.8.1-cp27-cp27m-win32.whl
+            - path: bitarray-0.8.1-cp27-cp27mu-manylinux1_i686.whl
+            - path: bitarray-0.8.1-cp27-cp27mu-manylinux1_x86_64.whl
+            - path: bitarray-0.8.1-cp27-none-macosx_10_6_intel.whl
+            - path: bitarray-0.8.1.tar.gz
+    
+        parties:
+          - role: owner
+            name: Ilan Schnell
+    
+        packages:
+          - download_url: http://pypi.python.org/packages/source/b/bitarray/bitarray-0.8.1.tar.gz
+            sha1: 468456384529abcdef342
+    
+        license_expression: psf
+    
+        licenses:
+          - source: scancode.com
+            key: psf
+            text_file: PSF.LICENSE
+
+
+And here would be similar data in JSON::
+
+    {"components": [{
+                   "name": "bitarray",
+                   "version": "0.8.1"
+                   "homepage_url": "https://github.com/ilanschnell/bitarray",
+                   "copyright": "Copyright (c) Ilan Schnell and others",
+                   "license_expression": "psf",
+                   "licenses": [{"key": "psf", "text_file": "PSF.LICENSE", "source": "scancode.com"}],
+                   "packages": [{"download_url": "http://pypi.python.org/packages/source/b/bitarray/bitarray-0.8.1.tar.gz"
+                                "sha1": "468456384529abcdef342"
+                    }],
+                   "parties": [{"name": "Ilan Schnell", "role": "owner"}],
+    
+                   "files": [{"path": "some/directory/", "type": "dir"},
+                             {"path": "bitarray-0.8.1-cp27-cp27m-macosx_10_9_intel.whl"},
+                             {"path": "bitarray-0.8.1-cp27-cp27m-manylinux1_i686.whl"},
+                             {"path": "bitarray-0.8.1-cp27-cp27m-manylinux1_x86_64.whl"},
+                             {"path": "bitarray-0.8.1-cp27-cp27m-win_amd64.whl"},
+                             {"path": "bitarray-0.8.1-cp27-cp27m-win32.whl"},
+                             {"path": "bitarray-0.8.1-cp27-cp27mu-manylinux1_i686.whl"},
+                             {"path": "bitarray-0.8.1-cp27-cp27mu-manylinux1_x86_64.whl"},
+                             {"path": "bitarray-0.8.1-cp27-none-macosx_10_6_intel.whl"},
+                             {"path": "bitarray-0.8.1.tar.gz"}],
+                   }],
+    
+     aboutcode_version: "4.0"}
+
+
+Impact on ScanCode Toolkit
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The new format will need to be implemented for scan results in general
+and for packages in particular.
+
+ScanCode will specify Package and several attributes related to scanning
+and referencing clues for files, directories and packages.
+
+Alternatively Packages could be extracted to an independent PackagedCode library.
+
+The changes will minimal impact onthe layout of the scan results. Here is an
+example of a scan payload in ABCD format: this is essentially the standard scan
+format::
+
+    {
+      "scancode_notice": "Generated with ScanCode and provided .......",
+      "scancode_version": "2.0.0.dev0",
+      "files_count": 7,
+      "files": [
+        {
+          "path": "samples/JGroups/src/",
+          "type": "directory",
+          "files_count": 29
+          "licenses" : [
+              { "key":"apache-2.0",
+                "concluded": true}
+          ]
+        }
+        {
+          "path": "samples/JGroups/src/GuardedBy.java",
+          "date": "2015-12-10",
+          "programming_language": "Java",
+          "sha1": "981d67087e65e9a44957c026d4b10817cf77d966",
+          "name": "GuardedBy.java",
+          "extension": ".java",
+          "file_type": "ASCII text",
+          "is_text": true,
+          "is_source": true,
+          "md5": "c5064400f759d3e81771005051d17dc1",
+          "type": "file",
+          "is_archive": null,
+          "mime_type": "text/plain",
+          "size": 813,
+          "copyrights": [
+            {
+              "end_line": 12,
+              "start_line": 9,
+              "holder": "Brian Goetz and Tim Peierls",
+              "statement": "Copyright (c) 2005 Brian Goetz and Tim Peierls"
+            }
+          ],
+          "licenses": [
+            { "detected": true,
+              "key": "cc-by-2.5",
+              "short_name": "CC-BY-2.5",
+              "homepage_url": "http://creativecommons.org/licenses/by/2.5/",
+              "dejacode_url": "https://enterprise.dejacode.com/license_library/Demo/cc-by-2.5/",
+              "text_url": "http://creativecommons.org/licenses/by/2.5/legalcode",
+              "owner": {
+                "name": "Creative Commons",
+              },
+              "detection_score": 100.0,
+              "start_line": 11,
+              "end_line": 11,
+              "category": "Attribution",
+              "external_reference": {
+                "source": "spdx.org",
+                "key": "CC-BY-2.5"
+                "url": "http://spdx.org/licenses/CC-BY-2.5",
+              },
+            }
+          ],
+        },
+        {
+          "path": "samples/JGroups/src/ImmutableReference.java",
+          "date": "2015-12-10",
+          "md5": "48ca3c72fb9a65c771a321222f118b88",
+          "type": "file",
+          "mime_type": "text/plain",
+          "size": "1838",
+          "programming_language": "Java",
+          "sha1": "30f56b876d5576d9869e2c5c509b08db57110592",
+          "name": "ImmutableReference.java",
+          "extension": ".java",
+          "file_type": "ASCII text",
+          "is_text": true,
+          "license_expression": "lgpl-2.1-plus and lgpl-2.0-plus",
+          "is_source": true,
+          "copyrights": [{
+            "end_line": 5,
+            "start_line": 2,
+            "holder": "Red Hat, Inc.",
+            "statement": "Copyright 2010, Red Hat, Inc."
+          }],
+          "licenses": [
+            { "detected": true,
+              "key": "lgpl-2.1-plus",  
+              "category": "Copyleft Limited",
+              "homepage_url": "http://www.gnu.org/licenses/old-licenses/lgpl-2.1-standalone.html",
+              "start_line": 7,
+              "end_line": 10,
+              "short_name": "LGPL 2.1 or later",
+              "owner": "Free Software Foundation (FSF)",
+              "dejacode_url": "https://enterprise.dejacode.com/license_library/Demo/lgpl-2.1-plus/",
+              "detection_score": 100.0,
+              "external_reference": {
+                "url": "http://spdx.org/licenses/LGPL-2.1+",
+                "source": "spdx.org",
+                "key": "LGPL-2.1+"
+              }
+            },
+            { "concluded": true,
+              "key": "lgpl-2.0-plus",
+              "short_name": "LGPL 2.0 or later",
+              "homepage_url": "http://www.gnu.org/licenses/old-licenses/lgpl-2.0.html",
+              "end_line": 20,
+              "dejacode_url": "https://enterprise.dejacode.com/license_library/Demo/lgpl-2.0-plus/",
+              "text_url": "http://www.gnu.org/licenses/old-licenses/lgpl-2.0-standalone.html",
+              "owner": "Free Software Foundation (FSF)",
+              "start_line": 12,
+              "detection_score": 47.46,
+              "category": "Copyleft Limited",
+              "external_reference": {
+                "url": "http://spdx.org/licenses/LGPL-2.0+",
+                "source": "spdx.org",
+                "key": "LGPL-2.0+"
+              }
+            }
+          ],
+        },
+        {
+          "path": "samples/arch/zlib.tar.gz",
+          "file_type": "gzip compressed data, last modified: Wed Jul 15 11:08:19 2015, from Unix",
+          "date": "2015-12-10",
+          "is_binary": true,
+          "md5": "20b2370751abfc08bb3556c1d8114b5a",
+          "sha1": "576f0ccfe534d7f5ff5d6400078d3c6586de3abd",
+          "name": "zlib.tar.gz",
+          "extension": ".gz",
+          "size": 28103,
+          "type": "file",
+          "is_archive": true,
+          "mime_type": "application/x-gzip",
+          "packages": [
+            {
+              "type": "plain tarball"
+            }
+          ],
+        }
+      ]
+    }
+
+
+AboutCode Manager
+~~~~~~~~~~~~~~~~~
+
+As a primary GUI for data review and integration, AboutCode Manager
+will need to be fluent in ABC Data to read/write ABC Data locally and
+remotely through API from several sources. 
+
+The short term changes would include:
+
+-  Support reading ABC Data from ScanCode
+-  Writing ABC Data, adding conclusions as related objects in the proper
+   lists
+
+
+New and Future tools
+~~~~~~~~~~~~~~~~~~~~
+
+-  TraceCode: would likely specify low level attributes for files (such
+   as debug symbols, etc) and how files are related from devel to deploy
+   and back.
+-  VulnerableCode: would likely specify a new Vulnerability object and
+   the related attributes and may track several identifiers to the NIST
+   NVD CPE and CVE.
+-  DeltaCode: would likely specify attributes to describe the changes
+   between codebases, files, packages.
+
+Copyright (c) 2016 nexB Inc.
+
+.. |image1| image:: image00.png
+.. |image2| image:: image02.png
diff --git a/aboutcode-data/image00.png b/aboutcode-data/image00.png
new file mode 100644
index 0000000..6982e41
Binary files /dev/null and b/aboutcode-data/image00.png differ
diff --git a/aboutcode-data/image01.png b/aboutcode-data/image01.png
new file mode 100644
index 0000000..5ada6ef
Binary files /dev/null and b/aboutcode-data/image01.png differ
diff --git a/aboutcode-data/image02.png b/aboutcode-data/image02.png
new file mode 100644
index 0000000..8f7f906
Binary files /dev/null and b/aboutcode-data/image02.png differ