Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5288 Introducing proper deps management and docs about it. #5289

Merged
merged 10 commits into from
Nov 30, 2018
3 changes: 2 additions & 1 deletion doc/sphinx-guides/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,8 @@
'sphinx.ext.autodoc',
'sphinx.ext.intersphinx',
'sphinx.ext.ifconfig',
'sphinx.ext.viewcode'
'sphinx.ext.viewcode',
'sphinx.ext.graphviz'
]

# Add any paths that contain templates here, relative to this directory.
Expand Down
274 changes: 274 additions & 0 deletions doc/sphinx-guides/source/developers/dependencies.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,274 @@
=====================
Dependency Management
=====================

.. contents:: |toctitle|
:local:

Dataverse is (currently) a Java EE 7 based application, that uses a lot of additional libraries for special purposes.
This includes features like support for SWORD-API, S3 storage and many others.

Besides the code that glues together the single pieces, any developer needs to describe used dependencies for the
Maven-based build system. As is familiar to any Maven user, this happens inside the "Project Object Model" (POM) living in
``pom.xml`` at the root of the project repository. Recursive and convergent dependency resolution makes dependency
management with Maven very easy. But sometimes, in projects with many complex dependencies like Dataverse, you have
to help Maven make the right choices.

Terms
-----

As a developer, you should familiarize yourself with the following terms:

- **Direct dependencies**: things *you use* yourself in your own code for Dataverse.
- **Transitive dependencies**: things *others use* for things you use, pulled in recursively.
See also: `Maven docs <https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#Transitive_Dependencies>`_.

.. graphviz::

digraph {
rankdir="LR";
node [fontsize=10]

yc [label="Your Code"]
da [label="Direct Dependency A"]
db [label="Direct Dependency B"]
ta [label="Transitive Dependency TA"]
tb [label="Transitive Dependency TB"]
tc [label="Transitive Dependency TC"]
dtz [label="Direct/Transitive Dependency Z"]

yc -> da -> ta;
yc -> db -> tc;
da -> tb -> tc;
db -> dtz;
yc -> dtz;
}

Direct dependencies
-------------------

Within the POM, any direct dependencies reside within the ``<dependencies>`` tag:

.. code:: xml

<dependencies>
<dependency>
<groupId>org.example</groupId>
<artifactId>example</artifactId>
<version>1.1.0</version>
<scope>compile</scope>
</dependency>
</dependencies>


Anytime you add a ``<dependency>``, Maven will try to fetch it from defined/configured repositories and use it
within the build lifecycle. You have to define a ``<version>``, but ``<scope>`` is optional for ``compile``.
(See `Maven docs: Dep. Scope <https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#Dependency_Scope>`_)


During fetching, Maven will analyse all transitive dependencies (see graph above) and, if necessary, fetch those, too.
Everything downloaded once is cached locally by default, so nothing needs to be fetched again and again, as long as the
dependency definition does not change.

**Rules to follow:**

1. You should only use direct dependencies for **things you are actually using** in your code.
2. **Clean up** direct dependencies no longer in use. It will bloat the deployment package otherwise!
3. Care about the **scope**. Do not include "testing only" dependencies in the package - it will hurt you in IDEs and bloat things. [#f1]_
4. Avoid using different dependencies for the **same purpose**, e. g. different JSON parsing libraries.
5. Refactor your code to **use Java EE** standards as much as possible.
6. When you rely on big SDKs or similar big cool stuff, try to **include the smallest portion possible**. Complete SDK
bundles are typically heavyweight and most of the time unnecessary.
7. **Don't include transitive dependencies.** [#f2]_

* Exception: if you are relying on it in your code (see *Z* in the graph above), you must declare it. See below
for proper handling in these (rare) cases.


Transitive dependencies
-----------------------

Maven is comfortable for developers; it handles recursive resolution, downloading, and adding "dependencies of dependencies".
However, as life is a box of chocolates, you might find yourself in *version conflict hell* sooner than later without even
knowing, but experiencing unintended side effects.

When you look at the graph above, imagine *B* and *TB* rely on different *versions* of *TC*. How does Maven decide
which version it will include? Easy: the dependent version of the nearest version wins:

.. graphviz::

digraph {
rankdir="LR";
node [fontsize=10]

yc [label="Your Code"]
db [label="Direct Dependency B"]
dtz1 [label="Z v1.0"]
dtz2 [label="Z v2.0"]

yc -> db -> dtz1;
yc -> dtz2;
}

In this case, version "2.0" will be included. If you know something about semantic versioning, a red alert should ring in your mind right now.
How do we know that *B* is compatible with *Z v2.0* when depending on *Z v1.0*?

Another scenario getting us in trouble: indirect use of transitive dependencies. Imagine the following: we rely on *Z*
in our code, but do not include a direct dependency for it within the POM. Now *B* is updated and removed its dependency
on *Z*. You definitely don't want to head down that road.

**Follow the rules to be safe:**

1. Do **not use transitive deps implicit**: add a direct dependency for transitive deps you re-use in your code.
2. On every build check that no implicit usage was added by accident.
3. **Explicitly declare versions** of transitive dependencies in use by multiple direct dependencies.
4. On every build check that there are no convergence problems hiding in the shadows.
5. **Do special tests** on every build to verify these explicit combinations work.

Managing transitive dependencies in ``pom.xml``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Maven can manage versions of transitive dependencies in four ways:

1. Make a transitive-only dependency not used in your code a direct one and add a ``<version>`` tag.
Typically a bad idea, don't do that.
2. Use ``<optional>`` or ``<exclusion>`` tags on direct dependencies that request the transitive dependency.
*Last resort*, you really should avoid this. Not explained or used here.
`See Maven docs <https://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html>`_.
3. Explicitly declare the transitive dependency in ``<dependencyManagement>`` and add a ``<version>`` tag.
4. For more complex transitive dependencies, reuse a "Bill of Materials" (BOM) within ``<dependencyManagement>``
and add a ``<version>`` tag. Many bigger and standard use projects provide those, making the POM much less bloated
compared to adding every bit yourself.

A reduced example, only showing bits relevant to the above cases and usage of an explicit transitive dep directly:

.. code-block:: xml
:linenos:

<properties>
<aws.version>1.11.172</aws.version>
<!-- We need to ensure that our choosen version is compatible with every dependency relying on it.
This is manual work and needs testing, but a good investment in stability and up-to-date dependencies. -->
<jackson.version>2.9.6</jackson.version>
<joda.version>2.10.1</joda.version>
</properties>

<!-- Transitive dependencies, bigger library "bill of materials" (BOM) and
versions of dependencies used both directly and transitive are managed here. -->
<dependencyManagement>
<dependencies>
<!-- First example for case 4. Only one part of the SDK (S3) is used and transitive deps
of that are again managed by the upstream BOM. -->
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-bom</artifactId>
<version>${aws.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
<!-- Second example for case 4 and an example for explicit direct usage of a transitive dependency.
Jackson is used by AWS SDK and others, but we also use it in Dataverse. -->
<dependency>
<groupId>com.fasterxml.jackson</groupId>
<artifactId>jackson-bom</artifactId>
<version>${jackson.version}</version>
<scope>import</scope>
<type>pom</type>
</dependency>
<!-- Example for case 3. Joda is not used in Dataverse (as of writing this). -->
<dependency>
<groupId>joda-time</groupId>
<artifactId>joda-time</artifactId>
<version>${joda.version}</version>
</dependency>
</dependencies>
</dependencyManagement>

<!-- Declare any DIRECT dependencies here.
In case the depency is both transitive and direct (e. g. some common lib for logging),
manage the version above and add the direct dependency here WITHOUT version tag, too.
-->
<dependencies>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-s3</artifactId>
<!-- no version here as managed by BOM above! -->
</dependency>
<!-- Should be refactored and removed once on Java EE 8 -->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<!-- no version here as managed above! -->
</dependency>
<!-- Should be refactored and removed once on Java EE 8 -->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<!-- no version here as managed above! -->
</dependency>
</dependencies>


Helpful tools
--------------

Maven provides some plugins that are of great help to detect possible conflicts and implicit usage.

For *implicit usage detection*, use `mvn dependency:analyze`. Examine the output with great care. Sometimes you will
see implicit usages that do no harm, especially if you are using bigger SDKs having some kind of `core` package.
This will also report on any direct dependency which is not in use and can be removed from the POM. Again, do this with
great caution and double check.

If you want to see the dependencies both direct and transitive in a *dependency tree format*, use `mvn dependency:tree`.

This will however not help you with detecting possible version conflicts. For this you need to use the `Enforcer Plugin
<https://maven.apache.org/enforcer/maven-enforcer-plugin/index.html>`_ with its built in `dependency convergence rule
<https://maven.apache.org/enforcer/enforcer-rules/dependencyConvergence.html>`_.

Repositories
------------

Maven receives all dependencies from *repositories*. Those can be public like `Maven Central <https://search.maven.org/>`_
and others, but you can also use a private repository on premises or in the cloud. Last but not least, you can use
local repositories, which can live next to your application code (see ``local_lib`` dir within Dataverse codebase).

Repositories are defined within the Dataverse POM like this:

.. code:: xml

<repositories>
<repository>
<id>central-repo</id>
<name>Central Repository</name>
<url>http://repo1.maven.org/maven2</url>
<layout>default</layout>
</repository>
<repository>
<id>prime-repo</id>
<name>PrimeFaces Maven Repository</name>
<url>http://repository.primefaces.org</url>
<layout>default</layout>
</repository>
<repository>
<id>dvn.private</id>
<name>Local repository for hosting jars not available from network repositories.</name>
<url>file://${project.basedir}/local_lib</url>
</repository>
</repositories>

You can also add repositories to your local Maven settings, see `docs <https://maven.apache.org/ref/3.6.0/maven-settings/settings.html>`_.

Typically you will skip the addition of the central repository, but adding it to the POM has the benefit that
dependencies are first looked up there (which in theory can speed up downloads). You should keep in mind that repositories
are used in the order they appear.

----

.. rubric:: Footnotes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this - I tried to find other examples, but didn't find one.


.. [#f1] Modern IDEs import your Maven POM and offer import autocompletion for classes based on direct dependencies in the model. You might end up using legacy or repackaged classes because of a wrong scope.
.. [#f2] This is going to bite back in modern IDEs when importing classes from transitive dependencies by "autocompletion accident".

----

Previous: :doc:`documentation` | Next: :doc:`debugging`
13 changes: 12 additions & 1 deletion doc/sphinx-guides/source/developers/documentation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,17 @@ Every non-index page should use the following code to display a table of content

This code should be placed below any introductory text/images and directly above the first subheading, much like a Wikipedia page.

GraphViz based images
---------------------

In some parts of the documentation, graphs are rendered as images via Sphinx GraphViz extension.

This requires `GraphViz <http://graphviz.org/>`_ installed and either ``dot`` on the path or
`adding options to the make call <https://groups.google.com/forum/#!topic/sphinx-users/yXgNey_0M3I>`_.

This has been tested and works on Mac, Linux, and Windows. If you have not properly configured GraphViz, then the worst thing that might happen is a warning and missing images in your local documentation build.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Good to know this works on a Windows box 😄



Versions
--------

Expand All @@ -86,4 +97,4 @@ In order to make it clear to the crawlers that we only want the latest version d

----

Previous: :doc:`testing` | Next: :doc:`debugging`
Previous: :doc:`testing` | Next: :doc:`dependencies`
1 change: 1 addition & 0 deletions doc/sphinx-guides/source/developers/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Developer Guide
sql-upgrade-scripts
testing
documentation
dependencies
debugging
coding-style
deployment
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
f578d8ec91811d5d72981355cb7a1f0f
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
523abaf48b4423eb874dbc086b876aa917930a04
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
346e9f235523e52256006bbe8eba60bb
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
cd83d08c097d6aa1b27b20ef4742c7e4fa47e6b5
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
546f1ab3f3f654280f88e429ba3471ae
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
f4da7ebc3fda69e1e7db12bda6d7b5fb4aecc7a4
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ec87ba7cb8e7396fc903acdbacd31ff6
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
370c2955550a42b11fe7b9007771c506f5769639
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
544e9b97062d054370695b9b09d4bb1c
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
67c505461f3c190894bb036cc866eb640c2f6a48
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
a1b49c13fcf448de9628798f8682fcaa
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
41be98af31f8d17d83ab6c38bd7939ba212eab8d
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
21bc45a29b715720f4b77f51bf9f1754
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
b544162e82d322116b87d99f2fbb6ddd4c4745e1
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
88dc05805672ebe01ded1197a582cd60
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
fe41289cb74c56e9282dd09c22df2eda47c68a0d
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
52f8b446f78009757d593312778f428c
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
feb6903ad32d4b42461b7ca1b3fae6146740bb31
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
b97b8ee92daa5fc4fd87004465f9ad2b
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
f772583549263bd72ea4d5268d9db0a84c27cb9f
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
b50966bebe8cfdcb58478cf029b08aa3
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
28a5d65399cbc25b29b270caebbb86e292c5ba18
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
f9bb7a20a9d538819606ec1630d661fe
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
37a9d8e464a57b90c04252f265572e5274beb605
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
c2d1a458dc809cb3833f3b362a23ed79
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
0f195ee47691c7ee8611db63b6d5ee262c139129
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
c3605bd6434ebeef82ef655d21075652
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
d8dc496b4d408dd6a9ed7429e6fa4d1ce5f57403
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
bcac19fbdf825c5e93e785413815b998
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
1f983c8cf895056f4d4efe7a717b8d73d5c6b091
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3f6f413fb54c5142f2e34837bb9369b4
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
475409b6444aba6bdc96ce42431b6d601c7abe5f
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
7f9939585e369ad60ac1f8a99b2fa75f
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
804fffb163526c6bea975038702ea90f24f89419
1 change: 1 addition & 0 deletions local_lib/edu/harvard/iq/dvn/unf5/5.0/unf5-5.0.jar.md5
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
eeef5c0dc201d1105b9529a51fa8cdab
1 change: 1 addition & 0 deletions local_lib/edu/harvard/iq/dvn/unf5/5.0/unf5-5.0.jar.sha1
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
1fa716d318920fd59fc63f77965d113decf97355
1 change: 1 addition & 0 deletions local_lib/edu/harvard/iq/dvn/unf5/5.0/unf5-5.0.pom.md5
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
2df5dac09375e1e7fcd66c705d9ca2ef
1 change: 1 addition & 0 deletions local_lib/edu/harvard/iq/dvn/unf5/5.0/unf5-5.0.pom.sha1
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
431cd55e2e9379677d14e402dd3c474bb7be4ac9
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
f6099186cd4ef67ea91b4c3b724c1113
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
2232318434cab52dd755fba7003958204459f404
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
b1390a875687dad3cc6527b83f84e635