Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move datastorage providers to maven modules and include them in the build on demand #7302

Merged
merged 9 commits into from
Sep 29, 2023

Conversation

josegar74
Copy link
Member

@josegar74 josegar74 commented Aug 30, 2023

GeoNetwork currently includes support for external data storage providers for S3, JCloud, and CMIS.

Including support for these 3 external data storage providers, that are barely used, adds an additional 113 library dependencies to the application.

This pull request extracts the data storage providers to independent maven modules. These modules are not included by default in the build. They can be included in the build using maven profiles, to include only the data storage provider required:

  • CMIS: mvn install -Pdatastorage-cmis
  • JCloud: mvn install -Pdatastorage-jcloud
  • S3: mvn install -Pdatastorage-s3

Tested with the default file system datastorage provider and looks working fine.

@ianwallen if you can test CMIS, would be great, thanks.

I'm not sure about other maintainers for JCloud or S3.


Some additional tasks to consider maybe:

  • Create a maven task to package each module in a zip file, including the jar file and all the related libraries.
  • Include in each release the previous packages.

@josegar74 josegar74 added this to the 4.4.0 milestone Aug 30, 2023
@ianwallen
Copy link
Contributor

@josegar74
I quickly scanned the code and it looks good.
Is there a reason to create folders like

  • datastorage-jcloud
  • datastorage-cmis
  • datastorage-s3

I would have expected a folder called datastorage with each datastore as a subfolder which would be similar to the schema plugins.
i.e.

  • datastorage/jcloud
  • datastorage/cmis
  • datastorage/s3

I'm currently on vacation. I should be able to do some testing around mid September.
I can test CMIS and Jcloud with Azure Blob Storage. I cannot test S3 however jcloud is a general cloud storage which supports 5 blobs store providers including s3 (https://jclouds.apache.org/reference/providers/#blobstore-providers). So if jcloud works correctly then users using s3 should be able to migrate over to jcloud should there be issues with s3.

@josegar74
Copy link
Member Author

@ianwallen thanks for the review. The structure you propose makes more sense, I'll check to refactor the code.

…uild on demand - add maven task to package the datastorage and related dependencies in a zip file
@jodygarnett
Copy link
Contributor

jodygarnett commented Sep 6, 2023

I like this it is clean and easy to follow what is going on :)

The GeoServer extension setup got more complicated because:

  • We did not know what we were doing, and maven got easier over time
  • We pushed the assembly step to the end of the build process (no sense making a zip if the next module fails)

Ideas:

  1. If this is of interest I could look at making a release module that depends on webapp, jcloud, cmis, s3 and packages only if they are successful.

  2. You could also only make the assembly happen with a profile like -Prelease is used.

  3. Maven has a "ModuleSet" way of doing assemblies now but I have not figured it out yet

Notes on what more complicated looks like for GeoServer.

  1. We moved all our assembly stuff to the end of our build in GeoServer: src/release
  2. The src/release/pom.xml depends on all the things so it happens last, and it unpacks all the jars into release/target.
  3. The src/pom.xml actually does all the assembly
      <plugin>
        <artifactId>maven-assembly-plugin</artifactId>
        <configuration>
          <descriptors>
            <descriptor>release/war.xml</descriptor>
            <descriptor>release/javadoc.xml</descriptor>
            <descriptor>release/bin.xml</descriptor>
            <descriptor>release/data.xml</descriptor>
            <descriptor>release/ext-db2.xml</descriptor>
            <descriptor>release/ext-feature-pregeneralized.xml</descriptor>
            ... 
  1. The extension assembly files release/ext-db2.xml files are complicated as they have to carefully include and exclude just what is needed from the mess of release/target jars.
    . (This is an ongoing maintenance chore as transitive dependencies change)

datastorages/s3/pom.xml Outdated Show resolved Hide resolved
<execution>
<id>test-jar</id>
<goals>
<goal>test-jar</goal>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the easy way to create a test-jar, but since you have a number of test scope dependencies here it may not be the correct one? The result is a test-jar with no transitive dependencies on other test scope jars required for use.

See "The preferred way" here: https://maven.apache.org/plugins/maven-jar-plugin/examples/create-test-jar.html

Summary: create test jar as a normal module, then folks including it as a test scope dependency will get all the transitive dependencies as test scope.

<module>cmis</module>
<module>jcloud</module>
<module>s3</module>
</modules>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea only:

  1. These modules could be isolated into the same profiles enjoyed by web/pom.xml and thus only compiled if used.
  2. If you also make the profiles enabled by a parameter, example -Dall, the build server has a quick way to ensure everything compiles

pom.xml Outdated Show resolved Hide resolved
Copy link
Contributor

@jodygarnett jodygarnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have provided lots of feedback, but only one solid thing that must be changed.

We should remove the duplication from pom.xml and datastorages/pom.xml which results in the same module being included twice in a full-build.

…uild on demand - change installation instructions to Markdown format
…uild on demand - include LICENSE.md in the assembly packages
…uild on demand - move artifact versions to dependency management
…uild on demand - move the maven profiles to build the datastorages to the datastorages module
@jodygarnett
Copy link
Contributor

The QA check shows test failures:

Error: 7,895 [ERROR] Errors: 
Error: 7,895 [ERROR]   TransactionAspectDoRunInTransactionTest.testRunInTransactionAlwaysCreateNewTransaction_AlwaysCommit » IllegalState
...

@jodygarnett
Copy link
Contributor

@josegar74 I have made the discussed changes; there are some test failures that may or may not be related? Please advise ...

@jodygarnett
Copy link
Contributor

I would like if this change could be adopted, as I would like to include html license files when packaging user docs.

@josegar74 josegar74 marked this pull request as ready for review September 22, 2023 15:01
Copy link
Contributor

@jodygarnett jodygarnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All my feedback has been accounted for

@jodygarnett
Copy link
Contributor

@ianwallen as the affected party here I believe your review/feedback/testing is most desired :)

Copy link
Contributor

@ianwallen ianwallen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested both CMIS and JCLOUD (with Blob storage) and it worked as expected.

@jodygarnett
Copy link
Contributor

Excellent :)

@josegar74 josegar74 merged commit 4388436 into geonetwork:main Sep 29, 2023
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants