JFleet persist to database large collections of Java POJOs using alternate persistence methods from JDBC
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.circleci Codecov integration (#8) Jan 31, 2018
config First commit Aug 25, 2017
gradle/wrapper Upgrade Gradle wrapper to 4.5 version Mar 30, 2018
jfleet-core Csv serialization tests Aug 5, 2018
jfleet-samples Fix markdown Aug 31, 2018
.gitignore First commit Aug 25, 2017
CODE_OF_CONDUCT.md Create CODE-OF-CONDUCT.md Feb 5, 2018
CONTRIBUTING.md Readme Feb 5, 2018
LICENSE.txt Add license Aug 25, 2017
README.md Release 0.5.12 version May 3, 2018
build.gradle Refactor sample names and layout Aug 31, 2018
gradle.properties Release 0.5.12 version May 3, 2018
gradlew First commit Aug 25, 2017
gradlew.bat First commit Aug 25, 2017
settings.gradle First commit Aug 25, 2017

README.md

Maven Central Build Status Codecov Download License

JFleet

JFleet is a Java library which persist in database large collections of Java POJOs as fast as possible, using the best available technique in each database provider, achieving it with alternate persistence methods from each JDBC driver implementation.

Its goal is to store a large amount of information in a single table using available batch persistence techniques.

Despite using basic JPA annotations to map Java objects to tables and columns, JFleet is not an ORM.

Table of Contents

Supported databases

Each database provides some technique to insert a bulk of information bypassing standard JDBC commands, but accessible from Java:

  • MySQL : Using the LOAD DATA statement.
  • PostgreSQL: Using the COPY command.

In both cases, and in unsupported databases, you can use the default implementation based on the standard JDBC executeBatch statement.

Benchmark

JFleet performance is comparable to using the native database import tool, and is between 2.1X and 3.8X faster than using the JDBC driver directly.

mysql vs postgres

You can find all the benchmarks numbers and results here

Usage

JFleet needs to know how to map your Java objects or entities to a table. The default mechanism used is standard JPA annotations like @Entity, @Column or @ManyToOne.

import javax.persistence.*;

@Entity
@Table(name = "customer_contact")
public class Customer {

    private Long id;

    private String contactName;
    
    @Column(name="customer_name")
    private String name;
    
    @ManyToOne
    @JoinColumn(name = "city_id")
    private City city;
    
    //Getters and setters
}

@Entity
@Table(name = "city")
public class City {
    
    @Id
    private Integer id;
    
    private String name;
    
    //Getters and setters
}

Like JPA, JFleet follows the convention of using the field name if no @Column name is provided, or the class name if no @Table name is provided.

Given a collection of objects Customer to persist in MySQL with the Load Data technique, you only need to provide a JDBC Connection:

    Collection<Customer> customers = buildLargeAmountOfCustomers();
    BulkInsert<Customer> bulkInsert = new LoadDataBulkInsert<>(Customer.class);
    bulkInsert.insertAll(connection, customers);

If you are using PostgreSQL the BulkInsert implementation is PgCopyBulkInsert. JFleet prefers Streams to Collections because it does not force you to instantiate all objects in memory, and allows you to create them lazily in some stream process:

    Stream<Customer> customers = createLongStreamOfCustomers();
    BulkInsert<Customer> bulkInsert = new PgCopyBulkInsert<>(Customer.class);
    bulkInsert.insertAll(connection, customers);

Dependency

JFleet is uploaded to Maven Central Repository and to use it, you need to add the following Maven dependency:

<dependency>
    <groupId>org.jfleet</groupId>
    <artifactId>jfleet</artifactId>
    <version>0.5.12</version>
</dependency>

or download the single jar from Maven repository.

You can always find the latest published version in the MvnRepository searcher.

By default JFleet uses basic javax.persistence annotations. If you don't have any JPA implementation as a dependency in your project, you must add the Javax Persistence API dependency:

<dependency>
    <groupId>javax.persistence</groupId>
    <artifactId>persistence-api</artifactId>
    <version>1.0.2</version>
</dependency>

Apart from persistence-api and SLF4J for logging, JFleet does not have any dependency. JFleet has not been tested against all JDBC driver versions, but it is expected that any modern version will work properly.

Advanced topics

IDs

JFleet does not manage the @Id of your entities as other ORMs do and you are responsible of it. You have some strategies to deal with it:

  • Use the mechanism provided by each database to autogenerate primary keys:

  • Assign manually an Id to each object:

    • Use an UUID generator
    • If your domain allows it, use a natural key
    • Use a composite key as primary key if the domain also allows it
    • If you control the concurrency access to the table, at the beginning of insertion process, get the max Id value in database and, from Java, increment and set a new Id value to each object

If you opt for an autogenerate strategy, breaking the JPA specification, you can avoid creating a field with the @Id column because it will be always null. But you can keep it if you want, or you are reusing a class from a existing JPA model.

In an autogenerate strategy, ORMs like JPA populate the @Id field of your objects as they insert rows in the database. But due to the insertion technique used by JFleet, primary keys created by the database can not be retrieved for each inserted row, and is not possible to set it back to each object.

In PostgreSQL, if you have a field in an entity which the corresponding database column is declared as SERIAL, you must annotate the field with @Id and @GeneratedValue(strategy = GenerationType.IDENTITY). Otherwise JFleet will try to insert a null value and the database will raise an error. Internally SERIAL is an alias to NOT NULL with a DEFAULT value implemented as a sequence, and does not accept to insert a null value, even when afterwards it will assign one.

JFleet needs to know if a field is SERIAL, and the convention used is annotating it with IDENTITY strategy.

Annotations

By default JFleet reuses existing JPA annotations to map Java object to tables.

JPA allows to define how to map your entities in two ways:

  • entity attributes (instance fields)
  • or the accessors (instance properties)

In JPA by default, the placement of the @Id annotation gives the default access strategy.

JFleet only supports access by entity attributes, and it expects annotations on fields.

The supported annotations are:

  • @Entity: Specifies that the class is an entity.
  • @Table: Specifies the table name. If no value is specified, the class name in lower case is used.
  • @Column: Is used to specify a mapped column for a persistent field. If no value is specified, the field name is used.
  • @Id: Specifies the primary key field of an entity. It is only used to fetch foreign key values in ManyToOne and OneToOne relationships.
  • @MappedSuperclass: Designates a class whose mapping information is applied to the entities that inherit from it. A mapped superclass has no separate table defined for it.
  • @Transient: This annotation specifies that field is not persistent.
  • @Embedded: Defines a persistent field of an entity whose value is an instance of an embeddable class.
  • @EmbeddedId: Is applied to a persistent field of an entity class or mapped superclass to denote a composite primary key that is an embeddable class.
  • @AttributeOverrides: Is used to override mappings of multiple fields.
  • @AttributeOverride: Is used to override the mapping of a basic field or Id field. May be applied to an entity that extends a mapped superclass or to an embedded field to override a basic mapping defined by the mapped superclass or embeddable class.
  • @ManyToOne: Defines a single-valued association to another entity class that has many-to-one multiplicity. The targetEntity value is ignored if provided. Uses the field class annotated.
  • @OneToOne: Defines a single-valued association to another entity that has one-to-one multiplicity. The targetEntity value is ignored if provided. Uses the field class annotated.
  • @JoinColumn: The name of the foreign key column. If no name is provided, the column name is the concatenation of the name of the referencing relationship field of the referencing entity, the char "_", and the name of the referenced primary key column.
  • @Enumerated: Specifies that a persistent field should be persisted as a enumerated type. The used value is specified by the EnumType value. If no annotation is used or no EnumType is used, the default enum type is ORDINAL.
  • @Temporal: This annotation must be specified for persistent fields of type java.util.Date. DATE, TIME and TIMESTAMP values are accepted.

Some common annotations which are not supported are: @GeneratedValue, @OneToMany, @ManyToMany, @JoinColumns and @JoinTable.

These annotations, and many configuration properties in supported annotations, are ignored mainly because has no effect o meaning in the purpose and limitations of JFleet. If you find a relevant annotation or property which could be included create an issue.

BulkInsert configuration

Load Data and Copy methods are based on serializing a batch of rows to a CSVlike StringBuilder, and when serialized information reach a limit of characters, flush it to the database. Depending on the available memory and the size of each row you can tune this limit.

In the JDBC batch insert method you can configure the numbers of rows of each batch operation.

You can also configure how transactions are managed persisting your Stream or Collection:

  • Let JFleet commit to database each time a batch of rows is flushed.
  • Join to the existing transaction in the provided connection, and deciding on your own code when to commit or rollback it.

If you override the default values (10MB and autocommit), you must use a different BulkInsert constructor.

For LoadDataBulkInsert version, with 5MB batch size and no autocommit:

import org.jfleet.mysql.LoadDataBulkInsert.Configuration;

Configuration<Employee> config = new Configuration<>(Employee.class)
        .batchSize(5 * 1024 * 1024)
        .autocommit(false);
BulkInsert<Employee> bulkInsert = new LoadDataBulkInsert<>(config);
bulkIsert.insertAll(connection, stream);

For PgCopyBulkInsert version, with 30MB batch size and autocommit after each batch:

import org.jfleet.postgres.PgCopyBulkInsert.Configuration;

Configuration<Employee> config = new Configuration<>(Employee.class)
        .batchSize(30 * 1024 * 1024)
        .autocommit(true);
BulkInsert<Employee> bulkInsert = new PgCopyBulkInsert<>(config);
bulkInsert.insertAll(connection, stream);

For JdbcBulkInsert version, with 1000 rows batch size and autocommit after each batch:

import org.jfleet.jdbc.JdbcBulkInsert.Configuration;

Configuration<Employee> config = new Configuration<>(Employee.class)
        .batchSize(1000)
        .autocommit(true);
BulkInsert<Employee> bulkInsert = new JdbcBulkInsert<>(config);
bulkInsert.insertAll(connection, stream);

Avoid javax.persistence annotations and dependency

If you have any problem using JPA annotations in your domain objects or directly you don't want to add javax.persistence dependency to your project, you can configure it manually mapping each column to a field path.

Given the same domain objects:

public class Customer {

    private Long id;
    private String contactName;
    private String name;
    private City city;
    
    //Getters and setters
}

public class City {
    private Integer id;
    private String name;
    
    //Getters and setters
}

You configure JFleet with the mapping info:

EntityInfo customerMap = new EntityInfoBuilder(Customer.class, "customer_contact")
	.addField("id", "id")
	.addField("contactName", "contactname")
	.addField("name", "customer_name")
	.addField("city.id", "city_id")
	.build();
    
Configuration<Customer> config = new Configuration<>(customerMap);
BulkInsert<Customer> bulkInsert = new LoadDataBulkInsert<>(config);

MySQL LOAD DATA error handling

In MySQL LOAD DATA command, data-interpretation, duplicate-key errors or foreign key errors become warnings and the operation continues until finish the whole data. Rows with errors are discarded and no SQLException is thrown by database or JDBC driver.

If your business logic is sensitive to these errors you can configure JFleet to detect when some row is missing and throw an exception:

import org.jfleet.mysql.LoadDataBulkInsert.Configuration;

Configuration<Employee> config = new Configuration<>(Employee.class)
    .errorOnMissingRow(true);
BulkInsert<Employee> bulkInsert = new LoadDataBulkInsert<>(config);
try {
    bulkInsert.insertAll(connection, employeesWithForeignKeyError);
} catch (JFleetException e) {
    logger.info("Expected error on missed FK");
}

Supported database versions

JFleet is configured to execute continuous integration tests against CircleCI service, using the latest stable release of MySQL 5.7 and the latest stable release of PostgreSQL 9.6.

PostgreSQL 10.1 release has been manually tested without any problem.

MySQL 8 is not yet supported by JFleet because requires the latest JDBC driver which it is completely rewritten, and all internal classes used by JFleet change from 5.x versions.

Any database engine with a standard JDBC driver should be used with the JdbcBulkInsert implementation.

Running the tests

Tests need a MySQL and a PostgreSQL instances running in localhost. A database called testdb must exist and an user test with password test must have CREATE TABLE and DROP TABLE permissions.

You can modify this settings changing locally mysql-test.properties and postgres-test.properties files.

To execute all test you must execute the command:

$ gradle test

You can also fork the project and test it in your CircleCI free account.

Contribute

Feel free to dive in! Open an issue or submit PRs.

Any contributor and maintainer of this project follows the Contributor Covenant Code of Conduct.

License

Apache 2 © Jerónimo López