Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamline `Package.java` API #29

Open
iSnow opened this issue Oct 18, 2019 · 5 comments
Assignees
Labels
Projects

Comments

@iSnow
Copy link

@iSnow iSnow commented Oct 18, 2019

Overview

Besides the no-args constructor, the Package class has 8 constructors, half of them just convenience functions (See "AS IS" below).

The way the API is structured is also not very Java-like, as

  • URL-based constructors are limited to HTTP/HTTPS protocols according to the JavaDoc, but should (and probably do) also work for file-URLs.

  • Usage of filepaths in the form of Strings is at least unusual and feels like a direct Python-port. A Java-API preferably should use either File/Path or InputStream (which would cover URLs, Files, even JSON-Strings if need be)

  • Using a String to either hold JSON-content or a file-path to a ZIP-file in public Package(String jsonStringSource, boolean strict) is rather unusual for a Java-API.

Proposal 1: Trim and make idiomatic Java

It would be sufficient to trim the host of constructors to three, maybe four:

/**
 * Load from String representation of JSON object.
 */
public Package(String jsonStringSource, boolean strict)   

/**
 * Load from URL (either 'http'/'https' or file URL).
 */
public Package(URL urlSource, boolean strict) 

/**
 * Load from File
 */
public Package(File sourceFile, boolean strict) 

/**
 * Load from InputStream.
 */
public Package(InputStream source, boolean strict) 

The most fundamental constructor would rely on an InputStream, the other constructors are syntactic sugar and would simply create an InputStream on either the URL or the JSON-String and delegate to this.

It would be necessary to validate that this setup supports ZIP-packaged DataPackages, but that should be possible. Going through the code, it seems that Resource resolution has a handful of problems in both directory-based datapackages with resources (paths can't really be relative to the datapackage.json) and ZIP-packaged DataPackages, but I would have to look into this more deeply.

Proposal 2: Switch to a Builder-pattern

Since fluid interfaces are very much a part of Java since a couple of years, it would maybe make sense to go with the practice and switch over to a builder-based API:

public Package {
	private Package();

	public static PackageBuilder builder();

	public static class PackageBuilder {

	   public PackageBuilder fromSource(String jsonStringSource);

	   public PackageBuilder fromSource(URL urlSource);

	   public PackageBuilder fromSource(InputStream source);
	
	   //maybe add
	    public PackageBuilder fromZipSource(URL urlSource);

	   public PackageBuilder setStrict (boolean strict);

	   public Package build();
	}
}

The special method for reading from a ZIP file might be needed to read Resources from inside the ZIP-files, I am not totally sure.

If there's any interest in this, I would volunteer to work on either proposal to demo its validity.


AS IS:

 /**
     * Load from native Java JSONObject.
 */
public Package(JSONObject jsonObjectSource, boolean strict) 

/**
 * Load from native Java JSONObject.
 */
public Package(JSONObject jsonObjectSource) 

/**
 * Load from String representation of JSON object or from a zip file path.
 */
public Package(String jsonStringSource, boolean strict)         

/**
 * Load from String representation of JSON object or from a zip file path.
 */
public Package(String jsonStringSource) 

/**
 * Load from URL (must be in either 'http' or 'https' schemes).
 */
public Package(URL urlSource, boolean strict) 

/**
 * Load from URL (must be in either 'http' or 'https' schemes).
 * No validation by default.
 */
public Package(URL urlSource) 

/**
 * Load from local file system path.
 */
public Package(String filePath, String basePath, boolean strict) 

/**
 * Load from local file system path.
 * No validation by default.
 */
public Package(String filePath, String basePath) 

Please preserve this line to notify @georgeslabreche (maintainer of this repository)

@roll

This comment has been minimized.

Copy link
Member

@roll roll commented Oct 29, 2019

@iSnow
It's really hard to say for me because I don't use Java but it seems a reasonable change. BTW as a client how can I use the lib comparing these 3 options?

@iSnow

This comment has been minimized.

Copy link
Author

@iSnow iSnow commented Oct 29, 2019

Hi, considering you aren't that familiar with Java, I won't directly write out code examples, but pseudo code. I am trying to be fair to the current API, eg. Case 4 is shorter in the current setup, but binds you to one certain JSON lib (Java, unfortunately, has many).

I'd like to point you to Case 5 & 6. Similar setup, one time the datapackage.json is directly referenced via URL (in this case, the library would need to store the base URL to load relative resources, but currently, it doesn't if I am not very wrong). In the other case, a ZIP file is referenced via URL. What you can see is that the current API needs to be used very differently between those cases.

It would still be good to get @georgeslabreche to give his opinion on the proposed change.

So, some scenarios how to use it:

Case 1: reading from a File (datapackage.json) in a directory

current:

  • get absolute path from File object pointing to datapackage.json
  • get parent to get directory object
  • convert to path string
  • create Package.java with the path string and "datapackage.json" as fixed value

proposal 1:

  • create Package.java with File pointing to the datapackage.json. The library would get the parent from the File object and know where to look for the resources

proposal 2:

  • get Builder for Package
  • set File via fromSource()
  • call build(). The library would get the parent from the File object and know where to look for the resources

Case 2: reading from a Zip somewhere in the file system

current:

  • get absolute path from File object pointing to zip file
  • convert to path string
  • create Package.java with the path string

proposal 1:

  • create Package.java with File object pointing to zip file

proposal 2:

  • get Builder for Package
  • set File via fromSource()
  • call build()

Case 3: reading from in-memory blob representing a ZIP, eg. received in a server app via HTTP POST

current:

  • write blob to temp file
  • get absolute path from File object pointing to zip file
  • convert to path string
  • create Package.java with the path string

proposal 1:

  • get InputStream from blob (everything stays in RAM)
  • create Package.java with InputStream

proposal 2:

  • get InputStream from blob (everything stays in RAM)
  • get Builder for Package
  • set InputStream via fromSource()
  • call build()

Case 4: reading from a JSON-object

current:

  • create Package.java with the object

proposal 1:

  • serialize JSON object to string
  • create Package.java with string

proposal 2:

  • serialize JSON object to string
  • get Builder for Package
  • set string via fromSource()
  • call build()

But what happens if the application doesn't use the json.org JSON library, but the GSON from Google? Then:

current:

  • serialize JSON obj to string
  • create Package.java with the string

proposal 1:

  • serialize JSON object to string
  • create Package.java with string

proposal 2:

  • serialize JSON object to string
  • get Builder for Package
  • set string via fromSource()
  • call build()

Case 5: reading from a ZIP somewhere on a server

current:

  • download ZIP to temp file
  • get absolute path from File object pointing to zip file
  • convert to path string
  • create Package.java with the path string

proposal 1:

  • get InputStream from ZIP's URL
  • create Package.java with InputStream

proposal 2:

  • get InputStream from ZIP's URL
  • get Builder for Package
  • set InputStream via fromSource()
  • call build()

Case 6: reading from a 'datapackage.json' somewhere on a server

current:

  • create Package.java with the URL (no resolution of relative URLs in Resources, will break)

proposal 1:

  • create Package.java with the URL (should do resolution of relative URLs in Resources)

proposal 2:

  • get Builder for Package
  • set URL via fromSource()
  • call build() (should do resolution of relative URLs in Resources)
@georgeslabreche

This comment has been minimized.

Copy link
Collaborator

@georgeslabreche georgeslabreche commented Oct 29, 2019

Hi guys!

I really like these proposals and agree that they are an improvement with respect to streamlining.

It's been a while so it's difficult for me to remember the exact details as to why it was implemented the way it is now. However, I do recall that special attention was made to stay true to the specifications as well as preserving consistent behaviour across implementations in other languages.

If we still want to enforce cross-language consistency then these proposals need to be compared with how the library has been implemented in other languages.

@iSnow

This comment has been minimized.

Copy link
Author

@iSnow iSnow commented Oct 30, 2019

Hi @georgeslabreche,

great to hear from you! I do recognize that you were following the interface draft very closely, which is a good thing. In multi-language projects though, you will always have a certain tension between what is proposed as a common interface and what is true to the nature of each language. It is a valid decision to stick to a common interface as closely as possible, but I'd argue that it's permissible to take some liberties to create idiomatic code for each environment, not least because the implementation guide states:

We prefer to focus on actions rather than features, feature sets, user stories, or more formal API specifications as we want to leave enough flexibility for implementations that follow the idioms of the host language, yet we do want to ensure a common base of what can be done with an implementation in any language.

There is no absolute right or wrong here, but my feeling when it comes to different language implementations, is that it's better to focus on what to achieve (the actions) than the how (precise API). Even more so because few developers constantly switch between Python, Java and JS.

@georgeslabreche

This comment has been minimized.

Copy link
Collaborator

@georgeslabreche georgeslabreche commented Oct 30, 2019

I totally agree. We just need an A-OK or No-Go from the frictionlessdata peeps. @roll?

@iSnow iSnow self-assigned this Nov 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
General
  
Software (core)
3 participants
You can’t perform that action at this time.