Skip to content

Commit

Permalink
Merge branch 'feature/gh-4-source-customisation' into develop
Browse files Browse the repository at this point in the history
  • Loading branch information
evadne committed Nov 30, 2019
2 parents 628c6dc + 9f4c985 commit 9439e7e
Show file tree
Hide file tree
Showing 13 changed files with 274 additions and 71 deletions.
20 changes: 20 additions & 0 deletions CHANGELOG.md
Expand Up @@ -7,6 +7,26 @@ The format is based on [Keep a Changelog][1], and this project adheres to [Seman
[1]: https://keepachangelog.com/en/1.0.0/
[2]: https://semver.org/spec/v2.0.0.html

## [Unreleased]

### Added

- Added support for custom Sources.
- Any module which implements `Packmatic.Source` can be used as a Source.

### Changed

- Revised `Packmatic.Source`.
- Added callback `validate/1` for entry validation.

- Revised `Packmatic.Manifest.Entry`.
- Moved validation of Initialisation Arguments to Sources.

### Fixed

- Revised `Packmatic.Encoder`.
- Fixed acceptance of IO Lists, in case of custom Sources returning these instead of binaries.

## [1.0.0] — 18 November 2019

### Changed
Expand Down
86 changes: 63 additions & 23 deletions README.md
Expand Up @@ -6,6 +6,16 @@ By using a Stream, the caller can compose it within the confines of Plug’s req

The generated archive uses Zip64, and works with individual files that are larger than 4GB. See the Compatibility section for more information.

* * *

- [Design Rationale](#design-rationale)
- [Installation](#installation)
- [Usage](#usage)
- [Source Types](#source-types)
- [Notes](#notes)

* * *

## Design Rationale

### Problem
Expand Down Expand Up @@ -50,38 +60,54 @@ end

## Usage

In order to use Packmatic, you will first create such a Stream by `Packmatic.build_stream/2`. You can then send it off for download with help from `Packmatic.Conn.send_chunked/3`.
The general way to use Packmatic within your application is to generate a `Stream` dynamically by passing a list of Source Entries directly to `Packmatic.build_stream/2`. This gives you a standard Stream which you can then send it off for download with help from `Packmatic.Conn.send_chunked/3`.

Internally, this is powered by `Packmatic.Encoder`, which consumes the Entries within a built Manifest iteratively at the pace set by the client’s download connection.
If you need more control, for example if you desire context separation, or if you wish to validate that the entries are valid prior to vending a Stream, you may generate a `Packmatic.Manifest` struct ahead of time, then pass it to `Packmatic.build_stream/2` at a later time. See `Packmatic.Manifest` for more information.

Each Source Entry within the Manifest specifies the source from where to obtain the content of a particular file to be placed in the package, and which path to put it under; it is your own responsibility to ensure that paths are not duplicated (see the Notes for an example).
In either case, the Stream is powered by `Packmatic.Encoder`, which consumes the Entries within the Manifest iteratively as the Stream is consumed, at the pace set by the client’s download connection.

### Building Stream
### Building the Stream with Entries

The usual way to construct a Stream is as follows.

```elixir
entries = [
[source: {:file, "/tmp/hello.pdf"}, path: "hello.pdf"],
[source: {:file, "/tmp/world.pdf"}, path: "world.pdf"],
[source: {:file, "/tmp/world.pdf"}, path: "world.pdf", timestamp: DateTime.utc_now()],
[source: {:url, "https://example.com/foo.pdf"}, path: "foo/bar.pdf"]
]

stream = Packmatic.build_stream(entries)
```

If you desire, you may pass an additional option entry to `Packmatic.build_stream/2`, such as:
As you can see, each Entry used to build the Stream (under `source:`) is a keyword list, which concerns itself with the source, the path, and optionally a timestamp:

- `source:` represents a 2-arity tuple, representing the name of the Source and its Initialisation Argument. This data structure specifies the nature of the data, and how to obtain its content.

- `path:` represents the path in the Zip file that the content should be put under; it is your own responsibility to ensure that paths are not duplicated (see the Notes for an example).

- `timestamp:` is optional, and represents the creation/modification timestamp of the file. Packmatic emits both the basic form (DOS / FAT) of the timestamp, and the Extended Timestamp Extra Field which represents the same value with higher precision and range.

Packmatic supports reading from any Source which conforms to the `Packmatic.Source` behaviour. To aid adoption and general implementation, there are built-in Sources as well; this is documented under [Source Types][#source-types].

### Building a Manifest

If you wish, you can use the `Packmatic.Manifest` module to build a Manifest ahead-of-time, in order to validate the Entries prior to vending the Stream.

Manifests can be created iteratively by calling `Packmatic.Manifest.prepend/2` against an existing Manifest, or by calling `Packmatic.Manifest.create/1` with a list of Entries created elsewhere. For more information, see `Packmatic.Manifest`.

### Specifying Error Behaviour

By default, Packmatic fails the Stream when any Entry fails to process for any reason. If you desire, you may pass an additional option to `Packmatic.build_stream/2` in order to modify this behaviour:

```elixir
stream = Packmatic.build_stream(entries, on_error: :skip)
```

Each Entry used to build the Stream is a 2-arity tuple, representing the Source Entry and the Path for the file.

Further, the Source Entry is a 2-arity tuple which represents the type of Source and the initialising argument of that type of Source. See [Source Types](#source-types).

### Writing Stream to File

You can use the standard `Stream.into/2` call to operate on the Stream:

```elixir
stream
|> Stream.into(File.stream!(file_path, [:write]))
Expand All @@ -90,6 +116,8 @@ stream

### Writing Stream to Conn (with Plug)

You can use the bundled `Packmatic.Conn` module to send a Packmatic stream down the wire:

```elixir
stream
|> Packmatic.Conn.send_chunked(conn, "download.zip")
Expand All @@ -99,31 +127,43 @@ When writing the stream to a chunked `Plug.Conn`, Packmatic automatically escape

## Source Types

Within Packmatic, there are four types of Sources:
Packmatic has default Source types that you can use easily when building Manifests and/or Streams:

1. **File,** representing content on disk, useful when the content is already available and only needs to be integrated. See `Packmatic.Source.File`.

2. **URL,** representing content that is available remotely. Packmatic will run a chunked download routine to incrementally download and archive available chunks. See `Packmatic.Source.URL`.

3. **Random,** representing randomly generated bytes which is useful for testing. See `Packmatic.Source.Random`.

1. **File,** representing content on disk, useful when the content is already available and only needs to be integrated.
4. **Dynamic,** representing a dynamically resolved Source, which is ultimately fulfilled by pulling content from either a File or an URL. If you have any need to inject a dynamically generated file, you may use this Source type to do it. This also has the benefit of avoiding expensive computation work in case the customer abandons the download midway. See `Packmatic.Source.Dynamic`.

Example: `{:file, "/tmp/hello/pdf"}`.
These Streams can be referred by their internal aliases:

See `Packmatic.Source.File`.
- `{:file, "/tmp/hello/pdf"}`.
- `{:url, "https://example.com/hello/pdf"}`.
- `{:random, 1048576}`.
- `{:dynamic, fn -> {:ok, {:random, 1048576}} end}`.

2. **URL,** representing content that is available remotely. Packmatic will run a chunked download routine to incrementally download and archive available chunks.
Alternatively, they can also be referred by module names:

Example: `{:url, "https://example.com/hello/pdf"}`.
- `{Packmatic.Source.File, "/tmp/hello/pdf"}`.
- `{Packmatic.Source.URL, "https://example.com/hello/pdf"}`.
- `{Packmatic.Source.Random, 1048576}`.
- `{Packmatic.Source.Dynamic, fn -> {:ok, {:random, 1048576}} end}`.

See `Packmatic.Source.URL`.
### Dynamic & Custom Sources

3. **Random,** representing randomly generated bytes which is useful for testing.
If you have an use case where you wish to dynamically generate the content that goes into the archive, you may either use the Dynamic source or implement a Custom Source.

Example: `{:random, 1048576}`.
For example, if the amount of dynamic computation is small, but the results are time-sensitive, like when you already have Object IDs and just need to pre-sign URLs, you can use a Dynamic source with a curried function:

See `Packmatic.Source.Random`.
{:dynamic, MyApp.Packmatic.build_dynamic_fun(object_id)}

4. **Dynamic,** representing a dynamically resolved Source, which is ultimately fulfilled by pulling content from either a File or an URL. If you have any need to inject a dynamically generated file, you may use this Source type to do it. This also has the benefit of avoiding expensive computation work in case the customer abandons the download midway.
If you have a different use case, for example if you need to pull data from a FTP server (which uses a protocol that Packmatic does not have a bundled Source to work with), you can implement a module that conforms to the `Packmatic.Source` behaviour, and pass it:

Example: `{:dynamic, fn -> {:ok, {:random, 1048576}} end}`.
{MyApp.Packmatic.Source.FTP, "ftp://example.com/my.docx"}

See `Packmatic.Source.Dynamic`.
See `Packmatic.Source` for more information.

## Notes

Expand Down
2 changes: 1 addition & 1 deletion lib/packmatic/encoder.ex
Expand Up @@ -101,7 +101,7 @@ defmodule Packmatic.Encoder do

defp stream_encode(%{current: {_, source, _}} = state) do
case Source.read(source) do
data when is_binary(data) -> stream_encode_data(data, state)
data when is_binary(data) or is_list(data) -> stream_encode_data(data, state)
:eof -> stream_encode_eof(state)
{:error, reason} -> stream_encode_error(reason, state)
end
Expand Down
8 changes: 1 addition & 7 deletions lib/packmatic/manifest/entry.ex
Expand Up @@ -26,13 +26,7 @@ end

defimpl Packmatic.Validator.Target, for: Packmatic.Manifest.Entry do
def validate(%{source: nil}, :source), do: {:error, :missing}
def validate(%{source: {:file, ""}}, :source), do: {:error, :invalid}
def validate(%{source: {:file, path}}, :source) when is_binary(path), do: :ok
def validate(%{source: {:url, ""}}, :source), do: {:error, :invalid}
def validate(%{source: {:url, url}}, :source) when is_binary(url), do: :ok
def validate(%{source: {:dynamic, fun}}, :source) when is_function(fun, 0), do: :ok
def validate(%{source: {:random, bytes}}, :source) when is_number(bytes) and bytes > 0, do: :ok
def validate(%{source: _}, :source), do: {:error, :invalid}
def validate(%{source: entry}, :source), do: Packmatic.Source.validate(entry)

def validate(%{path: nil}, :path), do: {:error, :missing}
def validate(%{path: _}, :path), do: :ok
Expand Down
144 changes: 104 additions & 40 deletions lib/packmatic/source.ex
Expand Up @@ -3,72 +3,136 @@ defmodule Packmatic.Source do
Defines how data can be acquired in a piecemeal fashion, perhaps by reading only a few pages
from the disk at a time or only a few MBs of data from an open socket.
The Source behaviour defines two functions, `init/1` and `read/1`, that must be implemented by
conforming modules. The first function initialises the Source and the second one iterates it,
reading more data until there is no more.
The Source behaviour defines three callbacks that must be implemented by conforming modules:
1. `c:validate/1`, which is called to check the initialisation argument.
2. `c:init/1`, which is called to instantiate the source and return its state.
3. `c:read/1`, which is called to read data from the source, given the state.
## Representing Sources
Sources are represented in Manifest Entries as tuples such as `{:file, path}` or `{:url, url}`.
This form of representation is called a Source Entry; the first element in the tuple is the name
and the second element is called the Initialisation Argument (`init_arg`).
This form of representation is called a Source Entry.
The Source Entry is a stable locator of the underlying data which has no runtime implications.
The Encoder hydrates the Source Entry into whatever the Source module implements internally,
when it is time to pull data from that source.
The first element in the tuple is the Source Name, and the second element is called the
Initialisation Argument (`init_arg`).
### Source Name
The Source names can be special atoms (short names) or full module names:
1. `:file` resolves to `Packmatic.Source.File`.
2. `:url` resolves to `Packmatic.Source.URL`.
3. `:dynamic` resolves to `Packmatic.Source.Dynamic`.
4. `:random` resolves to `Packmatic.Source.Random`.
If another atom is passed, Packmatic will first ensure that a module with that name has been
loaded, then use it.
### Initialisation Argument
The Initialisation Argument is usually a basic Elixir type, but in the case of Dynamic Sources,
it is a function which resolves to a Source Entry understood by either the File or URL source.
### Examples
The Source Entry `{:file, path}` is resolved during encoding:
iex(1)> {:ok, file_path} = Briefly.create()
iex(2)> {:ok, state} = Packmatic.Source.build({:file, file_path})
iex(3)> state.__struct__
Packmatic.Source.File
"""

@doc "Converts the Entry to a Source, or return failure."
@callback init(term()) :: {:ok, struct()} | {:error, term()}
@typedoc """
Represents the Name of the Source, which can be a shorthand (atom) or a module.
"""
@type name :: atom() | module()

@doc "Iterates the Source and return data as an IO List, `:eof`, or failure."
@callback read(struct()) :: iodata() | :eof | {:error, term()}
@typedoc """
Represents the Initialisation Argument which is a stable locator for the underlying data, that
the Source will initialise based upon.
"""
@type init_arg :: term()

defmodule Builder do
@moduledoc false
@typedoc """
Represents the internal State for a resolved Source that is being read from.
def build_sources(source_names, module) do
for source_name <- source_names do
{:"#{String.downcase(source_name)}", Module.concat([module, source_name])}
end
end
Sources that hold state must use `defstruct` to define a struct, as the name of the struct is
used to refer them back to the Source module when reading data.
def build_quoted_entry_type(sources) do
for {name, module} <- sources, reduce: [] do
acc -> [quote(do: {unquote(name), unquote(module).init_arg()}) | acc]
end
end
end
In case of a File source, the struct may hold the File Handle; in case of a URL source, it may
indirectly refer to the underlying network socket, etc.
"""
@type t :: struct()

source_names = ~w(File URL Random Dynamic)
sources = Builder.build_sources(source_names, __MODULE__)
@doc """
Validates the given Initialisation Argument.
"""
@callback validate(init_arg) :: :ok | {:error, term()}

@typedoc """
Represents an internal tuple that can be used to initialise a Source with `build/1`. This allows
the Entries to be dynamically resolved. Dynamic sources use this to prepare their work lazily,
and other Sources may use this to open sockets or file handles.
@doc """
Converts the Entry to a Source State.
"""
@type entry :: unquote(Builder.build_quoted_entry_type(sources))
@callback init(term()) :: {:ok, t} | {:error, term()}

@doc """
Iterates the Source State. Returns an IO List, `:eof`, or `{:error, reason}`.
"""
@callback read(t) :: iodata() | :eof | {:error, term()}

@typedoc """
Represents the internal (private) struct which holds runtime state for a resolved Source. In
case of a File source, this may hold the File Handle indirectly; in case of a URL source this
may indirectly refer to the underlying network socket.
Represents an internal tuple that can be used to initialise a Source with `build/1`.
This allows the Entries to be dynamically resolved. Dynamic sources use this to prepare their
work lazily, and other Sources may use this mechanism to delay opening of sockets or handles.
"""
@type t :: struct()
@type entry :: {name, init_arg}

@spec validate(entry) :: :ok | {:error, term()}
@spec build(entry) :: {:ok, t} | {:error, term()}
@spec read(t) :: iodata() | :eof | {:error, term()}

for {name, module} <- sources do
@spec build({unquote(name), unquote(module).init_arg()}) :: unquote(module).init_result()
@doc """
Validates the given Entry.
Called by `Packmatic.Manifest.Entry`.
"""
def validate({name, init_arg}) do
with {:module, module} <- resolve(name) do
module.validate(init_arg)
end
end

@doc "Transforms an Entry into a Source ready for acquisition. Called by `Packmatic.Encoder`."
for {name, module} <- sources do
def build({unquote(name), init_arg}), do: unquote(module).init(init_arg)
@doc """
Initialises the Source with the Initialisation Argument as specified in the Entry. This prepares
the Source for acquisition.
Called by `Packmatic.Encoder`.
"""
def build({name, init_arg}) do
with {:module, module} <- resolve(name) do
module.init(init_arg)
end
end

@doc "Consumes bytes off an initialised Source. Called by `Packmatic.Encoder`."
def read(%{__struct__: module} = source), do: module.read(source)
@doc """
Consumes bytes off an initialised Source.
Called by `Packmatic.Encoder`.
"""
def read(state)
def read(%{__struct__: module} = state), do: module.read(state)
def read(_), do: {:error, :invalid_state}

defp resolve(:file), do: {:module, __MODULE__.File}
defp resolve(:url), do: {:module, __MODULE__.URL}
defp resolve(:random), do: {:module, __MODULE__.Random}
defp resolve(:dynamic), do: {:module, __MODULE__.Dynamic}
defp resolve(module) when is_atom(module), do: Code.ensure_loaded(module)
defp resolve(_), do: {:error, :invalid_name}
end
3 changes: 3 additions & 0 deletions lib/packmatic/source/dynamic.ex
Expand Up @@ -50,6 +50,9 @@ defmodule Packmatic.Source.Dynamic do
@type resolve_result_url :: {:ok, {:url, Source.URL.init_arg()}}
@type resolve_result_error :: {:error, term()}

def validate(fun) when is_function(fun, 0), do: :ok
def validate(_), do: {:error, :invalid}

def init(resolve_fun) do
case resolve_fun.() do
{:ok, {:file, path}} -> Source.File.init(path)
Expand Down
4 changes: 4 additions & 0 deletions lib/packmatic/source/file.ex
Expand Up @@ -15,6 +15,10 @@ defmodule Packmatic.Source.File do
@enforce_keys ~w(path device)a
defstruct path: nil, device: nil

@impl Source
def validate(path) when is_binary(path) and path != "", do: :ok
def validate(_), do: {:error, :invalid}

@impl Source
def init(path) do
with {:ok, device} <- File.open(path, [:binary, :read]) do
Expand Down

0 comments on commit 9439e7e

Please sign in to comment.