# Quilt 4 API (working title)

This notebook serves as both living documentation and the integration test suite.

In [1]:
!pip uninstall -y quilt3
!pip install -e .

Uninstalling quilt3-3.1.4:
  Successfully uninstalled quilt3-3.1.4
Obtaining file:///Users/armandmcqueen/code/quilt/api/python
Installing collected packages: quilt3
  Running setup.py develop for quilt3
Successfully installed quilt3


In [2]:
!mkdir -p quilt-tmp
!echo "# README\n\n## This is a test dataset" > quilt-tmp/README.md
!echo "test" > quilt-tmp/testfile
!echo "test2" > quilt-tmp/testfile2
!echo "test3" > quilt-tmp/testfile3
!echo "test4" > quilt-tmp/testfile4

In [3]:
from quilt3.new_packages import (
    Package, PackageEntry, PackageBuilder, 
    MockPackageWithoutInitLogic, QuiltAddCollisionException
)
import pandas as pd

## Creating a `Package` with `PackageBuilder`

### Adding entries to the `PackageBuilder`

There are a number of ways to add entries to a PackageBuilder. We will illustrate several of those and then confirm that all the `PackageBuilders` create identical `Packages` at the end.

#### (1) Add files individually, with logical_key = physical_key

In [4]:
pb1 = PackageBuilder()
pb1.add_file("quilt-tmp/testfile")
pb1.add_file("quilt-tmp/README.md")

#### (2) Add files individually, specifying the logical key

Note: there's no good reason to do that in this example, but we want all of our PackageBuilders to match at the end

In [5]:
pb2 = PackageBuilder()
pb2.add_file("quilt-tmp/testfile",   "./quilt-tmp/testfile")
pb2.add_file("quilt-tmp/README.md",  "./quilt-tmp/README.md")

#### (3) Add a directory, with logical_key = physical_key

In [6]:
pb3 = PackageBuilder()
pb3.add_dir("quilt-tmp/")

AttributeError: 'str' object has no attribute 'scheme'

#### (4) Add a directory, specifying the logical_key to use

In [None]:
pb4 = PackageBuilder()
pb4.add_dir("quilt-tmp/", "./quilt-tmp/")

#### (5) Add a `PackageEntry` (i.e. one entry that comes from a `Package`)

In practice you shouldn't ever need to create a `PackageEntry` directly, but we can for mocking purposes

In [None]:
pkg_name = "bucket/package_name"
pkg_hash= "some_fake_hash"
entry_hash = "another_fake_hash"

entry1 = PackageEntry(logical_key="quilt-tmp/old_logical_key/testfile", 
                      physical_key="quilt-tmp/testfile", 
                      pkg_name=pkg_name, pkg_hash=pkg_hash, size=1, 
                      entry_hash=entry_hash, metadata={})

entry2 = PackageEntry(logical_key="quilt-tmp/old_logical_key/README.md", 
                      physical_key="quilt-tmp/README.md", 
                      pkg_name=pkg_name,pkg_hash=pkg_hash, size=1, 
                      entry_hash=entry_hash, metadata={})

pb5 = PackageBuilder()
pb5.add_package_entry("quilt-tmp/testfile", entry1)
pb5.add_package_entry("quilt-tmp/README.md", entry2)

# Make sure we deep copied correctly
entry1.logical_key = "Incorrect logical key"

#### (6) Add a `Package`

In [None]:
entryA = PackageEntry(logical_key="quilt-tmp/testfile", 
                      physical_key="quilt-tmp/testfile", 
                      pkg_name=pkg_name, pkg_hash=pkg_hash, size=1, 
                      entry_hash=entry_hash, metadata={})

entryB = PackageEntry(logical_key="quilt-tmp/README.md", 
                      physical_key="quilt-tmp/README.md", 
                      pkg_name=pkg_name, pkg_hash=pkg_hash, size=1, 
                      entry_hash=entry_hash, metadata={})

pkg = MockPackageWithoutInitLogic(pkg_name=pkg_name, pkg_tag=None, pkg_hash=pkg_hash, pkg_metadata={}, 
                                  pkg_entries=[entryA, entryB])

pb6 = PackageBuilder()
pb6.add_package(pkg)

#### Make sure all of those `PackageBuilders` build identical `Packages`. 

(`PackageBuilders` can't be directly compared until they are built because the above the above `PackageBuilders` are very different under the hood - different types of `PackageBuilderEntries` in different states of completeness until `build`)

In [None]:
pkg1 = pb1.build(pkg_name, allow_local_files=True)
pkg2 = pb2.build(pkg_name, allow_local_files=True)
pkg3 = pb3.build(pkg_name, allow_local_files=True)
pkg4 = pb4.build(pkg_name, allow_local_files=True)
pkg5 = pb5.build(pkg_name, allow_local_files=True)
pkg6 = pb6.build(pkg_name, allow_local_files=True)

assert pkg1 == pkg2 == pkg3 == pkg4 == pkg5 == pkg6

### `Adding` and logical_key collisions

`Adding` should be a purely additive operation. If you have a `PackageBuilder` with logical_key='KEY' and add a file/dir/`Package` that also uses logical_key='KEY', the `add` will raise an Exception unless you explicitly specify `overwrite=True`. This is illustrated below

In [None]:
pb = PackageBuilder()
pb.add_file("quilt-tmp/testfile", "quilt-tmp/testfile")
pb.add_file("quilt-tmp/README.md", "quilt-tmp/README.md")
pkg = pb.build(pkg_name, allow_local_files=True)

Collision on 'quilt-tmp/README.md' that raises Exception

In [None]:
pb = PackageBuilder()
pb.add_file("quilt-tmp/testfile2", "quilt-tmp/testfile2")
pb.add_file("quilt-tmp/README.md", "quilt-tmp/README.md")

exception_raised = False
try:
    pb.add_package(pkg)  # 'quilt-tmp/README.md' will cause a collision
except QuiltAddCollisionException as ex:
    exception_raised = True
finally:
    assert exception_raised

No collision

In [None]:
pb = PackageBuilder()
pb.add_file("quilt-tmp/testfile2", "quilt-tmp/testfile2")

exception_raised = False
try:
    pb.add_package(pkg)  # No longer have a collision
except QuiltAddCollisionException as ex:
    exception_raised = True
finally:
    assert not exception_raised

In [None]:
!rm -r quilt-tmp/