Skip to content

Opacify reads a file and builds a manifest of external sources to rebuild said file.

License

Notifications You must be signed in to change notification settings

mtingers/opacify

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Opacify

Opacify reads a file and builds a manifest of external URLs to rebuild said file.

asciicast

Install

pip install opacify

Must Knows

  1. Opacify is slow (and probably always will be)!
  2. A cache is built locally to speedup both pacify and satisfy. It is removed on completed unless you specify --keep.
  3. The cache is built from downloading the data from the urls list. TODO: Add cache limit flag.
  4. --threads N option will help speedup the pacify command.

Examples

Please note that the example output may not be accurate at this time as it is a work in progress.

Pacify A File

$ opacify pacify --input test.txt --manifest test.manifest --cache cache/ --urls urls.txt --keep --threads 4 --force
Progress: |████████████████████████████████████████████████████| * 100.0% thread-2 0.00m remaining

Wrote manifest to: test.manifest
   Avg chunk size: 3.40
     Total chunks: 2107
    Manifest size: 164291
    Original size: 7173
     Input sha256: 44060449ed92a19e59231d48ab634cbe89d7328f1c24ac7b48b4992b1256657f
         Duration: 7.170s

Satisfy A File

$ opacify satisfy --out test.txt.out --manifest test.manifest --cache dcache/ --force
Progress: |████████████████████████████████████████████████████| . 100.0%  0.00m remaining

    Manifest size: 164291
    Output sha256: 44060449ed92a19e59231d48ab634cbe89d7328f1c24ac7b48b4992b1256657f
      Output size: 7173
         Duration: 15.079s
$ shasum test.txt.out test.txt
85c7bd6f40ba36326f9acd695779db7847434db4  test.txt.out
85c7bd6f40ba36326f9acd695779db7847434db4  test.txt

Build Url List from Reddit

Please note that Reddit data is volatile and often disappears.

$ opacify reddit --out reddit-urls.txt --count 20
Generating urls from reddit data...
Wrote urls data to: reddit-urls.txt

$ wc -l reddit-urls.txt
      20 reddit-urls.txt

Validate Manifest

As time goes by, external sources may disappear or content may change. The following will check that the source exists (has a valid HTTP response) and check that the source provides enough data of offset+length:

$ opacify verify --manifest test.opacify
Validating external sources listed in manifest ...
Status: 100% ... Complete!

Usage

usage: opacify [-h] [-V] {pacify,satisfy,verify,reddit} ...

Opacify : v0.3.0
Project : http://github.com/mtingers/opacify
Author  : Matth Ingersoll <matth@mtingers.com>

positional arguments:
  {pacify,satisfy,verify,reddit}
    pacify              Run in pacify mode (builds manifest from input file)
    satisfy             Run in satisfy mode (extracts file using manifest)
    verify              Validate manifest URLs and response length
    reddit              Auto-generate a urls file from reddit links

optional arguments:
  -h, --help            show this help message and exit
  -V, --version         Display Opacify version info

Examples:
    $ opacify pacify --input test.txt --urls urls.txt --manifest test.opm --cache /tmp/cache/
    $ opacify satisfy --out test.txt.out --urls urls.txt --manifest test.opm --cache /tmp/dcache/
usage: opacify pacify [-h] -i INPUT -u URLS -m MANIFEST -c CACHE [-k] [-f]
                      [-d] [-t THREADS] [-s CHUNKSIZE]

Run in pacify mode (builds manifest from input file)

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Path to input file
  -u URLS, --urls URLS  Path to urls file
  -m MANIFEST, --manifest MANIFEST
                        Output path of manifest file
  -c CACHE, --cache CACHE
                        Path to cache directory
  -k, --keep            Do not remove cache after completed. Useful for
                        testing
  -f, --force           Overwrite manifest if it exists
  -d, --debug           Turn on debug output
  -t THREADS, --threads THREADS
                        Run processing multiple threads
  -s CHUNKSIZE, --chunksize CHUNKSIZE
                        Specify a different chunk size (default is 1 byte)
usage: opacify satisfy [-h] -m MANIFEST -o OUT -c CACHE [-k] [-f] [-d]

Run in satisfy mode (rebuilds file using manifest)

optional arguments:
  -h, --help            show this help message and exit
  -m MANIFEST, --manifest MANIFEST
                        Path of manifest file
  -o OUT, --out OUT     Path to write output file to
  -c CACHE, --cache CACHE
                        Path to cache directory
  -k, --keep            Do not remove cache after completed. Useful for
                        testing
  -f, --force           Overwrite output file if it exists
  -d, --debug           Turn on debug output
usage: opacify verify [-h] -m MANIFEST [-d]

Validate manifest URLs and response length

optional arguments:
  -h, --help            show this help message and exit
  -m MANIFEST, --manifest MANIFEST
                        Path of manifest file
  -d, --debug           Turn on debug output
usage: opacify reddit [-h] -o OUT -c COUNT

Auto-generate a urls file from reddit links

optional arguments:
  -h, --help            show this help message and exit
  -o OUT, --out OUT     Path to write urls to
  -c COUNT, --count COUNT
                        How many links to get

Errors

See Error Codes for a list of errors and meanings.

Manifest Format

The manifest consists of a header and body.

Header

The header is one line with a ':' delimiter. It contains the following in order as of this writing: version:source-file-sha256:source-file-length

  • version: The version of Opacify that the manifest was built with.
  • source-file-sha256: The sha256 of the input file. This is used to validate on satisfy.
  • source-file-length: The length of the input file. This is also used to validate on satisfy.

Body

Each line represents an item and has a space as a delimiter. The lines are in order of the input file data. Example:

http://foo/bar.png 23 55
http://bar/foo.png 100 32

The body items (each line) consist of the following parts:

  1. encoded url
  2. external source data offset
  3. external source data length

This example describes the following process to rebuild the input file from the above example:

  1. Read 55 bytes from http://foo/bar.png starting at an offset of 23 bytes.
  2. Append this data to the output file.
  3. Read 32 bytes from http://bar/foo.png starting at an offset of 100 bytes.
  4. Append this data to the output file.

TODO

Backup

Add --backup-level N option to create multiple manifest items for a buffer. This is like having replication/a backup for part of a file. If one URL source fails, a backup URL can be used.

About

Opacify reads a file and builds a manifest of external sources to rebuild said file.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages