tesla-cachefile

An addon to tesla-microservice to use a cachefile locally or on hdfs. In case of hdfs, the namenode can be automatically determined by querying a zookeeper every time the cache-file is read or written.

Usage

Add this to your project's dependencies:

Changelog

Version 0.4.0 changes: * require tesla-microservice 0.5.0 which has a scheduler in the base-system.

Version 0.3.3 changes:

bugfix: history files were compressed twice

Version 0.3.2 changes:

historization-strategy creates gzipped files

Version 0.3.0 changes:

Refactoring of repo-structure
Added a historization-strategy, which is used by the new file-historizer component

Version 0.2.0 changes:

Complete redesign of the API. The Filesystem is now treated as an immutable resource.

Version 0.1.2 changes:

added the possibility to cleanup hdfs-generations: (cleanup-generations [self]).
To make it work, you have to define a property called your.name.nr.gens.to.keep which specifies the number of generations with success-files to keep.
The delete is executed for the path you have specified by your.name.toplevel.path with the corresponding generations injected e.g. hdfs://namenode:1234/foo/bar/000001/subfolder

Version 0.1.0 has some major changes:

a folder is now configured by the property your.name.toplevel.path ({ZK_NAMENODE} and {GENERATION} can be used)
many files can now been written to the folder configured
generation-logic now works based on _SUCCESS-files: Read from latest generation with _SUCCESS-file + write to latest generation if _SUCCESS-file is not present or otherwise create and write to new generation

Version 0.0.10 has some api-changes:

write-cache-file now takes a line-seq as input
read-cache-file now takes an additional argument (read-fn), which is a function to accept a BufferedReader
slurp-cache-file has the old behaviour of getting the file's content as one big string.

Version 0.0.9 has some major changes:

the property hdfs.namenode has been removed. The namenode is now configured directly in the cache-file-path
you can use {ZK_NAMENODE} in your cache-file-path to determine the namenode from zookeeper
you can use {GENERATION} in your cache-file-path to read from the latest generation with the cache-file present and write to the latest generation if cache-file absent or otherwise to a new generation
uses [hdfs-clj "0.1.15"]

Cachefile-Handler component

The component, if used within a system, can be accessed using this protocol:

        (defprotocol GenerationHandling
          (folder-to-write-to [self] "Creates new generation directory and returns the path.")
          (folder-to-read-from [self] "Finds newest generation wit a success file and returns the path.")
          (write-success-file [self path] "Creates a file named _SUCCESS in the given , which is a marker for the other functions of this protocol")
          (cleanup-generations [self] "Determines n last successful generations and deletes any older generation."))

Local cachefile

Add your.name.toplevel.path to your properties pointing to e.g. /tmp/yourfolder
your.name is defined when adding the CacheFileHandler to your system:

(assoc :cachefile-handler (c/using (cfh/new-cachefile-handler "your-name") [:config :zookeeper]))

HDFS cachefile

Add your.name.toplevel.path to your properties pointing to e.g. hdfs://namenode:port/some/folder

cachefile with generations

Add your.name.toplevel.path to your properties pointing to e.g. hdfs://namenode:port/some/{GENERATION}/folder

Configuring a namenode via zookeeper

Add your.name.toplevel.path to your properties pointing to e.g. hdfs://{ZK_NAMENODE}/some/folder Add zookeeper.connect to your properties containing a valid zookeeper connection string. The module is currently looking for a namenode-string at a zk-node called /hadoop-ha/hadoop-ha/ActiveBreadCrumb.

File-Historizer component

The component, if used within a system, can be accessed using this protocol:

    (defprotocol HistorizationHandling
      (writer-for-timestamp [self timestamp] "Returns a PrintWriter-instance for the given timestamp (see Historization)"))

Historization

A new PrintWriter is returned for every new hour. This leads to the following fs-structure:

    └── output
        └── 2015
            └── 11
                └── 13
                    └── 13
                    |   └── a2586bd5-2636-4130-1fef-cd35af8e433k.hist.gz
                    └── 14
                        └── c7586bd8-2636-4130-9fef-cd35af8e433f.hist.gz

Initial Contributors

Christian Stamm, Kai Brandes, Torsten Mangner, Daley Chetwynd, Carl Düvel, Florian Weyandt

License

Apache License

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
src/de/otto/tesla/cachefile		src/de/otto/tesla/cachefile
test-resources		test-resources
test/de/otto/tesla/cachefile		test/de/otto/tesla/cachefile
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
project.clj		project.clj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tesla-cachefile

Usage

Changelog

Cachefile-Handler component

Local cachefile

HDFS cachefile

cachefile with generations

Configuring a namenode via zookeeper

File-Historizer component

Historization

Initial Contributors

License

About

Releases

Packages

Contributors 2

Languages

License

otto-de-legacy/tesla-cachefile

Folders and files

Latest commit

History

Repository files navigation

tesla-cachefile

Usage

Changelog

Cachefile-Handler component

Local cachefile

HDFS cachefile

cachefile with generations

Configuring a namenode via zookeeper

File-Historizer component

Historization

Initial Contributors

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages