dedup_simulation

This deduplication simulator was used for two research papers [1,2]. An earlier, java-based version of this simulator was used in an other paper [5]. It uses deduplication traces as generated by Meister's fs-c tool (https://github.com/dmeister/fs-c) as input.

The implemented deduplication approaches are:

container caching The basic deduplication approach introduced by Zhu et. al [3]
Sparse Indexing The basic deduplication approach introduced by Lillibridge et. al [4]
Block Locality Caching The basic deduplication approach introduced by Meister et. al [5]
Sorted Chunk Indexing The basic deduplication approach introduced by Kaiser et. al [2]

As discussed in [1], the first three approaches are possible in a version performing exact deduplication as well as a version performing approximate deduplication. Both versions are included. The approximate versions of container caching and Block Locality Caching are called sampling. There is no extra sampling version of Sparse Indexing as the design for its approximate and exact version is too small to implement them in two files. In fact, only the exact version is simulated and it's possible to extract the results for the approximate version based on the result files. Please see [1] for a discussion about the design.

Implementation

The simulator is written in Go (https://golang.org/). It depends on two third-party libraries:

protocol buffers: Meister's fs-c tool encodes the deduplication traces as protobuf messages. Therefore, this simulator must decode this messages to perform its task. It uses the gogoprotobuf library (https://github.com/gogo/protobuf) for this tasks because of performance reasons. The .proto message definitions are in the de_pc2_dedup_fschunk directory.
seelog : A simple logging library for cmd-line outputs. (https://github.com/cihub/seelog)

References

[1] Jürgen Kaiser, Dirk Meister, André Brinkmann, and Tim Süß. Deriving and Comparing Deduplication Techniques Using a Model-Based Classification. In Proceedings of the European Conference on Computer Systems (EuroSys). ACM, 2015.

[2] Jürgen Kaiser, Tim Süß, Lars Nagel, and André Brinkmann. Sorted Deduplication: How to Process Thousands of Backup Streams. In Proceedings of the 32th IEEE Conference on Massive Storage Systems and Technologies (MSST). IEEE, May 2016.

[3] Benjamin Zhu, Kai Li, and R. Hugo Patterson. Avoiding the Disk Bottleneck in the Data Domain Deduplication File System. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST)

[4] Mark Lillibridge, Kave Eshghi, Deepavali Bhagwat, Vinay Deolalikar, Greg Trezis, and Peter Camble. Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality. In Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST)

[5] Dirk Meister, Jürgen Kaiser, André Brinkmann. Block Locality Caching for Data Deduplication. In Proceedings of the 6th International Systems and Storage Conference. ACM, 2013

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
common		common
de_pc2_dedup_fschunk		de_pc2_dedup_fschunk
dedupAlgorithms		dedupAlgorithms
main		main
README.md		README.md
install_dependencies.sh		install_dependencies.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

common

common

de_pc2_dedup_fschunk

de_pc2_dedup_fschunk

dedupAlgorithms

dedupAlgorithms

main

main

README.md

README.md

install_dependencies.sh

install_dependencies.sh

Repository files navigation

dedup_simulation

Implementation

References

About

Releases

Packages

Languages

jkaiser/dedup_simulation

Folders and files

Latest commit

History

Repository files navigation

dedup_simulation

Implementation

References

About

Resources

Stars

Watchers

Forks

Languages