A simple proof-of-concept C++ class object that performs caching of frequently accessed items.
Table of Contents
This is the given assignment:
This project is to develop a simple concept program that performs caching of frequently accessed items, where the size of the cache is quite small compared to the number of the items.
* There are multiple readers and writers of items. Number of readers and writers comes from Readers and Writers file which is passed to the program from command line.
* Design an efficient mechanism to retrieve items that are cached.
* Design and implement an efficient eviction mechanism when there is a cache miss.
* Handle the case of dirty cache item, where a writer is modifying the item that is already cached.
* Your program should handle all corner cases and robust to failures. If it fails, it should fail gracefully giving out meaningful failure messages.
* List out all the assumptions you have made for any unclear requirements.
This is an example of how you may give instructions on setting up your project locally. To get a local copy up and running follow these simple example steps.
-
clang
-
cmake
- The CMake support is directly integrated since Visual Studio 2017. CMake projects in Visual Studio: Installation
-
ninja
- The CMake support is directly integrated since Visual Studio 2017
-
cmake
-
ninja
- Linux: Installing Ninja Build
-
clang, or gcc/ g++
- Linux: clang, or gcc/ g++ is installed by default on most Linux distros
Launch Visual Studio:
- Select
Open a local folder
, and open the root directory (this repository). - Select
<preset_name>
fromConfiguration
. - Select
Build
->Build All
Where <preset_name>
can be one of the following configurations:
x64-debug
,x64-release
,x86-debug
,x86-release
In the root directory (this repository), execute the command below:
mkdir build
cmake --preset <preset_name>
cmake --build ./build/<preset_name> --clean-first
Where <preset_name>
can be one of the following configurations:
- Linux:
linux-debug
,linux-release
The executable(test_memcacher
) is created in the current directory(build
).
-
Open terminal, copy the data files in folder
data
to folderbuild/<preset_name>/examples
:- Items.txt
- Readers.txt
- Reader1.txt
- Writers.txt
- Writer1.txt
-
Navigate to
build/<preset_name>/examples
directory, run the binary/executable like below:
./cacher <size_of_cache> <reader_file> <writer_file> <items_file>
-
<size_of_cache>
is integer value indicating number of elements that the cache can hold at any given time. -
<reader_file>
is a file that contains list of names of reader files. Each reader file will contain list of positions and the value to be read. -
<writer_file>
is a file that contains list of names of writer files. Each writer file will contain list of positions and the value to be written. -
<items_file>
is the actual data file forcacher
.
Each line of the file is either blank or contains one number. The positions in the reader and writer files correspond to line numbers in the item file.
From the build/<preset_name>
folder in terminal, execute the following command:
./cacher 4 Readers.txt Writers.txt items.txt
Note that the items_file
( items.txt
in this example) will be modified and updated with the latest data modified/added at the end of program.
To use Memcacher
in your C++ project, you need to include the MemCacher.h
header file and link with the Memcacher
object library. (See ./example/cacher.cpp
)
The project requires implementation of a simple class object (MemCacher
)that functions like a memory cache, accepting read/ write requests from multiple threads.
-
LRU (Least-Recently-Used) Cache Scheme:
MemCacher
uses the LRU scheme to ensure that the most recently accessed items are retained in the cache while older and less frequently accessed items are evicted from the cache. -
Read- and Write-Through Cache Scheme: The
MemCacher
supports read- and write-through caching, meaning it automatically loads data from file, viaFileDB
when acache miss
orinvalid cache
occurs during read access; and writes dirty cache items back to disk upon eviction. -
Write-Around Cache Scheme: The
MemCacher
supports write-around caching, meaning when a write request is received, it will writes directly to file, viaFileDB
, by-passing the cache. If the data is currently in the cache, it will be flagged asinvalid
. -
Thread-Safe:
MemCacher
is designed to be thread-safe, allowing multiple threads to access the cache concurrently without causing data corruption or race conditions.
Read-through illustration from "Caching Strategies and How to Choose the Right One" at codeahoy.com |
All data have to be through from the cache MemCacher
. Any misses, the data have to be fetched from the file via FileDB
.
This strategy is suitable for read-heavy scenario, especially when the same data is access frequently. However, when the cached data becomes invalid (when using Write-Around scheme), the cached data have to be updated from file. Therefore, a write-heavy scenario could impact the Read-Through scheme.
Write-around illustration |
Write-through illustration from "Caching Strategies and How to Choose the Right One" at codeahoy.com |
Since there is no assumption on whether MemCacher
will be handling write-heavy or read-heavy scenarios, we are going implement both write-around and write-through schemes. This will be interesting for us to test and see how it affects occurence of different cache events.
A write-through scheme would be beneficial for heavy-read and heavy-write scenarios, since the cached data will be consistent (no invalid cached data).
A write-around scheme would be beneificial for a occasional-write scenario, where the cached item is less likely to become invalid.
LRU Scheme diagram from "LRU Cache Implementation" at medium.com |
It is assumed that there is no look-ahead
attempts on the readers
and writers
files. Thus, there will be no information on future access patterns or the likelihood of certain items being accessed, which can be used to optimized the cache to reduces misses. Eviction of cached item during Cache Full
will be solely based on their past usage. Furthermore, we will assume that the past usage is indicative of future usage.
Therefore, a simple Least-Recently-Used (LRU) scheme will be suitable for our implementation of MemCacher
. The LRU is implemented using the following data containers in MemCacher
:
///< list of least-recently-used items.
std::list<CacheItem> m_lru_q;
///< map to store iterators of cache items.
std::unordered_map<int, decltype(m_lru_q)::iterator> m_cache;
The first item in m_lru_q
will be the most recently accessed cache item.
The least-frequently used item will be at the back of m_lru_q
. (Note: m_lru_q
must be a std::list container as we want all iterators and references unaffected by erase and insertion operations)
Whenever a CacheItem
is accessed or added, it will be removed in re-inserted to the front of the m_lru_q
list.
Whenever an eviction is required, the last item in m_lru_q
list will be removed.
The m_cache
map contains paired data, a key and a iterator, which allows us to have quick reference to the CacheItem in the std::list, given a key.
The following assumptions are made about the reading and writing the data:
-
Since the given
items_file
example shows blank lines, we shall assume the data read from cache or file can be a empty string. -
Subsequently, data from
items_file
can be non-numbers. -
The first line in the
items_file
is taken to be position1
. Therefore, the keys associated with each paired string data must be an integer greater than zero. -
Subsquently, the data written to cache or file can be a empty string.
-
When the maximum lines in
items_file
is N, and a data write request is made at M-th position, where M > N; then the lines in the range (N,M) shall be blanks. -
When the maximum lines in
items_file
is N, and a data read request is made at M-th position, where M > N; then an error should be raised. -
When writing a string at the last line in
items_file
, it shall not end with a newline character. This is to avoid a new blank line being added to theitems_file
. -
When a position value read from
readers
file is less than 1, the read request shall be ignored. -
When a position value read from
writers
file is less than 1, the write request shall be ignored.
Basic considerations of cache misses
, dirty cache
, invalid cache
, and cache full
have be taken care of. The following steps will be taken when such events occurs:
-
Cache Miss
, Reader request a data not found in the cache.MemCacher
obeject will attempt to read the data fromitems_file
throughFileBD
object.- If
MemCacher
successfully read data fromFileDB
, it will write the data into cache (See alsoCache Full
), and update the Reader with the data; - Otherwise,
MemCacher
will update the Reader with error code.
- If
-
Invalid Cache
, Reader request a data that has been flaggeddirty
in cache.MemCacher
object will attempt to read the data fromitems_file
throughFileBD
object.- If
MemCacher
successfully read data fromFileDB
, it will update the data in the cache, set itsdirty
flag to FALSE, and update the Reader with the data; - Otherwise,
MemCacher
will update the Reader with error code.
- If
-
Dirty Cache
, Writer request a write to a data found in the cache inWrite-through
mode. Cached data is dirty, proceed to overwrite the cached data,dirty
flag remains TRUE. -
Dirty Cache
, Reader request a read to a data found in the cache. Cached data is dirty,MemCacher
object will attempt to read the data fromitems_file
throughFileBD
object.- If
MemCacher
successfully read data fromFileDB
, it will update the data in the cache (See alsoCache Full
), set itsdirty
flag to FALSE, and update the Reader with the data; - Otherwise,
MemCacher
will update the Reader with error code.
- If
-
Cache Full
,MemCacher
attempts to add data into cache but it is full.- Using the LRU scheme,the least-recently accessed cached item will be removed from cache.
- The newly added data will be added as most-recently used.
-
cacher.cpp
:main()
to show usage ofMemCacher
class object.- Reads command line options and filename parameters.
- Creates
MemCacher
object. - Creates and launches reader thread(s).
- Creates and launches writer thread(s).
- Reader thread(s) output
Reader*.out.txt
files
-
MemCacher.h
MemCacher.cpp
: DefinesMemCacher
class.- Simple Cache object with predefined size.
- Creates
FileDB
object to accessitems_file
. - Supports
Write-through
andWrite-Around
mode. - Store frequently accessed data in the cache.
- Use LRU scheme to swap out data when cache is full.
- Marks
dirty
cache data to be written toFileDB
. - Marks
invalid
cache data to be refreshed fromFileDB
.
-
FileDB.h
FileDB.cpp
: DefinesFileDB
class.- Simple file management object to handle data reading and writing request.
- Reads data from file, at a given line number.
- Write data to file, at a given line number.
-
tests/main.cpp
:main()
for unit testing usinggoogle-test
. -
tests/test_FileDB.h
: test cases forFileDB
-
tests/test_MemCacher.h
: test cases forMemCacher
-
tests/test_helper.h
: helper functions.
This repository contains:
cmake/
: CMake scripts.data/
: Sample data files that can be used.example/
: Main program to useMemCacher
incl/
: Header files listed abovemd_imgs/
: Image data for README.mdsrc/
: Source files listed abovetests/
: Source files for unit testing (dep.google-test
).CMakePresets.json
: file to define and manage different build configurations for a cmake project.CMakeLists.txt
: cmake configuration fileREADME.md
: This file
The folder data/
contains sample data files that can be use at jumpstart for testing. The files are as follows:
-
Items.txt, is a
items_file
which contains actual data. Each line of the file is either blank or contains one number. The first line in the file has position 1. Example contents ofitems_file
:0 23.5 -33 75.2 45 90.0 100 8 9 10000.0
Note: Line 4 and 11 is blank.
-
Readers.txt, is a
readers_file
which contains list ofreader
filenames to be read. Example contents of areaders_file
:Reader1.txt Reader2.txt Reader3.txt
In the above example, there are 3
reader
files listed. -
Reader1.txt, is a
reader
file which contains a list of line positions initems_file
that it reads from. Example contents of areader
:1 3 5 7 1 3 5 7
In the above example, the
reader
file shows data from line 1, 3, 5, 7, ... will be read. -
Writers.txt, is a
writers_file
which contains list ofwriter
filenames to be read. Example contents of awriters_file
:Writer1.txt Writer2.txt
In the above example, there are 2
writer
files listed. -
Writer1.txt, is a
writer
file which contain list of positions and the value to be written to either cache or file ( will be explained further later). Example contents of awriter
:1 100 2 200 4 300 5 500 8 800
In the above example, there are 5 write operations. The last write operation being writing
800
to the line position8
.
Wei Siong (Garfield) Lee - weisiong.lee@gmail.com
Project Link: https://github.com/leews2001/MemCacher