A deduplication lib built Over SIMHASH lib, this provides:
- Simplified but necessary apis.
- An extra api supports user pre-define key-words, which means this supports simhash for multi-languages.
- Detailed benchmarkings.
- A python binding.
It's a header lib, just include include
path.
Using python3.7 as example, run
mkdir build && cd build
cmake ../ -DOSIMHASH_PY_BIND=ON -DPYTHON_EXECUTABLE="/usr/bin/python3.7"
make -j12
The python moduls is located under build/bindings/python/pyosimhash.cpython-37m-x86_64-linux-gnu.so
.
python -m pip install ${OSIMHASH_PATH}
# or
python -m pip install https://github.com/innerNULL/osimhash.git
# or
python -m pip install https://github.com/innerNULL/osimhash/archive/refs/heads/main.zip
- Build Python binding first
- Run
cd build && python3 -m venv osimhash_example
source ./osimhash_example/bin/activate
pip install -r ../examples/python/requirements.txt
deactivate
Run
mkdir build && cd build
cmake ../
make -j12
./benchmarks/cpp/benchmarking
- Build Python binding first
- Run
cd build && python3 -m venv osimhash_benchmark
source ./osimhash_benchmark/bin/activate
pip install -r ../benchmarks/python/requirements.txt
python ../benchmarks/python/benchmarking.py
deactivate
See benchmark