Skip to content

Commit

Permalink
fix parallel AddIndex issue, hash table size issue and configurable r…
Browse files Browse the repository at this point in the history
…efine iteration issue (#105)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* remove the tbb dependency (#71) (#10)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* fix type definition

* change incremental update design

* fix all type

* fix debug mode memory delete assert

* add deletePercentageForRefine judgement

* add dump and load from byte array

* add dump and load from byte array

* fix getNumThreads

* fix loadindex and add index bugs

* Update AlgoTest to add metamapping test

* fix compling error in g++7

* fix largest cluster cannot be split during clustering

* update fresh ANN implementation (#85) (#12)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* remove the tbb dependency (#71) (#10)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* fix type definition

* change incremental update design

* fix all type

* fix debug mode memory delete assert

* add deletePercentageForRefine judgement

* add dump and load from byte array

* add dump and load from byte array

* fix getNumThreads

* fix loadindex and add index bugs

* Update AlgoTest to add metamapping test

* fix compling error in g++7

* fix largest cluster cannot be split during clustering

* fix maxcluster is -1 bug

* fix Reader type definition and add more support

* fix maxcluster is -1 bug (#91) (#14)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* remove the tbb dependency (#71) (#10)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* fix type definition

* change incremental update design

* fix all type

* fix debug mode memory delete assert

* add deletePercentageForRefine judgement

* add dump and load from byte array

* add dump and load from byte array

* fix getNumThreads

* fix loadindex and add index bugs

* Update AlgoTest to add metamapping test

* fix compling error in g++7

* fix largest cluster cannot be split during clustering

* update fresh ANN implementation (#85) (#12)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* remove the tbb dependency (#71) (#10)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* fix type definition

* change incremental update design

* fix all type

* fix debug mode memory delete assert

* add deletePercentageForRefine judgement

* add dump and load from byte array

* add dump and load from byte array

* fix getNumThreads

* fix loadindex and add index bugs

* Update AlgoTest to add metamapping test

* fix compling error in g++7

* fix largest cluster cannot be split during clustering

* fix maxcluster is -1 bug

* move threadPool init into DefaultReader

* try to move VectorsetReader into CordLibrary

* fix bktree cluster split issue

* remove spaces and fix newCount is zero issue

* Merge from microsoft.SPTAG (#15)

* fix maxcluster is -1 bug (#91)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* remove the tbb dependency (#71) (#10)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* fix type definition

* change incremental update design

* fix all type

* fix debug mode memory delete assert

* add deletePercentageForRefine judgement

* add dump and load from byte array

* add dump and load from byte array

* fix getNumThreads

* fix loadindex and add index bugs

* Update AlgoTest to add metamapping test

* fix compling error in g++7

* fix largest cluster cannot be split during clustering

* update fresh ANN implementation (#85) (#12)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* remove the tbb dependency (#71) (#10)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* fix type definition

* change incremental update design

* fix all type

* fix debug mode memory delete assert

* add deletePercentageForRefine judgement

* add dump and load from byte array

* add dump and load from byte array

* fix getNumThreads

* fix loadindex and add index bugs

* Update AlgoTest to add metamapping test

* fix compling error in g++7

* fix largest cluster cannot be split during clustering

* fix maxcluster is -1 bug

* fix some type definition in the Reader and add more support to create Reader (#93)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* remove the tbb dependency (#71) (#10)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* fix type definition

* change incremental update design

* fix all type

* fix debug mode memory delete assert

* add deletePercentageForRefine judgement

* add dump and load from byte array

* add dump and load from byte array

* fix getNumThreads

* fix loadindex and add index bugs

* Update AlgoTest to add metamapping test

* fix compling error in g++7

* fix largest cluster cannot be split during clustering

* update fresh ANN implementation (#85) (#12)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* remove the tbb dependency (#71) (#10)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* fix type definition

* change incremental update design

* fix all type

* fix debug mode memory delete assert

* add deletePercentageForRefine judgement

* add dump and load from byte array

* add dump and load from byte array

* fix getNumThreads

* fix loadindex and add index bugs

* Update AlgoTest to add metamapping test

* fix compling error in g++7

* fix largest cluster cannot be split during clustering

* fix maxcluster is -1 bug

* fix Reader type definition and add more support

* fix maxcluster is -1 bug (#91) (#14)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* remove the tbb dependency (#71) (#10)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* fix type definition

* change incremental update design

* fix all type

* fix debug mode memory delete assert

* add deletePercentageForRefine judgement

* add dump and load from byte array

* add dump and load from byte array

* fix getNumThreads

* fix loadindex and add index bugs

* Update AlgoTest to add metamapping test

* fix compling error in g++7

* fix largest cluster cannot be split during clustering

* update fresh ANN implementation (#85) (#12)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* remove the tbb dependency (#71) (#10)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* fix type definition

* change incremental update design

* fix all type

* fix debug mode memory delete assert

* add deletePercentageForRefine judgement

* add dump and load from byte array

* add dump and load from byte array

* fix getNumThreads

* fix loadindex and add index bugs

* Update AlgoTest to add metamapping test

* fix compling error in g++7

* fix largest cluster cannot be split during clustering

* fix maxcluster is -1 bug

* move threadPool init into DefaultReader

* try to move VectorsetReader into CordLibrary

* fix bktree cluster split issue

* fix merge issues

* fix space issues

* fix files in VectorSetReaders directory are not included in CMakeLists.txt bug

* remove VectorSetReaders from indexbuilder

* add copy right

* fix refine iterations usage

* try to fix hash table size issue

* try to use maxCheckForRefineGraph in the build stage

* use maxcheckforrefinegraph

* enlarge nodecheckstatus hash table size

* fix pool size

* try to fix FineGrainedLock

* fix FineGrainLock concurrent issue

* try to fix add meta concurrent issue

* move AddIndex to each algorithm

* avoid write lock in the FineGrainLock

* optimize the insertneighbor performance

* fix hashtable size issue

* try to remove finegrained lock

* remove finegrainlock and fix insertneighbors

* fix CLR and Core Wrapper

* remove add log

* try to mergeindex in parallel add mode

* remove parallel add

* add parallel add

* try to make it parallel

* fix pool size

* support rebuild tree in the backend

* add background rebuild tree thread

* add buildmetaindex support for addindex operation

* fix some implementations

* fix rebuild and search delete issues

* fix refine for BKT

* fix add rebuild tree job

* fix compile issue in azure pipeline

* enable AVX2 in Linux

* change avx to sse

* try to fix aligned_malloc

* avx support check

* add linux avx support flag

* avoid exec jobs after destroy

* fix all delete and then insert error

* fix print percentage overflow

* try to fix graph save issue and delete performance issue

* Add RefineIndex to a newIndex and fix RefineIndex bugs

* fix Dataset Refine must return a value issue

* try to use one thread for tree rebuild

* try to use one thread for tree rebuild

* fix different compiler issue

* fix BOOST_CHECK cannot be used in multi thread issue

* fix set num of threads in the child thread issue

* fix m_workspacepool init problem

* change the swap interface to rebuild and remove the lock in the labelset

* rename m_deleted in labelset to m_inserted
  • Loading branch information
MaggieQi committed Dec 10, 2019
1 parent 92a42ff commit 4f7290a
Show file tree
Hide file tree
Showing 33 changed files with 953 additions and 428 deletions.
2 changes: 2 additions & 0 deletions AnnService/CoreLibrary.vcxproj
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,7 @@
</ItemDefinitionGroup>
<ItemGroup>
<ClInclude Include="inc\Core\Common\FineGrainedLock.h" />
<ClInclude Include="inc\Core\Common\Labelset.h" />
<ClInclude Include="inc\Core\Common\WorkSpace.h" />
<ClInclude Include="inc\Core\Common\CommonUtils.h" />
<ClInclude Include="inc\Core\Common\Dataset.h" />
Expand Down Expand Up @@ -164,6 +165,7 @@
<ClInclude Include="inc\Core\Common\RelativeNeighborhoodGraph.h" />
<ClInclude Include="inc\Core\Common\BKTree.h" />
<ClInclude Include="inc\Core\Common\KDTree.h" />
<ClInclude Include="inc\Helper\ThreadPool.h" />
<ClInclude Include="inc\Helper\VectorSetReader.h" />
<ClInclude Include="inc\Helper\VectorSetReaders\DefaultReader.h" />
</ItemGroup>
Expand Down
6 changes: 6 additions & 0 deletions AnnService/CoreLibrary.vcxproj.filters
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,12 @@
<ClInclude Include="inc\Helper\VectorSetReader.h">
<Filter>Header Files\Helper</Filter>
</ClInclude>
<ClInclude Include="inc\Helper\ThreadPool.h">
<Filter>Header Files\Helper</Filter>
</ClInclude>
<ClInclude Include="inc\Core\Common\Labelset.h">
<Filter>Header Files\Core\Common</Filter>
</ClInclude>
</ItemGroup>
<ItemGroup>
<ClCompile Include="src\Core\VectorIndex.cpp">
Expand Down
2 changes: 0 additions & 2 deletions AnnService/IndexBuilder.vcxproj
Original file line number Diff line number Diff line change
Expand Up @@ -138,12 +138,10 @@
</ItemDefinitionGroup>
<ItemGroup>
<ClInclude Include="inc\IndexBuilder\Options.h" />
<ClInclude Include="inc\IndexBuilder\ThreadPool.h" />
</ItemGroup>
<ItemGroup>
<ClCompile Include="src\IndexBuilder\main.cpp" />
<ClCompile Include="src\IndexBuilder\Options.cpp" />
<ClCompile Include="src\IndexBuilder\ThreadPool.cpp" />
</ItemGroup>
<ItemGroup>
<None Include="packages.config" />
Expand Down
11 changes: 4 additions & 7 deletions AnnService/IndexBuilder.vcxproj.filters
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<?xml version="1.0" encoding="utf-8"?>
<?xml version="1.0" encoding="utf-8"?>
<Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<ItemGroup>
<Filter Include="Source Files">
Expand All @@ -14,9 +14,6 @@
<ClInclude Include="inc\IndexBuilder\Options.h">
<Filter>Header Files</Filter>
</ClInclude>
<ClInclude Include="inc\IndexBuilder\ThreadPool.h">
<Filter>Header Files</Filter>
</ClInclude>
</ItemGroup>
<ItemGroup>
<ClCompile Include="src\IndexBuilder\Options.cpp">
Expand All @@ -25,8 +22,8 @@
<ClCompile Include="src\IndexBuilder\main.cpp">
<Filter>Source Files</Filter>
</ClCompile>
<ClCompile Include="src\IndexBuilder\ThreadPool.cpp">
<Filter>Source Files</Filter>
</ClCompile>
</ItemGroup>
<ItemGroup>
<None Include="packages.config" />
</ItemGroup>
</Project>
40 changes: 29 additions & 11 deletions AnnService/inc/Core/BKT/Index.h
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,13 @@
#include "../Common/WorkSpacePool.h"
#include "../Common/RelativeNeighborhoodGraph.h"
#include "../Common/BKTree.h"
#include "inc/Helper/ConcurrentSet.h"
#include "../Common/Labelset.h"
#include "inc/Helper/SimpleIniReader.h"
#include "inc/Helper/StringConvert.h"
#include "inc/Helper/ThreadPool.h"

#include <functional>
#include <mutex>
#include <shared_mutex>

namespace SPTAG
{
Expand All @@ -35,6 +36,18 @@ namespace SPTAG
template<typename T>
class Index : public VectorIndex
{
class RebuildJob : public Helper::ThreadPool::Job {
public:
RebuildJob(VectorIndex* p_index, COMMON::BKTree* p_tree, COMMON::RelativeNeighborhoodGraph* p_graph) : m_index(p_index), m_tree(p_tree), m_graph(p_graph) {}
void exec() {
m_tree->Rebuild<T>(m_index);
}
private:
VectorIndex* m_index;
COMMON::BKTree* m_tree;
COMMON::RelativeNeighborhoodGraph* m_graph;
};

private:
// data points
COMMON::Dataset<T> m_pSamples;
Expand All @@ -50,12 +63,16 @@ namespace SPTAG
std::string m_sDataPointsFilename;
std::string m_sDeleteDataPointsFilename;

std::mutex m_dataAddLock; // protect data and graph
Helper::Concurrent::ConcurrentSet<SizeType> m_deletedID;
int m_addCountForRebuild;
float m_fDeletePercentageForRefine;
std::unique_ptr<COMMON::WorkSpacePool> m_workSpacePool;
std::mutex m_dataAddLock; // protect data and graph
std::shared_timed_mutex m_dataDeleteLock;
COMMON::Labelset m_deletedID;

std::unique_ptr<COMMON::WorkSpacePool> m_workSpacePool;
Helper::ThreadPool m_threadPool;
int m_iNumberOfThreads;

DistCalcMethod m_iDistCalcMethod;
float(*m_fComputeDistance)(const T* pX, const T* pY, DimensionType length);

Expand Down Expand Up @@ -89,15 +106,15 @@ namespace SPTAG

inline float ComputeDistance(const void* pX, const void* pY) const { return m_fComputeDistance((const T*)pX, (const T*)pY, m_pSamples.C()); }
inline const void* GetSample(const SizeType idx) const { return (void*)m_pSamples[idx]; }
inline bool ContainSample(const SizeType idx) const { return !m_deletedID.contains(idx); }
inline bool NeedRefine() const { return m_deletedID.size() >= (size_t)(GetNumSamples() * m_fDeletePercentageForRefine); }
inline bool ContainSample(const SizeType idx) const { return !m_deletedID.Contains(idx); }
inline bool NeedRefine() const { return m_deletedID.Count() >= (size_t)(GetNumSamples() * m_fDeletePercentageForRefine); }
std::shared_ptr<std::vector<std::uint64_t>> BufferSize() const
{
std::shared_ptr<std::vector<std::uint64_t>> buffersize(new std::vector<std::uint64_t>);
buffersize->push_back(m_pSamples.BufferSize());
buffersize->push_back(m_pTrees.BufferSize());
buffersize->push_back(m_pGraph.BufferSize());
buffersize->push_back(m_deletedID.bufferSize());
buffersize->push_back(m_deletedID.BufferSize());
return std::move(buffersize);
}

Expand All @@ -110,8 +127,8 @@ namespace SPTAG
ErrorCode LoadIndexDataFromMemory(const std::vector<ByteArray>& p_indexBlobs);

ErrorCode BuildIndex(const void* p_data, SizeType p_vectorNum, DimensionType p_dimension);
ErrorCode SearchIndex(QueryResult &p_query) const;
ErrorCode AddIndex(const void* p_vectors, SizeType p_vectorNum, DimensionType p_dimension, SizeType* p_start = nullptr);
ErrorCode SearchIndex(QueryResult &p_query, bool p_searchDeleted = false) const;
ErrorCode AddIndex(const void* p_data, SizeType p_vectorNum, DimensionType p_dimension, std::shared_ptr<MetadataSet> p_metadataSet, bool p_withMetaIndex = false);
ErrorCode DeleteIndex(const void* p_vectors, SizeType p_vectorNum);
ErrorCode DeleteIndex(const SizeType& p_id);

Expand All @@ -120,9 +137,10 @@ namespace SPTAG

ErrorCode RefineIndex(const std::string& p_folderPath);
ErrorCode RefineIndex(const std::vector<std::ostream*>& p_indexStreams);
ErrorCode RefineIndex(std::shared_ptr<VectorIndex>& p_newIndex);

private:
void SearchIndexWithDeleted(COMMON::QueryResultSet<T> &p_query, COMMON::WorkSpace &p_space, const Helper::Concurrent::ConcurrentSet<SizeType> &p_deleted) const;
void SearchIndexWithDeleted(COMMON::QueryResultSet<T> &p_query, COMMON::WorkSpace &p_space) const;
void SearchIndexWithoutDeleted(COMMON::QueryResultSet<T> &p_query, COMMON::WorkSpace &p_space) const;
};
} // namespace BKT
Expand Down
5 changes: 3 additions & 2 deletions AnnService/inc/Core/BKT/ParameterDefinitionList.h
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,15 @@ DefineBKTParameter(m_pGraph.m_numTopDimensionTPTSplit, int, 5L, "NumTopDimension
DefineBKTParameter(m_pGraph.m_iNeighborhoodSize, DimensionType, 32L, "NeighborhoodSize")
DefineBKTParameter(m_pGraph.m_iNeighborhoodScale, int, 2L, "GraphNeighborhoodScale")
DefineBKTParameter(m_pGraph.m_iCEFScale, int, 2L, "GraphCEFScale")
DefineBKTParameter(m_pGraph.m_iRefineIter, int, 0L, "RefineIterations")
DefineBKTParameter(m_pGraph.m_iRefineIter, int, 2L, "RefineIterations")
DefineBKTParameter(m_pGraph.m_iCEF, int, 1000L, "CEF")
DefineBKTParameter(m_pGraph.m_iMaxCheckForRefineGraph, int, 10000L, "MaxCheckForRefineGraph")
DefineBKTParameter(m_pGraph.m_iMaxCheckForRefineGraph, int, 8192L, "MaxCheckForRefineGraph")

DefineBKTParameter(m_iNumberOfThreads, int, 1L, "NumberOfThreads")
DefineBKTParameter(m_iDistCalcMethod, SPTAG::DistCalcMethod, SPTAG::DistCalcMethod::Cosine, "DistCalcMethod")

DefineBKTParameter(m_fDeletePercentageForRefine, float, 0.4F, "DeletePercentageForRefine")
DefineBKTParameter(m_addCountForRebuild, int, 1000, "AddCountForRebuild")
DefineBKTParameter(m_iMaxCheck, int, 8192L, "MaxCheck")
DefineBKTParameter(m_iThresholdOfNumberOfContinuousNoBetterPropagation, int, 3L, "ThresholdOfNumberOfContinuousNoBetterPropagation")
DefineBKTParameter(m_iNumberOfInitialDynamicPivots, int, 50L, "NumberOfInitialDynamicPivots")
Expand Down
60 changes: 42 additions & 18 deletions AnnService/inc/Core/Common/BKTree.h
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
#include <stack>
#include <string>
#include <vector>
#include <shared_mutex>

#include "../VectorIndex.h"

Expand Down Expand Up @@ -46,25 +47,25 @@ namespace SPTAG
T* newTCenters;

KmeansArgs(int k, DimensionType dim, SizeType datasize, int threadnum) : _K(k), _D(dim), _T(threadnum) {
centers = new T[k * dim];
centers = (T*)aligned_malloc(sizeof(T) * k * dim, ALIGN);
newTCenters = (T*)aligned_malloc(sizeof(T) * k * dim, ALIGN);
counts = new SizeType[k];
newCenters = new float[threadnum * k * dim];
newCounts = new SizeType[threadnum * k];
label = new int[datasize];
clusterIdx = new SizeType[threadnum * k];
clusterDist = new float[threadnum * k];
newTCenters = new T[k * dim];
}

~KmeansArgs() {
delete[] centers;
aligned_free(centers);
aligned_free(newTCenters);
delete[] counts;
delete[] newCenters;
delete[] newCounts;
delete[] label;
delete[] clusterIdx;
delete[] clusterDist;
delete[] newTCenters;
}

inline void ClearCounts() {
Expand Down Expand Up @@ -106,23 +107,41 @@ namespace SPTAG
class BKTree
{
public:
BKTree(): m_iTreeNumber(1), m_iBKTKmeansK(32), m_iBKTLeafSize(8), m_iSamples(1000) {}
BKTree(): m_iTreeNumber(1), m_iBKTKmeansK(32), m_iBKTLeafSize(8), m_iSamples(1000), m_lock(new std::shared_timed_mutex) {}

BKTree(BKTree& other): m_iTreeNumber(other.m_iTreeNumber),
m_iBKTKmeansK(other.m_iBKTKmeansK),
m_iBKTLeafSize(other.m_iBKTLeafSize),
m_iSamples(other.m_iSamples) {}
m_iSamples(other.m_iSamples),
m_lock(new std::shared_timed_mutex) {}
~BKTree() {}

inline const BKTNode& operator[](SizeType index) const { return m_pTreeRoots[index]; }
inline BKTNode& operator[](SizeType index) { return m_pTreeRoots[index]; }

inline SizeType size() const { return (SizeType)m_pTreeRoots.size(); }

inline SizeType sizePerTree() const {
std::shared_lock<std::shared_timed_mutex> lock(*m_lock);
return (SizeType)m_pTreeRoots.size() - m_pTreeStart.back();
}

inline const std::unordered_map<SizeType, SizeType>& GetSampleMap() const { return m_pSampleCenterMap; }

template <typename T>
void BuildTrees(VectorIndex* index, std::vector<SizeType>* indices = nullptr)
void Rebuild(VectorIndex* p_index)
{
BKTree newTrees(*this);
newTrees.BuildTrees<T>(p_index, nullptr, nullptr, 1);

std::unique_lock<std::shared_timed_mutex> lock(*m_lock);
m_pTreeRoots.swap(newTrees.m_pTreeRoots);
m_pTreeStart.swap(newTrees.m_pTreeStart);
m_pSampleCenterMap.swap(newTrees.m_pSampleCenterMap);
}

template <typename T>
void BuildTrees(VectorIndex* index, std::vector<SizeType>* indices = nullptr, std::vector<SizeType>* reverseIndices = nullptr, int numOfThreads = omp_get_num_threads())
{
struct BKTStackItem {
SizeType index, first, last;
Expand All @@ -133,12 +152,12 @@ namespace SPTAG
std::vector<SizeType> localindices;
if (indices == nullptr) {
localindices.resize(index->GetNumSamples());
for (SizeType i = 0; i < index->GetNumSamples(); i++) localindices[i] = i;
for (SizeType i = 0; i < localindices.size(); i++) localindices[i] = i;
}
else {
localindices.assign(indices->begin(), indices->end());
}
KmeansArgs<T> args(m_iBKTKmeansK, index->GetFeatureDim(), (SizeType)localindices.size(), omp_get_num_threads());
KmeansArgs<T> args(m_iBKTKmeansK, index->GetFeatureDim(), (SizeType)localindices.size(), numOfThreads);

m_pSampleCenterMap.clear();
for (char i = 0; i < m_iTreeNumber; i++)
Expand All @@ -156,26 +175,29 @@ namespace SPTAG
m_pTreeRoots[item.index].childStart = newBKTid;
if (item.last - item.first <= m_iBKTLeafSize) {
for (SizeType j = item.first; j < item.last; j++) {
m_pTreeRoots.push_back(BKTNode(localindices[j]));
SizeType cid = (reverseIndices == nullptr)? localindices[j]: reverseIndices->at(localindices[j]);
m_pTreeRoots.push_back(BKTNode(cid));
}
}
else { // clustering the data into BKTKmeansK clusters
int numClusters = KmeansClustering(index, localindices, item.first, item.last, args);
if (numClusters <= 1) {
SizeType end = min(item.last + 1, (SizeType)localindices.size());
std::sort(localindices.begin() + item.first, localindices.begin() + end);
m_pTreeRoots[item.index].centerid = localindices[item.first];
m_pTreeRoots[item.index].centerid = (reverseIndices == nullptr) ? localindices[item.first] : reverseIndices->at(localindices[item.first]);
m_pTreeRoots[item.index].childStart = -m_pTreeRoots[item.index].childStart;
for (SizeType j = item.first + 1; j < end; j++) {
m_pTreeRoots.push_back(BKTNode(localindices[j]));
m_pSampleCenterMap[localindices[j]] = m_pTreeRoots[item.index].centerid;
SizeType cid = (reverseIndices == nullptr) ? localindices[j] : reverseIndices->at(localindices[j]);
m_pTreeRoots.push_back(BKTNode(cid));
m_pSampleCenterMap[cid] = m_pTreeRoots[item.index].centerid;
}
m_pSampleCenterMap[-1 - m_pTreeRoots[item.index].centerid] = item.index;
}
else {
for (int k = 0; k < m_iBKTKmeansK; k++) {
if (args.counts[k] == 0) continue;
m_pTreeRoots.push_back(BKTNode(localindices[item.first + args.counts[k] - 1]));
SizeType cid = (reverseIndices == nullptr) ? localindices[item.first + args.counts[k] - 1] : reverseIndices->at(localindices[item.first + args.counts[k] - 1]);
m_pTreeRoots.push_back(BKTNode(cid));
if (args.counts[k] > 1) ss.push(BKTStackItem(newBKTid++, item.first, item.first + args.counts[k] - 1));
item.first += args.counts[k];
}
Expand All @@ -195,6 +217,7 @@ namespace SPTAG

bool SaveTrees(std::ostream& p_outstream) const
{
std::shared_lock<std::shared_timed_mutex> lock(*m_lock);
p_outstream.write((char*)&m_iTreeNumber, sizeof(int));
p_outstream.write((char*)m_pTreeStart.data(), sizeof(SizeType) * m_iTreeNumber);
SizeType treeNodeSize = (SizeType)m_pTreeRoots.size();
Expand Down Expand Up @@ -270,7 +293,7 @@ namespace SPTAG
void SearchTrees(const VectorIndex* p_index, const COMMON::QueryResultSet<T> &p_query,
COMMON::WorkSpace &p_space, const int p_limits) const
{
do
while (!p_space.m_SPTQueue.empty())
{
COMMON::HeapCell bcell = p_space.m_SPTQueue.pop();
const BKTNode& tnode = m_pTreeRoots[bcell.node];
Expand All @@ -290,7 +313,7 @@ namespace SPTAG
p_space.m_SPTQueue.insert(COMMON::HeapCell(begin, p_index->ComputeDistance((const void*)p_query.GetTarget(), p_index->GetSample(index))));
}
}
} while (!p_space.m_SPTQueue.empty());
}
}

private:
Expand All @@ -300,11 +323,11 @@ namespace SPTAG
std::vector<SizeType>& indices,
const SizeType first, const SizeType last, KmeansArgs<T>& args, const bool updateCenters) const {
float currDist = 0;
int threads = omp_get_num_threads();
int threads = args._T;
float lambda = (updateCenters) ? COMMON::Utils::GetBase<T>() * COMMON::Utils::GetBase<T>() / (100.0f * (last - first)) : 0.0f;
SizeType subsize = (last - first - 1) / threads + 1;

#pragma omp parallel for
#pragma omp parallel for num_threads(threads)
for (int tid = 0; tid < threads; tid++)
{
SizeType istart = first + tid * subsize;
Expand Down Expand Up @@ -483,6 +506,7 @@ namespace SPTAG
std::unordered_map<SizeType, SizeType> m_pSampleCenterMap;

public:
std::unique_ptr<std::shared_timed_mutex> m_lock;
int m_iTreeNumber, m_iBKTKmeansK, m_iBKTLeafSize, m_iSamples;
};
}
Expand Down

0 comments on commit 4f7290a

Please sign in to comment.