Skip to content

Commit

Permalink
fix parallel AddIndex issue, hash table size issue and configurable r…
Browse files Browse the repository at this point in the history
…efine iteration issue (microsoft#105)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* remove the tbb dependency (microsoft#71) (microsoft#10)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* fix type definition

* change incremental update design

* fix all type

* fix debug mode memory delete assert

* add deletePercentageForRefine judgement

* add dump and load from byte array

* add dump and load from byte array

* fix getNumThreads

* fix loadindex and add index bugs

* Update AlgoTest to add metamapping test

* fix compling error in g++7

* fix largest cluster cannot be split during clustering

* update fresh ANN implementation (microsoft#85) (microsoft#12)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* remove the tbb dependency (microsoft#71) (microsoft#10)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* fix type definition

* change incremental update design

* fix all type

* fix debug mode memory delete assert

* add deletePercentageForRefine judgement

* add dump and load from byte array

* add dump and load from byte array

* fix getNumThreads

* fix loadindex and add index bugs

* Update AlgoTest to add metamapping test

* fix compling error in g++7

* fix largest cluster cannot be split during clustering

* fix maxcluster is -1 bug

* fix Reader type definition and add more support

* fix maxcluster is -1 bug (microsoft#91) (microsoft#14)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* remove the tbb dependency (microsoft#71) (microsoft#10)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* fix type definition

* change incremental update design

* fix all type

* fix debug mode memory delete assert

* add deletePercentageForRefine judgement

* add dump and load from byte array

* add dump and load from byte array

* fix getNumThreads

* fix loadindex and add index bugs

* Update AlgoTest to add metamapping test

* fix compling error in g++7

* fix largest cluster cannot be split during clustering

* update fresh ANN implementation (microsoft#85) (microsoft#12)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* remove the tbb dependency (microsoft#71) (microsoft#10)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* fix type definition

* change incremental update design

* fix all type

* fix debug mode memory delete assert

* add deletePercentageForRefine judgement

* add dump and load from byte array

* add dump and load from byte array

* fix getNumThreads

* fix loadindex and add index bugs

* Update AlgoTest to add metamapping test

* fix compling error in g++7

* fix largest cluster cannot be split during clustering

* fix maxcluster is -1 bug

* move threadPool init into DefaultReader

* try to move VectorsetReader into CordLibrary

* fix bktree cluster split issue

* remove spaces and fix newCount is zero issue

* Merge from microsoft.SPTAG (microsoft#15)

* fix maxcluster is -1 bug (microsoft#91)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* remove the tbb dependency (microsoft#71) (microsoft#10)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* fix type definition

* change incremental update design

* fix all type

* fix debug mode memory delete assert

* add deletePercentageForRefine judgement

* add dump and load from byte array

* add dump and load from byte array

* fix getNumThreads

* fix loadindex and add index bugs

* Update AlgoTest to add metamapping test

* fix compling error in g++7

* fix largest cluster cannot be split during clustering

* update fresh ANN implementation (microsoft#85) (microsoft#12)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* remove the tbb dependency (microsoft#71) (microsoft#10)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* fix type definition

* change incremental update design

* fix all type

* fix debug mode memory delete assert

* add deletePercentageForRefine judgement

* add dump and load from byte array

* add dump and load from byte array

* fix getNumThreads

* fix loadindex and add index bugs

* Update AlgoTest to add metamapping test

* fix compling error in g++7

* fix largest cluster cannot be split during clustering

* fix maxcluster is -1 bug

* fix some type definition in the Reader and add more support to create Reader (microsoft#93)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* remove the tbb dependency (microsoft#71) (microsoft#10)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* fix type definition

* change incremental update design

* fix all type

* fix debug mode memory delete assert

* add deletePercentageForRefine judgement

* add dump and load from byte array

* add dump and load from byte array

* fix getNumThreads

* fix loadindex and add index bugs

* Update AlgoTest to add metamapping test

* fix compling error in g++7

* fix largest cluster cannot be split during clustering

* update fresh ANN implementation (microsoft#85) (microsoft#12)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* remove the tbb dependency (microsoft#71) (microsoft#10)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* fix type definition

* change incremental update design

* fix all type

* fix debug mode memory delete assert

* add deletePercentageForRefine judgement

* add dump and load from byte array

* add dump and load from byte array

* fix getNumThreads

* fix loadindex and add index bugs

* Update AlgoTest to add metamapping test

* fix compling error in g++7

* fix largest cluster cannot be split during clustering

* fix maxcluster is -1 bug

* fix Reader type definition and add more support

* fix maxcluster is -1 bug (microsoft#91) (microsoft#14)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* remove the tbb dependency (microsoft#71) (microsoft#10)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* fix type definition

* change incremental update design

* fix all type

* fix debug mode memory delete assert

* add deletePercentageForRefine judgement

* add dump and load from byte array

* add dump and load from byte array

* fix getNumThreads

* fix loadindex and add index bugs

* Update AlgoTest to add metamapping test

* fix compling error in g++7

* fix largest cluster cannot be split during clustering

* update fresh ANN implementation (microsoft#85) (microsoft#12)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* remove the tbb dependency (microsoft#71) (microsoft#10)

* remove dup code

* Update Readme.md

* Fix DataSet GNU compile fail bug

* fix GNU Windows align alloc bugs

* add copyright in each file

* fix copy right in dataset

* change kdt distance judgement

* change code structure and add more wrappers

* Update docs

* fix search result

* change IndexBuilder to support binary input data

* temp remove java related projects

* remove javaclient and javacore from the windows build

* Fix SetData issue

* Add vector record count and dimension for reuse and debug

* change default parameter definition

* add uint8 support

* small fix for cosine distance of uint8

* fix AVX distance calculation epu8

* update readme

* Update DistanceUtils.h

* fix python wrapper cannot load larger than 4G memory error

* try to add C# wrapper

* fix owner of C# wrapper

* add C# cmake support

* fix byte array copy

* fix tab to space

* Try to make shared_ptr<T> as Array template

* fix copy

* add Parameters documents

* remove tbb dependency

* fix concurrent_set

* fix gcc 5.x cannot support shared_mutex

* move concurrentset to Helper folder and change find to contains

* Update README.md

* try to use shared_lock to replace lock and unlock, try to use block to manage the increased memory

* fix filling -1

* fix initialization

* change to memset

* add CLR CoreInterface for managed dll

* try to reserve incBlocks capacity

* fix return ErrorCode for AddBatch in Dataset.h

* change return type to ErrorCode for AddBatch

* fix type definition

* change incremental update design

* fix all type

* fix debug mode memory delete assert

* add deletePercentageForRefine judgement

* add dump and load from byte array

* add dump and load from byte array

* fix getNumThreads

* fix loadindex and add index bugs

* Update AlgoTest to add metamapping test

* fix compling error in g++7

* fix largest cluster cannot be split during clustering

* fix maxcluster is -1 bug

* move threadPool init into DefaultReader

* try to move VectorsetReader into CordLibrary

* fix bktree cluster split issue

* fix merge issues

* fix space issues

* fix files in VectorSetReaders directory are not included in CMakeLists.txt bug

* remove VectorSetReaders from indexbuilder

* add copy right

* fix refine iterations usage

* try to fix hash table size issue

* try to use maxCheckForRefineGraph in the build stage

* use maxcheckforrefinegraph

* enlarge nodecheckstatus hash table size

* fix pool size

* try to fix FineGrainedLock

* fix FineGrainLock concurrent issue

* try to fix add meta concurrent issue

* move AddIndex to each algorithm

* avoid write lock in the FineGrainLock

* optimize the insertneighbor performance

* fix hashtable size issue

* try to remove finegrained lock

* remove finegrainlock and fix insertneighbors

* fix CLR and Core Wrapper

* remove add log

* try to mergeindex in parallel add mode

* remove parallel add

* add parallel add

* try to make it parallel

* fix pool size

* support rebuild tree in the backend

* add background rebuild tree thread

* add buildmetaindex support for addindex operation

* fix some implementations

* fix rebuild and search delete issues

* fix refine for BKT

* fix add rebuild tree job

* fix compile issue in azure pipeline

* enable AVX2 in Linux

* change avx to sse

* try to fix aligned_malloc

* avx support check

* add linux avx support flag

* avoid exec jobs after destroy

* fix all delete and then insert error

* fix print percentage overflow

* try to fix graph save issue and delete performance issue

* Add RefineIndex to a newIndex and fix RefineIndex bugs

* fix Dataset Refine must return a value issue

* try to use one thread for tree rebuild

* try to use one thread for tree rebuild

* fix different compiler issue

* fix BOOST_CHECK cannot be used in multi thread issue

* fix set num of threads in the child thread issue

* fix m_workspacepool init problem

* change the swap interface to rebuild and remove the lock in the labelset

* rename m_deleted in labelset to m_inserted
  • Loading branch information
MaggieQi committed Dec 10, 2019
1 parent e064e24 commit 0fe1773
Show file tree
Hide file tree
Showing 33 changed files with 953 additions and 428 deletions.
2 changes: 2 additions & 0 deletions AnnService/CoreLibrary.vcxproj
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,7 @@
</ItemDefinitionGroup>
<ItemGroup>
<ClInclude Include="inc\Core\Common\FineGrainedLock.h" />
<ClInclude Include="inc\Core\Common\Labelset.h" />
<ClInclude Include="inc\Core\Common\WorkSpace.h" />
<ClInclude Include="inc\Core\Common\CommonUtils.h" />
<ClInclude Include="inc\Core\Common\Dataset.h" />
Expand Down Expand Up @@ -164,6 +165,7 @@
<ClInclude Include="inc\Core\Common\RelativeNeighborhoodGraph.h" />
<ClInclude Include="inc\Core\Common\BKTree.h" />
<ClInclude Include="inc\Core\Common\KDTree.h" />
<ClInclude Include="inc\Helper\ThreadPool.h" />
<ClInclude Include="inc\Helper\VectorSetReader.h" />
<ClInclude Include="inc\Helper\VectorSetReaders\DefaultReader.h" />
</ItemGroup>
Expand Down
6 changes: 6 additions & 0 deletions AnnService/CoreLibrary.vcxproj.filters
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,12 @@
<ClInclude Include="inc\Helper\VectorSetReader.h">
<Filter>Header Files\Helper</Filter>
</ClInclude>
<ClInclude Include="inc\Helper\ThreadPool.h">
<Filter>Header Files\Helper</Filter>
</ClInclude>
<ClInclude Include="inc\Core\Common\Labelset.h">
<Filter>Header Files\Core\Common</Filter>
</ClInclude>
</ItemGroup>
<ItemGroup>
<ClCompile Include="src\Core\VectorIndex.cpp">
Expand Down
2 changes: 0 additions & 2 deletions AnnService/IndexBuilder.vcxproj
Original file line number Diff line number Diff line change
Expand Up @@ -138,12 +138,10 @@
</ItemDefinitionGroup>
<ItemGroup>
<ClInclude Include="inc\IndexBuilder\Options.h" />
<ClInclude Include="inc\IndexBuilder\ThreadPool.h" />
</ItemGroup>
<ItemGroup>
<ClCompile Include="src\IndexBuilder\main.cpp" />
<ClCompile Include="src\IndexBuilder\Options.cpp" />
<ClCompile Include="src\IndexBuilder\ThreadPool.cpp" />
</ItemGroup>
<ItemGroup>
<None Include="packages.config" />
Expand Down
11 changes: 4 additions & 7 deletions AnnService/IndexBuilder.vcxproj.filters
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<?xml version="1.0" encoding="utf-8"?>
<?xml version="1.0" encoding="utf-8"?>
<Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<ItemGroup>
<Filter Include="Source Files">
Expand All @@ -14,9 +14,6 @@
<ClInclude Include="inc\IndexBuilder\Options.h">
<Filter>Header Files</Filter>
</ClInclude>
<ClInclude Include="inc\IndexBuilder\ThreadPool.h">
<Filter>Header Files</Filter>
</ClInclude>
</ItemGroup>
<ItemGroup>
<ClCompile Include="src\IndexBuilder\Options.cpp">
Expand All @@ -25,8 +22,8 @@
<ClCompile Include="src\IndexBuilder\main.cpp">
<Filter>Source Files</Filter>
</ClCompile>
<ClCompile Include="src\IndexBuilder\ThreadPool.cpp">
<Filter>Source Files</Filter>
</ClCompile>
</ItemGroup>
<ItemGroup>
<None Include="packages.config" />
</ItemGroup>
</Project>
40 changes: 29 additions & 11 deletions AnnService/inc/Core/BKT/Index.h
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,13 @@
#include "../Common/WorkSpacePool.h"
#include "../Common/RelativeNeighborhoodGraph.h"
#include "../Common/BKTree.h"
#include "inc/Helper/ConcurrentSet.h"
#include "../Common/Labelset.h"
#include "inc/Helper/SimpleIniReader.h"
#include "inc/Helper/StringConvert.h"
#include "inc/Helper/ThreadPool.h"

#include <functional>
#include <mutex>
#include <shared_mutex>

namespace SPTAG
{
Expand All @@ -35,6 +36,18 @@ namespace SPTAG
template<typename T>
class Index : public VectorIndex
{
class RebuildJob : public Helper::ThreadPool::Job {
public:
RebuildJob(VectorIndex* p_index, COMMON::BKTree* p_tree, COMMON::RelativeNeighborhoodGraph* p_graph) : m_index(p_index), m_tree(p_tree), m_graph(p_graph) {}
void exec() {
m_tree->Rebuild<T>(m_index);
}
private:
VectorIndex* m_index;
COMMON::BKTree* m_tree;
COMMON::RelativeNeighborhoodGraph* m_graph;
};

private:
// data points
COMMON::Dataset<T> m_pSamples;
Expand All @@ -50,12 +63,16 @@ namespace SPTAG
std::string m_sDataPointsFilename;
std::string m_sDeleteDataPointsFilename;

std::mutex m_dataAddLock; // protect data and graph
Helper::Concurrent::ConcurrentSet<SizeType> m_deletedID;
int m_addCountForRebuild;
float m_fDeletePercentageForRefine;
std::unique_ptr<COMMON::WorkSpacePool> m_workSpacePool;
std::mutex m_dataAddLock; // protect data and graph
std::shared_timed_mutex m_dataDeleteLock;
COMMON::Labelset m_deletedID;

std::unique_ptr<COMMON::WorkSpacePool> m_workSpacePool;
Helper::ThreadPool m_threadPool;
int m_iNumberOfThreads;

DistCalcMethod m_iDistCalcMethod;
float(*m_fComputeDistance)(const T* pX, const T* pY, DimensionType length);

Expand Down Expand Up @@ -89,15 +106,15 @@ namespace SPTAG

inline float ComputeDistance(const void* pX, const void* pY) const { return m_fComputeDistance((const T*)pX, (const T*)pY, m_pSamples.C()); }
inline const void* GetSample(const SizeType idx) const { return (void*)m_pSamples[idx]; }
inline bool ContainSample(const SizeType idx) const { return !m_deletedID.contains(idx); }
inline bool NeedRefine() const { return m_deletedID.size() >= (size_t)(GetNumSamples() * m_fDeletePercentageForRefine); }
inline bool ContainSample(const SizeType idx) const { return !m_deletedID.Contains(idx); }
inline bool NeedRefine() const { return m_deletedID.Count() >= (size_t)(GetNumSamples() * m_fDeletePercentageForRefine); }
std::shared_ptr<std::vector<std::uint64_t>> BufferSize() const
{
std::shared_ptr<std::vector<std::uint64_t>> buffersize(new std::vector<std::uint64_t>);
buffersize->push_back(m_pSamples.BufferSize());
buffersize->push_back(m_pTrees.BufferSize());
buffersize->push_back(m_pGraph.BufferSize());
buffersize->push_back(m_deletedID.bufferSize());
buffersize->push_back(m_deletedID.BufferSize());
return std::move(buffersize);
}

Expand All @@ -110,8 +127,8 @@ namespace SPTAG
ErrorCode LoadIndexDataFromMemory(const std::vector<ByteArray>& p_indexBlobs);

ErrorCode BuildIndex(const void* p_data, SizeType p_vectorNum, DimensionType p_dimension);
ErrorCode SearchIndex(QueryResult &p_query) const;
ErrorCode AddIndex(const void* p_vectors, SizeType p_vectorNum, DimensionType p_dimension, SizeType* p_start = nullptr);
ErrorCode SearchIndex(QueryResult &p_query, bool p_searchDeleted = false) const;
ErrorCode AddIndex(const void* p_data, SizeType p_vectorNum, DimensionType p_dimension, std::shared_ptr<MetadataSet> p_metadataSet, bool p_withMetaIndex = false);
ErrorCode DeleteIndex(const void* p_vectors, SizeType p_vectorNum);
ErrorCode DeleteIndex(const SizeType& p_id);

Expand All @@ -120,9 +137,10 @@ namespace SPTAG

ErrorCode RefineIndex(const std::string& p_folderPath);
ErrorCode RefineIndex(const std::vector<std::ostream*>& p_indexStreams);
ErrorCode RefineIndex(std::shared_ptr<VectorIndex>& p_newIndex);

private:
void SearchIndexWithDeleted(COMMON::QueryResultSet<T> &p_query, COMMON::WorkSpace &p_space, const Helper::Concurrent::ConcurrentSet<SizeType> &p_deleted) const;
void SearchIndexWithDeleted(COMMON::QueryResultSet<T> &p_query, COMMON::WorkSpace &p_space) const;
void SearchIndexWithoutDeleted(COMMON::QueryResultSet<T> &p_query, COMMON::WorkSpace &p_space) const;
};
} // namespace BKT
Expand Down
5 changes: 3 additions & 2 deletions AnnService/inc/Core/BKT/ParameterDefinitionList.h
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,15 @@ DefineBKTParameter(m_pGraph.m_numTopDimensionTPTSplit, int, 5L, "NumTopDimension
DefineBKTParameter(m_pGraph.m_iNeighborhoodSize, DimensionType, 32L, "NeighborhoodSize")
DefineBKTParameter(m_pGraph.m_iNeighborhoodScale, int, 2L, "GraphNeighborhoodScale")
DefineBKTParameter(m_pGraph.m_iCEFScale, int, 2L, "GraphCEFScale")
DefineBKTParameter(m_pGraph.m_iRefineIter, int, 0L, "RefineIterations")
DefineBKTParameter(m_pGraph.m_iRefineIter, int, 2L, "RefineIterations")
DefineBKTParameter(m_pGraph.m_iCEF, int, 1000L, "CEF")
DefineBKTParameter(m_pGraph.m_iMaxCheckForRefineGraph, int, 10000L, "MaxCheckForRefineGraph")
DefineBKTParameter(m_pGraph.m_iMaxCheckForRefineGraph, int, 8192L, "MaxCheckForRefineGraph")

DefineBKTParameter(m_iNumberOfThreads, int, 1L, "NumberOfThreads")
DefineBKTParameter(m_iDistCalcMethod, SPTAG::DistCalcMethod, SPTAG::DistCalcMethod::Cosine, "DistCalcMethod")

DefineBKTParameter(m_fDeletePercentageForRefine, float, 0.4F, "DeletePercentageForRefine")
DefineBKTParameter(m_addCountForRebuild, int, 1000, "AddCountForRebuild")
DefineBKTParameter(m_iMaxCheck, int, 8192L, "MaxCheck")
DefineBKTParameter(m_iThresholdOfNumberOfContinuousNoBetterPropagation, int, 3L, "ThresholdOfNumberOfContinuousNoBetterPropagation")
DefineBKTParameter(m_iNumberOfInitialDynamicPivots, int, 50L, "NumberOfInitialDynamicPivots")
Expand Down
60 changes: 42 additions & 18 deletions AnnService/inc/Core/Common/BKTree.h
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
#include <stack>
#include <string>
#include <vector>
#include <shared_mutex>

#include "../VectorIndex.h"

Expand Down Expand Up @@ -46,25 +47,25 @@ namespace SPTAG
T* newTCenters;

KmeansArgs(int k, DimensionType dim, SizeType datasize, int threadnum) : _K(k), _D(dim), _T(threadnum) {
centers = new T[k * dim];
centers = (T*)aligned_malloc(sizeof(T) * k * dim, ALIGN);
newTCenters = (T*)aligned_malloc(sizeof(T) * k * dim, ALIGN);
counts = new SizeType[k];
newCenters = new float[threadnum * k * dim];
newCounts = new SizeType[threadnum * k];
label = new int[datasize];
clusterIdx = new SizeType[threadnum * k];
clusterDist = new float[threadnum * k];
newTCenters = new T[k * dim];
}

~KmeansArgs() {
delete[] centers;
aligned_free(centers);
aligned_free(newTCenters);
delete[] counts;
delete[] newCenters;
delete[] newCounts;
delete[] label;
delete[] clusterIdx;
delete[] clusterDist;
delete[] newTCenters;
}

inline void ClearCounts() {
Expand Down Expand Up @@ -106,23 +107,41 @@ namespace SPTAG
class BKTree
{
public:
BKTree(): m_iTreeNumber(1), m_iBKTKmeansK(32), m_iBKTLeafSize(8), m_iSamples(1000) {}
BKTree(): m_iTreeNumber(1), m_iBKTKmeansK(32), m_iBKTLeafSize(8), m_iSamples(1000), m_lock(new std::shared_timed_mutex) {}

BKTree(BKTree& other): m_iTreeNumber(other.m_iTreeNumber),
m_iBKTKmeansK(other.m_iBKTKmeansK),
m_iBKTLeafSize(other.m_iBKTLeafSize),
m_iSamples(other.m_iSamples) {}
m_iSamples(other.m_iSamples),
m_lock(new std::shared_timed_mutex) {}
~BKTree() {}

inline const BKTNode& operator[](SizeType index) const { return m_pTreeRoots[index]; }
inline BKTNode& operator[](SizeType index) { return m_pTreeRoots[index]; }

inline SizeType size() const { return (SizeType)m_pTreeRoots.size(); }

inline SizeType sizePerTree() const {
std::shared_lock<std::shared_timed_mutex> lock(*m_lock);
return (SizeType)m_pTreeRoots.size() - m_pTreeStart.back();
}

inline const std::unordered_map<SizeType, SizeType>& GetSampleMap() const { return m_pSampleCenterMap; }

template <typename T>
void BuildTrees(VectorIndex* index, std::vector<SizeType>* indices = nullptr)
void Rebuild(VectorIndex* p_index)
{
BKTree newTrees(*this);
newTrees.BuildTrees<T>(p_index, nullptr, nullptr, 1);

std::unique_lock<std::shared_timed_mutex> lock(*m_lock);
m_pTreeRoots.swap(newTrees.m_pTreeRoots);
m_pTreeStart.swap(newTrees.m_pTreeStart);
m_pSampleCenterMap.swap(newTrees.m_pSampleCenterMap);
}

template <typename T>
void BuildTrees(VectorIndex* index, std::vector<SizeType>* indices = nullptr, std::vector<SizeType>* reverseIndices = nullptr, int numOfThreads = omp_get_num_threads())
{
struct BKTStackItem {
SizeType index, first, last;
Expand All @@ -133,12 +152,12 @@ namespace SPTAG
std::vector<SizeType> localindices;
if (indices == nullptr) {
localindices.resize(index->GetNumSamples());
for (SizeType i = 0; i < index->GetNumSamples(); i++) localindices[i] = i;
for (SizeType i = 0; i < localindices.size(); i++) localindices[i] = i;
}
else {
localindices.assign(indices->begin(), indices->end());
}
KmeansArgs<T> args(m_iBKTKmeansK, index->GetFeatureDim(), (SizeType)localindices.size(), omp_get_num_threads());
KmeansArgs<T> args(m_iBKTKmeansK, index->GetFeatureDim(), (SizeType)localindices.size(), numOfThreads);

m_pSampleCenterMap.clear();
for (char i = 0; i < m_iTreeNumber; i++)
Expand All @@ -156,26 +175,29 @@ namespace SPTAG
m_pTreeRoots[item.index].childStart = newBKTid;
if (item.last - item.first <= m_iBKTLeafSize) {
for (SizeType j = item.first; j < item.last; j++) {
m_pTreeRoots.push_back(BKTNode(localindices[j]));
SizeType cid = (reverseIndices == nullptr)? localindices[j]: reverseIndices->at(localindices[j]);
m_pTreeRoots.push_back(BKTNode(cid));
}
}
else { // clustering the data into BKTKmeansK clusters
int numClusters = KmeansClustering(index, localindices, item.first, item.last, args);
if (numClusters <= 1) {
SizeType end = min(item.last + 1, (SizeType)localindices.size());
std::sort(localindices.begin() + item.first, localindices.begin() + end);
m_pTreeRoots[item.index].centerid = localindices[item.first];
m_pTreeRoots[item.index].centerid = (reverseIndices == nullptr) ? localindices[item.first] : reverseIndices->at(localindices[item.first]);
m_pTreeRoots[item.index].childStart = -m_pTreeRoots[item.index].childStart;
for (SizeType j = item.first + 1; j < end; j++) {
m_pTreeRoots.push_back(BKTNode(localindices[j]));
m_pSampleCenterMap[localindices[j]] = m_pTreeRoots[item.index].centerid;
SizeType cid = (reverseIndices == nullptr) ? localindices[j] : reverseIndices->at(localindices[j]);
m_pTreeRoots.push_back(BKTNode(cid));
m_pSampleCenterMap[cid] = m_pTreeRoots[item.index].centerid;
}
m_pSampleCenterMap[-1 - m_pTreeRoots[item.index].centerid] = item.index;
}
else {
for (int k = 0; k < m_iBKTKmeansK; k++) {
if (args.counts[k] == 0) continue;
m_pTreeRoots.push_back(BKTNode(localindices[item.first + args.counts[k] - 1]));
SizeType cid = (reverseIndices == nullptr) ? localindices[item.first + args.counts[k] - 1] : reverseIndices->at(localindices[item.first + args.counts[k] - 1]);
m_pTreeRoots.push_back(BKTNode(cid));
if (args.counts[k] > 1) ss.push(BKTStackItem(newBKTid++, item.first, item.first + args.counts[k] - 1));
item.first += args.counts[k];
}
Expand All @@ -195,6 +217,7 @@ namespace SPTAG

bool SaveTrees(std::ostream& p_outstream) const
{
std::shared_lock<std::shared_timed_mutex> lock(*m_lock);
p_outstream.write((char*)&m_iTreeNumber, sizeof(int));
p_outstream.write((char*)m_pTreeStart.data(), sizeof(SizeType) * m_iTreeNumber);
SizeType treeNodeSize = (SizeType)m_pTreeRoots.size();
Expand Down Expand Up @@ -270,7 +293,7 @@ namespace SPTAG
void SearchTrees(const VectorIndex* p_index, const COMMON::QueryResultSet<T> &p_query,
COMMON::WorkSpace &p_space, const int p_limits) const
{
do
while (!p_space.m_SPTQueue.empty())
{
COMMON::HeapCell bcell = p_space.m_SPTQueue.pop();
const BKTNode& tnode = m_pTreeRoots[bcell.node];
Expand All @@ -290,7 +313,7 @@ namespace SPTAG
p_space.m_SPTQueue.insert(COMMON::HeapCell(begin, p_index->ComputeDistance((const void*)p_query.GetTarget(), p_index->GetSample(index))));
}
}
} while (!p_space.m_SPTQueue.empty());
}
}

private:
Expand All @@ -300,11 +323,11 @@ namespace SPTAG
std::vector<SizeType>& indices,
const SizeType first, const SizeType last, KmeansArgs<T>& args, const bool updateCenters) const {
float currDist = 0;
int threads = omp_get_num_threads();
int threads = args._T;
float lambda = (updateCenters) ? COMMON::Utils::GetBase<T>() * COMMON::Utils::GetBase<T>() / (100.0f * (last - first)) : 0.0f;
SizeType subsize = (last - first - 1) / threads + 1;

#pragma omp parallel for
#pragma omp parallel for num_threads(threads)
for (int tid = 0; tid < threads; tid++)
{
SizeType istart = first + tid * subsize;
Expand Down Expand Up @@ -483,6 +506,7 @@ namespace SPTAG
std::unordered_map<SizeType, SizeType> m_pSampleCenterMap;

public:
std::unique_ptr<std::shared_timed_mutex> m_lock;
int m_iTreeNumber, m_iBKTKmeansK, m_iBKTLeafSize, m_iSamples;
};
}
Expand Down
Loading

0 comments on commit 0fe1773

Please sign in to comment.