Jegao/label hot fix with main2 #430

Sanhaoji2 · 2023-08-18T03:59:37Z

Does this PR have a descriptive title that could go in our release notes?
Does this PR add any new dependencies?
Does this PR modify any existing APIs?
- Is the change to the API backwards compatible?
Should this result in any changes to our documentation, either updating existing docs or adding new ones?

Reference Issues/PRs

What does this implement/fix? Briefly explain your changes.

Any other comments?

* Transferring Varun's chagges from external fork with squash merge * generating multiple gt's for each filter label + search with multiple filter labels (code cleanup) * supporting no-filter + one filter label + filter label file (multiple filters) while computing GT * generating multiple gt's + refactoring code for readability & cleanliness * adding more tests for filtered search * updating pr-test to test filtered cases * lowering recall requirement for disk index * transferred functions to filter_utils * adding more test for build and search without universal label * adding one_per_point distribution to generate_synthetic_labels + cleaning up artifacts after compute gt+ removing minor errors * refactoring search_disk_index to use a query filter vector --------- Co-authored-by: patelyash <patelyash@microsoft.com> Co-authored-by: Varun Sivashankar <t-varunsi@microsoft.com>

- add code for two variants of filtered index, readme and CI tests - add utils for synthetic label generation and CI tests. * Add co-authors Co-authored-by: ravishankar <rakri@microsoft.com> Co-authored-by: Varun Sivashankar <t-varunsi@microsoft.com> --------- Co-authored-by: ravishankar <rakri@microsoft.com> Co-authored-by: David Kaczynski <dkaczynski@microsoft.com> Co-authored-by: Siddharth Gollapudi <t-gollapudis@microsoft.com> Co-authored-by: Neelam Mahapatro <nmahapatro@microsoft.com> Co-authored-by: Harsha Vardhan Simhadri <harshasi@microsoft.com> Co-authored-by: Harsha Vardhan Simhadri <harsha-simhadri@users.noreply.github.com> Co-authored-by: REDMOND\patelyash <patelyash@microsoft.com> Co-authored-by: Varun Sivashankar <t-varunsi@microsoft.com>

* Rather than sift through all the *.cpp and *.h in the root directory, we're looking for only the sources in our main repository for formatting. Git submodules are excluded * Removing the --Werror flag only until we actually format all of the code in a future commit * We're choosing to base our style on the Microsoft style guide and not make any changes * Running format action on source code. Settling on Google styling. Settled on '.clang-format' instead of '_clang-format'. Fixed instructions such that only clang-format 12 is installed (13 changes SortIncludes options from true/false to a trinary set of options, none of which include the word 'false') * Enabling error on malformatted file * Revert "Enabling error on malformatted file" This reverts commit fa33e82. * Revert "Running format action on source code. Settling on Google styling. Settled on '.clang-format' instead of '_clang-format'. Fixed instructions such that only clang-format 12 is installed (13 changes SortIncludes options from true/false to a trinary set of options, none of which include the word 'false')" This reverts commit e0281be. * Trying again; formatting rules based on Google rules, disables sorting includes as that breaks us, and enabling check on build. * Somehow this was missed in the mass format. Formatting include/distance.h. * Manually fixing the formatting because clang-format wouldn't, but WOULD flag it as invalid

Fix typo in SSD index readme

Remove warnings affecting internal build pipelines --------- Co-authored-by: Yiyong Lin <yiyolin@microsoft.com>

* Add support for multiple frozen points * Add the missing parameters to the constructor.

* Added filtered disk index readme

* Transferring Varun's chagges from external fork with squash merge * generating multiple gt's for each filter label + search with multiple filter labels (code cleanup) * supporting no-filter + one filter label + filter label file (multiple filters) while computing GT * generating multiple gt's + refactoring code for readability & cleanliness * adding more tests for filtered search * updating pr-test to test filtered cases * lowering recall requirement for disk index * transferred functions to filter_utils * adding more test for build and search without universal label * adding one_per_point distribution to generate_synthetic_labels + cleaning up artifacts after compute gt+ removing minor errors * refactoring search_disk_index to use a query filter vector --------- Co-authored-by: patelyash <patelyash@microsoft.com> Co-authored-by: Varun Sivashankar <t-varunsi@microsoft.com>

…ailed

remove _u, _s typedefs * converting uint64's to size_t where they represent array offsets --------- Co-authored-by: harsha vardhan simhadri <harsha.v.simhadri@gmail.com>

add codebook passing and pq/opq dim overwrite.

* some bug fix when enable the EXEC_EnV_OLS * avoid unit test failure * unit test testing * changed based on gopal's suggestion * update load_impl(AlignedFileReader &reader) * change the load_impl to be identical to objectstore * remvoe blank

* Output distance file * fix --------- Co-authored-by: Shengjie Qian <shenqian@microsoft.com>

* Add WIN macro for non-win funtion * fix vc16 compile issue * fix compile issue * fix compile issue * fix compile issue * clean up code

* small bug fix * test ubuntu fail * formatting * re-triggering unitest

* Refactor of diskannpy module code. * 0.5.0.rc1 for python and enabling the build-python portion of the pr-test process. * clang-format changes * In theory this should speed up the python build drastically by only building the wheel for the python version and OS we're attempting to fan out to in our CICD job tree * Missed a dollar sign * Copy/pasting left a CICD step name that implied we were running a code formatting check when instead we were building a wheel. This is now fixed. * In theory, readying the release action too. We won't know if it works until it merges and we cut a release, but at least the paths have been fixed * Designated initializers just happened to work on linux but shouldn't have as they weren't added until cpp20 * Formatting

* small bug fix * test ubuntu fail * formatting * re-triggering unitest * cause error, remove two character params * cause error, remove two character params * unit test fix * clean up code * add more accurate error handelling * fix filter build * re-trigger test * try lower recall number * test witl more value * revert back to test unit test

Github actions fix: composite action `python-wheel` publishes wheels to the `wheels` artifact. `python-release` workflow then looks for it in the `dist` artifact, which does not exist. This is a CICD change only.

* Fixed inputs type-o * Action 'checkout@v2' is deprecated

Trying a new release of the python lib to see if there was a packaging error in the publication of rc1.

* Fixed param name in comments * Hide rust/target

* Removed the logger and verified that the logging capability is the root cause of our consistent segfault errors in python. Perhaps it also will fix any issues in our label test too? I'd like to push it to GH and see. * Formatting fixes * Revert "Formatting fixes" This reverts commit 9042595. * Revert "Removed the logger and verified that the logging capability is the root cause of our consistent segfault errors in python. Perhaps it also will fix any issues in our label test too? I'd like to push it to GH and see." This reverts commit 7561009. * The custom logging implementation is causing segfaults in python. We're not sure exactly where, but this is the easiest and quickest way to getting a working python release. * All the integration tests are failing, and there's a chance the virtual dtor on AbstractDataStore might be the culprit, though I am not sure why. I'm hoping it is so it won't fall on the logging changes. * Formatting. Again.

* Added utilities to standardize help across cli tools. #370 * Made three option groupings (required/optional/print) * Moved common parameter descriptions to a common file. #370 * Updated usage statement for search_disk_app #370 * Updated range_search_disk_index to use the new required/optional format. #370 * Updated test apps to use the new help format. #370 * Fixed format issue. #370 * Updated help format for the 'build' apps. #370 * Fixed code formatting. #370 * Added src/*.hpp to the clang format. #370 * Moved header into the headers directory. #370 * Added missing configs. #370 * Removed superflous paths from include. #370 * Added #pragma once. #370 * Type-o fixes. #370 * Fixed capitolization of constant. #370 * Make fail_if_recall description more accurate. #370 * Changed to using set notation. #370 * Better explanations for some options. #370 * Added short explanation of file format. #370 --------- Co-authored-by: Jon McLean <none@example.com> Co-authored-by: Jonathan McLean <Jonathan.McLean@microsoft.com>

* Identified the appropriate build flags to get a working python build that doesn't rely on -march=native or -mtune=native. We've run benchmarks on multiple computers that indicate the only important flag other than -mavx2 -msse2 -mfma is -funroll-loops. Optimization levels such as -O1, -O2, or -O3 actually makes for less performant code. -Ofast is unavailble for use in Python, as it causes problems with floating point math in Python * 1.22 was left in a comment despite 1.25 being the value specified * Python 3.8 is not supported by numpy 1.25, so we're removing it.

* Work-in-progress commit adding JSON output for timings. in-mem-static is complete * Added timings to dynamic and total-time to static

Using the correct README for our publication to pypi.

* small bug fix * test ubuntu fail * formatting * re-triggering unitest * add small fix for in_mem_data_store when EXEC_ENV_OLS is enabed

* fix: use the passed in io_limit * fix to be clang-formatted

* While simply creating a unit test to repro Issue #400, I found a number of bugs that I needed to address just to get it to work the way I had intended. This does not yet have what I would consider a comprehensive suite of test coverage for the DynamicMemoryIndex, but we at least do save it with the metadata file, we can load it correctly, and saving *always* consolidate_deletes() prior to save if any item has been marked for deletion prior to save. * We actually cannot save without compacting before save anyway. Removing the parameter from save() and hardcoding it to True until we can actually support it. * Addressing some PR comments and readying a 0.5.0.rc5 release

…ueue<SSDThreadData*> type, otherwise the default null_T is uninitialized, could point to arbitraty memory (#408)

* Some early staging for README updates and pyproject updates for a 0.6.0 release for diskannpy. * Trying to fix the CI badge to point toward main's latest build * Updating documentation for pdoc generation * Documentation updates. Tightened up the API to drop list support (there were entirely too many cases where it wouldn't work, and it's easier to just tell people to convert it themselves) * Some module reorganization to make pdoc actually display the docstrings for variables re-exported at the top level * A copy paste happened that shouldn't have. * Updating the apps to use the new 0.6.0 api * Addressing PR feedback * Some of the documentation changes didn't get made in both from_file or the constructor

jinwei14 and others added 30 commits March 29, 2023 10:59

add codebook passing and pq/opq dim overwrite.

68f1443

Update SSD_index.md (#258)

cd8bee3

Fix typo in SSD index readme

Add filter-diskann paper link to readme (#275)

4320bad

Update README.md (#277)

6c7c2b3

update citation (#281)

936922c

Some fixes to pass internal building pipeline (#282)

066b9ed

Remove warnings affecting internal build pipelines --------- Co-authored-by: Yiyong Lin <yiyolin@microsoft.com>

Add support for multiple frozen points (#283)

09e8404

* Add support for multiple frozen points * Add the missing parameters to the constructor.

Added filtered disk index readme (#276)

162d1ea

* Added filtered disk index readme

udpate merging code

308b377

merge commit

5444f79

Using boost program options under Visual Studio MSVC 14.0 Assertion f…

6c1a175

…ailed

some commts and rewriting

0f5ec26

add back LF which might be confict with MSVC 14.0

918d11f

clang formating change

8d2a007

clang formating

9b46adc

revert back to Lf

dde79a3

unexpected failure on UT re-try

a7c8db6

adding default string to the path

6dd780c

fix reference issue

86ee064

Fixing Build errors in remove_extra_typedef (#290)

6a43218

remove _u, _s typedefs * converting uint64's to size_t where they represent array offsets --------- Co-authored-by: harsha vardhan simhadri <harsha.v.simhadri@gmail.com>

merge all new PR change into my branch

7b1553d

clang format

6833255

bump it up to 512 for MAX_PQ_CHUNKS

29f53d0

default codebook prefix value pass in for generate_quantized_data

27d1653

add check for disabling both -B and -QD pass in

a1792e4

Merge pull request #288 from jinwei14/codebookPassin

4c8041b

add codebook passing and pq/opq dim overwrite.

yashpatel007 and others added 29 commits June 27, 2023 10:54

hot fix for python build (#383)

2b50d8e

Output distance file in memory index search (#382)

132b62a

* Output distance file * fix --------- Co-authored-by: Shengjie Qian <shenqian@microsoft.com>

Add WIN macro for non-win function (#360)

775a9e9

* Add WIN macro for non-win funtion * fix vc16 compile issue * fix compile issue * fix compile issue * fix compile issue * clean up code

small EXEC_ENV_OLS bug fix (#387)

051df41

* small bug fix * test ubuntu fail * formatting * re-triggering unitest

Update python-release.yml

bc167d1

Github actions fix: composite action `python-wheel` publishes wheels to the `wheels` artifact. `python-release` workflow then looks for it in the `dist` artifact, which does not exist. This is a CICD change only.

Fixed inputs type-o (#391)

bbed8f8

* Fixed inputs type-o * Action 'checkout@v2' is deprecated

Update pyproject.toml

a7b2087

Trying a new release of the python lib to see if there was a packaging error in the publication of rc1.

Fixed param documentation (#393)

233c08c

* Fixed param name in comments * Hide rust/target

Jomclean/write timings (#397)

1de9cb3

* Work-in-progress commit adding JSON output for timings. in-mem-static is complete * Added timings to dynamic and total-time to static

Update pyproject.toml (#398)

dea70fc

Using the correct README for our publication to pypi.

Added filename to log (#399)

5bb07e4

Jinwei/fix in memory compile error (#401)

0488c03

* small bug fix * test ubuntu fail * formatting * re-triggering unitest * add small fix for in_mem_data_store when EXEC_ENV_OLS is enabed

fix: use the passed in io_limit (#403)

e1a8d78

* fix: use the passed in io_limit * fix to be clang-formatted

Pass nullptr as nullT when creating thread_data that's of ConcurrentQ…

1eac702

…ueue<SSDThreadData*> type, otherwise the default null_T is uninitialized, could point to arbitraty memory (#408)

sync to latest

4021c4f

Fix compile issue

4f84164

fix some issue

8a7d262

fix universal label

2947353

fix label file path

87c42e8

add unv label filepath in separate path function

2c75f4c

fix empty path issue

467202c

Sanhaoji2 merged commit 07938b9 into jegao/LabelHotFix Aug 18, 2023
5 of 32 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jegao/label hot fix with main2 #430

Jegao/label hot fix with main2 #430

Sanhaoji2 commented Aug 18, 2023

Jegao/label hot fix with main2 #430

Jegao/label hot fix with main2 #430

Conversation

Sanhaoji2 commented Aug 18, 2023

Reference Issues/PRs

What does this implement/fix? Briefly explain your changes.

Any other comments?