Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: IDIOMS Update & BULKI v0.1 #203

Open
wants to merge 872 commits into
base: develop
Choose a base branch
from

Conversation

zhangwei217245
Copy link
Collaborator

@zhangwei217245 zhangwei217245 commented Jun 24, 2024

Related Issues / Pull Requests

Direct
#151 #169 #206

Indirect
#109 #132

Description

Major changes:

  • IDIOMS -> affix-based query benchmark
  • IDIOMS -> index persistence mechanism
  • IDIOMS -> numeric value query support (exact query and range query)
  • IDIOMS -> numeric value query benchmark
  • IDIOMS -> simulation test
  • BULKI V0.1 -> a data serialization and deserialization mechanism
  • Metadata JSON validator
  • Metadata JSON Loader for PDC

Minor changes

  • Updated Dockerfile for faster docker image build
  • Updated pdc_generic.h for full data type support capability and better ordering of PDC data type enums

In-progress changes

  • Some code for multi-condition query.

What changes are proposed in this pull request?

  • Bug fix (non-breaking change which fixes an issue)
  • [ x ] New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality not to work as expected; for instance, examples in this repository must be updated too)
  • [ x ] This change requires a documentation update

Checklist:

  • My code modifies existing public API, or introduces new public API, and I updated or wrote docstrings
  • [ x ] I have commented my code
  • My code requires documentation updates, and I have made corresponding changes to the documentation
  • [ x ] I have added tests that prove my fix is effective or that my feature works
  • [ x ] New and existing unit tests pass locally with my changes

zhangwei217245 and others added 10 commits May 28, 2024 17:24
Major changes: 
* IDIOMS -> affix-based query benchmark
* IDIOMS -> Simulation Test
* IDIOMS -> Multi data type supported for the same key
* IDIOMS -> Range Query and Exact Query for Numeric Values
* IDIOMS -> benchmark for numeric values (exact search and range query)
* IDIOMS -> Index Persistence
* BULKI -> A data serialization and deserialization mechanism.
src/api/pdc_client_connect.c Show resolved Hide resolved
HG_Destroy(dart_get_server_info_handle);
return ret;
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just confirming: why do we not need a return code anymore?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous implementation was taking serverID and returning a dart_server struct instance that contains the indexed_word_count as well as request_count fields. However, when writing the simulator test of IDIOMS, I realize that we can send a dart_server pointer as an input, which can pass in the server ID, while still be able to take the result information of the function. This simplifies the implementation of , not only the simulator test, but also everywhere else in IDIOMS.

@@ -0,0 +1,264 @@
#ifndef PDC_GENERIC_H
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this _old.h file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since all the existing test passed, I don't think we need this anymore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! Could you remove this file, then? If needed we can get it back for git

Copy link
Collaborator Author

@zhangwei217245 zhangwei217245 Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let's remove it. I will update this PR along with the end-to-end CI test of IDIOMS.

@@ -0,0 +1,174 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire file commented out, do we still need it or can we remove it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for the controlled experiment in the DART paper (for full-string hash and prefix hashing). But now, this is not used.
Let's leave this here for this version. Will possibly need it somewhere in the future.

@@ -0,0 +1,25 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for this file, is it still needed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for the controlled experiment in the DART paper (for full-string hash and prefix hashing). But now, this is not used.
Let's leave this here for this version. Will possibly need it somewhere in the future.

src/server/include/pdc_server.h Show resolved Hide resolved
@@ -32,7 +32,7 @@
#include "mercury_proc_string.h"
#include "mercury_atomic.h"

#include "pdc_hash-table.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header file seems to be removed in the PR, but it is still included. Is this correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the "pdc_hash-table.h" is different than "pdc_hash_table.h" in terms of the file name and location. But both files contain the same implementation. After this change, we only have one location for the pdc_hash_table which is located in pdc_commons. This helps with avoid redundant code as well as inconsistency between two copies of the exact same code.

set(SUPPRESSED_LIST "-Wno-alloc-size-larger-than" ${SUPPRESSED_LIST})
set(SUPPRESSED_LIST "-Wno-array-bounds" ${SUPPRESSED_LIST})
set(SUPPRESSED_LIST "-Wno-stringop-truncation" ${SUPPRESSED_LIST})
set(SUPPRESSED_LIST "-Wno-use-after-free" ${SUPPRESSED_LIST})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we suppressing those?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are some warnings that can be ignored. Given the complexity of our PDC implementation, these warnings are not very sense-making in our project and there is marginal value for fixing these warnings in the code.

We need a cleaner compilation log to highlight the compilation errors that really matters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-Wno-use-after-free should probably be removed from here, is there an example code segment and compiler warning message that shows it is not helping?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DIdn't we have a logger in PDC already? Is this used only for IDIOMS?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, there is no logger that controls different level of logging.

@jeanbez jeanbez added type: enhancement New feature or request type: new feature Request for new feature labels Jul 1, 2024
@jeanbez jeanbez added this to the v.0.6 milestone Jul 1, 2024
@jeanbez jeanbez changed the title IDIOMS Update & BULKI v0.1 Draft: IDIOMS Update & BULKI v0.1 Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New feature or request type: new feature Request for new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants