Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Make secondary index configurable #1461

Merged
merged 2 commits into from Apr 21, 2023

Conversation

chubei
Copy link
Collaborator

@chubei chubei commented Apr 21, 2023

This PR adds to the configuration file the option of skip default secondary indexes and create new indexes.

The configuration format is not as nice as we wanted because protobuf doesn't support repeated oneof or repeated within oneof. On that note, I think it's time to consider decoupling our configuration format and protobuf.

Closes #1445.

Tested with data in https://github.com/getdozer/dozer-samples/tree/main/sql/join, and query:

POST localhost:8080/trips/query
{
    "$filter": {
        "hvfhs_license_num": "HV0003"
    },
    "$order_by": {
        "trip_miles": "desc"
    }
}

Error message format

With configuration:

connections:    
  - config : !LocalStorage
      details:
        path: data
      tables:
        - !Table
          name: trips
          prefix: /trips
          file_type: parquet
          extension: .parquet
    name: ny_taxi

endpoints:
  - name: trips
    path: /trips
    table_name: trips
    index:
      secondary:
        create:

cache_max_map_size: 10000000000

We get 500:

Failed to query cache: Plan error: Matching index not found. Try to add following secondary index configuration:
- index: !SortedInverted
    fields:
    - hvfhs_license_num
    - trip_miles

After index creation

If we add the configuration:

connections:    
  - config : !LocalStorage
      details:
        path: data
      tables:
        - !Table
          name: trips
          prefix: /trips
          file_type: parquet
          extension: .parquet
    name: ny_taxi

endpoints:
  - name: trips
    path: /trips
    table_name: trips
    index:
      secondary:
        create:
          - index: !SortedInverted
              fields:
              - hvfhs_license_num
              - trip_miles

cache_max_map_size: 10000000000

We get query results:

[
    {
        "hvfhs_license_num": "HV0003",
        "dispatching_base_num": "B03404",
        "originating_base_num": "B03404",
        "request_datetime": "2022-01-01T00:13:51.000Z",
        "on_scene_datetime": "2022-01-01T00:19:52.000Z",
        "pickup_datetime": "2022-01-01T00:19:52.000Z",
        "dropoff_datetime": "2022-01-01T01:34:54.000Z",
        "PULocationID": 138,
        "DOLocationID": 265,
        "trip_miles": 63.33,
        "trip_time": 4502,
        "base_passenger_fare": 166.66,
        "tolls": 26.55,
        "bcf": 5.87,
        "sales_tax": 0.0,
        "congestion_surcharge": 0.0,
        "airport_fee": 2.5,
        "tips": 0.0,
        "driver_pay": 127.05,
        "shared_request_flag": "N",
        "shared_match_flag": "N",
        "access_a_ride_flag": " ",
        "wav_request_flag": "N",
        "wav_match_flag": "N",
        "__dozer_record_id": 8283,
        "__dozer_record_version": 1
    },
    {
        "hvfhs_license_num": "HV0003",
        "dispatching_base_num": "B03404",
        "originating_base_num": "B03404",
        "request_datetime": "2022-01-01T00:37:13.000Z",
        "on_scene_datetime": "2022-01-01T00:52:47.000Z",
        "pickup_datetime": "2022-01-01T00:53:17.000Z",
        "dropoff_datetime": "2022-01-01T02:01:14.000Z",
        "PULocationID": 168,
        "DOLocationID": 265,
        "trip_miles": 61.04,
        "trip_time": 4077,
        "base_passenger_fare": 156.82,
        "tolls": 20.0,
        "bcf": 5.45,
        "sales_tax": 16.14,
        "congestion_surcharge": 0.0,
        "airport_fee": 0.0,
        "tips": 0.0,
        "driver_pay": 121.03,
        "shared_request_flag": "N",
        "shared_match_flag": "N",
        "access_a_ride_flag": " ",
        "wav_request_flag": "N",
        "wav_match_flag": "N",
        "__dozer_record_id": 20654,
        "__dozer_record_version": 1
    },
    ...
]

@coveralls
Copy link

Pull Request Test Coverage Report for Build 4763381855

  • 103 of 135 (76.3%) changed or added relevant lines in 6 files are covered.
  • 289 unchanged lines in 11 files lost coverage.
  • Overall coverage decreased (-0.4%) to 78.435%

Changes Missing Coverage Covered Lines Changed/Added Lines %
dozer-api/src/lib.rs 7 8 87.5%
dozer-cache/src/cache/plan/planner.rs 33 41 80.49%
dozer-api/src/cache_builder/mod.rs 19 42 45.24%
Files with Coverage Reduction New Missed Lines %
dozer-api/src/grpc/shared_impl/mod.rs 1 85.37%
dozer-cache/src/errors.rs 1 0%
dozer-api/src/grpc/typed/tests/service.rs 2 96.97%
dozer-core/src/executor.rs 2 98.1%
dozer-api/src/grpc/typed/service.rs 3 78.82%
dozer-core/src/executor/receiver_loop.rs 10 94.97%
dozer-sql/src/pipeline/projection/factory.rs 19 80.41%
dozer-api/src/grpc/typed/helper.rs 20 49.66%
dozer-sql/src/pipeline/aggregation/sum.rs 40 69.92%
dozer-sql/src/pipeline/pipeline_builder/join_builder.rs 88 54.4%
Totals Coverage Status
Change from base Build 4762840796: -0.4%
Covered Lines: 34877
Relevant Lines: 44466

💛 - Coveralls

@mediuminvader mediuminvader merged commit 3549b93 into getdozer:main Apr 21, 2023
6 of 7 checks passed
@chubei chubei deleted the feat/config branch April 21, 2023 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Secondary index configuration
3 participants