Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FixNPEIncremental #9

Closed
wants to merge 254 commits into from
This pull request is big! We’re only showing the most recent 250 commits.

Commits on Nov 14, 2021

  1. Copy the full SHA
    c2f9094 View commit details
    Browse the repository at this point in the history
  2. Copy the full SHA
    0bb6d8f View commit details
    Browse the repository at this point in the history

Commits on Nov 15, 2021

  1. Copy the full SHA
    a14d104 View commit details
    Browse the repository at this point in the history
  2. Copy the full SHA
    a0dae41 View commit details
    Browse the repository at this point in the history
  3. [MINOR] Fix typo in IntervalTreeBasedGlobalIndexFileFilter (apache#3993)

    Co-authored-by: 闫杜峰 <yandufeng@sinochem.com>
    dufeng1010 and 闫杜峰 committed Nov 15, 2021
    Copy the full SHA
    3c43197 View commit details
    Browse the repository at this point in the history
  4. Copy the full SHA
    53d2d6a View commit details
    Browse the repository at this point in the history
  5. [HUDI-2683] Parallelize deleting archived hoodie commits (apache#3920)

    Co-authored-by: yuezhang <yuezhang@freewheel.tv>
    zhangyue19921010 and yuezhang committed Nov 15, 2021
    Copy the full SHA
    38b6934 View commit details
    Browse the repository at this point in the history

Commits on Nov 16, 2021

  1. Copy the full SHA
    bff8769 View commit details
    Browse the repository at this point in the history
  2. Copy the full SHA
    6f5e661 View commit details
    Browse the repository at this point in the history

Commits on Nov 17, 2021

  1. [MINOR] Fixed checkstyle config to be based off Maven root-dir (requi…

    …res Maven >=3.3.1 to work properly); (apache#4009)
    
    Updated README
    Alexey Kudinkin committed Nov 17, 2021
    Copy the full SHA
    cbcbec4 View commit details
    Browse the repository at this point in the history
  2. Copy the full SHA
    04eb5fd View commit details
    Browse the repository at this point in the history
  3. [HUDI-2151] Part3 Enabling marker based rollback as default rollback …

    …strategy (apache#3950)
    
    * Enabling timeline server based markers
    
    * Enabling timeline server based markers and marker based rollback
    
    * Removing constraint that timeline server can be enabled only for hdfs
    
    * Fixing tests
    nsivabalan committed Nov 17, 2021
    Copy the full SHA
    ce7d233 View commit details
    Browse the repository at this point in the history
  4. Check --source-avro-schema-path parameter (apache#3987)

    Co-authored-by: 0x3E6 <dragon1996>
    0x3E6 committed Nov 17, 2021
    Copy the full SHA
    aec5d11 View commit details
    Browse the repository at this point in the history
  5. Copy the full SHA
    4d884bd View commit details
    Browse the repository at this point in the history
  6. [MINOR] Add the Schema for GooseFS to StorageSchemes (apache#3982)

    Co-authored-by: lubo <bollu@tencent.com>
    lubo212 and lubo committed Nov 17, 2021
    Copy the full SHA
    826414c View commit details
    Browse the repository at this point in the history
  7. [HUDI-2314] Add support for DynamoDb based lock provider (apache#3486)

    - Co-authored-by: Wenning Ding <wenningd@amazon.com>
    - Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>
    zhedoubushishi committed Nov 17, 2021
    Copy the full SHA
    1ee12cf View commit details
    Browse the repository at this point in the history
  8. Copy the full SHA
    f715cf6 View commit details
    Browse the repository at this point in the history
  9. Copy the full SHA
    2d3f2a3 View commit details
    Browse the repository at this point in the history

Commits on Nov 18, 2021

  1. Copy the full SHA
    71a2ae0 View commit details
    Browse the repository at this point in the history
  2. Copy the full SHA
    8772cec View commit details
    Browse the repository at this point in the history
  3. [HUDI-2362] Add external config file support (apache#3416)

    Co-authored-by: Wenning Ding <wenningd@amazon.com>
    zhedoubushishi and Wenning Ding committed Nov 18, 2021
    Copy the full SHA
    24def0b View commit details
    Browse the repository at this point in the history
  4. Copy the full SHA
    4e067ca View commit details
    Browse the repository at this point in the history

Commits on Nov 19, 2021

  1. Copy the full SHA
    7a00f86 View commit details
    Browse the repository at this point in the history
  2. Copy the full SHA
    bf00876 View commit details
    Browse the repository at this point in the history
  3. Copy the full SHA
    eba354e View commit details
    Browse the repository at this point in the history
  4. [HUDI-2593] Virtual keys support for metadata table (apache#3968)

    - Metadata table today has virtual keys disabled, thereby populating the metafields
      for each record written out and increasing the overall storage space used. Hereby
      adding virtual keys support for metadata table so that metafields are disabled
      for metadata table records.
    
    - Adding a custom KeyGenerator for Metadata table so as to not rely on the
      default Base/SimpleKeyGenerators which currently look for record key
      and partition field set in the table config.
    
    - AbstractHoodieLogRecordReader's version of processing next data block and
      createHoodieRecord() will be a generic version and making the derived class
      HoodieMetadataMergedLogRecordReader take care of the special creation of
      records from explictly passed in partition names.
    manojpec committed Nov 19, 2021
    Copy the full SHA
    459b342 View commit details
    Browse the repository at this point in the history

Commits on Nov 20, 2021

  1. Copy the full SHA
    c8617d9 View commit details
    Browse the repository at this point in the history
  2. [HUDI-2796] Metadata table support for Restore action to first commit (

    …apache#4039)
    
     - Adding support for the metadata table to restore to first commit and
       take proper action for the bootstrap on subequent commits.
    manojpec committed Nov 20, 2021
    Copy the full SHA
    0230d40 View commit details
    Browse the repository at this point in the history
  3. [HUDI-2242] Add configuration inference logic for few options (apache…

    …#3359)
    
    
    Co-authored-by: Wenning Ding <wenningd@amazon.com>
    zhedoubushishi and Wenning Ding committed Nov 20, 2021
    Copy the full SHA
    3dc6262 View commit details
    Browse the repository at this point in the history
  4. Copy the full SHA
    6cc97cc View commit details
    Browse the repository at this point in the history
  5. [HUDI-2742] Added S3 object filter to support multiple S3EventsHoodie…

    …IncrSources single S3 meta table (apache#4025)
    h7kanna committed Nov 20, 2021
    Copy the full SHA
    f4b974a View commit details
    Browse the repository at this point in the history
  6. [HUDI-2795] Add mechanism to safely update,delete and recover table p…

    …roperties (apache#4038)
    
    * [HUDI-2795] Add mechanism to safely update,delete and recover table properties
    
      - Fail safe mechanism, that lets queries succeed off a backup file
      - Readers who are not upgraded to this version of code will just fail until recovery is done.
      - Added unit tests that exercises all these scenarios.
      - Adding CLI for recovery, updation to table command.
      - [Pending] Add some hash based verfication to ensure any rare partial writes for HDFS
    
    * Fixing upgrade/downgrade infrastructure to use new updation method
    vinothchandar committed Nov 20, 2021
    Copy the full SHA
    ae0c67d View commit details
    Browse the repository at this point in the history

Commits on Nov 21, 2021

  1. Copy the full SHA
    1a5484d View commit details
    Browse the repository at this point in the history
  2. [MINOR] optimize in constructor of inputbatch class (apache#4040)

    Co-authored-by: 闫杜峰 <yandufeng@sinochem.com>
    dufeng1010 and 闫杜峰 committed Nov 21, 2021
    Copy the full SHA
    305d160 View commit details
    Browse the repository at this point in the history
  3. Copy the full SHA
    74b59a4 View commit details
    Browse the repository at this point in the history
  4. Copy the full SHA
    0411f73 View commit details
    Browse the repository at this point in the history
  5. Copy the full SHA
    520538b View commit details
    Browse the repository at this point in the history
  6. [HUDI-1932] Update Hive sync timestamp when change detected (apache#3053

    )
    
    * Update Hive sync timestamp when change detected
    
    Only update the last commit timestamp on the Hive table when the table schema
    has changed or a partition is created/updated.
    
    When using AWS Glue Data Catalog as the metastore for Hive this will ensure
    that table versions are substantive (including schema and/or partition
    changes). Prior to this change when a Hive sync is performed without schema
    or partition changes the table in the Glue Data Catalog would have a new
    version published with the only change being the timestamp property.
    
    https://issues.apache.org/jira/browse/HUDI-1932
    
    * add conditional sync flag
    
    * fix testSyncWithoutDiffs
    
    * fix HiveSyncConfig
    
    Co-authored-by: Raymond Xu <2701446+xushiyan@users.noreply.github.com>
    nateradtke and xushiyan committed Nov 21, 2021
    Copy the full SHA
    887787e View commit details
    Browse the repository at this point in the history
  7. Copy the full SHA
    2533a9c View commit details
    Browse the repository at this point in the history

Commits on Nov 22, 2021

  1. Copy the full SHA
    8281cbf View commit details
    Browse the repository at this point in the history
  2. [HUDI-1870] Add more Spark CI build tasks (apache#4022)

    * [HUDI-1870] Add more Spark CI build tasks
    
    - build for spark3.0.x
    - build for spark-shade-unbundle-avro
    - fix build failures
      - delete unnecessary assertion for spark 3.0.x
      - use AvroConversionUtils#convertAvroSchemaToStructType instead of calling SchemaConverters#toSqlType directly to solve the compilation failures with spark-shade-unbundle-avro (apache#5)
    
    Co-authored-by: Yann <biyan900116@gmail.com>
    xushiyan and YannByron committed Nov 22, 2021
    Copy the full SHA
    02f7ca2 View commit details
    Browse the repository at this point in the history
  3. [HUDI-2533] New option for hoodieClusteringJob to check, rollback and…

    … re-execute the last failed clustering job (apache#3765)
    
    * coding finished and need to do uts
    
    * add uts
    
    * code review
    
    * code review
    
    Co-authored-by: yuezhang <yuezhang@freewheel.tv>
    zhangyue19921010 and yuezhang committed Nov 22, 2021
    Copy the full SHA
    a2c91a7 View commit details
    Browse the repository at this point in the history
  4. [HUDI-2472] Enabling metadata table for TestHoodieIndex test case (ap…

    …ache#4045)
    
    - Enablng the metadata table for testSimpleGlobalIndexTagLocationWhenShouldUpdatePartitionPath.
       This is more of a test issue.
    manojpec committed Nov 22, 2021
    Copy the full SHA
    7f3b89f View commit details
    Browse the repository at this point in the history
  5. Copy the full SHA
    8945206 View commit details
    Browse the repository at this point in the history
  6. [HUDI-2559] Converting commit timestamp format to millisecs (apache#4024

    )
    
    - Adds support for generating commit timestamps with millisecs granularity. 
    - Older commit timestamps (in secs granularity) will be suffixed with 999 and parsed with millisecs format.
    nsivabalan committed Nov 22, 2021
    Copy the full SHA
    fc9ca6a View commit details
    Browse the repository at this point in the history
  7. Copy the full SHA
    fe57e9b View commit details
    Browse the repository at this point in the history

Commits on Nov 23, 2021

  1. [HUDI-2550] Expand File-Group candidates list for appending for MOR t…

    …ables (apache#3986)
    Alexey Kudinkin committed Nov 23, 2021
    Copy the full SHA
    3bdab01 View commit details
    Browse the repository at this point in the history
  2. [HUDI-2737] Use earliest instant by default for async compaction and …

    …clustering jobs (apache#3991)
    
    Address review comments
    
    Fix test failures
    
    Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>
    yihua and codope committed Nov 23, 2021
    Copy the full SHA
    772af93 View commit details
    Browse the repository at this point in the history
  3. Copy the full SHA
    0d1e7ec View commit details
    Browse the repository at this point in the history
  4. [HUDI-1937] Rollback unfinished replace commit to allow updates (apac…

    …he#3869)
    
    * [HUDI-1937] Rollback unfinished replace commit to allow updates while clustering
    
    * Revert and delete requested replacecommit too
    
    * Rollback pending clustering instants transactionally
    
    * No double locking and add a config to enable rollback
    
    * Update config to be clear about rollback only on conflict
    codope committed Nov 23, 2021
    Copy the full SHA
    e22150f View commit details
    Browse the repository at this point in the history
  5. [MINOR] Add more configuration to Kafka setup script (apache#3992)

    * [MINOR] Add more configuration to Kafka setup script
    
    * Add option to reuse Kafka topic
    
    * Minor fixes to README
    yihua committed Nov 23, 2021
    Copy the full SHA
    6aa710e View commit details
    Browse the repository at this point in the history
  6. Copy the full SHA
    c88c2af View commit details
    Browse the repository at this point in the history
  7. [HUDI-2778] Optimize statistics collection related codes and add some…

    … docs for z-order add fix some bugs (apache#4013)
    
    * [HUDI-2778] Optimize statistics collection related codes and add more docs for z-order.
    
    * add test code for multi-thread parquet footer read
    xiarixiaoyao committed Nov 23, 2021
    Copy the full SHA
    9de9951 View commit details
    Browse the repository at this point in the history
  8. [HUDI-2409] Using HBase shaded jars in Hudi presto bundle (apache#3623)

    * using hbase-shaded-jars-in-hudi-presto-hundle
    
    * test
    
    * add hudi-common-bundle
    
    * code review
    
    * code review
    
    * code review
    
    * code review
    
    * test
    
    * test
    
    Co-authored-by: yuezhang <yuezhang@freewheel.tv>
    zhangyue19921010 and yuezhang committed Nov 23, 2021
    Copy the full SHA
    9ed28b1 View commit details
    Browse the repository at this point in the history
  9. [HUDI-2332] Add clustering and compaction in Kafka Connect Sink (apac…

    …he#3857)
    
    * [HUDI-2332] Add clustering and compaction in Kafka Connect Sink
    
    * Disable validation check on instant time for compaction and adjust configs
    
    * Add javadocs
    
    * Add clustering and compaction config
    
    * Fix transaction causing missing records in the target table
    
    * Add debugging logs
    
    * Fix kafka offset sync in participant
    
    * Adjust how clustering and compaction are configured in kafka-connect
    
    * Fix clustering strategy
    
    * Remove irrelevant changes from other published PRs
    
    * Update clustering logic and others
    
    * Update README
    
    * Fix test failures
    
    * Fix indentation
    
    * Fix clustering config
    
    * Add JavaCustomColumnsSortPartitioner and make async compaction enabled by default
    
    * Add test for JavaCustomColumnsSortPartitioner
    
    * Add more changes after IDE sync
    
    * Update README with clarification
    
    * Fix clustering logic after rebasing
    
    * Remove unrelated changes
    yihua committed Nov 23, 2021
    Copy the full SHA
    ca9bfa2 View commit details
    Browse the repository at this point in the history
  10. Copy the full SHA
    969a5bf View commit details
    Browse the repository at this point in the history
  11. [HUDI-2325] Add hive sync support to kafka connect (apache#3660)

    Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>
    rmahindra123 and Rajesh Mahindra committed Nov 23, 2021
    Copy the full SHA
    fbff079 View commit details
    Browse the repository at this point in the history

Commits on Nov 24, 2021

  1. Copy the full SHA
    18cf595 View commit details
    Browse the repository at this point in the history
  2. Copy the full SHA
    5078d29 View commit details
    Browse the repository at this point in the history
  3. Copy the full SHA
    0cf2f10 View commit details
    Browse the repository at this point in the history
  4. Copy the full SHA
    323be33 View commit details
    Browse the repository at this point in the history
  5. Copy the full SHA
    0bb506f View commit details
    Browse the repository at this point in the history
  6. Copy the full SHA
    a234833 View commit details
    Browse the repository at this point in the history
  7. Copy the full SHA
    9af219b View commit details
    Browse the repository at this point in the history
  8. [HUDI-2671] Fix kafka offset handling in Kafka Connect protocol (apac…

    …he#4021)
    
    Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>
    rmahindra123 and Rajesh Mahindra committed Nov 24, 2021
    Copy the full SHA
    90f2ea2 View commit details
    Browse the repository at this point in the history
  9. [HUDI-2443] Hudi KVComparator for all HFile writer usages (apache#3889)

    * [HUDI-2443] Hudi KVComparator for all HFile writer usages
    
    - Hudi relies on custom class shading for Hbase's KeyValue.KVComparator to
      avoid versioning and class loading issues. There are few places which are
      still using the Hbase's comparator class directly and version upgrades
      would make them obsolete. Refactoring the HoodieKVComparator and making
      all HFile writer creation using the same shaded class.
    
    * [HUDI-2443] Hudi KVComparator for all HFile writer usages
    
    - Moving HoodieKVComparator from common.bootstrap.index to common.util
    
    * [HUDI-2443] Hudi KVComparator for all HFile writer usages
    
    - Retaining the old HoodieKVComparatorV2 for boostrap case. Adding the
      new comparator as HoodieKVComparatorV2 to differentiate from the old
      one.
    
    * [HUDI-2443] Hudi KVComparator for all HFile writer usages
    
     - Renamed HoodieKVComparatorV2 to HoodieMetadataKVComparator and moved it
       under the package org.apache.hudi.metadata.
    
    * Make comparator classname configurable
    
    * Revert new config and address other review comments
    
    Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>
    manojpec and codope committed Nov 24, 2021
    Copy the full SHA
    973f78f View commit details
    Browse the repository at this point in the history
  10. [HUDI-2788] Fixing issues w/ Z-order Layout Optimization (apache#4026)

    * Simplyfying, tidying up
    
    * Fixed packaging for `TestOptimizeTable`
    
    * Cleaned up `HoodiFileIndex` file filtering seq;
    Removed optimization manually reading Parquet table circumventing Spark
    
    * Refactored `DataSkippingUtils`:
      - Fixed checks to validate all statistics cols are present
      - Fixed some predicates being constructed incorrectly
      - Rewrote comments for easier comprehension, added more notes
      - Tidying up
    
    * Tidying up tests
    
    * `lint`
    
    * Fixing compilation
    
    * `TestOptimizeTable` > `TestTableLayoutOptimization`;
    Added assertions to test data skipping paths
    
    * Fixed tests to properly hit data-skipping path
    
    * Fixed pruned files candidates lookup seq to conservatively included all non-indexed files
    
    * Added java-doc
    
    * Fixed compilation
    Alexey Kudinkin committed Nov 24, 2021
    Copy the full SHA
    60b23b9 View commit details
    Browse the repository at this point in the history
  11. [HUDI-2766] Cluster update strategy should not be fenced by write con…

    …fig (apache#4093)
    
    Fix pending clustering rollback test
    codope committed Nov 24, 2021
    Copy the full SHA
    ff94d92 View commit details
    Browse the repository at this point in the history
  12. [HUDI-2793] Fixing deltastreamer checkpoint fetch/copy over (apache#4034

    )
    
    - Removed the copy over logic in transaction utils. Deltastreamer will go back to previous commits and get the checkpoint value.
    nsivabalan committed Nov 24, 2021
    Copy the full SHA
    435ea15 View commit details
    Browse the repository at this point in the history

Commits on Nov 25, 2021

  1. [HUDI-2853] Add JMX deps in hudi utilities and kafka connect bundles (a…

    …pache#4108)
    
    
    Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>
    rmahindra123 and Rajesh Mahindra committed Nov 25, 2021
    Copy the full SHA
    7286b56 View commit details
    Browse the repository at this point in the history
  2. [HUDI-2844][CLI] Fixing archived Timeline crashing if timeline contai…

    …ns REPLACE_COMMIT (apache#4091)
    Alexey Kudinkin committed Nov 25, 2021
    Copy the full SHA
    5129773 View commit details
    Browse the repository at this point in the history
  3. Copy the full SHA
    bef373f View commit details
    Browse the repository at this point in the history
  4. [HUDI-1290] [RFC-39] Deltastreamer avro source for Debezium CDC (apac…

    …he#4048)
    
    * Add RFC entry for deltastreamer source for debezium
    
    * Add RFC for debezium source
    
    * Add RFC for debezium source
    
    * Add RFC for debezium source
    
    * fix hyperlink issue and rebase
    
    * Update progress
    
    Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>
    rmahindra123 and Rajesh Mahindra committed Nov 25, 2021
    Copy the full SHA
    abc0175 View commit details
    Browse the repository at this point in the history
  5. [HUDI-1290] Add Debezium Source for deltastreamer (apache#4063)

    * add source for postgres debezium
    
    * Add tests for debezium payload
    
    * Fix test
    
    * Fix test
    
    * Add tests for debezium source
    
    * Add tests for debezium source
    
    * Fix schema for debezium
    
    * Fix checkstyle issues
    
    * Fix config issue for schema registry
    
    * Add mysql source for debezium
    
    * Fix checkstyle issues an tests
    
    * Improve code for merging toasted values
    
    * Improve code for merging toasted values
    
    Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>
    rmahindra123 and Rajesh Mahindra committed Nov 25, 2021
    Copy the full SHA
    83f8ed2 View commit details
    Browse the repository at this point in the history
  6. [HUDI-2792] Configure metadata payload consistency check (apache#4035)

    - Relax metadata payload consistency check to consider spark task failures with spurious deletes
    nsivabalan committed Nov 25, 2021
    Copy the full SHA
    a9bd208 View commit details
    Browse the repository at this point in the history
  7. Copy the full SHA
    88067f5 View commit details
    Browse the repository at this point in the history
  8. [HUDI-2480] FileSlice after pending compaction-requested instant-time… (

    apache#3703)
    
    * [HUDI-2480] FileSlice after pending compaction-requested instant-time is ignored by MOR snapshot reader
    
    * include file slice after a pending compaction for spark reader
    
    Co-authored-by: garyli1019 <yanjia.gary.li@gmail.com>
    danny0405 and garyli1019 committed Nov 25, 2021
    Copy the full SHA
    a2eb2b0 View commit details
    Browse the repository at this point in the history
  9. Copy the full SHA
    264e1ce View commit details
    Browse the repository at this point in the history
  10. Copy the full SHA
    b972aa5 View commit details
    Browse the repository at this point in the history
  11. [HUDI-2794] Guarding table service commits within a single lock to co…

    …mmit to both data table and metadata table (apache#4037)
    
    * Fixing a single lock to commit table services across metadata table and data table
    
    * Addressing comments
    
    * rebasing with master
    nsivabalan committed Nov 25, 2021
    Copy the full SHA
    7bb90e8 View commit details
    Browse the repository at this point in the history
  12. [HUDI-2671] Making error -> warn logs from timeline server with concu…

    …rrent writers for inconsistent state (apache#4088)
    
    * Making error -> warn logs from timeline server with concurrent writers for inconsistent state
    
    * Fixing bad request response exception for timeline out of sync
    
    * Addressing feedback. removed write concurrency mode depedency
    nsivabalan committed Nov 25, 2021
    Copy the full SHA
    f692078 View commit details
    Browse the repository at this point in the history
  13. Copy the full SHA
    6a0f079 View commit details
    Browse the repository at this point in the history
  14. Copy the full SHA
    8e13793 View commit details
    Browse the repository at this point in the history
  15. Copy the full SHA
    e0125a7 View commit details
    Browse the repository at this point in the history
  16. [HUDI-2840] Fixed DeltaStreaemer to properly respect configuration pa…

    …ssed t/h properties file (apache#4090)
    
    * Rebased `DFSPropertiesConfiguration` to access Hadoop config in liue of FS to avoid confusion
    
    * Fixed `readConfig` to take Hadoop's `Configuration` instead of FS;
    Fixing usages
    
    * Added test for local FS access
    
    * Rebase to use `FSUtils.getFs`
    
    * Combine properties provided as a file along w/ overrides provided from the CLI
    
    * Added helper utilities to `HoodieClusteringConfig`;
    Make sure corresponding config methods fallback to defaults;
    
    * Fixed DeltaStreamer usage to respect properly combined configuration;
    Abstracted `HoodieClusteringConfig.from` convenience utility to init Clustering config from `Properties`
    
    * Tidying up
    
    * `lint`
    
    * Reverting changes to `HoodieWriteConfig`
    
    * Tdiying up
    
    * Fixed incorrect merge of the props
    
    * Converted `HoodieConfig` to wrap around `Properties` into `TypedProperties`
    
    * Fixed compilation
    
    * Fixed compilation
    Alexey Kudinkin committed Nov 25, 2021
    Copy the full SHA
    6f5d8d0 View commit details
    Browse the repository at this point in the history
  17. Copy the full SHA
    8340ccb View commit details
    Browse the repository at this point in the history

Commits on Nov 26, 2021

  1. Copy the full SHA
    38585e4 View commit details
    Browse the repository at this point in the history
  2. [MINOR] Include hudi-aws in flink bundle jar (apache#4127)

    HUDI-2801 makes this jar as required.
    danny0405 committed Nov 26, 2021
    Copy the full SHA
    f5da9b5 View commit details
    Browse the repository at this point in the history
  3. [HUDI-2852] Table metadata returns empty for non-exist partition (apa…

    …che#4117)
    
    * [HUDI-2852] Table metadata returns empty for non-exist partition
    
    * add unit test
    
    * fix code checkstyle
    
    Co-authored-by: wangminchao <wangminchao@asinking.com>
    minchowang and wangminchao committed Nov 26, 2021
    Copy the full SHA
    e554c7f View commit details
    Browse the repository at this point in the history
  4. Copy the full SHA
    e9efbdb View commit details
    Browse the repository at this point in the history
  5. [HUDI-2850] Fixing Clustering CLI - schedule and run command fixes to…

    … avoid NumberFormatException (apache#4101)
    manojpec committed Nov 26, 2021
    Copy the full SHA
    3d75aca View commit details
    Browse the repository at this point in the history
  6. [HUDI-2814] Addressing issues w/ Z-order Layout Optimization (apache#…

    …4060)
    
    * `ZCurveOptimizeHelper` > `ZOrderingIndexHelper`;
    Moved Z-index helper under `hudi.index.zorder` package
    
    * Tidying up `ZOrderingIndexHelper`
    
    * Fixing compilation
    
    * Fixed index new/original table merging sequence to always prefer values from new index;
    Cleaned up `HoodieSparkUtils`
    
    * Added test for `mergeIndexSql`
    
    * Abstracted Z-index name composition w/in `ZOrderingIndexHelper`;
    
    * Fixed `DataSkippingUtils` to interrupt prunning in case data filter contains non-indexed column reference
    
    * Properly handle exceptions origination during pruning in `HoodieFileIndex`
    
    * Make sure no errors are logged upon encountering `AnalysisException`
    
    * Cleaned up Z-index updating sequence;
    Tidying up comments, java-docs;
    
    * Fixed Z-index to properly handle changes of the list of clustered columns
    
    * Tidying up
    
    * `lint`
    
    * Suppressing `JavaDocStyle` first sentence check
    
    * Fixed compilation
    
    * Fixing incorrect `DecimalType` conversion
    
    * Refactored test `TestTableLayoutOptimization`
      - Added Z-index table composition test (against fixtures)
      - Separated out GC test;
    Tidying up
    
    * Fixed tests re-shuffling column order for Z-Index table `DataFrame` to align w/ the one by one loaded from JSON
    
    * Scaffolded `DataTypeUtils` to do basic checks of Spark types;
    Added proper compatibility checking b/w old/new index-tables
    
    * Added test for Z-index tables merging
    
    * Fixed import being shaded by creating internal `hudi.util` package
    
    * Fixed packaging for `TestOptimizeTable`
    
    * Revised `updateMetadataIndex` seq to provide Z-index updating process w/ source table schema
    
    * Make sure existing Z-index table schema is sync'd to source table's one
    
    * Fixed shaded refs
    
    * Fixed tests
    
    * Fixed type conversion of Parquet provided metadata values into Spark expected schemas
    
    * Fixed `composeIndexSchema` utility to propose proper schema
    
    * Added more tests for Z-index:
      - Checking that Z-index table is built correctly
      - Checking that Z-index tables are merged correctly (during update)
    
    * Fixing source table
    
    * Fixing tests to read from Parquet w/ proper schema
    
    * Refactored `ParquetUtils` utility reading stats from Parquet footers
    
    * Fixed incorrect handling of Decimals extracted from Parquet footers
    
    * Worked around issues in javac failign to compile stream's collection
    
    * Fixed handling of `Date` type
    
    * Fixed handling of `DateType` to be parsed as `LocalDate`
    
    * Updated fixture;
    Make sure test loads Z-index fixture using proper schema
    
    * Removed superfluous scheme adjusting when reading from Parquet, since Spark is actually able to perfectly restore schema (given Parquet was previously written by Spark as well)
    
    * Fixing race-condition in Parquet's `DateStringifier` trying to share `SimpleDataFormat` object which is inherently not thread-safe
    
    * Tidying up
    
    * Make sure schema is used upon reading to validate input files are in the appropriate format;
    Tidying up;
    
    * Worked around javac (1.8) inability to infer expression type properly
    
    * Updated fixtures;
    Tidying up
    
    * Fixing compilation after rebase
    
    * Assert clustering have in Z-order layout optimization testing
    
    * Tidying up exception messages
    
    * XXX
    
    * Added test validating Z-index lookup filter correctness
    
    * Added more test-cases;
    Tidying up
    
    * Added tests for string expressions
    
    * Fixed incorrect Z-index filter lookup translations
    
    * Added more test-cases
    
    * Added proper handling on complex negations of AND/OR expressions by pushing NOT operator down into inner expressions for appropriate handling
    
    * Added `-target:jvm-1.8` for `hudi-spark` module
    
    * Adding more tests
    
    * Added tests for non-indexed columns
    
    * Properly handle non-indexed columns by falling back to a re-write of containing expression as  `TrueLiteral` instead
    
    * Fixed tests
    
    * Removing the parquet test files and disabling corresponding tests
    
    Co-authored-by: Vinoth Chandar <vinoth@apache.org>
    Alexey Kudinkin and vinothchandar committed Nov 26, 2021
    Copy the full SHA
    5755ff2 View commit details
    Browse the repository at this point in the history
  7. Copy the full SHA
    a88691f View commit details
    Browse the repository at this point in the history
  8. Copy the full SHA
    f8e0176 View commit details
    Browse the repository at this point in the history
  9. [HUDI-2767] Enabling timeline-server-based marker as default (apache#…

    …4112)
    
    - Changes the default config of marker type (HoodieWriteConfig.MARKERS_TYPE or hoodie.write.markers.type) from DIRECT to TIMELINE_SERVER_BASED for Spark Engine.
    - Adds engine-specific marker type configs: Spark -> TIMELINE_SERVER_BASED, Flink -> DIRECT, Java -> DIRECT.
    - Uses DIRECT markers as well for Spark structured streaming due to timeline server only available for the first mini-batch.
    - Fixes the marker creation method for non-partitioned table in TimelineServerBasedWriteMarkers.
    - Adds the fallback to direct markers even when TIMELINE_SERVER_BASED is configured, in WriteMarkersFactory: when HDFS is used, or embedded timeline server is disabled, the fallback to direct markers happens.
    - Fixes the closing of timeline service.
    - Fixes tests that depend on markers, mainly by starting the timeline service for each test.
    yihua committed Nov 26, 2021
    Copy the full SHA
    d1e83e4 View commit details
    Browse the repository at this point in the history
  10. [HUDI-2845] Metadata CLI - files/partition file listing fix and new v…

    …alidate option (apache#4092)
    
    - Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>
    manojpec committed Nov 26, 2021
    Copy the full SHA
    445208a View commit details
    Browse the repository at this point in the history
  11. Copy the full SHA
    8402cac View commit details
    Browse the repository at this point in the history
  12. [HUDI-2864] Fix README and scripts with current limitations of hive s…

    …ync (apache#4129)
    
    * Fix README with current limitations of hive sync
    
    * Fix README with current limitations of hive sync
    
    * Fix dep issue
    
    * Fix Copy on Write flow
    
    Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>
    rmahindra123 and Rajesh Mahindra committed Nov 26, 2021
    Copy the full SHA
    9028e6e View commit details
    Browse the repository at this point in the history
  13. [HUDI-2856] Bit cask disk map delete modified (apache#4116)

    * modified BitCaskDiskMap_close_function
    
    * change iterators location to finally
    
    * Update BitCaskDiskMap.java
    xuzifu666 committed Nov 26, 2021
    Copy the full SHA
    257a6a7 View commit details
    Browse the repository at this point in the history

Commits on Nov 27, 2021

  1. Copy the full SHA
    9c059ef View commit details
    Browse the repository at this point in the history
  2. [HUDI-2868] Fix skipped HoodieSparkSqlWriterSuite (apache#4125)

    - Co-authored-by: Yann Byron <biyan900116@gmail.com>
    xushiyan committed Nov 27, 2021
    Copy the full SHA
    3a8d64e View commit details
    Browse the repository at this point in the history
  3. [HUDI-2475] [HUDI-2862] Metadata table creation and avoid bootstrappi…

    …ng race for write client & add locking for upgrade (apache#4114)
    
    Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>
    manojpec and nsivabalan committed Nov 27, 2021
    Copy the full SHA
    2c7656c View commit details
    Browse the repository at this point in the history
  4. [HUDI-2102] Support hilbert curve for hudi (apache#3952)

    Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
    xiarixiaoyao and yihua committed Nov 27, 2021
    Copy the full SHA
    780a2ac View commit details
    Browse the repository at this point in the history
  5. Copy the full SHA
    a1d0ff4 View commit details
    Browse the repository at this point in the history

Commits on Nov 28, 2021

  1. Copy the full SHA
    eca1693 View commit details
    Browse the repository at this point in the history

Commits on Nov 29, 2021

  1. Copy the full SHA
    52aae36 View commit details
    Browse the repository at this point in the history
  2. Copy the full SHA
    38e75ea View commit details
    Browse the repository at this point in the history
  3. Copy the full SHA
    536af4b View commit details
    Browse the repository at this point in the history

Commits on Nov 30, 2021

  1. Copy the full SHA
    3433f00 View commit details
    Browse the repository at this point in the history
  2. Copy the full SHA
    a398aad View commit details
    Browse the repository at this point in the history
  3. Copy the full SHA
    ea009b5 View commit details
    Browse the repository at this point in the history

Commits on Dec 1, 2021

  1. Revert "[HUDI-2855] Change the default value of 'PAYLOAD_CLASS_NAME' …

    …to 'DefaultHoodieRecordPayload' (apache#4115)" (apache#4169)
    
    This reverts commit 88067f5.
    Alexey Kudinkin committed Dec 1, 2021
    Copy the full SHA
    24380c2 View commit details
    Browse the repository at this point in the history
  2. Copy the full SHA
    9b254b6 View commit details
    Browse the repository at this point in the history
  3. [HUDI-2880] Fixing loading of props from default dir (apache#4167)

    * Fixing loading of props from default dir
    
    * addressing comments
    nsivabalan committed Dec 1, 2021
    Copy the full SHA
    f4c25ba View commit details
    Browse the repository at this point in the history

Commits on Dec 2, 2021

  1. Copy the full SHA
    5284730 View commit details
    Browse the repository at this point in the history
  2. Fixed partitions produced by layout optimization in case order-by key…

    … is composed of a single column (apache#4183)
    Alexey Kudinkin committed Dec 2, 2021
    Copy the full SHA
    772f5ca View commit details
    Browse the repository at this point in the history
  3. [MINOR] Fix the wrong usage of timestamp length variable bug (apache#…

    …4179)
    
    Signed-off-by: zzzhy <candle_1667@163.com>
    zzzhy committed Dec 2, 2021
    Copy the full SHA
    61a03bc View commit details
    Browse the repository at this point in the history
  4. [HUDI-2904] Fix metadata table archival overstepping between regular …

    …writers and table services (apache#4186)
    
    - Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>
    - Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>
    rmahindra123 committed Dec 2, 2021
    Copy the full SHA
    91d2e61 View commit details
    Browse the repository at this point in the history

Commits on Dec 3, 2021

  1. Copy the full SHA
    934fe54 View commit details
    Browse the repository at this point in the history
  2. Copy the full SHA
    f74b3d1 View commit details
    Browse the repository at this point in the history
  3. Copy the full SHA
    0699521 View commit details
    Browse the repository at this point in the history
  4. Copy the full SHA
    ca42724 View commit details
    Browse the repository at this point in the history
  5. [HUDI-2902] Fixing populate meta fields with Hfile writers and Disabl…

    …ing virtual keys by default for metadata table (apache#4194)
    nsivabalan committed Dec 3, 2021
    Copy the full SHA
    e483f7c View commit details
    Browse the repository at this point in the history
  6. [HUDI-2911] Removing default value for PARTITIONPATH_FIELD_NAME res…

    …ulting in incorrect `KeyGenerator` configuration (apache#4195)
    Alexey Kudinkin committed Dec 3, 2021
    Copy the full SHA
    bed7f98 View commit details
    Browse the repository at this point in the history
  7. Revert "[HUDI-2495] Resolve inconsistent key generation for timestamp…

    … types by GenericRecord and Row (apache#3944)" (apache#4201)
    YannByron committed Dec 3, 2021
    Copy the full SHA
    2f96f43 View commit details
    Browse the repository at this point in the history
  8. [HUDI-2894][HUDI-2905] Metadata table - avoiding key lookup failures …

    …on base files over S3 (apache#4185)
    
    - Fetching partition files or all partitions from the metadata table is failing
       when run over S3. Metadata table uses HFile format for the base files and the
       record lookup uses HFile.Reader and HFileScanner interfaces to get records by
       partition keys. When the backing storage is S3, this record lookup from HFiles
       is failing with IOException, in turn failing the caller commit/update operations.
    
     - Metadata table looks up HFile records with positional read enabled so as to
       perform better for random lookups. But this positional read key lookup is
       returning with partial read sizes over S3 leading to HFile scanner throwing
       IOException. This doesn't happen over HDFS. Metadata table though uses the HFile
       for random key lookups, the positional read is not mandatory as we sort the keys
       when doing a lookup for multiple keys.
    
     - The fix is to disable HFile positional read for all HFile scanner based
       key lookups.
    manojpec committed Dec 3, 2021
    Copy the full SHA
    383d5ed View commit details
    Browse the repository at this point in the history

Commits on Dec 4, 2021

  1. Revert "[HUDI-2489]Tuning HoodieROTablePathFilter by caching hoodieTa…

    …bleFileSystemView, aiming to reduce unnecessary list/get requests"
    
    Co-authored-by: yuezhang <yuezhang@freewheel.tv>
    zhangyue19921010 and yuezhang committed Dec 4, 2021
    Copy the full SHA
    5616830 View commit details
    Browse the repository at this point in the history
  2. [MINOR] Mitigate CI jobs timeout issues (apache#4173)

    * skip shutdown zookeeper in `@AfterAll` in TestHBaseIndex
    
    * rebalance CI tests
    xushiyan committed Dec 4, 2021
    Copy the full SHA
    a799fae View commit details
    Browse the repository at this point in the history
  3. Copy the full SHA
    0fd6b2d View commit details
    Browse the repository at this point in the history
  4. [HUDI-2890] Kafka Connect: Fix failed writes and avoid table service …

    …concurrent operations (apache#4211)
    
    * Fix kafka connect readme
    
    * Fix handling of errors in write records for kafka connect
    
    * By default, ensure we skip error records and keep the pipeline alive
    
    * Fix indentation
    
    Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>
    rmahindra123 and Rajesh Mahindra committed Dec 4, 2021
    Copy the full SHA
    94f45e9 View commit details
    Browse the repository at this point in the history
  5. [HUDI-2923] Fixing metadata table reader when metadata compaction is …

    …inflight (apache#4206)
    
    * [HUDI-2923] Fixing metadata table reader when metadata compaction is inflight
    
    * Fixing retry of pending compaction in metadata table and enhancing tests
    nsivabalan committed Dec 4, 2021
    Copy the full SHA
    1d4fb82 View commit details
    Browse the repository at this point in the history
  6. Copy the full SHA
    568181a View commit details
    Browse the repository at this point in the history
  7. [HUDI-2935] Remove special casing of clustering in deltastreamer chec…

    …kpoint retrival (apache#4216)
    
    - We now seek backwards to find the checkpoint
     - No need to return empty anymore
    vinothchandar committed Dec 4, 2021
    Copy the full SHA
    36b69d8 View commit details
    Browse the repository at this point in the history

Commits on Dec 5, 2021

  1. [HUDI-2877] Support flink catalog to help user use flink table conven…

    …iently (apache#4153)
    
    * [HUDI-2877] Support flink catalog to help user use flink table conveniently
    
    * Fix comment
    
    * fix comment2
    lsyldliu committed Dec 5, 2021
    Copy the full SHA
    a8fb696 View commit details
    Browse the repository at this point in the history
  2. [HUDI-2937] Introduce a pulsar implementation of hoodie write commit … (

    apache#4217)
    
    * [HUDI-2937] Introduce a pulsar implementation of hoodie write commit callback
    
    * [HUDI-2937] Introduce a pulsar implementation of hoodie write commit callback
    
    * [HUDI-2937] Introduce a pulsar implementation of hoodie write commit callback
    
    * [HUDI-2937] Introduce a pulsar implementation of hoodie write commit callback
    
    * [HUDI-2937] Introduce a pulsar implementation of hoodie write commit callback
    
    * [HUDI-2937] Introduce a pulsar implementation of hoodie write commit callback
    
    * [HUDI-2937] Introduce a pulsar implementation of hoodie write commit callback
    XuQianJin-Stars committed Dec 5, 2021
    Copy the full SHA
    63b1560 View commit details
    Browse the repository at this point in the history
  3. [HUDI-2418] Support HiveSchemaProvider (apache#3671)

    Co-authored-by: jian.feng <fengjian428@gmial.com>
    fengjian428 and jian.feng committed Dec 5, 2021
    Copy the full SHA
    734c9f5 View commit details
    Browse the repository at this point in the history

Commits on Dec 6, 2021

  1. Copy the full SHA
    f0e46bf View commit details
    Browse the repository at this point in the history
  2. [HUDI-2900] Fix corrupt block end position (apache#4181)

    * [HUDI-2900] Fix corrupt block end position
    
    * add a test
    lsyldliu committed Dec 6, 2021
    Copy the full SHA
    84b531a View commit details
    Browse the repository at this point in the history
  3. [HUDI-2876] for hive/presto hudi should remove the temp file which cr…

    …eated by HoodieMergedLogRecordSanner when the query finished. (apache#4139)
    xiarixiaoyao committed Dec 6, 2021
    Copy the full SHA
    57c4bf8 View commit details
    Browse the repository at this point in the history
  4. Copy the full SHA
    2d66451 View commit details
    Browse the repository at this point in the history
  5. [MINOR] Use maven-shade-plugin version for hudi-timeline-server-bundl…

    …e from main pom.xml (apache#4209)
    
    Co-authored-by: Wenning Ding <wenningd@amazon.com>
    zhedoubushishi and Wenning Ding committed Dec 6, 2021
    Copy the full SHA
    4a437f2 View commit details
    Browse the repository at this point in the history

Commits on Dec 7, 2021

  1. [MINOR] Remove redundant and conflicting spark-hive dependency (apach…

    …e#4228)
    
    Disable TestHiveSchemaProvider
    codope committed Dec 7, 2021
    Copy the full SHA
    6dab307 View commit details
    Browse the repository at this point in the history
  2. Copy the full SHA
    e8473b9 View commit details
    Browse the repository at this point in the history

Commits on Dec 8, 2021

  1. Copy the full SHA
    c9e18d1 View commit details
    Browse the repository at this point in the history
  2. Copy the full SHA
    c56d93e View commit details
    Browse the repository at this point in the history
  3. [HUDI-2832][RFC-41] Proposal to integrate Hudi on Snowflake platform (a…

    …pache#4074)
    
    * [HUDI-2832][RFC-40] Proposal to integrate Hudi on Snowflake platform
    
    * rebased and addressed review comments
    Vinoth Govindarajan committed Dec 8, 2021
    Copy the full SHA
    082faa3 View commit details
    Browse the repository at this point in the history

Commits on Dec 9, 2021

  1. Copy the full SHA
    7c3f077 View commit details
    Browse the repository at this point in the history
  2. Copy the full SHA
    bd08470 View commit details
    Browse the repository at this point in the history
  3. [HUDI-2665] Fix overflow of huge log file in HoodieLogFormatWriter (a…

    …pache#3912)
    
    Co-authored-by: guanziyue.gzy <guanziyue.gzy@bytedance.com>
    guanziyue and ZiyueGuan-bytedance committed Dec 9, 2021
    Copy the full SHA
    9c8ad0f View commit details
    Browse the repository at this point in the history
  4. Copy the full SHA
    5ac9ce7 View commit details
    Browse the repository at this point in the history
  5. Copy the full SHA
    f612a20 View commit details
    Browse the repository at this point in the history
  6. [HUDI-2966] Add TaskCompletionListener for HoodieMergeOnReadRDD to cl…

    …ose logScaner when the query finished. (apache#4265)
    
    * [HUDI-2966] Add TaskCompletionListener for HoodieMergeOnReadRDD to close logScaner when the query finished.
    xiarixiaoyao committed Dec 9, 2021
    Copy the full SHA
    68f8597 View commit details
    Browse the repository at this point in the history
  7. Copy the full SHA
    3fb2f97 View commit details
    Browse the repository at this point in the history

Commits on Dec 10, 2021

  1. Copy the full SHA
    8321d20 View commit details
    Browse the repository at this point in the history
  2. Copy the full SHA
    ea154bc View commit details
    Browse the repository at this point in the history
  3. Copy the full SHA
    456d74c View commit details
    Browse the repository at this point in the history
  4. Copy the full SHA
    c7473a7 View commit details
    Browse the repository at this point in the history
  5. Copy the full SHA
    f194566 View commit details
    Browse the repository at this point in the history
  6. Copy the full SHA
    be36826 View commit details
    Browse the repository at this point in the history
  7. [HUDI-2912] Fix CompactionPlanOperator typo (apache#4187)

    Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com>
    yuzhaojing and yuzhaojing committed Dec 10, 2021
    Copy the full SHA
    3ad9b12 View commit details
    Browse the repository at this point in the history
  8. Copy the full SHA
    3ce0526 View commit details
    Browse the repository at this point in the history
  9. [HUDI-2892][BUG] Pending Clustering may stain the ActiveTimeLine and …

    …lead to incomplete query results (apache#4172)
    
    Co-authored-by: yuezhang <yuezhang@freewheel.tv>
    zhangyue19921010 and yuezhang committed Dec 10, 2021
    Copy the full SHA
    3ba2909 View commit details
    Browse the repository at this point in the history
  10. Copy the full SHA
    72901a3 View commit details
    Browse the repository at this point in the history
  11. Copy the full SHA
    2d864f7 View commit details
    Browse the repository at this point in the history

Commits on Dec 11, 2021

  1. Copy the full SHA
    c48a2a1 View commit details
    Browse the repository at this point in the history
  2. [HUDI-2974] Make the prefix for metrics name configurable (apache#4274)

    Co-authored-by: Rajesh Mahindra <rmahindra@Rajeshs-MacBook-Pro.local>
    rmahindra123 and Rajesh Mahindra committed Dec 11, 2021
    Copy the full SHA
    9797fdf View commit details
    Browse the repository at this point in the history
  3. Copy the full SHA
    9bdcee0 View commit details
    Browse the repository at this point in the history
  4. Copy the full SHA
    2dcb3f0 View commit details
    Browse the repository at this point in the history
  5. Copy the full SHA
    b5f05fd View commit details
    Browse the repository at this point in the history
  6. Copy the full SHA
    8dd0444 View commit details
    Browse the repository at this point in the history

Commits on Dec 12, 2021

  1. [HUDI-2946] Upgrade maven plugins to be compatible with higher Java v…

    …ersions (apache#4232)
    
    Co-authored-by: Wenning Ding <wenningd@amazon.com>
    zhedoubushishi and Wenning Ding committed Dec 12, 2021
    Copy the full SHA
    15444c9 View commit details
    Browse the repository at this point in the history
  2. Copy the full SHA
    b22c2c6 View commit details
    Browse the repository at this point in the history

Commits on Dec 13, 2021

  1. Copy the full SHA
    dd96129 View commit details
    Browse the repository at this point in the history
  2. [HUDI-2994] Add judgement to existed partitionPath in the catch code …

    …block for HU… (apache#4294)
    
    * [HUDI-2994] Add judgement to existed partition path in the catch code block for HUDI-2743
    
    Co-authored-by: wangminchao <wangminchao@asinking.com>
    minchowang and wangminchao committed Dec 13, 2021
    Copy the full SHA
    46de25d View commit details
    Browse the repository at this point in the history

Commits on Dec 14, 2021

  1. Copy the full SHA
    29bc5fd View commit details
    Browse the repository at this point in the history
  2. Copy the full SHA
    c8d6bd8 View commit details
    Browse the repository at this point in the history
  3. [HUDI-2995] Enabling metadata table by default (apache#4295)

    - Enabling metadata table by default
    manojpec committed Dec 14, 2021
    Copy the full SHA
    bc8bf04 View commit details
    Browse the repository at this point in the history

Commits on Dec 15, 2021

  1. [HUDI-3022] Fix NPE for isDropPartition method (apache#4319)

    * [HUDI-3022] Fix NPE for isDropPartition method
    XuQianJin-Stars committed Dec 15, 2021
    Copy the full SHA
    dbec6c5 View commit details
    Browse the repository at this point in the history
  2. [HUDI-3024] Add explicit write handler for flink (apache#4329)

    Co-authored-by: wangminchao <wangminchao@asinking.com>
    minchowang and wangminchao committed Dec 15, 2021
    Copy the full SHA
    9a2030a View commit details
    Browse the repository at this point in the history
  3. [HUDI-3025] Add additional wait time for namenode availability during…

    … IT tests initiatialization (apache#4328)
    
    - Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>
    yihua committed Dec 15, 2021
    Copy the full SHA
    3b89457 View commit details
    Browse the repository at this point in the history
  4. [HUDI-3028] Use blob storage to speed up CI downloads (apache#4331)

    Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>
    xushiyan and nsivabalan committed Dec 15, 2021
    Copy the full SHA
    27907de View commit details
    Browse the repository at this point in the history
  5. [HUDI-2998] claiming rfc number for consistent hashing index (apache#…

    …4303)
    
    Co-authored-by: xiaoyuwei <xiaoyuwei.yw@alibaba-inc.com>
    YuweiXiao and xiaoyuwei committed Dec 15, 2021
    Copy the full SHA
    f5b07a7 View commit details
    Browse the repository at this point in the history

Commits on Dec 16, 2021

  1. Copy the full SHA
    ea2eba1 View commit details
    Browse the repository at this point in the history
  2. [Minor] Catch and ignore all the exceptions in quietDeleteMarkerDir (a…

    …pache#4301)
    
    Co-authored-by: yuezhang <yuezhang@freewheel.tv>
    zhangyue19921010 and yuezhang committed Dec 16, 2021
    Copy the full SHA
    a8a192a View commit details
    Browse the repository at this point in the history
  3. Copy the full SHA
    294d712 View commit details
    Browse the repository at this point in the history

Commits on Dec 17, 2021

  1. [HUDI-3043] Revert async cleaner leak commit to unblock CI failure (a…

    …pache#4343)
    
    * Revert "[HUDI-2959] Fix the thread leak of cleaning service (apache#4252)"
    Reverting to unblock CI failure for now. will revisit this with the right fix
    nsivabalan committed Dec 17, 2021
    Copy the full SHA
    7e7ad15 View commit details
    Browse the repository at this point in the history
  2. Copy the full SHA
    d0087d4 View commit details
    Browse the repository at this point in the history
  3. [HUDI-3046] Claim RFC number for RFC for Compaction / Clustering Serv…

    …ice (apache#4347)
    
    Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com>
    yuzhaojing and yuzhaojing committed Dec 17, 2021
    Copy the full SHA
    e4cfb42 View commit details
    Browse the repository at this point in the history
  4. [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, wh…

    …en using bulkinsert to insert data which contains decimalType (apache#4253)
    xiarixiaoyao committed Dec 17, 2021
    Copy the full SHA
    9246b16 View commit details
    Browse the repository at this point in the history
  5. Copy the full SHA
    6eba834 View commit details
    Browse the repository at this point in the history

Commits on Dec 18, 2021

  1. [HUDI-2962] InProcess lock provider to guard single writer process wi…

    …th async table operations (apache#4259)
    
     - Adding Local JVM process based lock provider implementation
    
     - This local lock provider can be used by a single writer process with async
       table operations to guard the metadata tabl against concurrent updates.
    manojpec committed Dec 18, 2021
    Copy the full SHA
    7784249 View commit details
    Browse the repository at this point in the history
  2. Copy the full SHA
    4785244 View commit details
    Browse the repository at this point in the history
  3. [HUDI-3029] Transaction manager: avoid deadlock when doing begin and …

    …end transactions (apache#4363)
    
    * [HUDI-3029] Transaction manager: avoid deadlock when doing begin and end transactions
    
     - Transaction manager has begin and end transactions as synchronized methods.
       Based on the lock provider implementaion, this can lead to deadlock
       situation when the underlying lock() calls are blocking or with a long timeout.
    
     - Fixing transaction manager begin and end transactions to not get to deadlock
       and to not assume anything on the lock provider implementation.
    manojpec committed Dec 18, 2021
    Copy the full SHA
    d1d48ed View commit details
    Browse the repository at this point in the history
  4. Copy the full SHA
    733732b View commit details
    Browse the repository at this point in the history
  5. Copy the full SHA
    dc40397 View commit details
    Browse the repository at this point in the history
  6. Copy the full SHA
    77abb5c View commit details
    Browse the repository at this point in the history

Commits on Dec 19, 2021

  1. Copy the full SHA
    f57e28f View commit details
    Browse the repository at this point in the history
  2. Copy the full SHA
    bb99836 View commit details
    Browse the repository at this point in the history
  3. Copy the full SHA
    478f9f3 View commit details
    Browse the repository at this point in the history
  4. Copy the full SHA
    03f71ef View commit details
    Browse the repository at this point in the history
  5. [HUDI-3064][HUDI-3054] FileSystemBasedLockProviderTestClass tryLock f…

    …ix and TestHoodieClientMultiWriter test fixes (apache#4384)
    
     - Made FileSystemBasedLockProviderTestClass thread safe and fixed the
       tryLock retry logic.
    
     - Made TestHoodieClientMultiWriter. testHoodieClientBasicMultiWriter
       deterministic in verifying the HoodieWriteConflictException.
    manojpec committed Dec 19, 2021
    Copy the full SHA
    4a48f99 View commit details
    Browse the repository at this point in the history

Commits on Dec 20, 2021

  1. Copy the full SHA
    3ca9210 View commit details
    Browse the repository at this point in the history
  2. Copy the full SHA
    f166dda View commit details
    Browse the repository at this point in the history

Commits on Dec 21, 2021

  1. Copy the full SHA
    982ae3d View commit details
    Browse the repository at this point in the history
  2. [HUDI-3070] Add rerunFailingTestsCount for flakly testes (apache#4398)

    
    Co-authored-by: yuezhang <yuezhang@freewheel.tv>
    zhangyue19921010 and yuezhang committed Dec 21, 2021
    Copy the full SHA
    f3f6112 View commit details
    Browse the repository at this point in the history
  3. Copy the full SHA
    32a44bb View commit details
    Browse the repository at this point in the history
  4. Copy the full SHA
    7d046f9 View commit details
    Browse the repository at this point in the history
  5. Copy the full SHA
    92f54ce View commit details
    Browse the repository at this point in the history

Commits on Dec 22, 2021

  1. Copy the full SHA
    f1286c2 View commit details
    Browse the repository at this point in the history
  2. [HUDI-2547] Schedule Flink compaction in service (apache#4254)

    Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com>
    yuzhaojing and yuzhaojing committed Dec 22, 2021
    Copy the full SHA
    15eb7e8 View commit details
    Browse the repository at this point in the history
  3. Merge pull request apache#4308 from harsh1231/HUDI-3008

    [HUDI-3008] Fixing HoodieFileIndex partition column parsing for nested fields
    xiarixiaoyao committed Dec 22, 2021
    Copy the full SHA
    b5890cd View commit details
    Browse the repository at this point in the history
  4. [HUDI-3011] Adding ability to read entire data with HoodieIncrSource …

    …with empty checkpoint (apache#4334)
    
    * Adding ability to read entire data with HoodieIncrSource with empty checkpoint
    
    * Addressing comments
    nsivabalan committed Dec 22, 2021
    Copy the full SHA
    1a5f869 View commit details
    Browse the repository at this point in the history
  5. Copy the full SHA
    5d93edc View commit details
    Browse the repository at this point in the history
  6. Copy the full SHA
    57f43de View commit details
    Browse the repository at this point in the history

Commits on Dec 23, 2021

  1. Copy the full SHA
    032b883 View commit details
    Browse the repository at this point in the history

Commits on Dec 24, 2021

  1. Copy the full SHA
    4721073 View commit details
    Browse the repository at this point in the history

Commits on Dec 25, 2021

  1. Copy the full SHA
    7b07aac View commit details
    Browse the repository at this point in the history
  2. Copy the full SHA
    c81df99 View commit details
    Browse the repository at this point in the history

Commits on Dec 28, 2021

  1. Copy the full SHA
    282aa68 View commit details
    Browse the repository at this point in the history
  2. [HUDI-2374] Fixing AvroDFSSource does not use the overridden schema t…

    …o deserialize Avro binaries (apache#4353)
    harsh1231 committed Dec 28, 2021
    Copy the full SHA
    6409fc7 View commit details
    Browse the repository at this point in the history
  3. Copy the full SHA
    1f7afba View commit details
    Browse the repository at this point in the history
  4. Copy the full SHA
    32505d5 View commit details
    Browse the repository at this point in the history
  5. Copy the full SHA
    05942e0 View commit details
    Browse the repository at this point in the history
  6. Copy the full SHA
    3d7a869 View commit details
    Browse the repository at this point in the history
  7. Copy the full SHA
    9412281 View commit details
    Browse the repository at this point in the history

Commits on Dec 29, 2021

  1. Copy the full SHA
    a29b27c View commit details
    Browse the repository at this point in the history
  2. Copy the full SHA
    504747e View commit details
    Browse the repository at this point in the history

Commits on Dec 30, 2021

  1. Copy the full SHA
    5c0e4ce View commit details
    Browse the repository at this point in the history
  2. [HUDI-3083] Support component data types for flink bulk_insert (apach…

    …e#4470)
    
    * [HUDI-3083] Support component data types for flink bulk_insert
    
    * add nested row type test
    lsyldliu committed Dec 30, 2021
    Copy the full SHA
    674c149 View commit details
    Browse the repository at this point in the history
  3. Copy the full SHA
    436becf View commit details
    Browse the repository at this point in the history
  4. [HUDI-3124] Bootstrap when timeline have completed instant (apache#4467)

    Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com>
    yuzhaojing and yuzhaojing committed Dec 30, 2021
    Copy the full SHA
    0f0088f View commit details
    Browse the repository at this point in the history
  5. [HUDI-1951] Add bucket hash index, compatible with the hive bucket (a…

    …pache#3173)
    
    * [HUDI-2154] Add index key field to HoodieKey
    
    * [HUDI-2157] Add the bucket index and its read/write implemention of Spark engine.
    * revert HUDI-2154 add index key field to HoodieKey
    * fix all comments and introduce a new tricky way to get index key at runtime
    support double insert for bucket index
    * revert spark read optimizer based on bucket index
    * add the storage layout
    * index tag, hash function and add ut
    * fix ut
    * address partial comments
    * Code review feedback
    * add layout config and docs
    * fix ut
    * rename hoodie.layout and rebase master
    
    Co-authored-by: Vinoth Chandar <vinoth@apache.org>
    minihippo and vinothchandar committed Dec 30, 2021
    Copy the full SHA
    a4e622a View commit details
    Browse the repository at this point in the history

Commits on Dec 31, 2021

  1. [HUDI-3120] Cache compactionPlan in buffer (apache#4463)

    Co-authored-by: yuzhaojing <yuzhaojing@bytedance.com>
    yuzhaojing and yuzhaojing committed Dec 31, 2021
    Copy the full SHA
    e88b5fd View commit details
    Browse the repository at this point in the history
  2. [HUDI-3095] abstract partition filter logic to enable code reuse (apa…

    …che#4454)
    
    * [HUDI-3095] abstract partition filter logic to enable code reuse
    
    * [HUDI-3095] address reviews
    YuweiXiao committed Dec 31, 2021
    Copy the full SHA
    2444f40 View commit details
    Browse the repository at this point in the history
  3. [HUDI-3107]Fix HiveSyncTool drop partitions using JDBC or hivesql or …

    …hms (apache#4453)
    
    * constructDropPartitions when drop partitions using jdbc
    
    * done
    
    * done
    
    * code style
    
    * code review
    
    Co-authored-by: yuezhang <yuezhang@freewheel.tv>
    zhangyue19921010 and yuezhang committed Dec 31, 2021
    Copy the full SHA
    ef9923f View commit details
    Browse the repository at this point in the history

Commits on Jan 1, 2022

  1. Copy the full SHA
    bfa169d View commit details
    Browse the repository at this point in the history

Commits on Jan 2, 2022

  1. Copy the full SHA
    188d033 View commit details
    Browse the repository at this point in the history
  2. Copy the full SHA
    1622b52 View commit details
    Browse the repository at this point in the history
  3. Copy the full SHA
    fe9406d View commit details
    Browse the repository at this point in the history

Commits on Jan 3, 2022

  1. [HUDI-3138] Fix broken UT test for TestHiveSyncTool.testDropPartitions (

    apache#4493)
    
    Co-authored-by: yuezhang <yuezhang@freewheel.tv>
    zhangyue19921010 and yuezhang committed Jan 3, 2022
    Copy the full SHA
    1e2d2c4 View commit details
    Browse the repository at this point in the history
  2. [MINOR] Update README.md (apache#4492)

    Update Spark 3 build instructions
    xushiyan committed Jan 3, 2022
    Copy the full SHA
    0273f2e View commit details
    Browse the repository at this point in the history
  3. Copy the full SHA
    2b2ae34 View commit details
    Browse the repository at this point in the history

Commits on Jan 4, 2022

  1. Copy the full SHA
    29ab6fb View commit details
    Browse the repository at this point in the history
  2. Copy the full SHA
    7329d22 View commit details
    Browse the repository at this point in the history
  3. Copy the full SHA
    aaf5727 View commit details
    Browse the repository at this point in the history
  4. [HUDI-3141] Metadata merged log record reader - avoiding NullPointerE…

    …xception when records by keys (apache#4505)
    
    - HoodieMetadataMergedLogRecordReader#getRecordsByKeys() and its parent class methods
       are not thread safe. When multiple queries come in for gettting log records
       by keys, they all operate on the same log record reader instance provided by
       HoodieBackedTableMetadata#openReadersIfNeeded() and they trip over each other
       as they clear/put/get the same class memeber records.
    
     - The fix is to streamline the mutatation to class member records. Making
       HoodieMetadataMergedLogRecordReader#getRecordsByKeys() a synchronized method
    to avoid concurrent log records readers getting into NPE.
    manojpec committed Jan 4, 2022
    Copy the full SHA
    bf4e3d6 View commit details
    Browse the repository at this point in the history
  5. [HUDI-3147] Add endpoint_url to dynamodb lock provider (apache#4500)

    Co-authored-by: Nicolas Paris <nicolas.paris@adevinta.com>
    parisni and parisni committed Jan 4, 2022
    Copy the full SHA
    37b15ff View commit details
    Browse the repository at this point in the history

Commits on Jan 5, 2022

  1. [HUDI-2966] Closing LogRecordScanner in compactor (apache#4478)

    * Closing LogRecordScanner in compactor
    
    * Addressing comments
    nsivabalan committed Jan 5, 2022
    Copy the full SHA
    a66212d View commit details
    Browse the repository at this point in the history
  2. Copy the full SHA
    0e297c0 View commit details
    Browse the repository at this point in the history
  3. Copy the full SHA
    8307160 View commit details
    Browse the repository at this point in the history