Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
*: parse the data source directly into data and skip the KV encoder #145
What problem does this PR solve?
In our testing with a 4.1 TB workload, we found that parsing SQL takes almost half of the time to encode a row. Since we have already used a parser to extract each row, parsing it again is wasting computing resource. Additionally, for CSV we need to perform the complex and unnecessary Parse CSV → Reconstruct SQL → Parse SQL.
What is changed and how it works?
We change the Lightning parsers to directly produce an array of
This slashes half of the encoding time according to experiment.
2 times, most recently
Mar 15, 2019
@lonng Some metrics are temporarily removed, we need to see if we want to tweak the metrics or the process. The old process:
The following metrics were involved in this change and may need to be repurposed?