Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy CSV and Storage TODOs for Phase 1 #504

Closed
7 tasks done
semihsalihoglu-uw opened this issue Mar 8, 2022 · 0 comments
Closed
7 tasks done

Copy CSV and Storage TODOs for Phase 1 #504

semihsalihoglu-uw opened this issue Mar 8, 2022 · 0 comments
Assignees

Comments

@semihsalihoglu-uw
Copy link
Contributor

semihsalihoglu-uw commented Mar 8, 2022

Functionality

  • Support for list data type
  • Replace robin_hood_node_id map to use our hash_index. Two main things need to be done:
    1. Hash_Index needs to support parallel updates.
    2. We need to replace robin_hood_node_id map. We should ensure that after the index is constructed, we should save it to disk and then in later parts of the loader access it through the BufferManager.
  • Allow copying a node/rel csv file with or without header.

Usability

  • Progress Reporting Mechanism: This piece of code works nicely: https://stackoverflow.com/questions/14539867/how-to-display-a-progress-indicator-in-pure-c-c-cout-printf. We need to wrap it around a updateProgress function, do some computation to compute progress percentages based on the number of files and lines in each file and then call updateProgress in different parts of the code. [Loader progress bar #507 ]
  • When large strings are being inserted, give a warning instead of failing. => Just take the prefix 4096 char for now.
  • If for a relationship, there is single nodeLabel and a single destination nodeLabel, then do not require relationship files to contain a START_ID_LABEL and END_ID_LABEL fields.
  • Verify each CSV header as the first step and if there is something wrong, then error early with a proper error message. => Be graceful with capitalization but let's not accept arbitrary column names that don't match the schema.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants