Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CSV data export directly from Postgres, and add integration test #187

Merged
merged 9 commits into from
Mar 20, 2024

Conversation

smarr
Copy link
Owner

@smarr smarr commented Mar 20, 2024

This PR adds support to directly export data from Postgres, which avoids Node.js memory limitations. The data is exported as standard CSV file, and compressed with gzip.
Since the data is exported from Postgres, it doesn't need to run through Node.js and JSON conversion first, which enables very large data sets.

To make this work in practice, extra configuration may be necessary when for instance Postgres runs in a container. Generally, it's assume that all data files end up in the same folder.

For this purpose, there are two new environment variables:

  • RDB_DATA_EXPORT_PATH the path as seen from Postgres
  • NODE_DATA_EXPORT_PATH the path as seen from Node.js, all files are assume to be accessible by Node.js or the web server to serve them to the client

This is now tested on github with a basic end-to-end test.

There are a few other minor commits here:

  • improve log output
  • avoid test race conditions by using separate databases
  • always resolve the path to be absolute when getting a "robust" one

Signed-off-by: Stefan Marr <git@stefan-marr.de>
Signed-off-by: Stefan Marr <git@stefan-marr.de>
Signed-off-by: Stefan Marr <git@stefan-marr.de>
Signed-off-by: Stefan Marr <git@stefan-marr.de>
Signed-off-by: Stefan Marr <git@stefan-marr.de>
The CSV is directly generated by Postgres, which avoids any memory issues in Node.js with very large data sets.

Signed-off-by: Stefan Marr <git@stefan-marr.de>
Signed-off-by: Stefan Marr <git@stefan-marr.de>
RDB_DATA_EXPORT_PATH is used by Postgres to store CSV files and should be accessible to Node.js via NODE_DATA_EXPORT_PATH.

NODE_DATA_EXPORT_PATH is used to check for whether data files need to be generated and Node stores json files directly into it.

Signed-off-by: Stefan Marr <git@stefan-marr.de>
…ards

Signed-off-by: Stefan Marr <git@stefan-marr.de>
@smarr smarr added the enhancement New feature or request label Mar 20, 2024
@smarr smarr merged commit 0d0176d into master Mar 20, 2024
2 checks passed
@smarr smarr deleted the integration-test branch March 20, 2024 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant