Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream output rather than writing at end of process. #3

Open
steveworley opened this issue Jun 20, 2019 · 0 comments
Open

Stream output rather than writing at end of process. #3

steveworley opened this issue Jun 20, 2019 · 0 comments
Labels
optimisation Performance, memory or some other optimisation for the framework.

Comments

@steveworley
Copy link
Contributor

Currently the file write happens at the end of the process. This means that the entire JSON representation needs to be held in memory until the end of the process. This limits the number of URLs that can be processed in a single run.

Proposed solution

  • Stream the output to the output file after each request
  • Manage a SHA of each row processed so we don't add duplicates (this can be stored in memory)
@steveworley steveworley added the optimisation Performance, memory or some other optimisation for the framework. label Jun 20, 2019
@stooit stooit added this to the 0.4.0 milestone Aug 6, 2019
steveworley pushed a commit that referenced this issue Feb 4, 2020
# This is the 1st commit message:

Rename the merlin namespace.

# This is the commit message #2:

Merlin framework composer project

# This is the commit message #3:

Code consistency.
@stooit stooit removed this from the 0.4.0 milestone Apr 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimisation Performance, memory or some other optimisation for the framework.
Projects
None yet
Development

No branches or pull requests

2 participants