Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run the pipeline synchronously? #49

Closed
OmarJay1 opened this issue May 23, 2020 · 5 comments
Closed

Run the pipeline synchronously? #49

OmarJay1 opened this issue May 23, 2020 · 5 comments

Comments

@OmarJay1
Copy link

Hi, I've been stepping through the data acquisition code, and it's hard to do it when there are multiple processes running. Is there a way to make the processes run sequentially instead of asynchronously? I can comment stuff out and add break points, but if there's a command line switch or something that would be cleaner.

It looks like the --only argument might do something like that, but I haven't been able to get it to work.

Thanks.

@owahltinez
Copy link
Contributor

owahltinez commented May 23, 2020

Some pipelines simply can't do processing synchronously (e.g. weather would take over a day to process) but you can now run the pipelines within a chain one at a time by passing the --process-count 1 option to the run.py script. Again, keep in mind that each pipeline might still be running multiple threads / processes.

--only and --exclude are used to run a single pipeline within the chain. If you run a single pipeline, then multiprocessing is disabled automatically. To use it, pass the name of the output tables separated by commas: run.py --only epidemiology,demographics or run.py --exclude weather

@OmarJay1
Copy link
Author

OmarJay1 commented May 24, 2020

Thanks. I'm assuming that run.py is now called update.py?

Also, I noticed that a previous version I'm using captures more epidemiology data than the most recent. I can say more once I understand more how the code works.

Thank you.

@owahltinez
Copy link
Contributor

Thanks. I'm assuming that run.py is now called update.py?

Yes, sorry for the change -- the project is currently under active development but I don't expect that update.py will change again soon.

Also, I noticed that a previous version I'm using captures more epidemiology data than the most recent.

Can you please share a few data points (key + date) examples?

@owahltinez
Copy link
Contributor

@OmarJay1 did you get a chance to collect a few examples of datapoints which are missing from the previous version?

@OmarJay1
Copy link
Author

I didn't get a chance to look closely, but the overall size was back to what it was before. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants