New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add frequency metric to determine some average per-second metrics #760
Conversation
@erip thanks for the PR! I see the idea, maybe we can iterate over the implementation... And to make our CI happy I can add some tests :) |
@vfdev-5 absolutely! I mostly wanted to put some pen to paper quickly - happy to add some tests and see where that leads the implementation. |
Thanks for the update. Just saw the docs on usage. How about doing as here for GPU Info ? In this it is configured without using |
@vfdev-5 I have fixed some simple flake issues and added a docstring with an envisioned usage. I suspect that there's some improvements to be made. Since I'd like to compute average throughput, I think it might be good to include a class that inherits from |
Ah, you beat me to the comment. :-) |
@vfdev-5 Ok, hopefully the distributed is-init'd checks are consistent with the rest of ignite. |
@erip maybe we can add a single CPU distrib test to ensure the correct behavior. For example, like here:
If any questions, please do not hesitate to ask about how to do. This can run on CPU with the following command: Line 79 in 918746b
|
Brilliant. That's the next thing I wanted to add. 😄 |
Yay, it looks like it works. 😄 |
Okay, let's then wait until CI accomplishes its job and go on with merging. |
Awesome! For your awareness, I'm hoping to begin tackling facebookresearch/fairseq#1648 and this is the first step in that journey. You may see more of me as I run into features that ignite doesn't currently support that fairseq needs for parity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @erip
That would be great! So, yes, feel free to send other PRs and we can work out them too as well, |
Just another point, I would like to discuss before merging and seeing the context of where it could be potentially used. Maybe we can put this metric directly into core part as it does not require any additional packages ? cc @justusschock thoughts ? |
I'm happy to do that - I thought |
@vfdev-5 looks like I've run into some flakiness in the tests which is likely just a result of misunderstanding Is this a correct understanding? If so, is |
@erip well, I think we did not implemented that correctly. This is my fault, when I suggested to |
@erip to help you with your random search for a good testing, you can execute a script with class Frequency:
# ...
def _test_frequency_with_engine(device, workers):
artificial_time = 2 # seconds
batch_size = 4
n_tokens = 10000
total_tokens = n_tokens * batch_size
time_per_epoch = total_tokens / artificial_time
average_upper_bound = time_per_epoch * workers
average_lower_bound = average_upper_bound * 0.9
def update_fn(engine, batch):
time.sleep(artificial_time)
return {"ntokens": len(batch)}
engine = Engine(update_fn)
wps_metric = Frequency(output_transform=lambda x: x["ntokens"], device=device)
wps_metric.attach(engine, 'wps')
data = [list(range(n_tokens))] * batch_size
wps = engine.run(data, max_epochs=1).metrics['wps']
print("{} | {} | wps: {} | {}".format(dist.get_rank(), average_lower_bound, wps, average_upper_bound))
def test_frequency_with_engine_nondistributed():
device = "cpu"
_test_frequency_with_engine(device, workers=1)
if __name__ == "__main__":
dist.init_process_group("gloo", init_method="env://")
device = "cpu"
_test_frequency_with_engine(device, workers=dist.get_world_size()) like that
|
Thanks! I tried the local testing prescribed according to the Travis script, but had run into a weird issue. 😓 |
Actually, another thing I forgot to mention about distributed. Idea is to perform DDP. So we split the data of tokens by process. So, if world size is 4, each process sees 1/4 of total data. And I think this is not coded in the tests... |
There's the missing factor. 😅 |
Another thing on notations I do not get right is In distributed code, generally, they scale the batch size by number of processes (world_size) to have the same batch size regarding the configuration. |
Interestingly I find that when the world size is 2, there seems to be two passes over the data? I added
My algebra is failing me for some reason today... |
Ok, I think I really found it this time... whew |
I believe my own bastardization of "batch_size" was causing a lot of confusion. |
Yes, there are two processes who run the training. That's why we need DDP to make it like that:
So we have over all So when you print, you see the std out of all processes. Normally, we do an if on that
|
Ok, I think this is better now. Thanks for your patience and help! |
@erip I'm playing with tests and the code and probably it is not the end :) def _test_frequency_with_engine(device, workers):
artificial_time = 0.1 # seconds
total_tokens = 2000
batch_size = 128 // workers
def update_fn(engine, batch):
time.sleep(artificial_time)
return {"ntokens": len(batch)}
engine = Engine(update_fn)
wps_metric = Frequency(output_transform=lambda x: x["ntokens"], device=device)
wps_metric.attach(engine, 'wps')
@engine.on(Events.ITERATION_COMPLETED)
def assert_wps(e):
wps = e.state.metrics['wps']
if dist.get_rank() == 0:
print("{}: wps={}".format(e.state.iteration, wps))
data = [[i] * batch_size for i in range(0, total_tokens, batch_size)]
engine.run(data, max_epochs=1)
if __name__ == "__main__":
dist.init_process_group("gloo", init_method="env://")
device = "cpu"
_test_frequency_with_engine(device, workers=dist.get_world_size()) if executed as
the output is
It is OK, as it is about 128 samples per 0.1 seconds. If executed as
we have
Something still to fix with distrib config. PS: I'm curious about what they do in fairseq for this ? |
Fairseq uses what they call a |
@erip I found the problem, let me commit directly the fix and updated test |
@erip I made the changes vs your code:
In this code it will be also sensitive to IO, as timer measures the time between iterations: read data -> batch prep -> update model. ignite/ignite/engine/engine.py Line 408 in 918746b
|
I suspect there's no way around it. :-) |
Well, we need to setup timer as here: ignite/ignite/handlers/timing.py Line 90 in 918746b
on the correct events... |
@erip if you ok with this implementation we can merge it and if needed update the code to exclude data processing. |
I'm OK with the implementation as-is for now. I think there may be complications surrounding wiring the |
Thanks for pointing out that. Actually, So, in this way, |
I will defer to you about whether to merge now or to wait for a more complete solution. For fairseq this is good enough. 👍 |
Fixes # N/A
Description:
This code is to compute X per-second performance metrics (like words per second, images per second, etc). Likely this will be used in conjunction with
ignite.metrics.RunningAverage
for most utility.Check list: