Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Done is never True in data iter for seq len 1 #217

Closed
abiro opened this issue Aug 14, 2019 · 5 comments · Fixed by #233

Comments

@abiro
Copy link
Contributor

commented Aug 14, 2019

Hi,

I've found a bug in version 0.2.3 where the done flag is never set to True when requesting sequences of length 1. Code to reproduce:

import minerl

data = minerl.data.make('MineRLObtainDiamond-v0', data_dir='./data')

# Iterate through a single epoch gathering sequences of at most 32 steps
# max_sequence_len has to be 0 to return sequences of length 1
diter = data.sarsd_iter(num_epochs=1, max_sequence_len=0,
                        include_metadata=True, seed=1)
i = 1
stream_name = None
for current_state, action, reward, next_state, done, meta in diter:
    if stream_name is None:
        stream_name = meta['stream_name']
        print(meta)
    if done[-1] or stream_name != meta['stream_name']:
        print(i, done)
        break
    i += 1

Prints:

{'success': True, 'duration_ms': 95250, 'duration_steps': 1905, 'total_reward': 19.0, 'stream_name': 'v1_cute_breadfruit_spirit-1_35476-37415'}
1906 [False]

The problem is the code only stops where the stream name changes on the 1906th step, but it should stop on the 1905th step, since the first demonstration has that many steps according to its meta data.

I've addressed the max_sequence_len problem in PR#216.

Thanks,
Agost

@abiro

This comment has been minimized.

Copy link
Contributor Author

commented Aug 14, 2019

The cause of the problem seems to be that the done arrays are always 1 longer than the other ones.

@abiro

This comment has been minimized.

Copy link
Contributor Author

commented Aug 14, 2019

I've found an other issue which is probably related: when setting max_sequence_len=32 (the default value), only 1848 observations are returned instead of 1905 for the first demonstration. Code to reproduce:

import minerl

data = minerl.data.make('MineRLObtainDiamond-v0', data_dir='./data')

# Iterate through a single epoch gathering sequences of at most 32 steps
diter = data.sarsd_iter(num_epochs=1, max_sequence_len=32,
                        include_metadata=True, seed=1)
i = 0
stream_name = None
for current_state, action, reward, next_state, done, meta in diter:
    i += len(current_state['pov'])
    if stream_name is None:
        stream_name = meta['stream_name']
        print(meta)
    if done[-1]:
        print(i)
        break

Prints:

{'success': True, 'duration_ms': 95250, 'duration_steps': 1905, 'total_reward': 19.0, 'stream_name': 'v1_cute_breadfruit_spirit-1_35476-37415'}
1848
@MadcowD

This comment has been minimized.

Copy link
Collaborator

commented Aug 20, 2019

Hi, taking a look at this now! Thanks for all of the debug information :) I'll get back to you with a detailed explaination soon!

@MadcowD MadcowD self-assigned this Aug 20, 2019

@brandonhoughton

This comment has been minimized.

Copy link
Collaborator

commented Aug 26, 2019

It should be noted the claimed number of frames in the video observations is longer than the actual number of valid frames. Extra frames before are needed to prevent re-encoding key frames in the compressed videos

@MadcowD

This comment has been minimized.

Copy link
Collaborator

commented Sep 6, 2019

@abiro It looks like your first sample doesn't necessarily iterate to the end of the first stream when I try and reproduce (maybe my number of workers is higher). I will post a snippet which reproduces the error.

brandonhoughton added a commit that referenced this issue Sep 11, 2019
Fixing done trajectory lengths bug. (#233)
* Fixing the done lenth bug. Closes #222. Closes #217.

* Making a cursory fix to the data pipeline.

* Adding iteration for single file test.

* Finished the data ppeline.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.