Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

This model cannot handle extremely large dataset #43

Open
RayXu14 opened this issue Aug 6, 2018 · 2 comments
Open

This model cannot handle extremely large dataset #43

RayXu14 opened this issue Aug 6, 2018 · 2 comments

Comments

@RayXu14
Copy link

RayXu14 commented Aug 6, 2018

Just to point out that use
tf.convert_to_tensor -> tf.train.slice_input_producer -> tf.train.shuffle_batch
will meet an error

ValueError: Cannot create a tensor proto whose content is larger than 2GB.

if dataset is too large

@RayXu14 RayXu14 changed the title This model cannot handle large dataset This model cannot handle extremely large dataset Aug 6, 2018
@4pal
Copy link

4pal commented Sep 6, 2018

it means that you have very long sentences in your datasets which consumer a lot of memory during baching. you need to summarize your dataset line by line using an intelligent summarize like textrank which is supervised --doesn't require training. Then try this summarized dataset ..it can even be 10 GB and you won't have any problem.

@xlniu
Copy link

xlniu commented Feb 20, 2019

you can use feed_dict or tf.data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants