Memory error while working with The Signal Media One-Million News Articles Dataset(2.7GB approx) #3

iabhi7 · 2017-03-21T08:02:36Z

I tried creating the vocabulary embeddings with the 'The Signal Media One-Million News Articles Dataset' (which is approximately 2.7GB in size) but it gave me an error on a g2.8xlarge instance. Not sure what I am doing wrong here.
The vocabulary-embedding.py runs as expected but while training the model it is giving a memory error.
I also tried distributing the model on the 4 GPUs that are available.

Any hack for this, or any code snippet or alternate dataset that could help me solve this problem.

jmsfcb · 2017-06-14T00:38:25Z

If it helps. I had issues too so I replaced block 6 in vocabulary-embedding.ipynb with the following

import json
fndata = 'data/signalmedia-1m.jsonl'
heads = []
desc = []
keywords = []
counter = 0
with open(fndata) as f:
        for line in f:
            if counter < 20000:
                jdata = json.loads(line)
                heads.append(jdata["title"].lower())
                desc.append(jdata["content"].lower())
                keywords.append(None)
                #counter +=1

Creating a seperate pickle file was just causing me grief so I read it directly from the source. You can uncomment the counter in the last line if you want to only grab the first 20,000 articles.

KevinDanikowski · 2018-05-01T01:23:36Z

@jmsfcb , you should add else: break so you don't have to wait for it to go through all 1 million articles

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory error while working with The Signal Media One-Million News Articles Dataset(2.7GB approx) #3

Memory error while working with The Signal Media One-Million News Articles Dataset(2.7GB approx) #3

iabhi7 commented Mar 21, 2017

jmsfcb commented Jun 14, 2017

KevinDanikowski commented May 1, 2018

Memory error while working with The Signal Media One-Million News Articles Dataset(2.7GB approx) #3

Memory error while working with The Signal Media One-Million News Articles Dataset(2.7GB approx) #3

Comments

iabhi7 commented Mar 21, 2017

jmsfcb commented Jun 14, 2017

KevinDanikowski commented May 1, 2018