Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numpy memory error #30

Closed
Deepakchawla opened this issue Aug 21, 2017 · 19 comments
Closed

Numpy memory error #30

Deepakchawla opened this issue Aug 21, 2017 · 19 comments

Comments

@Deepakchawla
Copy link

When I am running python scripts/retriever/interactive.py command then it shows me below error.
root@ubuntu-2gb-nyc3-01:~/DrQA# python scripts/retriever/interactive.py
08/21/2017 08:13:28 AM: [ Initializing ranker... ]
08/21/2017 08:13:28 AM: [ Loading /root/DrQA/data/wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ]
Traceback (most recent call last):
File "scripts/retriever/interactive.py", line 27, in
ranker = retriever.get_class('tfidf')(tfidf_path=args.model)
File "/root/DrQA/drqa/retriever/tfidf_doc_ranker.py", line 37, in init
matrix, metadata = utils.load_sparse_csr(tfidf_path)
File "/root/DrQA/drqa/retriever/utils.py", line 34, in load_sparse_csr
matrix = sp.csr_matrix((loader['data'], loader['indices'],
File "/root/anaconda3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 233, in getitem
pickle_kwargs=self.pickle_kwargs)
File "/root/anaconda3/lib/python3.6/site-packages/numpy/lib/format.py", line 664, in read_array
array = numpy.ndarray(count, dtype=dtype)
MemoryError

I am using it without GPU and below is my system information.
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 4
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
Stepping: 1
CPU MHz: 2199.998
BogoMIPS: 4399.99
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 30720K
NUMA node0 CPU(s): 0-3
Can some one help me to resolve that problem..??

Thank You

@ajfisch
Copy link
Contributor

ajfisch commented Aug 21, 2017

How much free RAM does your system have? Is it possible your download was interrupted and got corrupted?

@Deepakchawla
Copy link
Author

below is free command results:
total used free shared buff/cache available
Mem: 7484 92 7176 9 215 7158
Swap: 0 0 0

@Deepakchawla
Copy link
Author

I set the value of cat /proc/sys/vm/overcommit_memory to 1 using echo 1 > /proc/sys/vm/overcommit_memory and again run interactive.py file and it shows me below message...
deepakchawla35@deepak-server:~/DrQA$ python scripts/pipeline/interactive.py
08/21/2017 05:49:49 PM: [ Running on CPU only. ]
08/21/2017 05:49:49 PM: [ Initializing pipeline... ]
08/21/2017 05:49:49 PM: [ Initializing document ranker... ]
08/21/2017 05:49:49 PM: [ Loading /home/deepakchawla35/DrQA/data/wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ]
Killed

now what should I do...??

@ajfisch
Copy link
Contributor

ajfisch commented Aug 21, 2017

From your free, it looks like you do not have enough RAM on your machine. You need at least around 15 GB and it looks like you have 8 (if the units you posted are MB).

@Deepakchawla
Copy link
Author

Ok I will change it from 8gb to 15gb but when I changed its value from 0 to 1 then it doesn't show me any memory relates error and run smoothly but it shows some killed like message now what the reason behind that killed message..

@ajfisch
Copy link
Contributor

ajfisch commented Aug 22, 2017

Setting the value from 0 to 1 enabled overcommit, always. In overcommit mode the linux kernel always lets a memory allocation like malloc return true. But then when your program actually uses that memory, you will run out of space, and the kernel OOM Killer will kill the process (hence your Killed message).

On the other hand, If overcommit is not enabled, then the kernel will not let programs allocate more virtual memory than is physically available. malloc will return false and the actual program (in this case numpy) will exit with an error (MemoryError).

@Deepakchawla
Copy link
Author

Deepakchawla commented Aug 22, 2017

okay got your point but now I changed by RAM size and free -m before running Python file
total used free shared buff/cache available
Mem: 22099 148 21876 10 74 21708
Swap: 0 0 0
deepakchawla35@deepak-server:~/DrQA$ python scripts/pipeline/interactive.py
08/22/2017 03:17:25 AM: [ Running on CPU only. ]
08/22/2017 03:17:25 AM: [ Initializing pipeline... ]
08/22/2017 03:17:25 AM: [ Initializing document ranker... ]
08/22/2017 03:17:25 AM: [ Loading /home/deepakchawla35/DrQA/data/wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ]
08/22/2017 03:19:24 AM: [ Initializing document reader... ]
08/22/2017 03:19:24 AM: [ Loading model /home/deepakchawla35/DrQA/data/reader/multitask.mdl ]
08/22/2017 03:19:31 AM: [ Initializing tokenizers and document retrievers... ]
Traceback (most recent call last):
File "scripts/pipeline/interactive.py", line 70, in
tokenizer=args.tokenizer
File "/home/deepakchawla35/DrQA/drqa/pipeline/drqa.py", line 140, in init
initargs=(tok_class, tok_opts, db_class, db_opts, fixed_candidates)
File "/home/deepakchawla35/anaconda3/lib/python3.6/multiprocessing/context.py", line 119, in Pool
context=self.get_context())
File "/home/deepakchawla35/anaconda3/lib/python3.6/multiprocessing/pool.py", line 168, in init
self._repopulate_pool()
File "/home/deepakchawla35/anaconda3/lib/python3.6/multiprocessing/pool.py", line 233, in _repopulate_pool
w.start()
File "/home/deepakchawla35/anaconda3/lib/python3.6/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/home/deepakchawla35/anaconda3/lib/python3.6/multiprocessing/context.py", line 277, in _Popen
return Popen(process_obj)
File "/home/deepakchawla35/anaconda3/lib/python3.6/multiprocessing/popen_fork.py", line 20, in init
self._launch(process_obj)
File "/home/deepakchawla35/anaconda3/lib/python3.6/multiprocessing/popen_fork.py", line 67, in _launch
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

and running python file it shows something else RAM size...
free -m
total used free shared buff/cache available
Mem: 22099 148 13961 10 7989 21628
Swap: 0 0 0

@ajfisch
Copy link
Contributor

ajfisch commented Aug 22, 2017

Do you still have overcommit enabled? You might need that to run with the tokenizers, as it allocates (but doesn't use all) memory for the JVM for each tokenizer process.

You can also see if running with --tokenizer spacy works.
Edit: Try --tokenizer regexp first, as you'd need to pip install spacy && python -m spacy download en for the former

@Deepakchawla
Copy link
Author

no currently overcommit disabled
deepakchawla35@deepak-server:~/DrQA$ cat /proc/sys/vm/overcommit_memory
0

You can also see if running with --tokenizer spacy works. => don't get your point...

@ajfisch
Copy link
Contributor

ajfisch commented Aug 22, 2017

  1. Try running with overcommit enabled (echo 1 > /proc/sys/vm/overcommit_memory)
  2. If that still errors, try running python scripts/pipeline/interactive.py --tokenizer regexp, it uses a less resource intensive tokenizer (where your machine is failing).

@Deepakchawla
Copy link
Author

okay, let me try...

@Deepakchawla
Copy link
Author

now it working perfectly... thank you so much but it giving me the wrong prediction for some questions:-
**>>> process('when facebook company ipo launched')
08/22/2017 03:49:42 AM: [ Processing 1 queries... ]
08/22/2017 03:49:42 AM: [ Retrieving top 5 docs... ]
08/22/2017 03:49:43 AM: [ Reading 323 paragraphs... ]
08/22/2017 03:49:51 AM: [ Processed 1 queries in 8.7226 (s) ]
Top Predictions:
+------+--------+-------------------------------------+--------------+-----------+
| Rank | Answer | Doc | Answer Score | Doc Score |
+------+--------+-------------------------------------+--------------+-----------+
| 1 | 2009 | Initial public offering of Facebook | 49060 | 248.07 |
+------+--------+-------------------------------------+--------------+-----------+

Contexts:
[ Doc = Initial public offering of Facebook ]
To ensure that early investors would retain control of the company, Facebook in 2009 instituted a dual-class stock structure. After the IPO, Zuckerberg was to retain a 22% ownership share in Facebook and was to own 57% of the voting shares. The document also stated that the company was seeking to raise 5 billion, which would make it one of the largest IPOs in tech history and the biggest in Internet history.**

**>>> process('when facebook company IPO launched')
08/22/2017 03:51:07 AM: [ Processing 1 queries... ]
08/22/2017 03:51:07 AM: [ Retrieving top 5 docs... ]
08/22/2017 03:51:07 AM: [ Reading 323 paragraphs... ]
08/22/2017 03:51:14 AM: [ Processed 1 queries in 6.7024 (s) ]
Top Predictions:
+------+--------+-------------------------------------+--------------+-----------+
| Rank | Answer | Doc | Answer Score | Doc Score |
+------+--------+-------------------------------------+--------------+-----------+
| 1 | 2012 | Initial public offering of Facebook | 4.8931e+05 | 248.07 |
+------+--------+-------------------------------------+--------------+-----------+

Contexts:
[ Doc = Initial public offering of Facebook ]
The social networking company Facebook held its initial public offering (IPO) on Friday, May 18, 2012. The IPO was the biggest in technology and one of the biggest in Internet history, with a peak market capitalization of over $104 billion. Media pundits called it a "cultural touchstone."**

**>>> process('who is father of deep learning')
08/22/2017 03:52:47 AM: [ Processing 1 queries... ]
08/22/2017 03:52:47 AM: [ Retrieving top 5 docs... ]
08/22/2017 03:52:48 AM: [ Reading 479 paragraphs... ]
08/22/2017 03:52:55 AM: [ Processed 1 queries in 7.3674 (s) ]
Top Predictions:
+------+---------------------+---------------+--------------+-----------+
| Rank | Answer | Doc | Answer Score | Doc Score |
+------+---------------------+---------------+--------------+-----------+
| 1 | Juergen Schmidhuber | Deep learning | 3.7192e+08 | 453.99 |
+------+---------------------+---------------+--------------+-----------+

Contexts:
[ Doc = Deep learning ]
Deep learning algorithms transform their inputs through more layers than shallow learning algorithms. At each layer, the signal is transformed by a processing unit, like an artificial neuron, whose parameters are 'learned' through training. A chain of transformations from input to output is a "credit assignment path" (CAP). CAPs describe potentially causal connections between input and output and may vary in length – for a feedforward neural network, the depth of the CAPs (thus of the network) is the number of hidden layers plus one (as the output layer is also parameterized), but for recurrent neural networks, in which a signal may propagate through a layer more than once, the CAP is potentially unlimited in length. There is no universally agreed upon threshold of depth dividing shallow learning from deep learning, but most researchers in the field agree that deep learning has multiple nonlinear layers (CAP > 2) and Juergen Schmidhuber considers CAP > 10 to be very deep learning.**

@ajfisch
Copy link
Contributor

ajfisch commented Aug 22, 2017

I am glad that it is working.

DrQA is just an AI research project -- of course there is no guarantee that it will answer all questions correctly (or in the case of this model be invariant to spelling, capitalization, or phrasing). In fact from our reported evaluations on several QA datasets, you can expect that DrQA will get most questions wrong (but also a fair amount correct). Hopefully this model can be a baseline for machine reading at scale that someone like you can beat 😉.

Then again, the answers to some of these questions are subjective. Perhaps Juergen wouldn't mind the answer to your question 3...

@Deepakchawla
Copy link
Author

okay and are you improving or working on its QA datasets to give more accurate answers.... and one more thing currently it taking so much time on giving the answers I want to do it in max. 3 sec.. what should I have to do to achieve this...??

@ajfisch
Copy link
Contributor

ajfisch commented Aug 22, 2017

Reading comprehension and open-domain QA is an active area of research, for FAIR and others.

To improve the runtime performance of DrQA you will need a machine with better specs. It also scales better with large batches (faster average time per question).

  • Ideally you will have a machine with a GPU and CUDNN. The higher quality the GPU, the better.
  • Having more CPU cores (especially if you are lacking a GPU) is also very helpful. The prediction pipeline runs on both CPU and GPU. >15 cores is good, more if not using a GPU.
  • Running in large batch sizes (say up to 1000 questions) is quite more efficient than single question. You can see how batching is done in scripts/pipeline/predict.py for example.
  • As an immediate measure, you can reduce the number of documents DrQA reads per question (the n_docs parameter in process, default is 5). This will hurt your accuracy, however.

@Deepakchawla
Copy link
Author

Okay so I will try with GPU and try to reduce its execution time... and thanks a lot once again... you help a lot and also contribute to accomplishment my passionate project... 😄

@ajfisch
Copy link
Contributor

ajfisch commented Aug 22, 2017

You are very welcome!

@Deepakchawla
Copy link
Author

Deepakchawla commented Aug 22, 2017

😊

@Deepakchawla Deepakchawla changed the title Numpu memory error Numpy memory error Aug 22, 2017
@ajfisch ajfisch closed this as completed Aug 28, 2017
@augmen
Copy link

augmen commented Apr 17, 2018

Hi i am having the same issue with 8GB RAM and 4CPU cores. Can you help us .
(pt) root@ml:~/DrQA# python3 scripts/pipeline/interactive.py --tokenizer regexp Traceback (most recent call last): File "scripts/pipeline/interactive.py", line 16, in <module> from drqa import pipeline ImportError: No module named 'drqa'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants