Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connectionError while parse_corpus #454

Closed
yejunbin opened this issue Oct 3, 2016 · 34 comments
Closed

connectionError while parse_corpus #454

yejunbin opened this issue Oct 3, 2016 · 34 comments

Comments

@yejunbin
Copy link

yejunbin commented Oct 3, 2016

from snorkel.parser import CorpusParser
cp = CorpusParser(doc_parser, sent_parser)
%time corpus = cp.parse_corpus(session, 'News Training')
---------------------------------------------------------------------------
ConnectionError                           Traceback (most recent call last)
<ipython-input-5-277d2c9f9bed> in <module>()
      2 
      3 cp = CorpusParser(doc_parser, sent_parser)
----> 4 get_ipython().magic(u"time corpus = cp.parse_corpus(session, 'News Training')")

/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.pyc in magic(self, arg_s)
   2156         magic_name, _, magic_arg_s = arg_s.partition(' ')
   2157         magic_name = magic_name.lstrip(prefilter.ESC_MAGIC)
-> 2158         return self.run_line_magic(magic_name, magic_arg_s)
   2159 
   2160     #-------------------------------------------------------------------------

/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.pyc in run_line_magic(self, magic_name, line)
   2077                 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
   2078             with self.builtin_trap:
-> 2079                 result = fn(*args,**kwargs)
   2080             return result
   2081 

<decorator-gen-59> in time(self, line, cell, local_ns)

/usr/local/lib/python2.7/dist-packages/IPython/core/magic.pyc in <lambda>(f, *a, **k)
    186     # but it's overkill for just that one bit of state.
    187     def magic_deco(arg):
--> 188         call = lambda f, *a, **k: f(*a, **k)
    189 
    190         if callable(arg):

/usr/local/lib/python2.7/dist-packages/IPython/core/magics/execution.pyc in time(self, line, cell, local_ns)
   1178         else:
   1179             st = clock2()
-> 1180             exec(code, glob, local_ns)
   1181             end = clock2()
   1182             out = None

<timed exec> in <module>()

/home/yejunbin/Github/snorkel/snorkel/parser.pyc in parse_corpus(self, session, name)
     38                     break
     39             corpus.append(doc)
---> 40             for _ in self.sent_parser.parse(doc, text):
     41                 pass
     42         if self.max_docs is not None:

/home/yejunbin/Github/snorkel/snorkel/parser.pyc in parse(self, doc, text)
    274     def parse(self, doc, text):
    275         """Parse a raw document as a string into a list of sentences"""
--> 276         for parts in self.corenlp_handler.parse(doc, text):
    277             yield Sentence(**parts)

/home/yejunbin/Github/snorkel/snorkel/parser.pyc in parse(self, document, text)
    211         if isinstance(text, unicode):
    212             text = text.encode('utf-8', 'error')
--> 213         resp = self.requests_session.post(self.endpoint, data=text, allow_redirects=True)
    214         text = text.decode('utf-8')
    215         content = resp.content.strip()

/usr/local/lib/python2.7/dist-packages/requests/sessions.pyc in post(self, url, data, json, **kwargs)
    520         """
    521 
--> 522         return self.request('POST', url, data=data, json=json, **kwargs)
    523 
    524     def put(self, url, data=None, **kwargs):

/usr/local/lib/python2.7/dist-packages/requests/sessions.pyc in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    473         }
    474         send_kwargs.update(settings)
--> 475         resp = self.send(prep, **send_kwargs)
    476 
    477         return resp

/usr/local/lib/python2.7/dist-packages/requests/sessions.pyc in send(self, request, **kwargs)
    594 
    595         # Send the request
--> 596         r = adapter.send(request, **kwargs)
    597 
    598         # Total elapsed time of the request (approximately)

/usr/local/lib/python2.7/dist-packages/requests/adapters.pyc in send(self, request, stream, timeout, verify, cert, proxies)
    485                 raise ProxyError(e, request=request)
    486 
--> 487             raise ConnectionError(e, request=request)
    488 
    489         except ClosedPoolError as e:

ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=12345): Max retries exceeded with url: /?properties=%7B%22annotators%22:%20%22tokenize,ssplit,pos,lemma,depparse,ner%22,%20%22outputFormat%22:%20%22json%22%7D (Caused by ProtocolError('Connection aborted.', BadStatusLine("''",)))
@ajratner
Copy link
Contributor

ajratner commented Oct 4, 2016

Did you use the run.sh script / download Stanford CoreNLP?

@yejunbin
Copy link
Author

yejunbin commented Oct 4, 2016

Yes, the Standford CoreNLP has been download. Here is files n the parser folder:
patterns
sutime
tokensregex
build.xml
CoreNLP-to-HTML.xsl
corenlp.sh
ejml-0.23-src.zip
ejml-0.23.jar
input.txt
input.txt.out
input.txt.xml
javax.json-api-1.0-sources.jar
javax.json.jar
joda-time-2.9-sources.jar
joda-time.jar
jollyday-0.4.7-sources.jar
jollyday.jar
LIBRARY-LICENSES
LICENSE.txt
Makefile
pom.xml
protobuf.jar
README.txt
SemgrexDemo.java
ShiftReduceDemo.java
slf4j-api.jar
slf4j-simple.jar
stanford-corenlp-3.6.0-javadoc.jar
stanford-corenlp-3.6.0-models.jar
stanford-corenlp-3.6.0-sources.jar
stanford-corenlp-3.6.0.jar
StanfordCoreNlpDemo.java
StanfordDependenciesManual.pdf
xom-1.2.10-src.jar
xom.jar

@ajratner
Copy link
Contributor

ajratner commented Oct 4, 2016

Was the parser running at all before this error? I.e. I see Caused by ProtocolError('Connection aborted.', BadStatusLine("''",)) so I'm wondering if there was an error during parsing, or just never got set up correctly?

@yejunbin
Copy link
Author

yejunbin commented Oct 4, 2016

How to make sure the parser is running?

@ajratner
Copy link
Contributor

ajratner commented Oct 4, 2016

You should see it printing out in the terminal where you ran the notebook?
On Tue, Oct 4, 2016 at 12:15 AM yejunbin notifications@github.com wrote:

How to make sure the parser is running?


You are receiving this because you commented.

Reply to this email directly, view it on GitHub
#454 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABgw_WGo6xAqSOx1YqcMwGJWnyScGMyxks5qwfz0gaJpZM4KMXqF
.

@yejunbin
Copy link
Author

yejunbin commented Oct 4, 2016

Hi ajratner, the problem is solved after i "chmod" files in parser. Thanks.

@yejunbin
Copy link
Author

yejunbin commented Oct 4, 2016

Hi ajratner, there is memory error when running parser. Here is the error information:

Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x000000073655b000, 369131520, 0) failed; error='Cannot allocate memory' (errno=12)

The question is how many memory the parser is need to run the tutorials? Thanks.

@zzdang
Copy link

zzdang commented Nov 27, 2016

hi,yejunbin,there is the same error "ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=12345)..."
You say "the problem is solved after i "chmod" files in parser",what files you chmod?
thanks.

@ajratner
Copy link
Contributor

Closed in v0.5.0

@pinkal08cece
Copy link

hi, ajratner, I am facing same error "ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=12345)..." in v0.5.0
thanks.

@ajratner
Copy link
Contributor

Hi @pinkal08cece any further details? Did you check the things noted above? Sorry for the delayed response!

@ajratner ajratner reopened this Feb 22, 2017
@pinkal08cece
Copy link

Thank you. Error is resolved.

@hlchen123
Copy link

@pinkal08cece I meet the same problem,how did you solve it ? thank you !

@ajratner ajratner reopened this Apr 21, 2017
@ajratner
Copy link
Contributor

Hi @pinkal08cece, @yejunbin,

If you get a chance and could post what helped you resolve your issues in sufficient detail to reproduce, we'd be greatly appreciative! Then if relevant I will also add to the README

Thanks!
Alex

@hlchen123
Copy link

hlchen123 commented Apr 24, 2017

@ajratner Thank you!It is ok now!At first I did as yejunbin said to "chmod" all files in parser,but error still existed,when I changed to another computer,It was solved.

@furmanv
Copy link

furmanv commented May 8, 2017

Hi folks ! Just decided to tell that I am also stuck with the parser (seems I cannot load it due to some connection error). I have already tried chmod on the parser folder by :

import subprocess
...
subprocess.call(['chmod', '-R', '+w', loc])

However I receive the error below after about some 20 minutes of running time.

P.S. trying to run the tutorials in Windows, Jupyter Notebook environment.

ConnectionErrorTraceback (most recent call last)
in ()
2
3 corpus_parser = CorpusParser()
----> 4 get_ipython().magic(u'time corpus_parser.apply(doc_preprocessor)')

C:\Users\tigran\Anaconda3\envs\py2env\lib\site-packages\IPython\core\interactiveshell.pyc in magic(self, arg_s)
2156 magic_name, _, magic_arg_s = arg_s.partition(' ')
2157 magic_name = magic_name.lstrip(prefilter.ESC_MAGIC)
-> 2158 return self.run_line_magic(magic_name, magic_arg_s)
2159
2160 #-------------------------------------------------------------------------

C:\Users\tigran\Anaconda3\envs\py2env\lib\site-packages\IPython\core\interactiveshell.pyc in run_line_magic(self, magic_name, line)
2077 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
2078 with self.builtin_trap:
-> 2079 result = fn(*args,**kwargs)
2080 return result
2081

in time(self, line, cell, local_ns)

C:\Users\tigran\Anaconda3\envs\py2env\lib\site-packages\IPython\core\magic.pyc in (f, *a, **k)
186 # but it's overkill for just that one bit of state.
187 def magic_deco(arg):
--> 188 call = lambda f, *a, **k: f(*a, **k)
189
190 if callable(arg):

C:\Users\tigran\Anaconda3\envs\py2env\lib\site-packages\IPython\core\magics\execution.pyc in time(self, line, cell, local_ns)
1174 if mode=='eval':
1175 st = clock2()
-> 1176 out = eval(code, glob, local_ns)
1177 end = clock2()
1178 else:

in ()

C:\Users\tigran\Desktop\snorkel-master\snorkel\udf.pyc in apply(self, xs, clear, parallelism, progress_bar, count, **kwargs)
38 print "Running UDF..."
39 if parallelism is None or parallelism < 2:
---> 40 self.apply_st(xs, progress_bar, clear=clear, count=count, **kwargs)
41 else:
42 self.apply_mt(xs, parallelism, clear=clear, **kwargs)

C:\Users\tigran\Desktop\snorkel-master\snorkel\udf.pyc in apply_st(self, xs, progress_bar, count, **kwargs)
61
62 # Apply UDF and add results to the session
---> 63 for y in udf.apply(x, **kwargs):
64
65 # Uf UDF has a reduce step, this will take care of the insert; else add to session

C:\Users\tigran\Desktop\snorkel-master\snorkel\parser.py in apply(self, x, **kwargs)
47 """Given a Document object and its raw text, parse into processed Sentences"""
48 doc, text = x
---> 49 for parts in self.corenlp_handler.parse(doc, text):
50 parts = self.fn(parts) if self.fn is not None else parts
51 yield Sentence(**parts)

C:\Users\tigran\Desktop\snorkel-master\snorkel\parser.py in parse(self, document, text)
310 if isinstance(text, unicode):
311 text = text.encode('utf-8', 'error')
--> 312 resp = self.requests_session.post(self.endpoint, data=text, allow_redirects=True)
313 text = text.decode('utf-8')
314 content = resp.content.strip()

C:\Users\tigran\Anaconda3\envs\py2env\lib\site-packages\requests\sessions.pyc in post(self, url, data, json, **kwargs)
533 """
534
--> 535 return self.request('POST', url, data=data, json=json, **kwargs)
536
537 def put(self, url, data=None, **kwargs):

C:\Users\tigran\Anaconda3\envs\py2env\lib\site-packages\requests\sessions.pyc in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
486 }
487 send_kwargs.update(settings)
--> 488 resp = self.send(prep, **send_kwargs)
489
490 return resp

C:\Users\tigran\Anaconda3\envs\py2env\lib\site-packages\requests\sessions.pyc in send(self, request, **kwargs)
607
608 # Send the request
--> 609 r = adapter.send(request, **kwargs)
610
611 # Total elapsed time of the request (approximately)

C:\Users\tigran\Anaconda3\envs\py2env\lib\site-packages\requests\adapters.pyc in send(self, request, stream, timeout, verify, cert, proxies)
485 raise ProxyError(e, request=request)
486
--> 487 raise ConnectionError(e, request=request)
488
489 except ClosedPoolError as e:

ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=12345): Max retries exceeded with url: /?properties=%7B%22annotators%22:%20%22tokenize,ssplit,pos,lemma,depparse,ner%22,%20%22outputFormat%22:%20%22json%22%7D (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x00000000094F32B0>: Failed to establish a new connection: [Errno 10061] No connection could be made because the target machine actively refused it',))

Thanks in advance !

Kind Regards,
Tigran

@ajratner
Copy link
Contributor

ajratner commented Jun 1, 2017

Is this resolved with the latest parser? @jason-fries ?

@neda-abolhassani
Copy link

@ajratner
Hi Alex,
I have the same issue while running the first tutorial. It happens in the corpus-parsing section and I get the following error in the notebook:

/home/ubuntu/anaconda3/envs/py2Env/lib/python2.7/site-packages/requests/adapters.pyc in send(self, request, stream, timeout, verify, cert, proxies)
486
487 except (ProtocolError, socket.error) as err:
--> 488 raise ConnectionError(err, request=request)
489
490 except MaxRetryError as e:

ConnectionError: ('Connection aborted.', BadStatusLine("''",))

In addition, I get an error printed in the command line at the same time:

[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x000000072b81e000, 247824384, 0) failed; error='Cannot allocate memory' (errno=12)

I have tried chmod for all the files in the parser folder. However, it keeps showing me the same error. Would you please help me with it?
Thanks,
Neda

@ajratner
Copy link
Contributor

Hi @neda-abolhassani ,

This looks like a memory issue? Do you have enough memory to run CoreNLP on the machine you're using? (@jason-fries any thoughts / ever seen this before?)

@jason-fries
Copy link
Contributor

Hi @neda-abolhassani,
I have seen this error before on a low-memory, virtual machine instance. An easy fix is to instantiate the StanfordCoreNLPServer object with lower default memory requirements. The default java xmx is 4GB, so I would try using 2GB (or 1GB) using the following code snippet

corenlp_server = StanfordCoreNLPServer(verbose=True, java_xmx="2g", num_threads=1)
corpus_parser = CorpusParser(corenlp_server)
corpus_parser.apply(doc_preprocessor)

@neda-abolhassani
Copy link

Hi @jason-fries
I have tried it and modified the corenlp.py:

def

init(self, annotators=['tokenize', 'ssplit', 'pos', 'lemma', 'depparse', 'ner'],
annotator_opts={}, tokenize_whitespace=False, split_newline=False,
java_xmx='1g', port=12345, num_threads=1, verbose=True, version='3.6.0'):

However, I still get the same error when I am on py2Env kernel.

I have also tried changing the kernel to python2. I got a bunch of dependency errors although I have installed the libraries in python-package-requirement.txt. After updating and installing all the required dependencies, I got the following error in the command window while running the corpus-parsing section:

port=12345
threads=1
timeout=600000
Starting server on port 12345 with timeout of 600000 milliseconds.
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at sun.net.httpserver.ServerImpl.(ServerImpl.java:100)
at sun.net.httpserver.HttpServerImpl.(HttpServerImpl.java:50)
at sun.net.httpserver.DefaultHttpServerProvider.createHttpServer(DefaultHttpServerProvider.java:35)
at com.sun.net.httpserver.HttpServer.create(HttpServer.java:130)
at edu.stanford.nlp.pipeline.StanfordCoreNLPServer.run(StanfordCoreNLPServer.java:692)
at edu.stanford.nlp.pipeline.StanfordCoreNLPServer.main(StanfordCoreNLPServer.java:756)

It was bugging about the _htmlparser in the Jupyter Notebook.

@jason-fries
Copy link
Contributor

Hi @neda-abolhassani,

Your second error suggests that a CoreNLP instance is already running. I would make certain you've terminated all of your java processes and try the second approach again.

@neda-abolhassani
Copy link

@jason-fries The problem is the error shown in the command window is different than the error shown in Notebook. I have terminated all the processes but I get the same error in the command window and the Notebook says:

ImportError:No module named packages.urllib3.util.retry

I have double checked and the folder exists in the directory:
/usr/lib/python2.7/dist-packages/requests/

@jason-fries
Copy link
Contributor

Hi @neda-abolhassani,

Your Jupyter notebook kernel might not match the environment where you installed your dependencies -- I would double check that first thing. I've never seen the missing urllib3 error before -- that suggests to me that something is off in your environment settings.

You might also want to check and see that you can manually launch CoreNLP from the command line (see CoreNLP docs on how to do this).

@neda-abolhassani
Copy link

Hi @jason-fries
I fixed the problem with urllib3, as you have mentioned it was the environment issue. CoreNLP also runs perfectly. However, I am still getting the memory allocation error :(

@neda-abolhassani
Copy link

@jason-fries I have even changed the heap size to 512m
def init(self, annotators=['tokenize', 'ssplit', 'pos', 'lemma', 'depparse', 'ner'],
annotator_opts={}, tokenize_whitespace=False, split_newline=False,
java_xmx='512m', port=12345, num_threads=1, verbose=True, version='3.6.0'):
'''

@varun-tandon
Copy link

varun-tandon commented Jun 19, 2017

Hi! I am experiencing the same error, here is my output:
It generally takes about 20 minutes and then provides this output.

Running UDF...
[=                                       ] 0%
---------------------------------------------------------------------------
ConnectionError                           Traceback (most recent call last)
<ipython-input-2-e096403833eb> in <module>()
     22 corpus_parser = CorpusParser()
     23 
---> 24 corpus_parser.apply(list(doc_preprocessor)) #parallelism can be run with a Postgres DBMS, but not SQLite
     25 
     26 # Let's now analayze the counts of documents and sentences in our corpus

/home/varun/Documents/MarkerSub/snorkel/snorkel/udf.pyc in apply(self, xs, clear, parallelism, progress_bar, count, **kwargs)
     41         print("Running UDF...")
     42         if parallelism is None or parallelism < 2:
---> 43             self.apply_st(xs, progress_bar, clear=clear, count=count, **kwargs)
     44         else:
     45             self.apply_mt(xs, parallelism, clear=clear, **kwargs)

/home/varun/Documents/MarkerSub/snorkel/snorkel/udf.pyc in apply_st(self, xs, progress_bar, count, **kwargs)
     64 
     65             # Apply UDF and add results to the session
---> 66             for y in udf.apply(x, **kwargs):
     67 
     68                 # Uf UDF has a reduce step, this will take care of the insert; else add to session

/home/varun/Documents/MarkerSub/snorkel/snorkel/parser/corpus_parser.pyc in apply(self, x, **kwargs)
     29         """Given a Document object and its raw text, parse into Sentences"""
     30         doc, text = x
---> 31         for parts in self.req_handler.parse(doc, text):
     32             parts = self.fn(parts) if self.fn is not None else parts
     33             yield Sentence(**parts)

/home/varun/Documents/MarkerSub/snorkel/snorkel/parser/corenlp.pyc in parse(self, document, text, conn)
    204 
    205         text = text.encode('utf-8', 'error')
--> 206         resp = conn.post(self.endpoint, data=text, allow_redirects=True)
    207         content = resp.content.strip().decode('utf-8')
    208 

/home/varun/py2_kernel/local/lib/python2.7/site-packages/requests/sessions.pyc in post(self, url, data, json, **kwargs)
    547         """
    548 
--> 549         return self.request('POST', url, data=data, json=json, **kwargs)
    550 
    551     def put(self, url, data=None, **kwargs):

/home/varun/py2_kernel/local/lib/python2.7/site-packages/requests/sessions.pyc in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    500         }
    501         send_kwargs.update(settings)
--> 502         resp = self.send(prep, **send_kwargs)
    503 
    504         return resp

/home/varun/py2_kernel/local/lib/python2.7/site-packages/requests/sessions.pyc in send(self, request, **kwargs)
    610 
    611         # Send the request
--> 612         r = adapter.send(request, **kwargs)
    613 
    614         # Total elapsed time of the request (approximately)

/home/varun/py2_kernel/local/lib/python2.7/site-packages/requests/adapters.pyc in send(self, request, stream, timeout, verify, cert, proxies)
    502                 raise ProxyError(e, request=request)
    503 
--> 504             raise ConnectionError(e, request=request)
    505
    506         except ClosedPoolError as e:

ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=12345): Max retries exceeded with url: /?properties=%7B%22annotators%22:%20%22tokenize,ssplit,pos,lemma,depparse,ner%22,%22outputFormat%22:%20%22json%22%7D (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f6a70d19150>: Failed to establish a new connection: [Errno 111] Connection refused',))

@varun-tandon
Copy link

I was able to resolve this issue by running chmod on snorkel/parser and updating my JDK, thanks!

@ajratner
Copy link
Contributor

@varun-tandon thanks for the tip! @neda-abolhassani let us know if that helps for you?

@neda-abolhassani
Copy link

Hi @ajratner
I have tried all the solutions, however it has not worked on the AWS instance although I have increased its memory. I have installed everything all over again on my local system and it works. No clue why it does not work on the instance.

@ajratner
Copy link
Contributor

Hm I've run snorkel on AWS instances before, not sure what's happening here

@asc313x
Copy link

asc313x commented Jun 30, 2017

I hit the same error as the original poster myself just now. I was running the Intro_Tutorial_1 notebook. The StanfordCoreNLPServer had started and was listed in the notebook terminal output — at least it was listed there after the error occurred.

In my case, re-running the cell in the notebook was sufficient to get it working. Perhaps this is a problem of the resource being requested before the server has finished starting up?

@ajratner
Copy link
Contributor

ajratner commented Jun 30, 2017 via email

@ajratner
Copy link
Contributor

This should be closed in v0.6, re-open if not

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants