# Stanford CoreNLP

Installation notes taken from here: https://stackoverflow.com/questions/32879532/stanford-nlp-for-python

## Install Stanford CoreNLP
The latest version at this time (2018-02-27) is 3.9.1:

```
wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-02-27.zip
unzip stanford-corenlp-full-2018-02-27.zip
```

## Start the server

```
cd stanford-corenlp-full-2018-02-27
java -mx5g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -timeout 10000
```

Notes:
* timeout is in milliseconds, I set it to 10 sec above. You should increase it if you pass huge blobs to the server.
* There are more options, you can list them with --help.

## Notes

1. You pass the whole text to the server and it splits it into sentences.
2. The sentiment is ascribed to each sentence, not the whole text.
3. The average sentiment of text is between `Positive (3)`, `Neutral (2)` and `Negative (1)`, the range is from `VeryNegative (0)` to `VeryPositive (4)` which appear to be quite rare.
4. You can stop the server either by typing `Ctrl-C` at the terminal you started it from or using the shell command `kill $(lsof -ti tcp:9000)`. `9000` is the default port, you can change it using the `-port` option when starting the server.
5. Increase timeout (in milliseconds) in server or client if you get timeout errors.

## Install the python packages

Python packages list available here: https://stanfordnlp.github.io/CoreNLP/other-languages.html#python

```
pip install pycorenlp
pip install stanfordcorenlp
```

## PyCoreNLP Package

Documentation: https://github.com/smilli/py-corenlp

In [1]:
from pycorenlp import StanfordCoreNLP

nlp = StanfordCoreNLP('http://localhost:9000')
res = nlp.annotate("I love you. I hate him. You are nice. He is dumb.",
                   properties={
                       'annotators': 'sentiment',
                       'outputFormat': 'json',
                       'timeout': 1000,
                   })
#print(type(res)) # <class 'dict'>

for s in res["sentences"]:
    print("%d: '%s': %s %s" % (
        s["index"],
        " ".join([t["word"] for t in s["tokens"]]),
        s["sentimentValue"], s["sentiment"]))

0: 'I love you .': 3 Positive
1: 'I hate him .': 1 Negative
2: 'You are nice .': 3 Positive
3: 'He is dumb .': 1 Negative


## StanfordCoreNLP Package

Documentation: https://github.com/Lynten/stanford-corenlp

In [2]:
from stanfordcorenlp import StanfordCoreNLP
import json

nlp = StanfordCoreNLP(r'http://localhost', port=9000)

sentence = 'This World is an amazing place.'
properties = {'annotators': 'sentiment', 
              'pinelineLanguage': 'en', 
              'outputFormat': 'json'}
res = nlp.annotate(sentence, properties = properties)

#print(res) # <class 'str'>
res = json.loads(res)

for s in res["sentences"]:
    print("%d: '%s': %s %s" % (
        s["index"],
        " ".join([t["word"] for t in s["tokens"]]),
        s["sentimentValue"], s["sentiment"]))

nlp.close() # Do not forget to close! The backend server will consume a lot memery.

0: 'This World is an amazing place .': 4 Verypositive


## Another Language

Available languages: https://stanfordnlp.github.io/CoreNLP/download.html

```
cd stanford-corenlp-full-2018-02-27
wget http://nlp.stanford.edu/software/stanford-arabic-corenlp-2018-02-27-models.jar
wget http://nlp.stanford.edu/software/stanford-chinese-corenlp-2018-02-27-models.jar
```

In [3]:
# _*_coding:utf-8_*_
from stanfordcorenlp import StanfordCoreNLP
import json

nlp = StanfordCoreNLP(r'http://localhost', port=9000, lang='zh')

#sentence = '这个世界是一个了不起的地方。' #simplified, This World is an amazing place.
sentence = '這個世界是一個了不起的地方。' #traditional, This World is an amazing place.
properties = {'annotators': 'sentiment', 
              'pinelineLanguage': 'zh', 
              'outputFormat': 'json'}
res = nlp.annotate(sentence, properties = properties)
res = json.loads(res)

for s in res["sentences"]:
    print("%d: '%s': %s %s" % (
        s["index"],
        " ".join([t["word"] for t in s["tokens"]]),
        s["sentimentValue"], s["sentiment"]))

nlp.close()

0: '這個世界是一個了不起的地方 。': 2 Neutral


In [4]:
# _*_coding:utf-8_*_
from stanfordcorenlp import StanfordCoreNLP
import json

# Other human languages support, e.g. Chinese
sentence = '這個世界是一個了不起的地方。'

with StanfordCoreNLP(r'http://localhost', port=9000, lang='zh') as nlp:
    print(nlp.word_tokenize(sentence))
    print(nlp.pos_tag(sentence))
    print(nlp.ner(sentence))
    print(nlp.parse(sentence))
    print(nlp.dependency_parse(sentence))

['這個世界是一個了不起的地方', '。']
[('這個世界是一個了不起的地方', 'NN'), ('。', 'SYM')]
[('這個世界是一個了不起的地方', 'O'), ('。', 'O')]
(ROOT
  (NP (NN 這個世界是一個了不起的地方) (SYM 。)))
[('ROOT', 0, 1), ('dep', 1, 2)]


In [5]:
# _*_coding:utf-8_*_
from stanfordcorenlp import StanfordCoreNLP
import json

nlp = StanfordCoreNLP(r'http://localhost', port=9000, lang='ar')

sentence = 'أحبك. أنا أكرهه. أنت لطيف. إنه غبي' #I love you. I hate him. you are kind. Stupid
properties = {'annotators': 'sentiment', 
              'pinelineLanguage': 'ar', 
              'outputFormat': 'json'}
res = nlp.annotate(sentence, properties = properties)
res = json.loads(res)

for s in res["sentences"]:
    print("%d: '%s': %s %s" % (
        s["index"],
        " ".join([t["word"] for t in s["tokens"]]),
        s["sentimentValue"], s["sentiment"]))

nlp.close()

0: 'أحبك .': 2 Neutral
1: 'أنا أكرهه .': 2 Neutral
2: 'أنت لطيف .': 2 Neutral
3: 'إنه غبي': 2 Neutral
