# Stanford NLP

*Note*: A newer, improved Python wrapper is Stanza. See the notebook in this folder. This notebook is here for those using the older stanfordnlp package. 

Stanford's [CoreNLP](https://stanfordnlp.github.io/CoreNLP/) system is a Java implementation of a full suite of NLP tools. The tools can be run from the command line, so implementation with a Python system could involve running a bash script to get and save the Java output which can then be processing using Python. There have been Python wrappers of CoreNLP but performance was an issue.

Recently Stanford NLP released a [Python implementation](https://github.com/stanfordnlp/stanfordnlp) of some of the functionality. As of this writing, the NLP Pipeline tokenizes, pos tags, finds lemmas, and outputs a depenency parse. This notebook demonstrates this functionality. The link at the top of this paragraph provides installation instructions. The link also provides information about how to see sample notebooks in Google colab. One of the notebooks demonstrates how to set up a local server which will allow access to more functionality such as NER.

Refer to the [documentation](https://pypi.org/project/stanfordnlp/) for installation and usage information.

Running CoreNLP through Python will be slower with fewer options than running it in Java.

In [20]:
import stanfordnlp

nlp = stanfordnlp.Pipeline()

Use device: cpu
---
Loading: tokenize
With settings: 
{'model_path': '/Users/mazidi/stanfordnlp_resources/en_ewt_models/en_ewt_tokenizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
---
Loading: pos
With settings: 
{'model_path': '/Users/mazidi/stanfordnlp_resources/en_ewt_models/en_ewt_tagger.pt', 'pretrain_path': '/Users/mazidi/stanfordnlp_resources/en_ewt_models/en_ewt.pretrain.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
---
Loading: lemma
With settings: 
{'model_path': '/Users/mazidi/stanfordnlp_resources/en_ewt_models/en_ewt_lemmatizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
Building an attentional Seq2Seq model...
Using a Bi-LSTM encoder
Using soft attention for LSTM.
Finetune all embeddings.
[Running seq2seq lemmatizer with edit classifier]
---
Loading: depparse
With settings: 
{'model_path': '/Users/mazidi/stanfordnlp_resources/en_ewt_models/en_ewt_parser.pt', 'pretrain_path': '/Users/mazidi/stanfordnlp_resources/en_ewt_mode

In [21]:
text = 'The history of natural language processing(NLP) generally started in the 1950s, \
although work can be found from earlier periods. In 1950, Alan Turing published an article\
titled "Computing Machinery and Intelligence" which proposed what is now called the\
Turing test as a criterion of intelligence.'

doc = nlp(text)



In [22]:
for i, sentence in enumerate(doc.sentences):
    print('\n[Sentence {}]'.format(i+1))
    for word in sentence.words:
        print('{:12s}\t{:12s}\t{:6s}\t{:d}\t{:12s}'.format(\
            word.text, word.lemma, word.pos, word.governor, word.dependency_relation))
        print(' ')


[Sentence 1]
The         	the         	DT    	2	det         
 
history     	history     	NN    	11	nsubj       
 
of          	of          	IN    	6	case        
 
natural     	natural     	JJ    	5	amod        
 
language    	language    	NN    	6	compound    
 
processing  	processing  	NN    	2	nmod        
 
(           	(           	-LRB- 	8	punct       
 
NLP         	NLP         	NN    	6	appos       
 
)           	)           	-RRB- 	8	punct       
 
generally   	generally   	RB    	11	advmod      
 
started     	start       	VBD   	0	root        
 
in          	in          	IN    	14	case        
 
the         	the         	DT    	14	det         
 
1950s       	1950        	NNS   	11	obl         
 
,           	,           	,     	11	punct       
 
although    	although    	IN    	20	mark        
 
work        	work        	NN    	20	nsubj:pass  
 
can         	can         	MD    	20	aux         
 
be          	be          	VB    	20	aux:pass    
 
found       	find        	

You can also start a server to process sentences. Read more in [the documentation here](https://github.com/stanfordnlp/stanfordnlp/blob/master/demo/corenlp.py)

In [23]:
import os

os.environ["CORENLP_HOME"] = r'/Users/mazidi/stanford-corenlp-4.1.0'

from stanfordnlp.server import CoreNLPClient



In [24]:
# set up the client
with CoreNLPClient(annotators=['tokenize','ssplit','pos','lemma','ner','parse','depparse'], timeout=60000, memory='16G') as client:
    # submit the request to the server
    ann = client.annotate(text)
    
    # first sentence
    sentence = ann.sentence[0]
    
    
    # get the dependency parse of the first sentence
    print('---')
    print('dependency parse of first sentence')
    dependency_parse = sentence.basicDependencies
    print(dependency_parse)
 
    # get the constituency parse of the first sentence
    print('---')
    print('constituency parse of first sentence')
    constituency_parse = sentence.parseTree
    print(constituency_parse)

    # get the first token of the first sentence
    print('---')
    print('first token of first sentence')
    token = sentence.token[0]
    print(token)
    
    # get the part-of-speech tag
    print('---')
    print('part of speech tag of token')
    token.pos
    print(token.pos)

    # get the named entity tag
    print('---')
    print('named entity tag of token')
    print(token.ner)


Starting server with command: java -Xmx16G -cp /Users/mazidi/stanford-corenlp-4.1.0/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 60000 -threads 5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-a70f7fb6c42b441e.props -preload tokenize,ssplit,pos,lemma,ner,parse,depparse
---
dependency parse of first sentence
node {
  sentenceIndex: 0
  index: 1
}
node {
  sentenceIndex: 0
  index: 2
}
node {
  sentenceIndex: 0
  index: 3
}
node {
  sentenceIndex: 0
  index: 4
}
node {
  sentenceIndex: 0
  index: 5
}
node {
  sentenceIndex: 0
  index: 6
}
node {
  sentenceIndex: 0
  index: 7
}
node {
  sentenceIndex: 0
  index: 8
}
node {
  sentenceIndex: 0
  index: 9
}
node {
  sentenceIndex: 0
  index: 10
}
node {
  sentenceIndex: 0
  index: 11
}
node {
  sentenceIndex: 0
  index: 12
}
node {
  sentenceIndex: 0
  index: 13
}
node {
  sentenceIndex: 0
  index: 14
}
node {
  sentenceIndex: 0
  index: 15
}
node {
  sentenceIndex: 0
  index: 16
}
node {
  senten