# Stanford NLP

Stanford's [CoreNLP](https://stanfordnlp.github.io/CoreNLP/) system is a Java implementation of a full suite of NLP tools. The tools can be run from the command line, so implementation with a Python system could involve running a bash script to get and save the Java output which can then be processing using Python. There have been Python wrappers of CoreNLP but performance was an issue.

Recently Stanford NLP released a [Python implementation](https://github.com/stanfordnlp/stanfordnlp) of some of the functionality. As of this writing, the NLP Pipeline tokenizes, pos tags, finds lemmas, and outputs a depenency parse. This notebook demonstrates this functionality. The link at the top of this paragraph provides installation instructions. The link also provides information about how to see sample notebooks in Google colab. One of the notebooks demonstrates how to set up a local server which will allow access to more functionality such as NER.

In [1]:
import stanfordnlp

nlp = stanfordnlp.Pipeline()

Use device: cpu
---
Loading: tokenize
With settings: 
{'model_path': '/home/mazidi/stanfordnlp_resources/en_ewt_models/en_ewt_tokenizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
---
Loading: pos
With settings: 
{'model_path': '/home/mazidi/stanfordnlp_resources/en_ewt_models/en_ewt_tagger.pt', 'pretrain_path': '/home/mazidi/stanfordnlp_resources/en_ewt_models/en_ewt.pretrain.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
---
Loading: lemma
With settings: 
{'model_path': '/home/mazidi/stanfordnlp_resources/en_ewt_models/en_ewt_lemmatizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
Building an attentional Seq2Seq model...
Using a Bi-LSTM encoder
Using soft attention for LSTM.
Finetune all embeddings.
[Running seq2seq lemmatizer with edit classifier]
---
Loading: depparse
With settings: 
{'model_path': '/home/mazidi/stanfordnlp_resources/en_ewt_models/en_ewt_parser.pt', 'pretrain_path': '/home/mazidi/stanfordnlp_resources/en_ewt_models/en_

In [10]:
doc = nlp('The history of natural language processing(NLP) generally started in the 1950s, \
although work can be found from earlier periods. In 1950, Alan Turing published an article\
titled "Computing Machinery and Intelligence" which proposed what is now called the\
Turing test as a criterion of intelligence.')

In [11]:
for i, sentence in enumerate(doc.sentences):
    print('\n[Sentence {}]'.format(i+1))
    for word in sentence.words:
        print('{:12s}\t{:12s}\t{:6s}\t{:d}\t{:12s}'.format(\
            word.text, word.lemma, word.pos, word.governor, word.dependency_relation))
        print(' ')


[Sentence 1]
The         	the         	DT    	2	det         
 
history     	history     	NN    	11	nsubj       
 
of          	of          	IN    	6	case        
 
natural     	natural     	JJ    	5	amod        
 
language    	language    	NN    	6	compound    
 
processing  	processing  	NN    	2	nmod        
 
(           	(           	-LRB- 	8	punct       
 
NLP         	NLP         	NN    	6	appos       
 
)           	)           	-RRB- 	8	punct       
 
generally   	generally   	RB    	11	advmod      
 
started     	start       	VBD   	0	root        
 
in          	in          	IN    	14	case        
 
the         	the         	DT    	14	det         
 
1950s       	1950        	NNS   	11	obl         
 
,           	,           	,     	11	punct       
 
although    	although    	IN    	20	mark        
 
work        	work        	NN    	20	nsubj:pass  
 
can         	can         	MD    	20	aux         
 
be          	be          	VB    	20	aux:pass    
 
found       	find        	