# Objective

Many agronomic terms appear in natural language in multiple forms, e.g.:
* "The awns are rough", "It has rough awns", or "It is rough-awned". In all these cases, the plant part (PLAN), awn, is modified by an adjective, rough. The combination, "rough" + "awn" is a trait (TRAT).
* "early maturing", "matures early". In these cases, a trait (TRAT), 'maturing' is modified by an adjective, early. This combination "early" + "maturing" is a compound trait (TRAT).

In this notebook, we will run a small section of text against a trained NLP model, read the predictions, identify compoud traits based on the above rules, and output modified named entities in JSON format that include the compound traits.

# NLP parse some sample text

In [25]:
import spacy
import PyPDF2
from spacy import displacy

# nlp = spacy.load('en_core_web_sm')
nlp = spacy.load('NerModel')

test_text = '''Kold is a six-rowed winter feed barley. It was released by the Oregon and Idaho AESs in 1993. It has rough awns, is early maturing, and is high yielding.'''

doc = nlp(test_text)

colors = {'ALAS':'BlueViolet','CROP': 'Aqua','CVAR':'Chartreuse','PATH':'red','PED':'orange','PLAN':'pink','PPTD':'brown','TRAT':'yellow'}
cust_options = {'ents': ['ALAS','CROP','CVAR','PATH','PED','PLAN','PPTD','TRAT'], 'colors':colors}

displacy.render(doc, style='ent', jupyter=True, options=cust_options)


# Identify compound traits ADJ + PLAN = TRAT

In [26]:
print("Entities:")
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)
    
# Comment out the following due to lack of statistical training for NERModel
# print("\nNoun Chunks:")
# for chunk in doc.noun_chunks:
#     print(chunk.text, chunk.root.text, chunk.root.dep_,
#             chunk.root.head.text)
    
print("\nParts of Speech:")
for token in doc:
    print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_)

Entities:
Kold 0 4 CVAR
six-rowed 10 19 TRAT
winter 20 26 TRAT
feed 27 31 TRAT
barley 32 38 CROP
Oregon and Idaho AESs 63 84 ORG
1993 88 92 DATE
awns 107 111 PLAN
maturing 122 130 TRAT
yielding 144 152 TRAT

Parts of Speech:
Kold Kold   
is be   
a a   
six-rowed six-rowed   
winter winter   
feed fee   
barley barley   
. .   
It It   
was be   
released release   
by by   
the the   
Oregon Oregon   
and and   
Idaho Idaho   
AESs AESs   
in in   
1993 1993   
. .   
It It   
has have   
rough rough   
awns awn   
, ,   
is be   
early early   
maturing mature   
, ,   
and and   
is be   
high high   
yielding yield   
. .   


Strangely, if we use the custom NerModel instead of the standard English training, Parts of Speech are not recorded, and name chuncks have no statistics. I suspect we need to alter the way we've done training...