### Noun Chunks

- noun + word describing the noun
- noun phrases

In [1]:
import spacy
nlp = spacy.load('en_core_web_sm')

In [3]:
doc_covid = nlp(open('covid19.txt').read())
doc_covid

Through the International Food Safety Authorities Network (INFOSAN),
national food safety authorities are seeking more information on the
potential for persistence of SARS-CoV-2, which causes COVID-19, on foods
traded internationally as well as the potential role of food in the transmission
of the virus. Currently, there are investigations conducted to evaluate the
viability and survival time of SARS-CoV-2. As a general rule, the consumption
of raw or undercooked animal products should be avoided. Raw meat, raw
milk or raw animal organs should be handled with care to avoid crosscontamination with uncooked foods.

### Finding Noun Chunks

In [9]:
noun_chunk = [token.text for token in doc_covid.noun_chunks]
print(noun_chunk)


['the International Food Safety Authorities Network', 'INFOSAN', 'national food safety authorities', 'more information', 'the\npotential', 'persistence', 'SARS-CoV-2', 'COVID-19', 'foods', 'the potential role', 'food', 'the transmission', 'the virus', 'investigations', 'the\nviability', 'survival time', 'SARS-CoV-2', 'a general rule', 'the consumption', 'raw or undercooked animal products', 'Raw meat', 'raw\nmilk', 'raw animal organs', 'care', 'crosscontamination', 'uncooked foods']


### Root Text

In [10]:
# Root Text
# The main noun to the next

noun_chunk_root = [token.root.text for token in doc_covid.noun_chunks]
print(noun_chunk_root)

['Network', 'INFOSAN', 'authorities', 'information', 'potential', 'persistence', 'CoV-2', 'COVID-19', 'foods', 'role', 'food', 'transmission', 'virus', 'investigations', 'viability', 'time', 'CoV-2', 'rule', 'consumption', 'products', 'meat', 'milk', 'organs', 'care', 'crosscontamination', 'foods']


### Root Token Head

In [53]:
# Text of the root token head

for token in doc_covid.noun_chunks:
    print('{:<20}{:<20}{:<20}'.format(token.root.head.text, 'Connector_text',token.root.text))

Through             Connector_text      Network             
Network             Connector_text      INFOSAN             
seeking             Connector_text      authorities         
seeking             Connector_text      information         
on                  Connector_text      potential           
for                 Connector_text      persistence         
of                  Connector_text      CoV-2               
causes              Connector_text      COVID-19            
on                  Connector_text      foods               
potential           Connector_text      role                
of                  Connector_text      food                
in                  Connector_text      transmission        
of                  Connector_text      virus               
are                 Connector_text      investigations      
evaluate            Connector_text      viability           
viability           Connector_text      time                
of                  Conn

### Finding Nouns from POS

In [11]:
noun = [token.text for token in doc_covid if token.pos_ == 'NOUN' ]
print(noun)

['food', 'safety', 'authorities', 'information', 'potential', 'persistence', 'foods', 'role', 'food', 'transmission', 'virus', 'investigations', 'viability', 'survival', 'time', 'rule', 'consumption', 'animal', 'products', 'meat', 'milk', 'animal', 'organs', 'care', 'crosscontamination', 'foods']


In [31]:
l1 = ['food', 'safety', 'authorities', 'information', 'potential', 'persistence', 'foods', 'role', 'food', 'transmission', 'virus', 'investigations', 'viability', 'survival', 'time', 'rule', 'consumption', 'animal', 'products', 'meat', 'milk', 'animal', 'organs', 'care', 'crosscontamination', 'foods']
l1.sort()

In [32]:
l2 = ['Network', 'INFOSAN', 'authorities', 'information', 'potential', 'persistence', 'CoV-2', 'COVID-19', 'foods', 'role', 'food', 'transmission', 'virus', 'investigations', 'viability', 'time', 'CoV-2', 'rule', 'consumption', 'products', 'meat', 'milk', 'organs', 'care', 'crosscontamination', 'foods']
l2.sort()

In [36]:
# Finding all Nouns

all = l1 + l2
nouns_all = list(set(all))
print(nouns_all)

['consumption', 'survival', 'products', 'virus', 'information', 'care', 'CoV-2', 'crosscontamination', 'authorities', 'time', 'milk', 'Network', 'potential', 'viability', 'foods', 'persistence', 'safety', 'investigations', 'meat', 'animal', 'INFOSAN', 'rule', 'organs', 'food', 'role', 'COVID-19', 'transmission']


In [39]:
len(nouns_all)

27

In [38]:
# Nouns common in both list
set_l1 = set(l1)
set_l2 = set(l2)
common_nouns = set_l1.intersection(set_l2)
common_nouns = list(common_nouns)
print(common_nouns)

['consumption', 'products', 'virus', 'information', 'care', 'crosscontamination', 'authorities', 'time', 'milk', 'potential', 'viability', 'foods', 'persistence', 'investigations', 'meat', 'rule', 'organs', 'food', 'role', 'transmission']


In [40]:
len(common_nouns)

20