# Example notebook for testing basics out

In [1]:
# Jupyter magic
%load_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
# imports
import pandas as pd
import pathlib
import spacy

ModuleNotFoundError: No module named 'spacy'

### Load sample articles 

These were mentioned by Kathy in the [brief](https://docs.google.com/document/d/1ncZfFND1ytnZ3kgMl8RC4GtBFkC3-tAlBDQxuFhRgAU/edit). I manually copied and pasted the text into dropbox to make some quick data.

In [3]:
data_dir = '../1_data/example_articles/'
p = pathlib.Path(data_dir)

articles = {}
for article_path in p.glob('*'):
    name = article_path.name
    txt = article_path.read_text()
    articles[name] = txt
    
    print(f'{name}    - has {len(txt)} chars')

orbia_2019_forbes    - has 6585 chars
orbia_2019_howtolead    - has 6753 chars
orbia_2018_reuters_businessnews    - has 2376 chars
orbia_2019_quartzatwork    - has 20006 chars


IsADirectoryError: [Errno 21] Is a directory: '../1_data/example_articles/.ipynb_checkpoints'

In [4]:
articles['orbia_2019_forbes']

'One of the biggest questions I get asked is whether companies who did not have an overarching purpose in the first place can find one and transform themselves into a purpose-driven one. The answer is yes–but it is extremely hard and takes time.\n\nOne person who has taken his company on such a journey is Daniel Martínez-Valle, CEO of Orbia, CEO of Orbia (formerly Mexichem). Almost two years ago, he took the helm of a company with disparate business groups, missions and purposes (if they existed at all.) In undergoing their transformation, Martínez-Valle, CEO of Orbia asked a couple key questions: why do we exist beyond profit and how do you unite 22,000 employees operating in 110 countries?\n\nThe company re-organized into five global business groups, each focused on responding to opportunities to address of the world’s biggest challenges including: food security, water scarcity, rapid urbanization, community access to global data infrastructure, and expanding access to health and wel

## Add some NLP

Straight from [spacy's example page](https://nightly.spacy.io/)

In [5]:
# This command is going to fail if the "en_core_web_sm" corpus hasn't been installed yet.
nlp = spacy.load("en_core_web_sm")

In [8]:
doc = nlp(articles['orbia_2019_forbes'])

# Analyze syntax
print("Noun phrases:", [chunk.text for chunk in doc.noun_chunks])
print()
print("Verbs:", [token.lemma_ for token in doc if token.pos_ == "VERB"])


Noun phrases: ['the biggest questions', 'I', 'companies', 'who', 'an overarching purpose', 'the first place', 'themselves', 'The answer', 'it', 'time', 'One person', 'who', 'his company', 'such a journey', 'Daniel Martínez-Valle', 'CEO', 'Orbia', 'CEO', 'Orbia', 'formerly Mexichem', 'he', 'the helm', 'a company', 'disparate business groups', 'missions', 'purposes', 'they', 'their transformation', 'Martínez-Valle', 'CEO', 'Orbia', 'a couple key questions', 'we', 'profit', 'you', '22,000 employees', '110 countries', 'The company', 'five global business groups', 'opportunities', 'the world', 'biggest challenges', 'food security', 'water scarcity', 'rapid urbanization', 'community access', 'global data infrastructure', 'access', 'health', 'wellness', 'advanced materials', 'innovation', 'customer-centricity', 'Daniel Martínez-Valle', 'I', 'CEO', 'Orbia', 'those days', 'I', 'the CEO', 'a 22,000-person organization', 'I', 'we', 'products', 'an impact', 'people', 'the way', 'they', 'their live

In [9]:
# Find named entities, phrases and concepts
for entity in doc.ents:
    print(entity.text, entity.label_)

One CARDINAL
first ORDINAL
Daniel Martínez-Valle PERSON
Orbia GPE
Orbia GPE
Almost two years ago DATE
Martínez-Valle PERSON
Orbia GPE
22,000 CARDINAL
110 CARDINAL
five CARDINAL
Daniel Martínez-Valle PERSON
Orbia GPE
today DATE
a day DATE
daily DATE
Orbia GPE
Valle GPE
Orbia GPE
today DATE
the UN Sustainable Development Goals ORG
Valle GPE
Valle GPE
Orbia GPE
22,000 CARDINAL
dozens CARDINAL
Martínez-Valle PERSON
ImpactMark PERSON
first ORDINAL
ImpactMark PERSON
Valle GPE
