Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Information Extraction (Knowledge Triples) #3303

Closed
ryan-clancy opened this issue Feb 20, 2019 · 3 comments
Closed

Information Extraction (Knowledge Triples) #3303

ryan-clancy opened this issue Feb 20, 2019 · 3 comments
Labels
usage General spaCy usage

Comments

@ryan-clancy
Copy link

Feature description

I've seen scattered posts and issues about information extraction using spaCy, but no concrete solution.

Ideally, we'd have the following:

  • Given a sentence, extract all the entities.
  • For each entity, extract all the possible knowledge triples.

Example:

Input: "Barrack Obama was born in Hawaii. He was president of the United States and lived in the White House"

Output:

  • (Barrack Obama, was born in, Hawaii)
  • (Barrack Obama, president of, United States)
  • (Barrack Obama, lived in, White House)

Is this something that can be easily done at the moment?

Could the feature be a custom component or spaCy plugin?

If so, we will tag it as project idea so other users can take it on.

@ines ines added the usage General spaCy usage label Feb 20, 2019
@ines
Copy link
Member

ines commented Feb 20, 2019

If you haven't seen it yet, you might find these examples useful: https://github.com/explosion/spaCy/tree/master/examples/information_extraction

Especially the entity relations script shows a very similar use case: extracting the relationships between phrases and named entity types, using the dependency parse.

For the new v2.1 docs, I also added a section on combining models with rules for information extraction. It's not live yet, but you can already read the draft here: https://github.com/explosion/spaCy/blob/develop/website/docs/usage/rule-based-matching.md#combining-models-and-rules-models-rules

@ines ines closed this as completed Feb 21, 2019
@ryan-clancy
Copy link
Author

ryan-clancy commented Feb 21, 2019

Thanks for pointing me in the right direction, @ines

I have the following code, but it doesn't seem very robust:

import spacy

def extract_relations(doc):

    spans = list(doc.ents) + list(doc.noun_chunks)
    for span in spans:
        span.merge()
    
    triples = []
        
    for ent in doc.ents:
        preps = [prep for prep in ent.root.head.children if prep.dep_ == "prep"]
        for prep in preps:
            for child in prep.children:
                triples.append((ent.text, "{} {}".format(ent.root.head, prep), child.text))
            
    
    return triples

TEXTS = [
    'Barrack Obama was born in Hawaii in the year 1961. He was president of the United States.',
    'Apple was founded in Cupertino in the year 1981.'
]
    
nlp = spacy.load("en")

for text in TEXTS:
    print("\n" + text)
    relations = extract_relations(nlp(text))
    for r1, r2, r3 in relations:
        print('({}, {}, {})'.format(r1, r2, r3))

which produces:

Barrack Obama was born in Hawaii in the year 1961. He was president of the United States.
(Barrack Obama, born in, Hawaii)
(Barrack Obama, born in, the year 1961)

Apple was founded in Cupertino in the year 1981.
(Apple, founded in, Cupertino)
(Apple, founded in, the year 1981)

For example, it doesn't capture ideas like Obama being president of the United States. Do you have any recommendations to make this more robust?

@lock
Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Mar 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
usage General spaCy usage
Projects
None yet
Development

No branches or pull requests

2 participants