# Named Entity Linking (NEL) with DBpedia SPARQL

This notebook demonstrates how to use the `get_best_match` function to link an extracted organization name from a news article to its corresponding entity in DBpedia.

In [1]:
!pip install SPARQLWrapper pandas

Defaulting to user installation because normal site-packages is not writeable
Collecting SPARQLWrapper
  Downloading SPARQLWrapper-2.0.0-py3-none-any.whl.metadata (2.0 kB)
Downloading SPARQLWrapper-2.0.0-py3-none-any.whl (28 kB)
Installing collected packages: SPARQLWrapper
Successfully installed SPARQLWrapper-2.0.0



[notice] A new release of pip is available: 23.3.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
from SPARQLWrapper import SPARQLWrapper, JSON
import pandas as pd
from difflib import SequenceMatcher

# DBpedia SPARQL Endpoint
DBPEDIA_SPARQL_URL = "http://dbpedia.org/sparql"

def get_best_match(org_name):
    """Fetch the best-matching company entity from DBpedia using SPARQL."""
    
    # SPARQL Query with Dynamic Organization Name
    sparql_query = f"""
    SELECT ?company ?label ?industry ?country ?abstract ?wikiPage WHERE {{
      ?company rdf:type dbo:Company.
      ?company rdfs:label ?label.
      
      OPTIONAL {{ ?company dbo:industry ?industry. }}
      OPTIONAL {{ ?company dbo:country ?country. }}
      OPTIONAL {{ ?company dbo:abstract ?abstract. }}
      OPTIONAL {{ ?company foaf:isPrimaryTopicOf ?wikiPage. }}

      FILTER (CONTAINS(LCASE(?label), LCASE("{org_name}")))
      FILTER (lang(?label) = 'en')
      FILTER (lang(?abstract) = 'en')
    }}
    LIMIT 5
    """

    sparql = SPARQLWrapper(DBPEDIA_SPARQL_URL)
    sparql.setQuery(sparql_query)
    sparql.setReturnFormat(JSON)
    
    results = sparql.query().convert()
    matches = results["results"]["bindings"]

    if not matches:
        return None

    # Rank results by similarity score
    ranked_matches = sorted(matches, key=lambda x: SequenceMatcher(None, org_name.lower(), x["label"]["value"].lower()).ratio(), reverse=True)

    # Best match
    best_match = ranked_matches[0]

    return {
        "Matched Entity": best_match["label"]["value"],
        "Industry": best_match["industry"]["value"] if "industry" in best_match else "Unknown",
        "Country": best_match["country"]["value"] if "country" in best_match else "Unknown",
        "Description": best_match["abstract"]["value"] if "abstract" in best_match else "No description available",
        "Wikipedia URL": best_match["wikiPage"]["value"] if "wikiPage" in best_match else "No URL available"
    }

if __name__ == "__main__":
    org_name = input("Enter an organization name: ")
    best_match = get_best_match(org_name)

    if best_match:
        print("Best Matching Entity Found:")
        for key, value in best_match.items():
            print(f"{key}: {value}")
    else:
        print("No matching entity found.")


Best Matching Entity Found:
Matched Entity: Microsoft Press
Industry: Unknown
Country: http://dbpedia.org/resource/United_States
Description: Microsoft Press is the publishing arm of Microsoft, usually releasing books dealing with various current Microsoft technologies. Microsoft Press' first introduced books were The Apple Macintosh Book by Cary Lu and Exploring the IBM PCjr Home Computer by Peter Norton in 1984 at the West Coast Computer Faire. The publisher has gone on to release books by other recognizable authors such as Charles Petzold, Steve McConnell, Mark Russinovich and . Following a deal signed in 2009, O'Reilly Media became the official distributor of Microsoft Press books. In 2014, the distributor was changed to Pearson. In July 2016, Microsoft Press editorial staff was laid off.
Wikipedia URL: http://en.wikipedia.org/wiki/Microsoft_Press


In [3]:
# Test the function with an example entity
org_name = "Apple Inc."  # Replace with an extracted organization name
best_match = get_best_match(org_name)

# Display results
if best_match:
    print("Best Matching Entity Found:")
    for key, value in best_match.items():
        print(f"{key}: {value}")
else:
    print("No matching entity found.")

Best Matching Entity Found:
Matched Entity: Apple Inc.
Industry: http://dbpedia.org/resource/Consumer_electronics
Country: Unknown
Description: Apple Inc. is an American multinational technology company headquartered in Cupertino, California, United States. Apple is the largest technology company by revenue (totaling US$365.8 billion in 2021) and, as of June 2022, is the world's biggest company by market capitalization, the fourth-largest personal computer vendor by unit sales and second-largest mobile phone manufacturer. It is one of the Big Five American information technology companies, alongside Alphabet, Amazon, Meta, and Microsoft. Apple was founded as Apple Computer Company on April 1, 1976, by Steve Jobs, Steve Wozniak and Ronald Wayne to develop and sell Wozniak's Apple I personal computer. It was incorporated by Jobs and Wozniak as Apple Computer, Inc. in 1977 and the company's next computer, the Apple II, became a best seller and one of the first mass-produced microcomputers