<table align="left">

  <td>
    <a href="https://colab.research.google.com/github/neo4j-partners/apevue-knowledge-graph/blob/master/enrich.ipynb" target="_blank">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo"> Run in Colab
    </a>
  </td>
  <td>
    <a href="https://github.com/neo4j-partners/apevue-knowledge-graph/blob/master/enrich.ipynb" target="_blank">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
  <td>
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/neo4j-partners/apevue-knowledge-graph/main/enrich.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">Open in Vertex AI Workbench
    </a>
</td>
</table>

# Enrich
In this notebook, we'll enrich the data we previously loaded using Google Enterprise Knowledge Graph (EKG).

## Google Enterprise Knowledge Graph
Google Enterprise Knowledge Graph (EKG) is a tool for enriching datasets with data that Google has collected and made available through the EKG API.  This can be used to enrich data sets stored in Neo4j, the leading graph database.  In this blog post we’ll describe how to do that.

Some helpful resources on EKG are:

* Documentation - [Google Knowledge Graph Search API](https://developers.google.com/knowledge-graph)
* Documentation - [Enterprise Knowledge Graph field enrichment](https://developers.google.com/knowledge-graph)
* Blog Post - [Add intelligence to your document processing with Google's Enterprise Knowledge Graph](https://cloud.google.com/blog/products/ai-machine-learning/improves-document-ai-accuracy-and-consistency-with-ekg)
* Support - [How Google's Knowledge Graph works](https://support.google.com/knowledgepanel/answer/9787176?hl=en)
* Wikipedia - [Google Knowledge Graph](https://en.wikipedia.org/wiki/Google_Knowledge_Graph)


## Getting Started with EKG

We're going to need to enable the Knowledge Graph Search API.  To do so, you'll need a Google Cloud account.  If you don't have one, you can sign up for one [here](https://console.cloud.google.com/freetrial).

With that complete, navigate [here](https://console.cloud.google.com/marketplace/product/google/kgsearch.googleapis.com) and click "Enable."

Next you're going to need credentials.  Click on the "Credentials" link and then click "Create Credentials."  Select "API Key."  Copy your key and paste it below:

In [None]:
api_key='<enter your key here>'

Ultimately we want to use EKG to enrich the data we already loaded into Neo4j.  First, let's start with a simple search.  One of the companies in the ApeVue data set we loaded earlier is Impossible Foods.  Let's try searching the graph for them.

In [None]:
import json
import urllib
import pprint

query = 'Impossible Foods'
service_url = 'https://kgsearch.googleapis.com/v1/entities:search'
params = {
    'query': query,
    'limit': 1,
    'indent': True,
    'key': api_key,
}
url = service_url + '?' + urllib.parse.urlencode(params)
response = json.loads(urllib.request.urlopen(url).read())
for element in response['itemListElement']:
  print(element['result']['name'] + ' (' + str(element['resultScore']) + ')')

pp = pprint.PrettyPrinter(indent=4)
pp.pprint(response['itemListElement'][0])

Ok.  That's pretty cool.  We looked up "Impossible Foods" and EKG told us that it's a organization, corporation and a thing.  For a description it's a "Food company," which is pretty reasonable.

Let's wrap the code above up in a function that we can reuse later.  We're also going to add a little error handling logic in case we don't get a result in the EKG.  In that case, we'll just default the description to "Company."

In [None]:
def search_ekg(query):
    service_url = 'https://kgsearch.googleapis.com/v1/entities:search'
    params = {
        'query': query,
        'limit': 1,
        'indent': True,
        'key': api_key,
    }
    url = service_url + '?' + urllib.parse.urlencode(params)
    response = json.loads(urllib.request.urlopen(url).read())

    try:
        description=response['itemListElement'][0]['result']['description']
    except:
        description='Company'
    return(description)

Let's test our new function out.

In [None]:
search_ekg('Impossible Foods')

That looks good.  

Now let's try something more advanced.  We're going to connect to the Neo4j instance we were using in the previous notebook and enrich that graph with data from EKG.

## Connect to Neo4j
We'll run the same commands as before to create a Neo4j connection.

In [None]:
%pip install graphdatascience

In [None]:
# Edit these variables!
DB_URL = "neo4j+s://XXXXX.databases.neo4j.io"
DB_PASS = "<your-password>"

# You can leave this default
DB_USER = "neo4j"

In [None]:
from graphdatascience import GraphDataScience

gds = GraphDataScience(DB_URL, auth=(DB_USER, DB_PASS))

## Enriching our Graph with EKG
Now let's run a Cypher query which gets the firms in our graph.

In [None]:
result = gds.run_cypher(
    """
        MATCH (n:Firm) RETURN n
    """
)
display(result)

The formatting of this output leaves something to be desired.  We can improve it this way.

In [None]:
import pandas as pd
df = pd.DataFrame([dict(record.items()) for record in result['n']])
df

Ok.  So, now we're going to iterate through our dataframe, grabbing the name of every firm we have.  We'll then perform a lookup in EKG, grabbing the description.  Finally we'll update the firm node back in Neo4j to add that new description.

In [None]:
for index, row in df.iterrows():
    name=row['name']
    description=search_ekg(name)
    print(name + ' - ' + description)

    result = gds.run_cypher(
            "MATCH (n:Firm {name: '" + name + "'})\n"
            "SET n.description = '" + description + "'\n"
            "RETURN n"
)


Now, let's query the firms in the graph again to check how we did.

In [None]:
result = gds.run_cypher(
    """
        MATCH (n:Firm) RETURN n
    """
)
display(result)

In [None]:
import pandas as pd
df = pd.DataFrame([dict(record.items()) for record in result['n']])
df

This is pretty neat.  Some things like Impossible Foods are tagged correctly.  We do have some ambiguity issues.  The most obvious is Gorillas which comes back as a primate.  While technically true, that's probably not the gorilla we were looking for.

Regardless, we've now enriched our data with EKG!

# Conclusion
In this notebook you enriched the graph we built in Neo4j Aura DS with additional data from Google Enterprise Knowledge Graph.  The next step will be to explore the enriched graph.