<a href="https://colab.research.google.com/github/mneedham/data-science-training/blob/master/03_Recommendations.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Recommendations: Part 1

In this notebook we're going to learn how to make recommendations using Neo4j. As with the other notebooks, let's get our environment setup.

And let's import those libraries:

In [None]:
from neo4j import GraphDatabase
import pandas as pd

import matplotlib 
import matplotlib.pyplot as plt

plt.style.use('fivethirtyeight')
pd.set_option('display.float_format', lambda x: '%.3f' % x)
pd.set_option('display.max_colwidth', 100)


Update the cell below with the same Sandbox credentials that you used in the first notebook:

In [None]:
driver = GraphDatabase.driver("bolt://data-science-training-neo4j", auth=("neo4j", "admin"))        
print(driver.address)

##  Finding popular authors

Since we're going to make collaborator suggestions so let's find authors who have written the most articles so that we have some data to work with.

In [None]:
popular_authors_query = """
MATCH (author:Author)
RETURN author.name, size((author)<-[:AUTHOR]-()) AS articlesPublished
ORDER BY articlesPublished DESC
LIMIT 10
"""

with driver.session() as session:
    result = session.run(popular_authors_query)

pd.DataFrame([dict(record) for record in result])

Let's pick one of these authors...

In [None]:
author_name = "Tao Xie"

And let's have a look what articles they've published and how many citations they've received:

In [None]:
author_articles_query = """
MATCH (:Author {name: $authorName})<-[:AUTHOR]-(article)
RETURN article.title AS article, article.year AS year, size((article)<-[:CITED]-()) AS citations
ORDER BY citations DESC
LIMIT 20
"""

with driver.session() as session:
    result = session.run(author_articles_query, {"authorName": author_name})
pd.DataFrame([dict(record) for record in result])

Find the authors collaborators...

In [None]:
collaborations_query = """
MATCH (:Author {name: $authorName})<-[:AUTHOR]-(article)-[:AUTHOR]->(coauthor)
RETURN coauthor.name AS coauthor, count(*) AS collaborations
ORDER BY collaborations DESC
LIMIT 10
"""

with driver.session() as session:
    result = session.run(collaborations_query, {"authorName": author_name})
pd.DataFrame([dict(record) for record in result])

How would we suggest some future collaborators for this author? One way is by looking at the collaborators of their collaborators!

In [None]:
collaborations_query = """
MATCH (author:Author {name: $authorName})<-[:AUTHOR]-(article)-[:AUTHOR]->(coauthor),
      (coauthor)<-[:AUTHOR]-()-[:AUTHOR]->(coc)
WHERE not((coc)<-[:AUTHOR]-()-[:AUTHOR]->(author)) AND coc <> author      
RETURN coc.name AS coauthor, count(*) AS collaborations
ORDER BY collaborations DESC
LIMIT 10
"""

with driver.session() as session:
    result = session.run(collaborations_query, {"authorName": author_name})
pd.DataFrame([dict(record) for record in result])

Each of these people have collaborated with someone that Peter has worked with before, so they might be able to do an introduction.


## Exercise

* Can you find the top 20 suggested collaborators for 'Brian Fitzgerald' or 'Peter G. Neumann' instead of 'Tao Xie'?
* How many of these potential collaborators have collaborated with Brian's collaborators more than 3 times?
