# News Relevance with Knowledge Graphs
In this section, I will demonstrate how we can discover details of emerging crypto projects by building a graph. This will be our knowledge graph. A knowledge graph organizes information into an ontology. 

It turns out many real world system can be modelled as networks ... By choosing this model we can leverage many graph algorithms and perform interesting tasks..

In this case, We want / will monitor for specially defined relationships (predicates) that connect a given/identified list of relevant crypto projects (subjects) and their other relevant, related crypto projects, companies, people involved, news articles (also subjects).

We will perform the following tasks to achieve the above described objective

1. Build the graph through Triples extraction
2. Extract infomration from news
3. Add news data to the knowledge graph
4. Make inferences

Use Qwikdata pacakge to query for information about entities from Wikidata.


## 1. Imports

In [1]:
!pip install qwikidata

Collecting qwikidata
  Downloading qwikidata-0.4.0-py3-none-any.whl (20 kB)
Installing collected packages: qwikidata
Successfully installed qwikidata-0.4.0


## Triples Extraction from WikiData
Given a list of [SUBJECTS] and a list of predicates we are interested in, let's write a function that will extract the list of (subject, predicate, object) triples.

In [15]:
#Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from qwikidata.entity import WikidataItem, WikidataLexeme, WikidataProperty
from qwikidata.linked_data_interface import get_entity_dict_from_api

# Progress bar
from tqdm.notebook import tqdm

Lets write a function that takes in the list of companies we want to consider as well as the relationships (predicates) these companies have with other entities. The function should retunr the (subject, predicate, object) triples, where the subject is the company. For example, we would like:

In [None]:
# List of Companies/Financial Entities
KG_financial = ["Amazon Web Services", "Bitcoin", "Bank of Montreal", "Ethereum", "Adastra Corporation", "Polygon", "Goldman Sachs", "Bloomberg", "Google", "TD", "Apple", "Tesla", "CME"]


In [None]:
q42_dict = get_entity_dict_from_api("Q42")
q42 = WikidataItem(q42_dict)

In [23]:
def get_triples_from_wikidata(companies_list, predicate_list):
    """
    Inputs: subject_list - a list of companies, Q id.
            predicate_list - a list of predicates, P id
    Outputs: (company, predicate, object) triples
    """
    subjects, predicates, objects = [], [], []
    # For each Company, get its  WikidataItem Id
    for Q_id in tqdm(companies_list):
        print(companies_list)
        # WikidataItem - https://qwikidata.readthedocs.io/en/stable/entity.html?highlight=WikidataItem
        Q_company = WikidataItem(get_entity_dict_from_api(Q_id))
        for predicate in predicate_list:
            # add note here
            for claim in Q_company.get_claim_group(predicate):
                
                object_id = claim.mainsnak.datavalue.value["id"]
                object_entity = WikidataItem(get_entity_dict_from_api(object_id))
                
                subjects.append(Q_company.get_label())
                # add notes here
                predicate_property = WikidataProperty(get_entity_dict_from_api(predicate))
                predicates.append(predicate_property)
                
                objects.append(object_entity.get_label())
    return subjects, predicates, objects
    

### Companies List

In [24]:
# Figure out how To generate these from KG_financials. Insert conversion query here
companies_list = ["Q95", "Q2283", "Q193326", "Q744149", "Q1418", "Q312", "Q41187", "Q20716", "Q248", "Q37156", "Q17077936", "Q355", "Q23548", "Q3884", "Q66", "Q4781944", "Q478214",
                  "Q907311"]
    

### Predicate List
The predicate list holds a list of relationships we would like to capture in our knowledge graph. P31 for example is, "instance of", which could be used in a triple: (Google, instance of, technology company).

In [25]:
predicate_list = ["P31", "P17", "P361", "P452", "P112", "P169", "P463", "P355", "P1830", "P1056"]

In [None]:
subjects, predicates, objects = get_triples_from_wikidata(companies_list, predicate_list)
print(subjects)

  0%|          | 0/18 [00:00<?, ?it/s]

['Q95', 'Q2283', 'Q193326', 'Q744149', 'Q1418', 'Q312', 'Q41187', 'Q20716', 'Q248', 'Q37156', 'Q17077936', 'Q355', 'Q23548', 'Q3884', 'Q66', 'Q4781944', 'Q478214', 'Q907311']
['Q95', 'Q2283', 'Q193326', 'Q744149', 'Q1418', 'Q312', 'Q41187', 'Q20716', 'Q248', 'Q37156', 'Q17077936', 'Q355', 'Q23548', 'Q3884', 'Q66', 'Q4781944', 'Q478214', 'Q907311']
['Q95', 'Q2283', 'Q193326', 'Q744149', 'Q1418', 'Q312', 'Q41187', 'Q20716', 'Q248', 'Q37156', 'Q17077936', 'Q355', 'Q23548', 'Q3884', 'Q66', 'Q4781944', 'Q478214', 'Q907311']
['Q95', 'Q2283', 'Q193326', 'Q744149', 'Q1418', 'Q312', 'Q41187', 'Q20716', 'Q248', 'Q37156', 'Q17077936', 'Q355', 'Q23548', 'Q3884', 'Q66', 'Q4781944', 'Q478214', 'Q907311']
['Q95', 'Q2283', 'Q193326', 'Q744149', 'Q1418', 'Q312', 'Q41187', 'Q20716', 'Q248', 'Q37156', 'Q17077936', 'Q355', 'Q23548', 'Q3884', 'Q66', 'Q4781944', 'Q478214', 'Q907311']
['Q95', 'Q2283', 'Q193326', 'Q744149', 'Q1418', 'Q312', 'Q41187', 'Q20716', 'Q248', 'Q37156', 'Q17077936', 'Q355', 'Q23548', 