# Getting Started with Affinity

This notebook gives useful examples on how to make use of the Affinity endpoints in the Signal AI API.

Full technical documentation of the Affinity endpoints are available here
https://api.signal-ai.com/docs#tag/Concept-Affinity


## What is Affinity?


The Signal platform processes a huge amount of unstructured data (text) every day, and it further helps parse the data by adding meaningful classifications (entities, topics, sentiment). By doing this, we allow our users for instance to access useful **relevant information**, for example articles about their brand. Indeed the Search endpoints of the Signal AI API allows you to do exactly that.

Taking the platform to the next level, we built **the Signal knowledge graph** - a powerful structure to represent connections between entities and topics and how they change over time. With this knowledge graph and through the **Affinity** capabilities in the API, you will be able to ask questions about the data and find unknowns.

To give an idea, here are examples of questions that you  will be able to answer with the Affinity endpoints: 

a. What are currently the top topics associated with my brand (in media)? Is Sustainability one of them?

b. How is the conversation changing around a certain company? Are there any risky topics associated with it?


## 0. Setting things up!

### 0.1 Pre-requesites

Please make sure that you have familiarised yourself with the Signal AI API using the [Getting Started notebook](getting_started.ipynb)

In particular you would need to be able to use the following endpoints:

1- Authentication: to be able to access the API

2- Discovery: to be able to search for entities and topics of interest

Note: run `pip install -r requirements.txt` to install the dependencies for this notebook

In [27]:
import backoff
import requests
import os
import pandas as pd

### 0.2 Check if authentication works! 

You will need a client_id and client_secret to gain access to the API. The code below will assume they have been set and the environment variables SIGNAL_API_CLIENT_ID and SIGNAL_API_CLIENT_SECRET respectively.

Using your credentials you can request a temporary access token from the API using the url:
https://api.signal-ai.com/auth/token

In [18]:
def authenticate(client_id, client_secret, url = "https://api.signal-ai.com"):
    """ obtain a temporary access token using user credentials """
    token_url = f'{url}/auth/token'
    payload = {
        "grant_type": "client_credentials",
        "client_id": client_id,
        "client_secret": client_secret
    }
    response = requests.post(token_url, data=payload)
    return response.json().get("access_token")

In [23]:
os.environ['SIGNAL_API_CLIENT_ID'] = 'api-CeZQJI41j1rcXlUs3bcMB'
os.environ['SIGNAL_API_CLIENT_SECRET'] = 'c9776f70-4eee-450d-b05c-f864738bfe73'

In [25]:
TEMP_ACCESS_TOKEN = authenticate(os.environ['SIGNAL_API_CLIENT_ID'], os.environ['SIGNAL_API_CLIENT_SECRET'])
if TEMP_ACCESS_TOKEN:
    print('Congratulations! You have an access token, it will last for 24 hours before you will need to reauthenticate by repeating this step')
else:
    print('Error: Perhaps the credentials are incorrect?')

Congratulations! You have an access token, it will last for 24 hours before you will need to reauthenticate by repeating this step


## Using Affinity

The affinity endpoints allows API users to leverage the power of the Signal knowledge graph that we construct from our content and update over time.

In the following we give examples on useful questions that can be answered with this knolwedge graph.

## 1. Proximity Examples

### 1.1 Entities most associated with a topic (at a certain time)

Example use case:
- Who are currently the main active organisations within a certain industry? Are there new players in the market?  Is there a rising competition? 



How: 
Using the Affinity end points, rank organisation by their proximity score to a topic.

Note that you would need to have the ID of the topic as an input. For that please consult [Getting Started notebook](getting_started.ipynb) to search for topics and obtain the ID

**Example**: top 20 organisations in the Wearables industry in December 2020

In [45]:
topic_id = 'f861d9df-5a65-41fe-8077-1714b838f1e1' ## topic ID for Wearables
response = requests.get(
    # call the entities endpoint
    'https://api.signal-ai.com/affinity/topics/{}/relationships'.format(topic_id),
    params={
        'rel-type':'proximity',
        # Limit the search to organisations
        'entity-type':'organisation',
        'start-month':'2020-12',
        'end-month': '2020-12',
        'size': 20
    },
    # include the access token in the header
    headers={
        "Authorization": f'Bearer {TEMP_ACCESS_TOKEN}',
        "Content-Type": "application/json",
    }
)

result = response.json()


In [46]:
df = pd.DataFrame.from_dict(result['relationships'])
df[['concept-name']]

Unnamed: 0,concept-name
0,Fitbit
1,Garmin
2,Polar Electro
3,TAG Heuer
4,Withings
5,Gorilla glass
6,Corning Inc.
7,SmartThings
8,Xiaomi Tech
9,Mobvoi


### 1.2 Topics most associated with an entity (at a certain time)

Example use case:
- What are currently the top topics associated with my brand (in media)? Is Sustainability one of them?

How: using the Affinity end points, rank topics by their proximity score to a certain organisation, also retrieve the scores to generate a visualisation


**Example**: top 10 topics associated to Tesla, Inc. in December 2020

In [50]:
entity_id = '11cab8df-4be1-470f-8f49-8f7f0863ec95' ##  ID for Tesla, Inc.
response = requests.get(
    # call the entities endpoint
    'https://api.signal-ai.com/affinity/entities/{}/relationships'.format(entity_id),
    params={
        'rel-type':'proximity',
        'start-month':'2020-12',
        'end-month': '2020-12',
        'size': 10
    },
    # include the access token in the header
    headers={
        "Authorization": f'Bearer {TEMP_ACCESS_TOKEN}',
        "Content-Type": "application/json",
    }
)

result = response.json()

In [51]:
df = pd.DataFrame.from_dict(result['relationships'])
df[['concept-name']]

Unnamed: 0,concept-name
0,Electric Vehicles (EV)
1,Futures of Transport
2,Cleantech
3,Automotive Industry
4,Autonomous Vehicles
5,Travel
6,Decarbonisation
7,Transport
8,Sustainability
9,Executive Compensation


### 1.3 Changes of associations over time

Example use case:
- How is the conversation changing around a certain company? Are they any emerging risks?

How: Using the Affinity endpoints, identify the top topics by proximity to a certain organisation over time



**Example**: How is the conversation changing about 'Wirecard' between Septembar and November 2019

In [54]:
entity_id = 'eae834b7-5b2f-430e-99ce-770786a971cf' ##  ID for Wirecard
response = requests.get(
    # call the entities endpoint
    'https://api.signal-ai.com/affinity/entities/{}/relationships'.format(entity_id),
    params={
        'rel-type':'proximity',
        'start-month':'2019-09',
        'end-month': '2019-11',
        'size': 100
    },
    # include the access token in the header
    headers={
        "Authorization": f'Bearer {TEMP_ACCESS_TOKEN}',
        "Content-Type": "application/json",
    }
)

result = response.json()

In [56]:
df = pd.DataFrame.from_dict(result['relationships'])
df[['concept-name']]

Unnamed: 0,concept-name
0,FinTech
1,SME Lending
2,Commodity Prices
3,Active & Passive Funds
4,Banking
...,...
275,Commercial Aviation
276,Antitrust Crime
277,Financial Statements Release
278,Mining


### 1.4 Comparing associations over time

Example use case:
- Over the past year, how do certain organisations compare with regards to their association with a certain topic? ]]how 

**Example**: How do how do these car manufacturers: Mercedez-Benz, General Motors, Tesla Inc. compare  with regards to their association to the ‘Product Recall’ topic?

In [59]:
topic_id = '734a342f-a053-4823-9bda-8abb687182ba' ##  ID for Product Recall
response = requests.get(
    # call the entities endpoint
    'https://api.signal-ai.com/affinity/topics/{}/relationships'.format(topic_id),
    params={
        'rel-type':'proximity',
        'start-month':'2020-01',
        'end-month': '2020-12',
        'entity-id': [
            '8d9ee12f-f4f2-3dc7-a8b9-673a07bd7747', # Mercedez-Benz 
            'a9cf01c5-751f-4fe5-a529-12e0d297cb63', # General Motors
            '11cab8df-4be1-470f-8f49-8f7f0863ec95', # Tesla
        ]
    },
    # include the access token in the header
    headers={
        "Authorization": f'Bearer {TEMP_ACCESS_TOKEN}',
        "Content-Type": "application/json",
    }
)

result = response.json()

In [60]:
result

{'id': '734a342f-a053-4823-9bda-8abb687182ba',
 'name': 'Product Recall',
 'start-month': '2020-01',
 'end-month': '2020-12',
 'relationships': [{'positive-sentiment-score': 0.007344507568413539,
   'sentiment-score': 0.3229383973318769,
   'month': '01',
   'year': '2020',
   'negative-sentiment-score': 0.0008397413846836597,
   'concept-name': 'General Motors',
   'rel-type': 'proximity',
   'concept-id': 'a9cf01c5-751f-4fe5-a529-12e0d297cb63',
   'proximity': 0.4235058893777329,
   'concept-type': 'entity'},
  {'positive-sentiment-score': 0.0038080338481681943,
   'sentiment-score': -0.37988671470879387,
   'month': '01',
   'year': '2020',
   'negative-sentiment-score': 0.006902103767606367,
   'concept-name': 'Mercedes-Benz',
   'rel-type': 'proximity',
   'concept-id': '8d9ee12f-f4f2-3dc7-a8b9-673a07bd7747',
   'proximity': 0.3865937360137157,
   'concept-type': 'entity'},
  {'positive-sentiment-score': 0.0060806682938589,
   'sentiment-score': -0.45423525341109494,
   'month': '