# Getting Started with Affinity

This notebook gives useful examples on how to make use of the Affinity endpoints in the Signal AI API.

Full technical documentation of the Affinity endpoints are available here
https://api.signal-ai.com/docs#tag/Concept-Affinity

## What is Affinity?


The Affinity API endpoints allow API users to leverage the power of the **Signal AI Knowledge Graph**, derived from billions of documents and updated regularly.

The Signal AI Knowledge Graph consists of:

- nodes which represent concepts such as entities and topics
- edges represent connections describing relationships between these concepts

Using the Affinity API, users can retrieve data for hundreds of thousands of entities and topics. The sole relationship type currently accessible via the Affinity API is `proximity`. The proximity between an entity (e.g. Tesla) and a topic (e.g. Product recall) is a measure of how they are related over a period of time (e.g. a certain month).  It reflects the likelihood of a salient mention of the entity with the topic.

The proximity relationships can be used for disovery and comparison use cases as we will show in this notebook

## 0. Setting things up!

### 0.1 Prerequisites

Please make sure that you have familiarised yourself with the Signal AI API using the [Getting Started with the Search API notebook](getting_started.ipynb)

In particular you would need to be able to use the following endpoints:

1- Authentication: to be able to access the API

2- Discovery: to be able to search for entities and topics of interest

Note: run `pip install -r requirements.txt` to install the dependencies for this notebook

In [39]:
import backoff
import requests
import os
import pandas as pd
import datetime

### 0.2 Check if authentication works! 

You will need a client_id and client_secret to gain access to the API. The code below will assume they have been set and the environment variables SIGNAL_API_CLIENT_ID and SIGNAL_API_CLIENT_SECRET respectively.

Using your credentials you can request a temporary access token from the API using the url:
https://api.signal-ai.com/auth/token

In [2]:
def authenticate(client_id, client_secret, url = "https://api.signal-ai.com"):
    """ obtain a temporary access token using user credentials """
    token_url = f'{url}/auth/token'
    payload = {
        "grant_type": "client_credentials",
        "client_id": client_id,
        "client_secret": client_secret
    }
    response = requests.post(token_url, data=payload)
    return response.json().get("access_token")

In [6]:
TEMP_ACCESS_TOKEN = authenticate(os.environ['SIGNAL_API_ID'], os.environ['SIGNAL_API_SECRET'])
if TEMP_ACCESS_TOKEN:
    print('Congratulations! You have an access token, it will last for 24 hours before you will need to reauthenticate by repeating this step')
else:
    print('Error: Perhaps the credentials are incorrect?')

Congratulations! You have an access token, it will last for 24 hours before you will need to reauthenticate by repeating this step


## 1. Proximity Examples

### 1.1. Discovery: entities most associated with a topic (at a certain time)

Example use case:
- Who are currently the main active organisations associated with a certain industry? 


How: 
Using the Affinity end points, rank organisations by their proximity score to a topic (representing the industry) in the last month.

Note that you would need to have the ID of the topic as an input. For that please consult [Getting Started notebook](getting_started.ipynb) to search for topics and obtain the ID.

In [29]:
## Use the topics end point as per the getting started notebook to search for the ids 
##  but we provide them here for convineince
topics = {
    'Wearables': 'f861d9df-5a65-41fe-8077-1714b838f1e1' 
}

**Example**: top 20 organisations in the Wearables industry in the last month

In [11]:
last_month = (pd.to_datetime("today").to_period('M') - 1).strftime('%Y-%m')
topic_id = topics['Wearables']
response = requests.post(
    # call the entities endpoint
    'https://api.signal-ai.com/affinity',
    json={
        'source-concept': {
            'id': topic_id
        },
        'relationship': {
            'type': 'proximity',
            'date': {
                'start': last_month,
                'end': last_month
            },
            'interval': 'month',
            'limit-per-interval': 20
        },
        'target-concepts': {
            'types': ['entity/organisation']
        }
    },
    # include the access token in the header
    headers={
        "Authorization": f'Bearer {TEMP_ACCESS_TOKEN}',
        "Content-Type": "application/json",
    }
)

result = response.json()


In [12]:
df = pd.json_normalize(result['results'])
df[['target-concept.name']]

Unnamed: 0,target-concept.name
0,Fitbit
1,Garmin
2,Omega SA
3,TAG Heuer
4,Xiaomi
5,Xiaomi Tech
6,Mobvoi
7,Apple Inc.
8,Huawei
9,The Swatch Group


### 1.2. Discovery: topics most associated with an entity (at a certain time)

Example use case:
- What are currently the my brand is most associated with?

How: using the Affinity end points, rank topics by their proximity score to a certain organisation, also retrieve the scores to generate a visualisation


**Example**: top 10 topics associated to `Tesla, Inc.` in the last month

In [33]:
## Use the entities end point as per the getting started notebook to search for the ids 
##  but we provide them here for convineince
entities = {
    'Tesla, Inc.': '11cab8df-4be1-470f-8f49-8f7f0863ec95',
    'General Motors': 'a9cf01c5-751f-4fe5-a529-12e0d297cb63', 
    'Mercedes-Benz': '8d9ee12f-f4f2-3dc7-a8b9-673a07bd7747', 
}

In [34]:
entity_name = 'Tesla, Inc.'
entity_id = entities[entity_name]
response = requests.post(
    # call the entities endpoint
    'https://api.signal-ai.com/affinity',
    json={
        'source-concept': {
            'id': entity_id
        },
        'relationship': {
            'type': 'proximity',
            'date': {
                'start': last_month,
                'end': last_month
            },
            'interval': 'month',
            'limit-per-interval': 10
        }
    },
    # include the access token in the header
    headers={
        "Authorization": f'Bearer {TEMP_ACCESS_TOKEN}',
        "Content-Type": "application/json",
    }
)

result = response.json()

In [35]:
df = pd.json_normalize(result['results'])
df[['target-concept.name', 'relationship.proximity-score']]

Unnamed: 0,target-concept.name,relationship.proximity-score
0,Electric Vehicles (EV),0.66093
1,Futures of Transport,0.656709
2,Cleantech,0.607097
3,Automotive Industry,0.60165
4,Lidar,0.584335
5,Touch Free Technology,0.52922
6,Corporate Controversy,0.492055
7,Accounting Irregularities,0.48346
8,Transport,0.470326
9,Autonomous Vehicles,0.466495


**Top Articles driving the associations**

we can also use the search endpoint to identify the top headlines around the coverage for these topics

In [49]:
## most relevant articles for the top 3 topics
## change depth to do more than 3
depth = 3

last_day_prev_month = pd.to_datetime("today").replace(day=1)- datetime.timedelta(days=1)
first_day_prev_month = last_day_prev_month.replace(day=1)

for i in range(depth):    
    query = {
        'where': {
            "published-at": {
                "gte": first_day_prev_month.strftime('%Y-%m-%d'), 
                "lte": last_day_prev_month.strftime('%Y-%m-%d')
            },
            'topics': {
                'id': {
                    'eq': df.iloc[i]['target-concept.id']
                },
            },
            'entities': {
                'id': {
                    'eq': entities['Tesla, Inc.'] 
                },
                'salient-only':True,
            }
        },
        'sort': [['score', 'desc']],
        'size': 3
    }
    response = requests.post(
        'https://api.signal-ai.com/search',
        json=query,
        headers={
            "Authorization": f'Bearer {TEMP_ACCESS_TOKEN}',
            "Content-Type": "application/json",
        },
    )
    documents = response.json()['documents']
    print('Top articles on {} and {} in {}'.format(
        entity_name,
        df.iloc[i]['target-concept.name'],
        df.iloc[i]['relationship.date']        
    ))
    for i, doc in enumerate(documents):
        print('{}: ({}) {} {}'.format(i+1, doc['source']['name'], doc['title'], doc.get('url')))
    print()

Top articles on Tesla, Inc. and Electric Vehicles (EV) in 2022-03
1: (SINA) Panasonic announced the start of mass production of Tesla 4680 batteries in fiscal year 2023 https://auto.sina.cn/zz/hy/2022-03-01/detail-imcwiwss3421611.d.html
2: (TMT Post) Musk's super factory "Hydrangea”, to whom is it thrown? https://www.tmtpost.com/6022700.html
3: (News Explorer) Honda S2000 Swaps Screaming VTEC For Deadly Tesla Electric Motor https://newsexplorer.net/honda-s2000-swaps-screaming-vtec-for-deadly-tesla-electric-motor-s1041239.html

Top articles on Tesla, Inc. and Futures of Transport in 2022-03
1: (Eetop) Tesla launches fully autonomous driving test version in Canada https://auto.163.com/22/0301/07/H1BUCJTA000884MM.html
2: (SINA) Panasonic announced the start of mass production of Tesla 4680 batteries in fiscal year 2023 https://auto.sina.cn/zz/hy/2022-03-01/detail-imcwiwss3421611.d.html
3: (TMT Post) Musk's super factory "Hydrangea”, to whom is it thrown? https://www.tmtpost.com/6022700.ht

### 1.3. Comparison: proximity of different entities to a certain topic

Example use case:
- What is my association to a certain like compared to my competitors? Whose association to X is stronger/weaker?

**Example**: How do these car manufacturers: `Mercedez-Benz`, `General Motors`, `Tesla, Inc.` compare  with regards to their association to the `Product Recall`topic in the last month?

In [51]:
topic_id = '734a342f-a053-4823-9bda-8abb687182ba' ##  ID for Product Recall
response = requests.post(
    # call the entities endpoint
    'https://api.signal-ai.com/affinity',
    json={
        'source-concept': {
            'id': topic_id
        },
        'relationship': {
            'type': 'proximity',
            'date': {
                'start': last_month,
                'end': last_month
            },
            'interval': 'month'
        },
        'target-concepts': {
            'ids': [
                entities['Mercedes-Benz'],
                entities['General Motors'],
                entities['Tesla, Inc.'],
            ]
        }
    },
    # include the access token in the header
    headers={
        "Authorization": f'Bearer {TEMP_ACCESS_TOKEN}',
        "Content-Type": "application/json",
    }
)

result = response.json()

In [52]:
df = pd.json_normalize(result['results'])

In [55]:
df.sort_values(by='relationship.proximity-score', ascending=False)[['target-concept.name','relationship.proximity-score']]

Unnamed: 0,target-concept.name,relationship.proximity-score
1,General Motors,0.527692
0,Mercedes-Benz,0.422225
2,"Tesla, Inc.",0.380587


The proximity scores retrieved from Affinity allow us to order these organisations by their association to the topics. However, it should be noted that currently it is not possible to interpret the differences as the scale for proximity scores is not linear. Alternatively one can use the metrics API to examine the main drivers for the proximity scores in a particular month: (number of salient co-mentions of the entity with the topic, total number of salient mentions of the entity and the total number of mentions of the topic)

### 1.4. Comparison: proximity of an entities to a topic over time

Example use case:
- How is my/their association with certain topics changing over time? Is my association now stronger/weaker

**Example**: How does `Tesla, Inc.`'s association to the `Product Recall` topic changing over the past year?

In [60]:
twelve_months_ago = (
    pd.to_datetime("today").replace(day=1)- datetime.timedelta(days=365)
).strftime('%Y-%m')
response = requests.post(
    # call the entities endpoint
    'https://api.signal-ai.com/affinity',
    json={
        'source-concept': {
            'id': topic_id
        },
        'relationship': {
            'type': 'proximity',
            'date': {
                'start': twelve_months_ago,
                'end': last_month
            },
            'interval': 'month'
        },
        'target-concepts': {
            'ids': [
                entities['Tesla, Inc.'],
            ]
        }
    },
    # include the access token in the header
    headers={
        "Authorization": f'Bearer {TEMP_ACCESS_TOKEN}',
        "Content-Type": "application/json",
    }
)

result = response.json()
df = pd.json_normalize(result['results'])

In [62]:
df[['relationship.date', 'relationship.proximity-score']]

Unnamed: 0,relationship.date,relationship.proximity-score
0,2021-04,0.266264
1,2021-05,0.302764
2,2021-06,0.635515
3,2021-07,0.262505
4,2021-08,0.23402
5,2021-09,0.233576
6,2021-10,0.375716
7,2021-11,0.479871
8,2021-12,0.619636
9,2022-01,0.46434


Like before, the proximity scores retrieved from Affinity allow us to compare the associations to the topic month-on-month. However, it should be noted that currently it is not possible to interpret the differences as the scale for proximity scores is not linear. Alternatively one can use the metrics API to examine the main drivers for the proximity scores in a particular month: (number of salient co-mentions of the entity with the topic, total number of salient mentions of the entity and the total number of mentions of the topic)