# GraphAI client

## Introduction

Welcome to `graphai-client`. This package is designed to streamline programmatic interaction with the [GraphAI API](https://graphai.epfl.ch/), taking care of logins, authentication tokens, retries, and asynchronous as well as synchronous endpoints.

This notebook will take you through a typical use case of the API through the client. Remember that in order to access any of the endpoints, you need to have a GraphAI account, which will give you access to some or all of the endpoint groups.

## Installation

### Installing the package

Default installation:

`pip install git+https://github.com/epflgraph/graphai-client.git`

Editable installation (pip version >= 21.3):

`pip install -e git+https://github.com/epflgraph/graphai-client.git`

### Preparing to run this notebook

In order to run this notebook, you need to have a GraphAI account. Create a `config` directory in the same directory as this notebook, create a `graphai-api.json` file within it, and set up its contents as follows:

```json
{
  "host": "https://graphai.epfl.ch",
  "port": 443,
  "user": "PUT_YOUR_USERNAME_HERE",
  "password": "PUT_YOUR_PASSWORD_HERE"
}
```

In [1]:
credentials_path = 'config/graphai-api.json'

In [2]:
from graphai_client.client import login

## Use

### Example 1: Translating text

**This section requires your account to have access to the `translation` endpoints. If you encounter a permission error, contact the administrator.**

Let's say you have some text in French that you want to translate to English. Before you get started, you need to log in using the `login` function:

In [3]:
login_info = login(graph_api_json=credentials_path)

Now that you're logged in, let's discuss the translation function. The `graphai_client.client_api.translation.translate_text` function handles translation for you. All you need to provide are the text, the source and target languages, and the login info you just obtained.

This function is part of the direct API functionalities, all of which are found in the `graphai_client.client_api` subpackage.

At the time of this notebook's creation, the supported languages are EN-FR, FR-EN, IT-EN, and DE-EN.

In [4]:
from graphai_client.client_api.translation import translate_text
french_text = "Elle vend des coquillages au bord de la mer."
translated_text = translate_text(text=french_text, source_language='fr', target_language='en', login_info=login_info)

[30m[2024-10-07 16:42:03] [GRAPHAI] [TRANSLATE] [PROCESSING] extracting en translation from fr text (44 characters)...[0m
[32m[2024-10-07 16:42:03] [GRAPHAI] [TRANSLATE] [SUCCESS] en translation has been extracted from fr text (44 characters) (already done in the past)[0m


In [5]:
translated_text

' She sells shellfish by the sea.'

Not exactly "she sells seashells by the seashore", but close enough!

Let's see another translation, this time from English.

In [6]:
english_text = "I could make so many requests to the API and I still would not reach the rate limit."
translated_text = translate_text(text=english_text, source_language='en', target_language='fr', login_info=login_info)

[30m[2024-10-07 16:42:03] [GRAPHAI] [TRANSLATE] [PROCESSING] extracting fr translation from en text (84 characters)...[0m
[32m[2024-10-07 16:42:03] [GRAPHAI] [TRANSLATE] [SUCCESS] fr translation has been extracted from en text (84 characters) (already done in the past)[0m


In [7]:
translated_text

" Je pourrais faire autant de demandes à l'API et je n'atteindrais toujours pas la limite de taux."

### Example 2: Processing a video

**This section requires your account to have access to the `video`, `voice`, and `image` endpoints. If you encounter a permission error, contact the administrator.**

Since video processing is the central functionality of GraphAI, `graphai-client` provides you with a singular function to directly and fully process a video: from downloading to slide and audio extraction and finally to OCR and audio transcription. This function is `graphai_client.client.process_video`.

This function receives several flags and inputs. Some of the most important ones are:
* `analyze_audio`: if set, the audio is extracted and transcribed. `True` by default.
* `analyze_slides`: if set, slides are extracted and OCR is performed on them. You need to provide a Google API token for this endpoint to work. `True` by default.
* `destination_languages`: list of languages to translate the results to. `['en', 'fr']` by default.

In [8]:
from graphai_client.client import process_video
url= 'http://api.cast.switch.ch/p/113/sp/11300/serveFlavor/entryId/0_00gdquzv/v/2/ev/3/flavorId/0_i0v49s5y/forceproxy/true/name/a.mp4'
# In order to enable slide analysis, provide your own Google Vision API token below.
# We cannot provide our own since this notebook is publicly available.
google_api_token = None
video_info = process_video(url, analyze_slides=google_api_token is not None, login_info=login_info, google_api_token=google_api_token)

[30m[2024-10-07 16:42:03] [GRAPHAI] [DOWNLOAD VIDEO] [PROCESSING] extracting file from http://api.cast.switch.ch/p/113/sp/11300/serveFlavor/entryId/0_00gdquzv/v/2/ev/3/flavorId/0_i0v49s5y/forceproxy/true/name/a.mp4...[0m
[32m[2024-10-07 16:42:11] [GRAPHAI] [DOWNLOAD VIDEO] [SUCCESS] file (0.8 MB) has been extracted from http://api.cast.switch.ch/p/113/sp/11300/serveFlavor/entryId/0_00gdquzv/v/2/ev/3/flavorId/0_i0v49s5y/forceproxy/true/name/a.mp4 (already done in the past) (all are active) (all are fingerprinted)[0m
[30m[2024-10-07 16:42:11] [GRAPHAI] [EXTRACT AUDIO] [PROCESSING] extracting audio from 169770835520421902463099.mp4...[0m
[32m[2024-10-07 16:42:12] [GRAPHAI] [EXTRACT AUDIO] [SUCCESS] audio has been extracted from 169770835520421902463099.mp4 (already done in the past) (all are active) (all are fingerprinted)[0m
[30m[2024-10-07 16:42:12] [GRAPHAI] [AUDIO FINGERPRINT] [PROCESSING] extracting fingerprint from 169770835520421902463099.mp4_audio.ogg...[0m
[32m[2024-10

Here we can see that the video was downloaded, its audio was extracted, and the subtitles were generated. The results also include a variety of information on the video and audio streams of the video file, plus the internal video and audio tokens. If you wish to further process the same file, you will have to use these tokens to refer to the file you have made prior requests for.

In [9]:
video_info

{'url': 'http://api.cast.switch.ch/p/113/sp/11300/serveFlavor/entryId/0_00gdquzv/v/2/ev/3/flavorId/0_i0v49s5y/forceproxy/true/name/a.mp4',
 'video_size': 881966,
 'video_token': '169770835520421902463099.mp4',
 'slides': None,
 'slides_language': None,
 'subtitles': [{'id': 0,
   'start': 0.0,
   'end': 5.0,
   'en': 'These subtitles have been generated automatically\nBacteria GFP Expression',
   'fr': 'Ces sous-titres ont été générés automatiquement\nExpression des bactéries GFP'},
  {'id': 1,
   'start': 5.0,
   'end': 17.0,
   'en': 'The data consists of bacteria images acquired in face contrast and fluorescence across three different conditions, A, B, and C, with five replicates per condition.',
   'fr': 'Les données consistent en des images de bactéries acquises en contraste facial et en fluorescence dans trois conditions différentes, A, B et C, avec cinq répétitions par condition.'},
  {'id': 2,
   'start': 17.0,
   'end': 24.0,
   'en': "The fluorescent channel represents a GFP-

### Example 3: Getting word embeddings for text

**This section requires your account to have access to the `translation` endpoints. If you encounter a permission error, contact the administrator.**

Another group of endpoints provided through the client is `embedding`, which allows you to embed a given text as a vector. Here's an example of how to use this functionality:

In [10]:
from graphai_client.client_api.embedding import embed_text
text_to_embed = "The Graph project at the Federal Institute of Technology in Lausanne"
result = embed_text(text_to_embed, login_info)

[30m[2024-10-07 16:42:12] [GRAPHAI] [EMBED] [PROCESSING] extracting embedding from text (68 characters)...[0m
[32m[2024-10-07 16:42:13] [GRAPHAI] [EMBED] [SUCCESS] embedding has been extracted from text (68 characters) (already done in the past)[0m


In [11]:
import numpy as np
result_vector = np.array(result)
result_vector.shape

(384,)

### Example 4: Concept detection

**This section requires your account to have access to the `text` endpoints. If you encounter a permission error, contact the administrator.**

Last but definitely not least is concept detection, one of the core functionalities of GraphAI. Let's use this functionality to extract the concepts that are found in the abstract of the paper that described Latent Dirichlet Allocation (the original topic model).

The function that we will use is `graphai_client.client_api.text.extract_concepts_from_text`.

In [13]:
abstract = """We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of
discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each
item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in
turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of
text modeling, the topic probabilities provide an explicit representation of a document. We present
efficient approximate inference techniques based on variational methods and an EM algorithm for
empirical Bayes parameter estimation. We report results in document modeling, text classification,
and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI
model."""

from graphai_client.client_api.text import extract_concepts_from_text
concepts = extract_concepts_from_text(abstract, login_info=login_info)

In [14]:
concepts

[{'concept_id': 871681,
  'concept_name': 'Mixture model',
  'search_score': 1.0,
  'levenshtein_score': 0.9224137931034483,
  'embedding_local_score': 0.9565356031170075,
  'embedding_global_score': 0.9543518748952157,
  'graph_score': 1.0,
  'ontology_local_score': 0.825,
  'ontology_global_score': 0.9003860961574224,
  'embedding_keywords_score': 0.9896800253874976,
  'graph_keywords_score': 1.0,
  'ontology_keywords_score': 1.0,
  'mixed_score': 0.9521506785812595},
 {'concept_id': 318439,
  'concept_name': 'Text mining',
  'search_score': 0.6468749999999999,
  'levenshtein_score': 0.8669541143082001,
  'embedding_local_score': 0.7541342146712362,
  'embedding_global_score': 0.7360360546032512,
  'graph_score': 0.2891711219958145,
  'ontology_local_score': 0.9240906057884843,
  'ontology_global_score': 0.8535298753647459,
  'embedding_keywords_score': 0.6868573114884935,
  'graph_keywords_score': 0.33322676418164565,
  'ontology_keywords_score': 0.7941865509506046,
  'mixed_score':