<a href="https://colab.research.google.com/github/mafux777/Alation_Article/blob/master/Alation_API_Training_July_2020.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Alation API Training Worksheet

2020-07-21 at 1600 UTC
Url: http://alation.zoom.us/j/296432240
Login: Join by Zoom or phone (US) +1 877-853-5257



We need to import just a small number of libraries to create an Alation Instance. You can use this Class to test against any official or unofficial API.

In [1]:
import pandas as pd
import os
import requests

import time
import json

import pprint
pp = pprint.PrettyPrinter(indent=4)

The class AlationInstance is created with a URL, username and password.

In [6]:
import urllib

# The AlationInstance class is a handle to an Alation server defined by a URL
# A server admin user name and password needs to be provided and all API actions
# will be run as that user
class AlationInstance():
    # The __init__ method is the constructor used for instantiating
    # email: the up to 30 chars user name, often the email, but for long emails could be cut off
    # password: could be the LDAP password, as well
    # verify: Requests verifies SSL certificates for HTTPS requests, just like a web browser.
    # By default, SSL verification is enabled, and Requests will throw a SSLError if it’s unable to verify the certificate
    def __init__(self, host, account, password, verify=True):
        self.host = host
        self.verify = verify
        self.account = account
        self.password = password
        self.token = self.get_token()
        self.headers = self.login(account, password)

    # The login method is used to obtain a session ID and relevant cookies
    # They are cached in the headers variable
    # account: the up to 30 chars user name, often the email, but for long emails could be cut off
    # password: could be the LDAP password, as well
    def login(self, account, password):
        URL = self.host + '/login/'

        s = requests.Session()
        s.get(URL, verify=self.verify)

        # get the cookie token
        csrftoken = s.cookies.get('csrftoken')

        # login with user name and password (and token)
        payload = {"csrfmiddlewaretoken": csrftoken, "ldap_user": account, "password": password}
        headers = {"Referer": URL}
        log_me("Logging in to {}".format(URL))
        r = s.post(URL, data=payload, verify=self.verify, headers=headers)

        # get the session ID and store it for all future API calls
        sessionid = s.cookies.get('sessionid')
        if not sessionid:
            log_me('No session ID, probably wrong user name / password')
        headers = {"X-CSRFToken": csrftoken,
                   "Cookie": f"csrftoken={csrftoken}; sessionid={sessionid}",
                   "Referer": URL
                   }

        return headers

    def get_token(self):
        change_token = "/api/v1/changeToken/"  # if you already have a token, use this url
        new_token = "/api/v1/getToken/"  # if you have never generated a token, use this url
        data = dict(username=self.account, password=self.password)
        response = requests.post(self.host + new_token, data=data)
        api_token = response.text
        if api_token == "EXISTING":
            response = requests.post(self.host + change_token, data=data)
            api_token = response.text
        return api_token

    # The generic_api_post method posts a request to Alation and if necessary checks the status
    def generic_api_post(self, api, params=None, body=None, official=False):
        if official:
            headers_final = dict(token=self.token)
        else:
            headers_final = self.headers
            headers_final['Referer'] = self.host + api
        r = requests.post(self.host + api, json=body, params=params, headers=headers_final)

        return r.content # for testing in July, no parsing attempted
        if r.status_code:
            r_parsed = r.json()
            # do we need to ask the job status API for help?
            if 'job_id' in r_parsed:
                params = dict(id=r_parsed['job_id'])
                url_job = "/api/v1/bulk_metadata/job/"
                # Let's wait for the job to finish
                while (True):
                    status = self.generic_api_get(api=url_job, params=params, official=True, verify=self.verify)
                    if status['status'] != 'running':
                        objects = status['result']
                        # if objects:
                        #     # for error in error_objects:
                        #     print(objects)
                        # else:
                        #     #print(status)
                        #     pass
                        break
                r_parsed = status
            return r_parsed
        else:
            return r.content

    # The generic_api_put method posts a request to Alation and if necessary checks the status
    def generic_api_put(self, api, params=None, body=None):
        r = requests.put(self.host + api, json=body, params=params, headers=self.headers, verify=self.verify)
        return r.content

    # The generic_api_patch method posts a request to Alation and if necessary checks the status
    def generic_api_patch(self, api, params=None, body=None):
        r = requests.patch(self.host + api, json=body, params=params, headers=self.headers, verify=self.verify)
        return r.content

    # The generic_api_get implements a REST get, with API token if official or Cookie if not.
    # If the callers sends header, it needs to contain API or cookie
    def generic_api_get(self, api, headers=None, params=None, official=False):
        if headers:
            # caller has supplied the headers
            headers_final = headers
        else:
            if official:
                headers_final = dict(token=self.token)
            else:
                headers_final = self.headers
                headers_final['Referer'] = self.host + api
        r = requests.get(self.host + api, headers=headers_final, params=params, verify=self.verify)
        if r.status_code in [200, 201]:
            try:
                return r.json()
            except:
                return r.content # for LogicalMetadata API which does not use standard JSON
        else:
            return r.content

    # The generic_api_get implements a REST get, with API token if official or Cookie if not.
    # If the callers sends header, it needs to contain API or cookie
    def raw_api_get(self, api, headers=None, params=None, official=False):
        if headers:
            # caller has supplied the headers
            headers_final = headers
        else:
            if official:
                headers_final = dict(token=self.token)
            else:
                headers_final = self.headers
                headers_final['Referer'] = self.host + api
        return requests.get(self.host + api, headers=headers_final, params=params, verify=self.verify)


Let's create our first AlationInstance object:

In [7]:
def log_me(text):
  print(text)

alation = AlationInstance('http://r7-sandbox.alationproserv.com',
                         'matthias@alation.com',
                         'REMOVED')

Logging in to http://r7-sandbox.alationproserv.com/login/


# How to deal with results bigger than 100 (or some number)

You can not too be sure how many results you get when you call an API. The safest way to deal with this issue is to iterate like this.

Note that "get" on a dict returns null if the key is not there. So the loop will break when there is no X-Next-Page..

In [19]:
next = '/integration/v1/article/'
while next:
  r = alation.raw_api_get(next, official=True)
  next = r.headers.get('X-Next-Page')
  for n, a in enumerate(r.json()):
    print(n, a.get('id'),a.get('title', 'No title'))

0 1 Getting Started for Analysts
1 406 C. Content Layout & Taxonomy
2 2 Quick Links for Analysts
3 3 Getting Started for Data Stewards
4 378 Restricted
5 4 Quick Links for Data Stewards
6 5 test
7 453 C3.2 Data Steward Privacy Policy Guidelines
8 405 A. Data Catalog Principles
9 360 Plan Access
10 374 Data Classification Policies
11 468 Role: Executive Sponsor
12 363 This is a Test Article
13 521 address
14 375 Public
15 529 Test
16 362 Article with course embedded
17 384 소향 - 홀로 아리랑
18 376 Secret
19 365 KPI -- Test
20 364 How To -- Test
21 380 chris test
22 408 A. Content Topics
23 382 Jon iframe test
24 412 B. Understanding UUIDs in Alation Analytics
25 413 A. Understanding Articles in Alation Analytics
26 526 Controlled Public
27 520 PII Policy for Use
28 513 C8.1 Cloud Migration Process Description
29 401 A. Start Here for <User Role>
30 402 E. DRAFT: Data Policy and Access Guidelines
31 403 D. DRAFT: Data Catalog Status
32 404 B. Data Catalog Organization & Guidelines
33 407 B. Co

# How to search for Articles by name

Refer to this: [Django Doc](https://docs.djangoproject.com/en/3.0/ref/models/querysets/#id4) to see how you can query, e.g "title starts with" or "contains"

In [33]:
#params = dict(title__startswith='Chapter')
params = dict(title__icontains='jon')
art = alation.raw_api_get('/integration/v1/article/', params=params, official=True).json()
# convert result to DataFrame
df = pd.DataFrame(art)
# Index of DataFrame is the Article ID
df.index = df.id
# Print id and title
df.title.sort_values()

id
532    %% (Jon Percent Test)
379                 Jon Test
382          Jon iframe test
Name: title, dtype: object

In [None]:
# http://r7-sandbox.alationproserv.com/search/?q=data%20governance&otype=article&ff=%7B%22custom_template%22%3A+33%7D
alation.generic_api_get('/download_search_result/?q=data%20governance&otype=article&ff=%7B%22custom_template%22%3A+33%7D')

{'job_id': 5781,
 'success': 'Your export is processing and a link will be sent to matthias@alation.com when complete. '}

On the server, we use `ls -l /opt/alation/alation/opt/alation/site/downloads/search_results/` to find out the file name. Then, we can download the file like this:

In [None]:
filename = '16_2020-07-20T08-20-34-678697.csv'
print(f"http://r7-sandbox.alationproserv.com/download/search_results/{filename}")

http://r7-sandbox.alationproserv.com/download/search_results/16_2020-07-20T08-20-34-678697.csv


Alternatively, we can use something like this: 

`rsync -av ec2-user@184.169.206.190:/opt/alation/alation/opt/alation/site/downloads/search_results/ search_results`

We can also construct a query string from scratch, like this:

In [None]:
params=dict(
q="Data Governance",
oytpe="Article",
ff='{"Custom Template":+33}'
)

alation.generic_api_get('/download_search_result/', params=params)


{'job_id': 5782,
 'success': 'Your export is processing and a link will be sent to matthias@alation.com when complete. '}