# Sprint 4 - Integrating Our Project With Existing Augur Schema

For sprint 4, we took our working prototype code and integrated it with the schema that Sean provided for us on Slack.  This schema's creation sql code is in a separate file in this repository.  We also moved all the necessary credentials to a config.json file for greater portability, and handled errors where an email address doesn't exist in the data being used, or when there is no GitLab account and thus the api call returns an empty JSON object.

Very important note: When trying to create the "contributors" table on the augur-community-reports server's augur_osshealth database, we received an error when attempting to access the schema "sean", as it appears to be owned by augur, while the only credentials we have access to are that of chaoss - This allows us to view what we need, but not make any changes.  For that reason, we created a local database with the same schema, imported the data from the "contributors" table that already exists at augur-community-reports, and ran this code against that.

## Library imports

Nothing crazy here. Import necessary libraries.  psycopg2 handles python-database calls, requests and json handle the api calls.

In [1]:
import psycopg2
import requests
import json

## Connecting to a database

First we need to connect to our database.  We create an instance called conn with the necessary details - this could be updated to read from a config file in the future.  Then psycopg2 requires we set up a cursor.  Then we execute a query to pull all the rows from the gitlab database.

#### The database connection is now handled via config file instead of hardcoded as it was before.  This should allow for greater portability and security, while avoiding something sloppier like leaving relevant details as [INSERT YOUR HOST HERE]

The config file also includes the auth token from gitlab necessary to make the API calls.  I have left my own personal code in this repository for the purposes of this project/demo, but for obvious reasons this token will not be valid forever.

In [2]:
with open("config.json") as config_file:
    config = json.load(config_file)



conn = psycopg2.connect(host=config['host'], port = config['port'], database=config['database'], user=config['user'], password=config['password'])
cur = conn.cursor()
cur.execute("SELECT contributors.cntrb_email, contributors.gitlab_username, contributors.gitlab_id FROM sean.contributors;")
tuples = cur.fetchall()

## Iteration and the API call

Next we set up a function APIcall to...you guessed it, make the api call.  The call still functions the way we showed in our prototype in sprint 2, however we've worked to avoid hardcoding anything here.  All print statements used for testing have been left in, just in case further testing is required.  They have been commented out however, as the augur sample data we have has something like 3700 tuples, and that seemed excessive to print to the console.  If there is no email address found in the contributor table somehow, we just skip that entry.  Similarly, if there exists no GitLab account for the provided email address, GitLab returns "[]", which provides errors when the program is expecting a full JSON object.  Our code now checks the length of the returned object and if it doesn't have a length, it just returns a null object where (id, username) would be.

In [None]:
def APIcall(email):
    baseurl = 'https://gitlab.com/api/v4/users?search='
    url = ''.join([baseurl, email])
    req = requests.get(url, headers = {'private-token': config['gitlab_token']})
    if (req.status_code == 401):
        return None;
    
    #print(req.json())
    #print("\n\n\n")
    j = req.json()
    if (len(j) == 0):
        #print("\nEmpty response!\n")
        data = (None, None)
        return data
    else:
        #print(email)
        #print(j[0]["name"])
        #print(j[0]["id"])
        #return j[0]["username"]
        data = (j[0]["id"], j[0]["username"])
        #return j[0]["id"]
        return data

print("Now updating table.  Depending on the size of your database, this may take a while.  Make a cup of tea.\n")
for i in tuples:
    x = i
    (email, user, labID) = x
    if (email == None):
        #print("\nEmail cannot be null!\n")
        pass
    elif (labID != None):
        #print("\nGitLabID found!\n")
        pass
    
    else:
        data = APIcall(email)
        if (data == None):
            print("GitLab returned status code 401 - Unauthorized.  Please make sure you are using a valid authorization token for GitLab.\n")
            print("The following line will tell you the update is complete.  This is a lie, as you didn't have permission to access GitLab's API.\n")
            break
        else:
            ID = data[0]
            user = data[1]
            #sql = "UPDATE gitLab SET gitlabid = " + ID + " WHERE cont_email = %s"
            #email = "'" + email + "'"
            #cur.execute(sql, email)
            cur.execute("UPDATE sean.contributors SET gitlab_username = %s, gitlab_id = %s WHERE cntrb_email = %s", (user, ID, email))
print("\nUpdate complete!  Enjoy your updated table, now featuring gitlab identification!")

Now updating table.  Depending on the size of your database, this may take a while.  Make a cup of tea.



###### To be honest, I wanted to draw a progress bar, or a spinning cursor here, but it's a little late to be diving into something new on what is basically a whim.  I also wasn't sure if it would be compatible with the output in Jupyter.

## Wrapping up and closing loose ends

After all that, we call conn.commit() to make our changes to the database, then we close our cursor and connection to the database.

In [None]:
conn.commit()
cur.close()
conn.close()