# Introduction to connecting and Querrying the Augur DB

If you made to this point, welcome! :) This short tutorial will show how to connect to the database and how to do a simple querry. If you need the config file please email cdolfi@redhat.com

## Connect to your database

Until the Operate First enviroment can connect to the DB, use config file to access. Do not push config file to Github repo

In [7]:
import psycopg2
import pandas as pd 
import sqlalchemy as salc
import json
import os

with open("../../config.json") as config_file:
    config = json.load(config_file)

In [8]:
database_connection_string = 'postgresql+psycopg2://{}:{}@{}:{}/{}'.format(config['user'], config['password'], config['host'], config['port'], config['database'])

dbschema='augur_data'
engine = salc.create_engine(
    database_connection_string,
    connect_args={'options': '-csearch_path={}'.format(dbschema)})

### Retrieve Available Respositories

In [9]:
repolist = pd.DataFrame()
repo_query = salc.sql.text(f"""
             SET SCHEMA 'augur_data';
             SELECT a.rg_name,
                a.repo_group_id,
                b.repo_name,
                b.repo_id,
                b.forked_from,
                b.repo_archived
            FROM
                repo_groups a,
                repo b
            WHERE
                a.repo_group_id = b.repo_group_id
            ORDER BY
                rg_name,
                repo_name;
    """)
repolist = pd.read_sql(repo_query, con=engine)
display(repolist)
repolist.dtypes

Unnamed: 0,rg_name,repo_group_id,repo_name,repo_id,forked_from,repo_archived
0,Default Repo Group,1,augur,1,Parent not available,0
1,konveyor,101,,25437,Parent not available,0
2,konveyor,101,,25439,Parent not available,0
3,konveyor,101,,25440,Parent not available,0
4,konveyor,101,,25441,Parent not available,0
...,...,...,...,...,...,...
66,konveyor,101,,25489,Parent not available,0
67,konveyor,101,,25490,Parent not available,0
68,konveyor,101,,25491,Parent not available,0
69,konveyor,101,,25436,Parent not available,0


rg_name          object
repo_group_id     int64
repo_name        object
repo_id           int64
forked_from      object
repo_archived     int64
dtype: object

### Create a Simpler List for quickly Identifying repo_group_id's and repo_id's for other queries

In [10]:

repolist = pd.DataFrame()

repo_query = salc.sql.text(f"""
             SET SCHEMA 'augur_data';
             SELECT b.repo_id,
                a.repo_group_id,
                b.repo_name,
                a.rg_name
            FROM
                repo_groups a,
                repo b 
            WHERE
                a.repo_group_id = b.repo_group_id 
            ORDER BY
                rg_name,
                repo_name;   

    """)

repolist = pd.read_sql(repo_query, con=engine)

display(repolist)

repolist.dtypes

Unnamed: 0,repo_id,repo_group_id,repo_name,rg_name
0,1,1,augur,Default Repo Group
1,25437,101,,konveyor
2,25439,101,,konveyor
3,25440,101,,konveyor
4,25441,101,,konveyor
...,...,...,...,...
66,25489,101,,konveyor
67,25490,101,,konveyor
68,25491,101,,konveyor
69,25436,101,,konveyor


repo_id           int64
repo_group_id     int64
repo_name        object
rg_name          object
dtype: object