# Testing all environment files

Before we start running other notebooks, lets run this notebook and check if we have successfully set-up the environment variables for the workshop.

In [3]:
import os
import boto3
import trino
from dotenv import find_dotenv, load_dotenv
from github import Github

load_dotenv(find_dotenv(), override=True)

True

In [4]:
ORG = os.getenv("GITHUB_ORG")
REPO = os.getenv("GITHUB_REPO")

CEPH_BUCKET_PREFIX = os.getenv("CEPH_BUCKET_PREFIX")
CEPH_BUCKET = os.getenv("CEPH_BUCKET")
CEPH_KEY_ID = os.getenv("CEPH_KEY_ID")
CEPH_SECRET_KEY = os.getenv("CEPH_SECRET_KEY")

S3_BUCKET = os.getenv("S3_BUCKET")
S3_ACCESS_KEY = os.getenv("AWS_ACCESS_KEY_ID")
S3_SECRET_KEY = os.getenv("AWS_SECRET_ACCESS_KEY")
S3_ENDPOINT_URL = os.getenv("S3_ENDPOINT_URL")

GITHUB_ACCESS_TOKEN = os.getenv("GITHUB_ACCESS_TOKEN")

TRINO_USER = os.getenv("TRINO_USER")
TRINO_PASSWD = os.getenv("TRINO_PASSWD")
TRINO_HOST = os.getenv("TRINO_HOST")
TRINO_PORT = os.getenv("TRINO_PORT")

CHOSEN_MODEL = os.getenv("CHOSEN_MODEL")
REMOTE = os.getenv("REMOTE")

## GitHub ORG/REPO

You are trying to extract the Pull Request data from: 

In [5]:
print(f"{ORG}/{REPO}")

operate-first/support


## S3 Credentials

In order to check our S3 credentials. Lets list out the files in the bucket using the credentials from the environment file.

In [6]:
s3 = boto3.client(
    "s3",
    endpoint_url=S3_ENDPOINT_URL,
    aws_access_key_id=S3_ACCESS_KEY,
    aws_secret_access_key=S3_SECRET_KEY,
)

for key in s3.list_objects(Bucket=S3_BUCKET)["Contents"]:
    print(key["Key"])

/srcopsmetrics/bot_knowledge/operate-first/support/PullRequest.json
andrewkjankowski/srcopsmetrics/bot_knowledge/operate-first/support/PullRequest.json
attendee/operate-first/support/features/operate-firstsupportFILETYPE.parquet
attendee/operate-first/support/features/operate-firstsupporttitlewords.parquet
attendee/operate-first/support/sql/operate-firstsupportprs.parquet
attendee/operate-first/support/test-data/X_test.parquet
attendee/operate-first/support/test-data/y_test.parquet
attendee/operate-first/support/ttm-model/model.joblib
attendee/operate-first/support/ttm_feature_engineered_dataset.parquet
attendee/srcopsmetrics/bot_knowledge/operate-first/support/PullRequest.json
attendee2/operate-first/support/features/operate-firstsupportFILETYPE.parquet
attendee2/operate-first/support/features/operate-firstsupporttitlewords.parquet
attendee2/operate-first/support/sql/operate-firstsupportprs.parquet
attendee2/operate-first/support/test-data/X_test.parquet
attendee2/operate-first/suppor

If you see the list of files, it means that the credentials you have provided is working properly. If not, please recheck the credentials and try again. 

## CEPH Credentials

## Compare the credentials of S3 and CEPH:

In [7]:
print(S3_BUCKET == CEPH_BUCKET)

True


In [8]:
print(S3_ACCESS_KEY == CEPH_KEY_ID)

True


In [9]:
print(S3_SECRET_KEY == CEPH_SECRET_KEY)

True


If your results for all the above three are `True`. You are good to move forward. If not, please make sure the credentials are correct and same.

## Trino Credentials

Here we are testing our trino credentials, if we can list out the table from the trino server. It means with the credentials we have provided, it is connecting to the trino client. 

In [11]:
# Create a Trino Client
conn = trino.dbapi.connect(
    auth=trino.auth.BasicAuthentication(
        os.environ["TRINO_USER"], os.environ["TRINO_PASSWD"]
    ),
    host=os.environ["TRINO_HOST"],
    port=int(os.environ["TRINO_PORT"]),
    http_scheme="https",
    verify=True,
)
cur = conn.cursor()

In [12]:
# Check if Trino connection was successful
cur.execute("show catalogs")
cur.fetchall()

[['aiops_tools_workshop'], ['jmx'], ['system']]

Check the list of tables. If the table named, `aiops_tools_workshop` is present in the above list. You are good to proceed to next section.

## GitHub access token

In order to check your GitHub Access token, we will be checking the rate with which the GitHub data can be extracted by the user. 

In [13]:
g = Github(GITHUB_ACCESS_TOKEN)
g.rate_limiting

(5000, 5000)

If you can see your rate, (max value is 5000/hour). Your GitHub credentials are working.

## Your Chosen Model

In [14]:
if CHOSEN_MODEL in ("rf", "egbc", "svc", "gnb"):
    print("Great! You can proceed forward.")
else:
    print(
        "Make sure your chosen model falls into the list of models,('rf', 'egbc', 'svc', 'gnb')"
    )

Great! You can proceed forward.


## REMOTE

Check if your have assigned REMOTE as 1. 

In [15]:
print(f"REMOTE : {REMOTE}")

REMOTE : 1


We choose `REMOTE=1`, since we will be downloading the data remotely from the S3_bucket. Make sure, it has assigned value `1` or `True` since we will be 

If everything code cell works well and does not give you any warnings or errors. You are ready to proceed to the next section.