# Data collection from opf-datacatlog bucket

The following notebook serves the purpose of giving a brief overview of meetup attendees data saved in a opf-datacatlog bucket.

In [10]:
import os
import pandas as pd
import boto3
from dotenv import find_dotenv, load_dotenv

load_dotenv(find_dotenv())

True

## Accessing the environment keys:

The credentials for accessing the bucket can be found [here](https://www.operate-first.cloud/apps/content/odh/trino/access_public_bucket.html). 

In [11]:
s3_endpoint_url = os.getenv("S3_ENDPOINT")
s3_access_key = os.getenv("S3_ACCESS_KEY")
s3_secret_key = os.getenv("S3_SECRET_KEY")
s3_bucket = os.getenv("S3_BUCKET")

We will be using boto3 client, s3, to extract the content from the bucket.

In [12]:
s3 = boto3.client(
    "s3",
    endpoint_url=s3_endpoint_url,
    aws_access_key_id=s3_access_key,
    aws_secret_access_key=s3_secret_key,
)

We can use this option to view all the contents in the bucket.

In [13]:
"""
for key in s3.list_objects(Bucket=s3_bucket)['Contents']:
    print(key['Key'])
"""

"\nfor key in s3.list_objects(Bucket=s3_bucket)['Contents']:\n    print(key['Key'])\n"

The meetup attendees data are stored in `opf-datacatlog` bucket in this folder: `open-services-group/operate-first-data-science-community/meetup_attendees`. In order to extract them, we can follow the steps as shown below:

In [14]:
obj = s3.get_object(
    Bucket=s3_bucket,
    Key="open-services-group/operate-first-data-science-community/meetup_attendees/obfuscated_2021-12-14_10_58_EYBYEGJGJI_Attendance_Report-Attendees.csv",
)
df = pd.read_csv(obj["Body"])
df.drop(["Unnamed: 0"], 1, inplace=True)
df.head()

Unnamed: 0,Date,Name#,Email,Duration,Time_joined,Time_exited
0,2021-12-14,513fa0b6615335d0bed0a1dcdb40e1e515c07619608650...,redhat.com,30 min,11:02 AM,11:32 AM
1,2021-12-14,c0600eaa67ce4b171cb130cfa4914cc54d163384012e93...,redhat.com,30 min,11:02 AM,11:32 AM
2,2021-12-14,864042274f41a29a5b7fb860c7ec9fb3c1e6ec0e994731...,redhat.com,30 min,11:02 AM,11:32 AM
3,2021-12-14,c281759f750aa8f477d025643494df52e531935a0bfbb1...,redhat.com,31 min,11:02 AM,11:33 AM
4,2021-12-14,76ecf41e4da0c8629abadab0ea6751c7a85f0fe632a96d...,redhat.com,30 min,11:02 AM,11:32 AM


Here we have a sample of meetup-attendees obfuscated data. It contains obfuscated name, date of meetup, time_joined, time_excited, and respective domain name of email address. We will use these information in order to do an exploratory data analysis and gather useful insights from the data.