**Note:** This script needs a github token to be executed. If you set and environment variable named `GITHUB_TOKEN` before running docker-compose up, or if you manually passed this variable to your container, the script will use it. Otherwise you need to enter the variable value at line 10, replacing the literal `<YOUR_GITHUB_TOKEN>`

# Sending GitHub Events To Kafka

Sending events to Kafka means connecting to a Kafka broker and pushing data to one or more topics. 
In this case the broker lives within the same docker-componse environment, so we access via the container name: `broker:29002`. 
For the topic name we will be using `github_events` as we have other components that will be consuming events from those topics. We have configured those consumers to read JSON, so we are sending the data to Kafka already in JSON format to simplify this example.

Data is retrieved from the GitHub public events API, and will request new data every 10 seconds to avoid any api rate limits.

If you prefer to ingest data from outside a Jupyter Notebook, you have scripts in Python, Go, Rust, JAVA, and NodeJS available in [this same repository](https://github.com/javier/time-series-streaming-analytics-template?tab=readme-ov-file#ingestion). 

It would also be possible to ingest data directly into QuestDB, skipping Kafka altogether. That would simplify the deployment and would decrease latency (although for most use cases this would not be noticeable). On the other hand, having a message broker in frong of your database gives you more flexibility in your data analytics pipeline. There is a [Jupyter Notebook](./IoTEventsToQuestDB.ipynb) in this repository to ingest data directly into QuestDB.

In [3]:
from github import Github, GithubException
import requests
from kafka import KafkaProducer
import json
import time
from datetime import datetime
import os

# Configuration
GITHUB_TOKEN = os.getenv('GITHUB_TOKEN', '<YOUR_GITHUB_TOKEN>')  # Fetch GitHub token from environment variable
if not GITHUB_TOKEN:
    raise ValueError("GitHub token not found in environment variables.")

KAFKA_TOPIC = 'github_events'       # Kafka topic to produce messages to
KAFKA_BROKERS = ['broker:29092','broker-2:29092']
FETCH_INTERVAL = 10                 # Time interval between fetches in seconds
GITHUB_EVENTS_URL = 'https://api.github.com/events'


# Initialize GitHub client
g = Github(GITHUB_TOKEN, per_page=100)

# Initialize Kafka producer
producer = KafkaProducer(bootstrap_servers=KAFKA_BROKERS,
                         value_serializer=lambda m: json.dumps(m).encode('ascii'))

# Function to fetch and send public events
def fetch_and_send_events(etag=None):
    headers = {
        'Authorization': f'token {GITHUB_TOKEN}',
        'Accept': 'application/vnd.github.v3+json',
    }
    if etag:
        headers['If-None-Match'] = etag

    response = requests.get(GITHUB_EVENTS_URL, headers=headers)

    if response.status_code == 304:  # Not Modified
        print("No new events since last check.")
        return etag
    elif response.status_code != 200:
        raise GithubException(response.status_code, response.json())

    new_etag = response.headers.get('ETag')
    events = response.json()

    for event in events:
        # Uncomment the following lines if you want to send the event timestamp 
        # rather than allow QuestDB to use the server timestamp
        # created_at_datetime = datetime.strptime(event.get('created_at'), '%Y-%m-%dT%H:%M:%SZ')
        # created_at_microseconds = int(time.mktime(created_at_datetime.timetuple()) * 1e6)

        event_data = {
            'type': event.get('type'),
            'repo': event.get('repo', {}).get('name', 'None'),
            'actor': event.get('actor', {}).get('login', 'Unknown'),
            # Uncomment the following line if using created_at_microseconds
            # 'created_at': created_at_microseconds
        }
        producer.send(KAFKA_TOPIC, event_data)
        print(f"Sent event: {event.get('type')} from {event.get('repo', {}).get('name', 'None')}")

    return new_etag

# Main loop
etag = None
try:
    while True:
        rate_limit = g.get_rate_limit().core
        if rate_limit.remaining == 0:
            reset_time = rate_limit.reset.timestamp()
            sleep_time = max(reset_time - time.time(), 1)
            print(f"Rate limit exceeded. Sleeping for {sleep_time} seconds.")
            time.sleep(sleep_time)
        else:
            etag = fetch_and_send_events(etag)
            print(f"Sleeping for {FETCH_INTERVAL} seconds....")
            time.sleep(FETCH_INTERVAL)
except KeyboardInterrupt:
    print("Stopping...")

Sent event: PushEvent from spinalcom/spinal-env-viewer-plugin-graph-manager
Sent event: IssueCommentEvent from status-im/status-go
Sent event: PullRequestEvent from aws-aemilia-pdx/Github-PR-Commit-Integration-Test-DoNotTouch-GitHubAutoBuildPrPreviewCanaryTest-v1-prod-us-west-2
Sent event: ReleaseEvent from budka-tech/snip-common-go
Sent event: IssuesEvent from liberu-crm/crm-laravel
Sent event: PushEvent from brand22/d3
Sent event: PushEvent from go-cinch/argocd-app
Sent event: PushEvent from cguilloteau/cguilloteau.github.io
Sent event: PushEvent from Ayatisonkar/DSA
Sent event: WatchEvent from exo-explore/exo
Sent event: PushEvent from athombv/node-homey
Sent event: PushEvent from drymonsterblack462wj56/1an-Overwatch2n
Sent event: PushEvent from timherreijgers/LoggingVisualizer
Sent event: PushEvent from BurmeseTV/ios
Sent event: IssueCommentEvent from ITISFoundation/osparc-simcore
Sent event: WatchEvent from williamleif/GraphSAGE
Sent event: PullRequestEvent from homewizard/api-doc