Skip to content

redhat-performance/data-dynamo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

data-dynamo

Dataset for the PerfConf 2024 Hackathon

Introduction

This repository contains the instructions and dataset for the PerfConf 2024 Hackathon Challenge #2 which has been given the name Data Dynamo.

Goals

The goal of this challenge is to provide an opportunity to apply Data Science, AI or Machine Learning based approaches to a performance related dataset. Participants are expected to understand the dataset, do some exploratory data anlysis to frame a few research questions that they'd like to explore in more detail and use a programming language or any of the tools such as Jupyter Notebooks, R Studio, Power Bi, Alteryx or plain old google sheets to derive insights into the data. Participants may choose to use pure data science methods or take it one notch up through AI and Machine Learning (Supervised/Unsupervised Learning).

Deliverables

The end deliverable at the Hackathon Presentations needs to be a slide deck and readout going over the Research Questions explored by the team, the interesting insights generated that helped answer those questions, overview of the tool and methogology used and any supporting visualizations to drive home the narrative. All supporting code and files (notebooks, scripts, source code, files) need to be uploaded on GitHub and the link referenced within the slide deck. If there are any file formats like Power BI reports (.pbix), please upload them to GDrive and include a link in the README of your GitHub project.

Dataset

The dataset includes performance results and measurements gathered from an open-source microservices system with injected performance anomalies. The entire dataset is lcoated within the data/trainticket directory. The dataset comprises of CSV files containing pre-processed execution traces. The traces originate from an open-source micro-services system called Train-Ticket. This project uses the microservices design pattern to build and deploy a cloud-native train ticket booking app using 41 different microservices. This project can also easily be deployed on Kubernetes using the included instructions in its README. Each CSV file in the trainticket directory corresponds to a distinct scenario involving specific performance anomalies injected into one or more Remote Procedure Calls (RPCs). In these CSV files, rows represent individual end-to-end requests, and columns provide details on the cumulative response time (in milliseconds) for particular RPCs within these requests. If an RPC is invoked multiple times within a single request, the response times from all invocations are combined. The Latency column indicates the response time for the root RPC, reflecting the end-to-end response time of a request. The anomaly column acts as a marker for performance anomalies, where a value of 0 indicates no issues, while values of 1 or 2 represent specific performance problems. Each scenario includes two distinct performance issue types (1 and 2), each affecting a different subset of RPCs. For a detailed explanation of how the dataset was created, please refer to Section 5.3 of Traini and Cortellessa, 2023. If you choose to use Supervised Learning on the dataset, feel free to split the available data into training and evaluation sets based on your needs.

The dataset is borrowed from the paper:

L. Traini and V. Cortellessa, DeLag: Using Multi-Objective Optimization to Enhance the Detection of Latency Degradation Patterns in Service-Based Systems, in IEEE Transactions on Software Engineering, vol. 49, no. 6, pp. 3554-3580, 1 June 2023, DOI: 10.1109/TSE.2023.3266041.

About

Dataset for the PerfConf 2024 Hackathon

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published