Skip to content
master
Switch branches/tags
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 

GPU Open Analytics Initiative

Accelerating the Scalable Data Science Environment with GPU-enabled Python

KDD'18 Hands-On Tutorial

Tuesday 8:30 am


Software / Hardware Requirements

The tutorial will leverage cloud resources that will provide the a common environment for all students.

Requirements:

  • Laptop with WiFi

    • We will be using the conference WiFi, please ensure that you can connect prior to the tutorial
  • Web browser - latest version of any will work, preference is towards Firefox or Chrome.

Tutorial Agenda

Introductions

  • Who we are

Getting Connected

  • Connect to Qwiklabs
  • Introduction notebook to validate

Introduction and Background

  • Big Data Ecosystem
  • Challenges in Big Data today
  • Apache Arrow
  • GPUs for compute
  • The GPU Open Analytics Initiative
  • The GPU Data Frame (GDF)
  • Python library for GDF (PyGDF)

Hands-on: Data Loading and Manipulation

  • Lab 1: Data Loading and Manipulation

    • Traditional interface through Pandas
    • Pandas to/from PyGDF
    • Column Function and Basic Transforms
    • Filtering
  • Student Assignment

Break

Hands-on: Data Science and Machine Learning

  • Lab 3: Classification using XGBoost
    • Familarize with IoT cyber network data
    • Data ingest and feature extraction
    • Time binning and preparation for classifiation
    • Building XGBoost model
    • Evaluating the model via ROC curves and AUC
    • Student Assignment:
      • Investigation into other time binnings, aggregations, and XGBoost parameters
      • Using additional features (quantitative and categorical) in the data to build better models
      • Moving beyond connection logs to other log types (e.g., DNS) and building models

Break

Wrap-up and Conclusion-

  • Roadmap
  • Scaling out to multi-GPU and multi-node
  • Partner Activities
  • Comclusion

About

No description, website, or topics provided.

Resources

Releases

No releases published

Packages

No packages published