No description, website, or topics provided.
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
notebook
README.md

README.md

GPU Open Analytics Initiative

Accelerating the Scalable Data Science Environment with GPU-enabled Python

KDD'18 Hands-On Tutorial

Tuesday 8:30 am


Software / Hardware Requirements

The tutorial will leverage cloud resources that will provide the a common environment for all students.

Requirements:

  • Laptop with WiFi

    • We will be using the conference WiFi, please ensure that you can connect prior to the tutorial
  • Web browser - latest version of any will work, preference is towards Firefox or Chrome.

Tutorial Agenda

Introductions

  • Who we are

Getting Connected

  • Connect to Qwiklabs
  • Introduction notebook to validate

Introduction and Background

  • Big Data Ecosystem
  • Challenges in Big Data today
  • Apache Arrow
  • GPUs for compute
  • The GPU Open Analytics Initiative
  • The GPU Data Frame (GDF)
  • Python library for GDF (PyGDF)

Hands-on: Data Loading and Manipulation

  • Lab 1: Data Loading and Manipulation

    • Traditional interface through Pandas
    • Pandas to/from PyGDF
    • Column Function and Basic Transforms
    • Filtering
  • Student Assignment

Break

Hands-on: Data Science and Machine Learning

  • Lab 3: Classification using XGBoost
    • Familarize with IoT cyber network data
    • Data ingest and feature extraction
    • Time binning and preparation for classifiation
    • Building XGBoost model
    • Evaluating the model via ROC curves and AUC
    • Student Assignment:
      • Investigation into other time binnings, aggregations, and XGBoost parameters
      • Using additional features (quantitative and categorical) in the data to build better models
      • Moving beyond connection logs to other log types (e.g., DNS) and building models

Break

Wrap-up and Conclusion-

  • Roadmap
  • Scaling out to multi-GPU and multi-node
  • Partner Activities
  • Comclusion