Launching iPython Examples
- Python 2.7
Install iPython Notebook
Download pip, a Python package manager (if it's not already installed):
$ sudo easy_install pip
Install iPython using pip install:
$ sudo pip install "ipython[notebook]"
This module uses requests and tabulate modules, both of which are available on pypi, the Python package index.
$ sudo pip install requests $ sudo pip install tabulate
Install and Launch H2O
To use H2O in Python, follow the instructions on the Install in Python tab after selecting the H2O version on the H2O Downloads page.
Launch H2O outside of the iPython notebook. You can do this in the top directory of your H2O build download. The version of H2O running must match the version of the H2O Python module for Python to connect to H2O. To access the H2O Web UI, go to https://localhost:54321 in your web browser.
Open Demos Notebook
Open the prostate_gbm.ipynb file. The notebook contains a demo that starts H2O, imports a prostate dataset into H2O, builds a GBM model, and predicts on the training set with the recently built model. Use Shift+Return to execute each cell and proceed to the next cell in the notebook .
$ ipython notebook prostate_gbm.ipynb
All demos are available here:
Running Python Examples
To set up your Python environment to run these examples, download and install H2O from Python using the instructions above.
- Predict Airline Delays - Uses historical airlines flight data to build multiple classification models to label any flight as either delayed or not delayed.
- Chicago Crime Rate - Uses weather and city statistics to compare arrest rates with the total crimes for each category.
- NYC Citibike Demand with Weather - Takes monthly bike ride data (~10 million rows) for the past two years to predict bike demand at each bike share station. Weather data is also incorporated to better predict bike usage.
- NYC Citibike Demand with Weather - smaller dataset - Takes monthly bike ride data (~1 million rows) for the past two years to predict bike demand at each bike share station. Weather data is also incorporated to better predict bike usage.
- Confusion Matrix & ROC - Creates a GBM and GLM model using the airlines dataset, including confusion matrices, ROCs, and scoring histories.
- Imputation - Substitutes values for missing data (imputes) the airlines dataset.
- Not Equal Factor - Try to slice the airlines dataset using !=
- Airline Confusion Matrices - Uses the airlines dataset to generate confusion matrices for algorithm performance analysis.
- Deep Learning for Prostate Cancer Analysis - Uses the prostate dataset to build a Deep Learning model.
- Airlines Prep - Condition the airline dataset by filtering out NAs if the departure delay in the input dataset is unknown. Anything longer than
minutesOfDelayWeTolerateis treated as delayed.
- GBM model using prostate dataset - Creates a GBM model using the prostate dataset.
- Balance Classes - Imports the airlines dataset, parses it, displays a summary, and runs GLM with a binomial link function.
- Clustering with KMeans - Demonstrates kmeans clusters and different diagnostics for selecting the number of clusters. Link to data is provided in the notebook.
- EEG Eye State - Uses EEG data collected from an Emotiv Neuroheadset and classifies eye state (open vs closed) with a GBM.
Used in NYC Citibike Demand with Weather
NYC Weather Data - Used in NYC Citibike Demand with Weather and NYC Citibike Demand with Weather - smaller dataset