Skip to content

stefano-meschiari/spark_workshop

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
img
 
 
 
 
 
 
 
 
 
 

Spark Workshop

In this workshop you will learn how to:

  • use a notebook environment
  • write simple Apache Spark queries to filter and transform a dataset
  • do very simple outlier detection

The example dataset we will use is the Amazon Electronics reviews dataset:

Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering
R. He, J. McAuley
WWW, 2016 [http://jmcauley.ucsd.edu/data/amazon/]

Installation instructions (~15 minutes)

To make things smoother and avoid installation woes, I created a Docker container that will have all we need for this workshop pre-installed and separate from your system.

Please follow the instructions below to start up the container:

  1. If you don't have a Docker ID account yet, go to Docker Hub and create an account.

  2. Install Docker. On Mac OS X, you can download Docker for Mac for an easy-to-install desktop app.

  3. Open Docker and enter your Docker Hub credentials.

  4. Click on the Docker app icon and select "Preferences...". Under "Advanced", increase the memory available to containers to 8.0 GB.

prefs mem

  1. Clone this repository using git clone:
$ git clone https://github.com/stefano-meschiari/spark_workshop.git
  1. Open a terminal, navigate to the spark_workshop directory, and run:
$ sh run.sh

from a terminal to download a JSON dataset and start the Docker container.

  1. You should be able to navigate to http://0.0.0.0:8889 with your browser and see a Jupyter notebook instance. The password is spark.

  2. You can exit the Docker session using Ctrl+C.

Troubleshooting

If step 6 fails with error unauthorized: incorrect username or password., run

$ docker login

and enter your DockerHub credentials (username and password; username is not your email).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages