# Setup your local laptop for the labs

This guide will walk you through how to setup a local setup to run the labs.

There are two options
- option A (recommended) : Running our training docker image on your computer
- option B : setup your own computer 

## Option A : Run training Docker 
This is the same environmnet you used for training.

Here is the [docker repository](https://hub.docker.com/repository/docker/elephantscale/es-training)

### Step A1 : Install docker 
Get it from [docker.com](https://docker.com)

### Step A2 : Download the docker image

```bash
   $   docker pull elephantscale/es-training:prod
   $   docker images 
   #   you should see 'elephantscale/es-training' image
```

### Step A3 : Download Data
- Download files from [here](https://s3.amazonaws.com/elephantscale-public/data/datasets.zip)
- unzip this bundle 
- let's say your data dir is in  `~/datasets`


### Step A4 : Download the launch script

- Download the [run-docker-es-training.sh](https://gist.github.com/sujee/71b04e9174d8ca1e017512ba377e401a) script
- Inspect the script and make sure the directories it is looking for are correct

### Step A5 : Lunch the docker

Run this script, from a working dir.

```bash
$  cd /to/your/project/dir

$  bash ./run-docker-es-training.sh   elephantscale/es-training:prod
```

### Step A6 : Things to Note
- This script mounts the `current working directory` on host to `~/dev` directory within docker image
- Also it mounts `~/datasets` directory on host to /data/ on docker

### Step A7 : Access the Image
- In browser go to [localhost](http://localhost)
- Default password is :   __bingobob123__


## Option B : Setting up your own machine

### Operating System
Setup is easier if you have either **MacOS or Linux**.   
If you are on Windows, download the [docker image](https://hub.docker.com/repository/docker/elephantscale/es-training) and do the setup in the sandbox.

## Software Needed
- Java version 8 or later
- Anaconda Python version 3.x
- A few more python packages
- Spark version 2.3 or latest
- our data files


### B1: Java 8
Download and install JDK (not JRE) v8 or later from [here](https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html).  
Verify you have the correct version by doing 
```
    java -version
```

### B2: Anaconda Python
Download and install Anaconda Python version 3.x from [here](https://www.anaconda.com/download/).

### B3: Install following add-on packages
Open a **new** terminal and run the following command

```bash
$   cd /path/to/where/you/unpacked/ml-labs
$   conda install --file=python-modules.txt
$   conda install -c conda-forge findspark
#$  conda install findspark
```


### B4: Download Spark
- Download latest Spark from [here](https://spark.apache.org/downloads.html)
- Unzip the downloaded zip file
- where Spark is unzipped is the SPARK_HOME

(Labs are tested with version 2.3 of Spark)

### B5: Download data files
- download our data files from [here](https://s3.amazonaws.com/elephantscale-public/data/datasets.zip)
- unzip this bundle
- our labs look for data in `/data` directory.  You can unzip it there, or create a link like  this
```bash
$   sudo  ln  -s   /path/to/data/dir   /data
```

### B6: Download labs / solutions
Unzip them anywhere

### B7: Edit  `run-jupyter.sh`
This file is located in the labs directory. 
Edit this file to match your environment.

```bash
# TODO : Edit the following lines   
export PATH=$HOME/anaconda3/bin:$PATH   
export SPARK_HOME=$HOME/dev/spark   
jupyter lab   
```



### 8: Run the labs
```bash
$   ./run-jupyter.sh
```

### 9: Open and run `testing123.ipynb` file 
This file is under `0-testing` directory.   
This file will check your setup.  
If there are no errors here, then you are good to go

### 10: Open `README.ipynb`
And practice!
