## Method 1 (Docker)
We will be using:
- Docker for Windows 10 with WSL2 Backend (https://www.docker.com/)
- AWS Glue container (https://hub.docker.com/r/amazon/aws-glue-libs)

Steps:  
0. (Pre-Req) Install WSL2.
1. Download Docker and install.
2. Set WSL2 as backend and restart.
3. Launch WSL2 and run:
```bash
docker pull amazon/aws-glue-libs:glue_libs_1.0.0_image_01
``` 
    - tag=`glue_libs_1.0.0_image_01` is the latest as of 2021
4. Run and install the container:
```bash
docker run -itd -p 8888:8888 -p 4040:4040 -v %UserProfile%\.aws:/root/.aws:rw -v C:\Users\YOUR_USERNAME\Documents\GitHub:/home/jupyter/jupyter_default_dir --name glue_jupyter amazon/aws-glue-libs:glue_libs_1.0.0_image_01 /home/jupyter/jupyter_start.sh
```
    - `p` specifies the port (i.e local development will be at `http://localhost:8888` or `http://localhost:4040`
    - `-v` specifies the directory for your files
    - `--name` specifies the container name (though the container ID will be different)
5. Check to see the container is running with `docker ps`
6. Launch Jupyter Notebook with your browser and open a `PySpark` kernel.

## Method 2 (Preferred)
We will be using:
- Ubuntu 20.04 (WSL2) or MacOS.

Steps:  
0. (Pre-Req) Install WSL2 for Windows 10 users. MacOS users, please ensure your terminal is set to `bash`.
1. Setup your Python environment (i.e `pip3 install notebook pandas numpy ...`)
2. Install `Java` and `PySpark`:  
- Linux
```bash
# install java
sudo apt install openjdk-8-jdk -y
# add to path
echo 'JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"' | sudo tee -a /etc/environment
# apply to environment
source /etc/environment
# install spark
pip3 install pyspark
```
- MacOS
```bash
# install java
brew install openjdk@8
# add to path (earlier OSX defaults to bash while newer ones defaults to zsh)
echo 'export JAVA_HOME="$(/usr/libexec/java_home -v1.8)"' | tee -a $HOME/.bashrc $HOME/.zshrc
# reload java path
source $HOME/.bashrc ; source $HOME/.zshrc
# install spark
pip3 install pyspark
```
    
3. Launch Jupyter Notebook.
4. Start a Spark session:  

```python
from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()
```


If you can run the following code, then it works!

In [1]:
import findspark

findspark.init()

import pyspark

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

df = spark.sql("select 'spark' as hello ")

df.show()

+-----+
|hello|
+-----+
|spark|
+-----+



In [2]:
from pyspark.sql import SparkSession
# Create a spark session (which will run spark jobs)
spark = SparkSession.builder.getOrCreate()

#spark.conf.set('spark.ui.showConsoleProgress.enabled', True)
# set some configs - you'll learn about them later on
spark.conf.set('spark.sql.repl.eagerEval.enabled', True)

sdf = spark.read.csv('../data/sample.csv', header=True)

sdf

VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,pickup_longitude,pickup_latitude,RatecodeID,store_and_fwd_flag,dropoff_longitude,dropoff_latitude,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount
2,1/12/15 0:00,1/12/15 0:05,5,0.96,-73.97994232,40.76538086,1,N,-73.96630859,40.76308823,1,5.5,0.5,0.5,1.0,0.0,0.3,7.8
2,1/12/15 0:00,1/12/15 0:00,2,2.69,-73.97233582,40.76237869,1,N,-73.99362946,40.74599838,1,21.5,0.0,0.5,3.34,0.0,0.3,25.64
2,1/12/15 0:00,1/12/15 0:00,1,2.62,-73.96884918,40.76453018,1,N,-73.97454834,40.79164124,1,17.0,0.0,0.5,3.56,0.0,0.3,21.36
1,1/12/15 0:00,1/12/15 0:05,1,1.2,-73.99393463,40.74168396,1,N,-73.99766541,40.74746704,1,6.5,0.5,0.5,0.2,0.0,0.3,8.0
1,1/12/15 0:00,1/12/15 0:09,2,3.0,-73.98892212,40.72698975,1,N,-73.97559357,40.6968689,2,11.0,0.5,0.5,0.0,0.0,0.3,12.3
1,1/12/15 0:00,1/12/15 0:16,1,6.3,-73.97408295,40.76291275,1,N,-74.01280212,40.70220947,1,20.5,0.5,0.5,4.35,0.0,0.3,26.15
2,1/12/15 0:00,1/12/15 0:02,6,0.63,-73.96831512,40.75532913,1,N,-73.96208191,40.75891495,1,4.0,0.5,0.5,1.06,0.0,0.3,6.36
2,1/12/15 0:00,1/12/15 0:08,2,1.91,-73.99420929,40.74610138,1,N,-74.00424957,40.72180939,1,8.0,0.5,0.5,1.86,0.0,0.3,11.16
2,1/12/15 0:00,1/12/15 0:17,1,4.5,-74.00675964,40.7189064,1,N,-73.98969269,40.77285385,1,16.5,0.5,0.5,3.56,0.0,0.3,21.36
2,1/12/15 0:00,1/12/15 0:10,2,1.42,-73.99963379,40.73477173,1,N,-73.98906708,40.72312164,1,8.5,0.5,0.5,2.45,0.0,0.3,12.25
