Skip to content

gofr1/pyspark-learning

Repository files navigation

Installation

Download and install python

Download spark + hadoop from https://spark.apache.org/downloads.html
Unpack it somewhere

Download Java from https://www.java.com/download/ie_manual.jsp and install

Download winutils from https://github.com/steveloughran/winutils for particular version you downloaded from Spark site and put it into \bin\ folder of your Spark installation

Add environmental variables (better System) example given:

name value
HADOOP_HOME D:\spark-3.3.1-bin-hadoop3
JAVA_HOME D:\java
SPARK_HOME D:\spark-3.3.1-bin-hadoop3
PATH %PATH%;D:\spark-3.3.1-bin-hadoop3\bin;D:\java\bin;

Then run pip install pyspark

After that you can test if everything is installed and working:

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.3.1
      /_/

Using Scala version 2.12.15, Java HotSpot(TM) Client VM, 1.8.0_351
Branch HEAD
Compiled by user yumwang on 2022-10-15T09:47:01Z
Revision fbbcf9434ac070dd4ced4fb9efe32899c6db12a9
Url https://github.com/apache/spark
Type --help for more information.
PS C:\WINDOWS\system32>```

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published