Installation

Download and install python

Download spark + hadoop from https://spark.apache.org/downloads.html
Unpack it somewhere

Download Java from https://www.java.com/download/ie_manual.jsp and install

Download winutils from https://github.com/steveloughran/winutils for particular version you downloaded from Spark site and put it into \bin\ folder of your Spark installation

Add environmental variables (better System) example given:

name	value
HADOOP_HOME	D:\spark-3.3.1-bin-hadoop3
JAVA_HOME	D:\java
SPARK_HOME	D:\spark-3.3.1-bin-hadoop3
PATH	%PATH%;D:\spark-3.3.1-bin-hadoop3\bin;D:\java\bin;

Then run pip install pyspark

After that you can test if everything is installed and working:

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.3.1
      /_/

Using Scala version 2.12.15, Java HotSpot(TM) Client VM, 1.8.0_351
Branch HEAD
Compiled by user yumwang on 2022-10-15T09:47:01Z
Revision fbbcf9434ac070dd4ced4fb9efe32899c6db12a9
Url https://github.com/apache/spark
Type --help for more information.
PS C:\WINDOWS\system32>```

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
pyspark-test.ipynb		pyspark-test.ipynb
read-csv.ipynb		read-csv.ipynb
read-parquet.ipynb		read-parquet.ipynb
readme.md		readme.md
write-csv-parquet.ipynb		write-csv-parquet.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

About

Releases

Packages

Languages

gofr1/pyspark-learning

Folders and files

Latest commit

History

Repository files navigation

Installation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages