## Spark 설치
- 주소 : https://spark.apache.org/downloads.html
- 주소 : https://www.apache.org/dyn/closer.lua/spark/spark-3.5.1/spark-3.5.1-bin-hadoop3.tgz

In [1]:
!apt-get install openjdk-8-jdk-headless

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  libxtst6 openjdk-8-jre-headless
Suggested packages:
  openjdk-8-demo openjdk-8-source libnss-mdns fonts-dejavu-extra fonts-nanum fonts-ipafont-gothic
  fonts-ipafont-mincho fonts-wqy-microhei fonts-wqy-zenhei fonts-indic
The following NEW packages will be installed:
  libxtst6 openjdk-8-jdk-headless openjdk-8-jre-headless
0 upgraded, 3 newly installed, 0 to remove and 45 not upgraded.
Need to get 39.7 MB of archives.
After this operation, 144 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/main amd64 libxtst6 amd64 2:1.2.3-1build4 [13.4 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 openjdk-8-jre-headless amd64 8u402-ga-2ubuntu1~22.04 [30.8 MB]
Get:3 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 openjdk-8-jdk-headless amd64 8u402-ga-2ubuntu1~22.04 [8,873 kB]

In [2]:
!wget -q https://dlcdn.apache.org/spark/spark-3.5.1/spark-3.5.1-bin-hadoop3.tgz
!tar -zxf spark-3.5.1-bin-hadoop3.tgz

In [3]:
!ls

sample_data  spark-3.5.1-bin-hadoop3  spark-3.5.1-bin-hadoop3.tgz


## 환경변수 설정
- 버전에 따라 변경한다.

In [4]:
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-3.5.1-bin-hadoop3"

## PySpark 설치

In [5]:
!pip install findspark -q

In [6]:
import findspark
findspark.init()

In [7]:
import pyspark
spark_version = pyspark.__version__
print("Apache Spark 버전 확인: " + spark_version)

Apache Spark 버전 확인: 3.5.1


## 테스트

In [8]:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('mulCamp28').config('spark.ui.port', '4050').getOrCreate()
spark

In [9]:
sample_text = """
Once upon a time, in a land far, far away, there lived a dragon who loved to collect treasures.
Every day, the dragon would fly across the kingdoms, searching for precious items to add to his collection.
"""

lines = spark.sparkContext.parallelize(sample_text.split("\n"))
wordCounts = lines.flatMap(lambda line: line.split(" ")) \
             .map(lambda word: (word, 1)) \
             .reduceByKey(lambda a, b: a + b)
for word, count in wordCounts.collect():
    print(f"{word}: {count}")

: 2
Once: 1
upon: 1
in: 1
far,: 1
far: 1
there: 1
lived: 1
dragon: 2
loved: 1
Every: 1
would: 1
fly: 1
kingdoms,: 1
precious: 1
items: 1
his: 1
collection.: 1
a: 3
time,: 1
land: 1
away,: 1
who: 1
to: 3
collect: 1
treasures.: 1
day,: 1
the: 2
across: 1
searching: 1
for: 1
add: 1


## Spark 종료

In [10]:
spark.stop()