
### WordCount using Spark's Scala API

## Objective
This notebook demonstrates how to perform the **WordCount** operation using **Apache Spark with Scala**.

### Steps Covered
1. Create a SparkSession  
2. Load a text file into an RDD  
3. Split lines into words  
4. Map each word into (word, 1) pairs  
5. Reduce by key to count occurrences  
6. Collect and print the results

### Background
In Apache Spark, the **RDD (Resilient Distributed Dataset)** is a distributed collection of elements that can be processed in parallel.  
WordCount is one of the most common examples used to understand RDD transformations and actions.
    

### Step 1: Initialize SparkSession

In [None]:

import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder.getOrCreate()
    

### Step 2: Load text file into an RDD

In [None]:

val filePath = "samplefile.txt"
val linesRdd = spark.sparkContext.textFile(filePath)
    

### Step 3: Split lines into words

In [None]:

val wordsRdd = linesRdd.flatMap(x => x.split(" "))
    

### Step 4: Map words to (word, 1) pairs

In [None]:

val wordsMapRdd = wordsRdd.map(x => (x, 1))
    

### Step 5: Reduce by key to count occurrences

In [None]:

val wordCountRdd = wordsMapRdd.reduceByKey((x, y) => x + y)
    

### Step 6: Collect and print results

In [None]:

wordCountRdd.collect()
wordCountRdd.collect().foreach(println)
    