
# Compute Average of Numbers in an RDD using PySpark with flatMap

This notebook demonstrates how to calculate the average of a list of numbers using PySpark with the `flatMap` transformation. 
The `compute_average_flatmap` function transforms each RDD element into a sequence of `(sum, count)` pairs.

## Steps Involved
1. Parallelize the list into an RDD.
2. Use `flatMap` to convert each number into a `(1, (number, 1))` pair for aggregation.
3. Use `reduceByKey` to sum up the values and counts.
4. Calculate the average by dividing the total sum by the count.


In [None]:
from pyspark import SparkContext

def split_names(sc, full_names):
    # Parallelize the list into an RDD
    rdd = sc.parallelize(full_names)
    
    # Use flatMap to split each full name into first and last name
    # For each full name, flatMap will return a list of individual names
    split_rdd = rdd.flatMap(lambda name: name.split(" "))
    
    # Collect the results
    result = split_rdd.collect()
    return result

# Example usage
if __name__ == "__main__":
    sc = SparkContext.getOrCreate()
    names = ["John Doe", "Jane Smith", "Alice Johnson"]
    result = split_names(sc, names)
    print(result)
