### Import necessary classes frorm the PySpark library

In [None]:
from pyspark import SparkConf, SparkContext
import collections


### Spark COnfiguration and Context

In [None]:
conf = SparkConf().setMaster("local").setAppName("RatingsHistogram")
sc = SparkContext(conf = conf)

* `SparkConf().setMaster("Local")`: Configures Spark to run locally with one thread
* `setAppName("RatingHistogram")`: Names the spark application "RatingHistogram"
* `sparkContext(conf = conf)`: initialize Spark context with the given configuration. This context is the main entry point for Spark functionality.

In [None]:
lines = sc.textFile("filepath") # Reads the input data from the specified file path into an RDD called 'lines'

### Extracting Ratings

In [None]:
ratings = lines.map(lambda x: x.split()[2])

* `lines.map(lambda x: x.split()[2])`: Transforms each line in the `lines` RDD by splitting it into a list of fields and extracting the third field (index 2), which is assumed to be the rating. The result is a new RDD called `ratings` where each element is a rating

Example of `u.data` file:
```
195 242 3 88125
```
* After splitting: `["195", "242", "3", "88125"]`
* Extracted rating: `"3"`

### Counting Ratings

In [None]:
result = ratings.countByValue()

* `ratings.countByValue()`: Counts the occurrences of each unique rating in the ratings RDD. The result is a dictionary where the keys are the ratings and the values are their respective counts.

### Sorting Results

In [None]:
sortedResults = collections.OrderedDict(sorted(result.items()))

* `sorted(result.items())`: Sorts the dictionary items (rating, count) by rating.
* `collections.OrderedDict(...)`: Creates an ordered dictionary from the sorted items, preserving the order.

In [None]:
for key, value in sortedResults.items():
    print("%s %i" % (key, value))