# Ingesting JSON Files in Python
## Introduction
Now that you have worked through the JSON intro lab it is time to try
 your hand parsing another file.  In this lab you are to open and parse
 the reviews.json data file.

The reviews.json file contains product reviews (fictitious) from a web
 site.  Each review has the following JSON format.
```
  {
    "userId": 1,
    "id": 1,
    "title": "sunt aut facere repellat provident occaecati excepturi optio reprehenderit",
    "body": "quia et suscipit\nsuscipit recusandae consequuntur
 expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum
 rerum est autem sunt rem eveniet architecto"
  }
```
You may/should model your approach after our JSON into solution.  When
 that is not enough, use the documentation to determine an appropriate
 course of action. to https://docs.python.org/3/library/json.html

## Tasks
1. Use the json module to deserialize the data
2. Determine which users have completed the most reviews?
3. Create a JSON file that contains the reviews for each user who
 has completed the maximum number of reviews.
4. Which review(s) (by Id) has the longest review in terms of the
 number of characters?
5. Are there any duplicate titles?  If so how many?

In [None]:
import json

# Read about the 'with in' context manager the CSV intro lab
#
with open('reviews.json', 'r') as read_file:
    # Since the root of the JSON data is an array
    # load() will return a python list
    reviews = json.load(read_file)

# Create a dictionary (empty) to map userId to number of completed reviews for that user
reviews_by_user = {}

# Create a list (empty) to keep track of review with most text
max_text_list = []
max_review_len = 0

# Lists are iterable so we can loop over the data returned open
# Increment completed reviews count for each user.
#
for review in reviews:
    # review will be a dictionary- make sure you understand why that is the case!
    # capture the review's userId
    userId = review["userId"]

    # Using dict.get(), check to see if we have already added this userId
    # to our dictionary.  If so, increment the existing count
    # If not, add the user setting their completed toto count to 1
    reviews_by_user[ userId ] = reviews_by_user.get(userId, 0) + 1

    # now, see if the character count of this review exceeds
    # the char count of the prior longest review.  Using a list
    # approach allows us to capture more than one review should
    # there be a tie in the counts.

    # NOTE: Python has a sizeable number of built in functions (see https://docs.python.org/3/library/functions.html)
    # Most of those functions are wrappers that are really calling so called "dunder" methods.
    # They are called dunder as slang for double underscore (ex: __len__).
    #
    # So, for example for ANY object to work with the "len" protocol it
    # simply must implement the __len__ method.  The same is true for all other functions that are wrappers for methods.
    # That is, they must comply with a "protocol" defined/required for an object to support that function.
    #
    # So len([1,2,3]) is 100% equivalent to [1,2,3].__len__()
    # Keep in mind, it is taboo (normally) to directly invoke dunder methods
    #
    # Want to understand the polymorphism philosophy of Python?  Read about "duck typing" (quack...)
    # https://en.wikipedia.org/wiki/Duck_typing
    # https://hackernoon.com/python-duck-typing-or-automatic-interfaces-73988ec9037f
    #
    body_len = len(review["body"])
    if body_len >= max_review_len:
        max_review_len = body_len
        max_text_list.append( review['id'] )

# Get the maximum number of complete reviews.
# values() returns the key-value pairs of the dictionary.
# you guessed it, the object returned is an iterable.
# max() essentially loops over the iterable to find the max value.
max_complete = max( reviews_by_user.values() )

# Create a list of all users who have completed the maximum number of reviews.
users = [u for u, v in reviews_by_user.items() if v == max_complete]

# Display the results
print(f"User(s) {users} completed {max_complete} reviews")

# Display the max review text counts
print(f'The longest review was {max_review_len}')
print(f'\tThis ocurred {len(max_text_list)} times.')
print(f'\tThis count was observed for the following review Ids - {max_text_list}')

In [None]:
# Task: Create a JSON file that contains the completed reviews for each of the users
# who completed the maximum number of reviews.

# Write filtered reviews to file.
# Eliminate the users from reviews who are not in the users list previously created
#
# List comprehension approach to find all reviews from users have the max number of reviews
filtered_reviews = [ review for review in reviews if review['userId'] in users ]

with open('filtered_reviews.json', 'w') as data_file:
    json.dump(filtered_reviews, data_file, indent=2)