### Importing the wine dataset

In [1]:
import json

In [2]:
with open ("wine.json") as json_file:
    wine_data = json.load(json_file)

### Exercise 1: How many wine reviews are included in the dataset?

In [3]:
f" The wine dataset includes {len(wine_data)} wine reviews."

' The wine dataset includes 129971 wine reviews.'

##### Another way of doing it by using `index`!

In [4]:
f" The wine dataset includes {wine_data.index(wine_data[-1])+1} wine reviews."

' The wine dataset includes 129971 wine reviews.'

### Exercise 2: What's the length of the last review?
Add a new `length` key to each review in the list, containing the number of words in the `description` value.

In [5]:
for review in wine_data:
    review["length"] = len(review["description"].split())

##### To better visualize it, let's look at the last wine in the dataset. A new `length` key has been added to the wine's review: 27 words are in the "description" value!

In [6]:
wine_data[-1]

{'points': '90',
 'title': 'Domaine Schoffit 2012 Lieu-dit Harth Cuvée Caroline Gewurztraminer (Alsace)',
 'description': 'Big, rich and off-dry, this is powered by intense spiciness and rounded texture. Lychees dominate the fruit profile, giving an opulent feel to the aftertaste. Drink now.',
 'taster_name': 'Roger Voss',
 'taster_twitter_handle': '@vossroger',
 'price': 21,
 'designation': 'Lieu-dit Harth Cuvée Caroline',
 'variety': 'Gewürztraminer',
 'region_1': 'Alsace',
 'region_2': None,
 'province': 'Alsace',
 'country': 'France',
 'winery': 'Domaine Schoffit',
 'length': 27}

In [7]:
print("The length of the last review is "+ str(wine_data[-1]["length"]) + ".")

The length of the last review is 27.


### Exercise 3: How many different countries have wines reviewed in the dataset?

In [8]:
country_list = {countries["country"] for countries in wine_data}

different_countries = len(set(filter(None, country_list)))
f"{different_countries} different countries have wines reviewed in the dataset!"

'43 different countries have wines reviewed in the dataset!'

### Exercise 4: Build a dictionary with the following structure: 
    {country: # of wines reviewed from that country}

In [9]:
country_to_wine_reviews = {}

for reviews in wine_data:
    if reviews["country"] in country_to_wine_reviews:
        country_to_wine_reviews[reviews["country"]] += 1
    else:
        country_to_wine_reviews[reviews["country"]] = 1
country_to_wine_reviews

{'Italy': 19540,
 'Portugal': 5691,
 'US': 54504,
 'Spain': 6645,
 'France': 22093,
 'Germany': 2165,
 'Argentina': 3800,
 'Chile': 4472,
 'Australia': 2329,
 'Austria': 3345,
 'South Africa': 1401,
 'New Zealand': 1419,
 'Israel': 505,
 'Hungary': 146,
 'Greece': 466,
 'Romania': 120,
 'Mexico': 70,
 'Canada': 257,
 None: 63,
 'Turkey': 90,
 'Czech Republic': 12,
 'Slovenia': 87,
 'Luxembourg': 6,
 'Croatia': 73,
 'Georgia': 86,
 'Uruguay': 109,
 'England': 74,
 'Lebanon': 35,
 'Serbia': 12,
 'Brazil': 52,
 'Moldova': 59,
 'Morocco': 28,
 'Peru': 16,
 'India': 9,
 'Bulgaria': 141,
 'Cyprus': 11,
 'Armenia': 2,
 'Switzerland': 7,
 'Bosnia and Herzegovina': 2,
 'Ukraine': 14,
 'Slovakia': 1,
 'Macedonia': 12,
 'China': 1,
 'Egypt': 1}

### Exercise 5: Build a dictionary with the following structure: 
    {country: avg points of wines coming from that country}

In [10]:
countries_dictionary = {
    reviews["country"] for reviews in wine_data if reviews["country"] != None 
}

countries_to_avg_points = {}

for country in countries_dictionary:
    length_points = 0
    sum_points = 0
    for reviews in wine_data:
        if reviews["country"] == country:
            length_points += 1
            sum_points += int(reviews["points"])
            
    countries_to_avg_points[country] = format(sum_points/length_points,".2f")

countries_to_avg_points

{'Brazil': '84.67',
 'Bulgaria': '87.94',
 'Moldova': '87.20',
 'Argentina': '86.71',
 'Serbia': '87.50',
 'Egypt': '84.00',
 'Hungary': '89.19',
 'Lebanon': '87.69',
 'Cyprus': '87.18',
 'Morocco': '88.57',
 'Australia': '88.58',
 'Canada': '89.37',
 'Czech Republic': '87.25',
 'Mexico': '85.26',
 'Slovakia': '87.00',
 'England': '91.58',
 'Chile': '86.49',
 'Italy': '88.56',
 'Macedonia': '86.83',
 'France': '88.85',
 'Peru': '83.56',
 'Portugal': '88.25',
 'Luxembourg': '88.67',
 'South Africa': '88.06',
 'Israel': '88.47',
 'Greece': '87.28',
 'Spain': '87.29',
 'Ukraine': '84.07',
 'Slovenia': '88.07',
 'Romania': '86.40',
 'Georgia': '87.69',
 'Armenia': '87.50',
 'New Zealand': '88.30',
 'Germany': '89.85',
 'China': '89.00',
 'US': '88.56',
 'Bosnia and Herzegovina': '86.50',
 'Uruguay': '86.75',
 'Switzerland': '88.57',
 'Turkey': '88.09',
 'Croatia': '87.22',
 'Austria': '90.10',
 'India': '90.22'}

### Exercise 6: What's the country that produces the wines with the highest average rating?

In [11]:
maximum_country_average = max(countries_to_avg_points.values())

for key, value in countries_to_avg_points.items():
    if value == maximum_country_average:
        print(f"{key}'s rating is {value}, it is the country with the highest average rating!")

England's rating is 91.58, it is the country with the highest average rating!


### Exercise 7: What is the resulting description of the last review?

Update each wine's description by adding at the end of each description the following piece of text:
    "This is a {designation} from {country} that scored {points} points"

In [12]:
for wine in wine_data:
    wine['description'] += f" This is a {wine['designation']} from {wine['country']} that scored {wine['points']} points."

In [13]:
print ("The resulting description of the last review is ==>", wine_data[-1]["description"])

The resulting description of the last review is ==> Big, rich and off-dry, this is powered by intense spiciness and rounded texture. Lychees dominate the fruit profile, giving an opulent feel to the aftertaste. Drink now. This is a Lieu-dit Harth Cuvée Caroline from France that scored 90 points.


### Exercise 8: What's the proportion of tasters that have twitter accounts?

In [14]:
tasters_with_twitter=[]

tasters = []

for reviews in wine_data:
    if reviews["taster_twitter_handle"] != None:
        if reviews["taster_name"] != None:
            tasters_with_twitter.append(reviews["taster_name"])

for reviews in wine_data:
    if reviews["taster_name"] != None:
        tasters.append(reviews["taster_name"])
        
proportion = len(set(tasters_with_twitter)) / len(set(tasters)) * 100

print ("The proportion of tasters that have Twitter accounts is", format(proportion, ".2f") + "%")

The proportion of tasters that have Twitter accounts is 84.21%
