# Individual Assignment: Wine!

There are 10 questions in this assignment. Some questions unlock others. If you can't answer a question, you can skip it and come back to it later, and if the question is locked, I'll provide a data sample for you to use, **at a cost of 0.5 points.**

The wine dataset is available in the file `wine.json`. This data contains information about wine reviews. It's a list of dictionaries, where each dictionary represents a wine review. The keys in the dictionary are:

* `points`: how many points the taster gave the wine on a scale of 1-100
* `title`: the title of the wine
* `description`: a description of the wine
* `taster_name`: the name of the taster
* `taster_twitter_handle`: the twitter handle of the taster
* `price`: the cost for a bottle of the wine
* `designation`: the vineyard within the winery where the grapes that made the wine are from
* `variety`: the type of grapes used to make the wine
* `region_1`: the province or state that the wine is from
* `region_2`: a more specific region within a wine growing area
* `province`: the province or state that the wine is from
* `country`: the country that the wine is from
* `winery`: the winery that made the wine



### Rules:
* For each question, print the answer in the cell below the question. You can use `print()` or just type the variable name.
* You can use any resources you like --including the internet, your notes, and the past notebooks-- except ChatGPT/Copilot/other AI writing tools. Using AI-based tools or asking other people for help will result in a 0 for the assignment, an immediate Fail in the course, and a report to the Dean of Students.
* You have 80 minutes to complete the assignment.
* You can submit the assignment as many times as you like, only the last submission will be graded.
* You can't work with other people on the assignment.

### 0. Load the data as a Python object and print the first item

In [5]:
import json
with open('wine-data-set.json', encoding='utf-8') as json_file:
    json_data = json.load(json_file)

json_data[0]

{'points': 89,
 'title': 'Caymus 1998 Cabernet Sauvignon (Napa Valley)',
 'description': 'Creamy black cherry aromas layered with fresh brussel sprouts and spicy arugula flavors of red plums and toasted oak.',
 'taster_name': None,
 'taster_twitter_handle': None,
 'price': 70,
 'designation': None,
 'variety': 'Cabernet Sauvignon',
 'region_1': 'Napa Valley',
 'region_2': 'Napa',
 'province': 'California',
 'country': 'US',
 'winery': 'Caymus'}

### 1. How many wine reviews are included in the dataset? (1 point)

In [6]:
len(json_data)

10000

### 2. Add a new {key:value} pair in each item in the list (1 point)

The new key should be called *length* and it should indicate the amount of words in the *description* value.

For example, the following description:
* "Very strong taste like apple and cinnamon"

should have a *length* value of **7** 

In [7]:
for elem in json_data:
    elem["length"] = len(elem["description"].split())

json_data[0]

{'points': 89,
 'title': 'Caymus 1998 Cabernet Sauvignon (Napa Valley)',
 'description': 'Creamy black cherry aromas layered with fresh brussel sprouts and spicy arugula flavors of red plums and toasted oak.',
 'taster_name': None,
 'taster_twitter_handle': None,
 'price': 70,
 'designation': None,
 'variety': 'Cabernet Sauvignon',
 'region_1': 'Napa Valley',
 'region_2': 'Napa',
 'province': 'California',
 'country': 'US',
 'winery': 'Caymus',
 'length': 19}

In [10]:
def calculate_length_description(review):
    review["length"] = len(review["description"].split())
    return review

new_reviews = [
    calculate_length_description(review)
    for review in json_data
]

new_reviews[0]

{'points': 89,
 'title': 'Caymus 1998 Cabernet Sauvignon (Napa Valley)',
 'description': 'Creamy black cherry aromas layered with fresh brussel sprouts and spicy arugula flavors of red plums and toasted oak.',
 'taster_name': None,
 'taster_twitter_handle': None,
 'price': 70,
 'designation': None,
 'variety': 'Cabernet Sauvignon',
 'region_1': 'Napa Valley',
 'region_2': 'Napa',
 'province': 'California',
 'country': 'US',
 'winery': 'Caymus',
 'length': 19}

In [15]:
map_reviews = map(calculate_length_description, json_data)

list(map_reviews)[0]

{'points': 89,
 'title': 'Caymus 1998 Cabernet Sauvignon (Napa Valley)',
 'description': 'Creamy black cherry aromas layered with fresh brussel sprouts and spicy arugula flavors of red plums and toasted oak.',
 'taster_name': None,
 'taster_twitter_handle': None,
 'price': 70,
 'designation': None,
 'variety': 'Cabernet Sauvignon',
 'region_1': 'Napa Valley',
 'region_2': 'Napa',
 'province': 'California',
 'country': 'US',
 'winery': 'Caymus',
 'length': 19}

### 3. How many different countries have their wines reviewed in the dataset? (1 point)

In [30]:
different_countries = [
    elem["country"] for elem in json_data
]

countries = set(different_countries)

len(countries) - 1 # because the country None still doesnt exist

countries

{'Argentina',
 'Australia',
 'Austria',
 'Brazil',
 'Bulgaria',
 'Canada',
 'Chile',
 'China',
 'Croatia',
 'Czech Republic',
 'England',
 'France',
 'Georgia',
 'Germany',
 'Greece',
 'Hungary',
 'India',
 'Israel',
 'Italy',
 'Lebanon',
 'Mexico',
 'Moldova',
 'New Zealand',
 None,
 'Portugal',
 'Romania',
 'Serbia',
 'Slovenia',
 'South Africa',
 'Spain',
 'Turkey',
 'US',
 'Ukraine',
 'Uruguay'}

In [20]:
different_countries = len(
    set(
        [
            elem["country"]
            for elem in json_data
            if elem["country"] != None
        ]
    )
)

different_countries

33

### 4. Build a dictionary with the following structure: (1 point)

{country: number of wines reviewed coming from that country}

In [25]:
country_count = {}

for review in json_data:
    country = review["country"]
    if country in country_count and country != None:
        country_count[country] += 1
    elif country != None:
        country_count[country] = 1

country_count

{'US': 4317,
 'France': 1716,
 'Chile': 362,
 'Italy': 1432,
 'Germany': 155,
 'New Zealand': 100,
 'South Africa': 111,
 'Argentina': 319,
 'Spain': 462,
 'Portugal': 396,
 'Austria': 287,
 'Greece': 32,
 'Australia': 165,
 'Canada': 17,
 'England': 11,
 'Hungary': 13,
 'Georgia': 7,
 'Mexico': 10,
 'Israel': 32,
 'Bulgaria': 13,
 'Brazil': 6,
 'Uruguay': 3,
 'Slovenia': 4,
 'Ukraine': 2,
 'Turkey': 4,
 'Croatia': 2,
 'Lebanon': 3,
 'Romania': 6,
 'Moldova': 2,
 'Czech Republic': 3,
 'Serbia': 1,
 'India': 1,
 'China': 1}

In [29]:
diff_countries = {}

for country in countries:
    diff_countries[country] = 0
    for element in json_data:
        if country == element["country"]:
            diff_countries[country] += 1


del diff_countries[]

{'Canada': 17,
 'Lebanon': 3,
 'Serbia': 1,
 'Slovenia': 4,
 'Mexico': 10,
 'Turkey': 4,
 'Portugal': 396,
 'Ukraine': 2,
 'New Zealand': 100,
 'Bulgaria': 13,
 'Germany': 155,
 'Croatia': 2,
 'Australia': 165,
 'China': 1,
 'England': 11,
 'India': 1,
 'Brazil': 6,
 'South Africa': 111,
 'Chile': 362,
 'Czech Republic': 3,
 'Israel': 32,
 'Austria': 287,
 'Argentina': 319,
 'US': 4317,
 'Romania': 6,
 'Italy': 1432,
 'France': 1716,
 'Georgia': 7,
 'Uruguay': 3,
 'Moldova': 2,
 'Hungary': 13,
 'Greece': 32,
 'Spain': 462}

### 5. Build a dictionary with the following structure (1 point)
{country: average points of wines coming from that country]

In [36]:
new_dict = {country: [0, 0] for country in countries}

for review in json_data:
    new_dict[review["country"]][0] += 1
    new_dict[review["country"]][1] += review["points"]

avg_dict = {}

for country in new_dict:
    average = new_dict[country][1] / new_dict[country][0]
    avg_dict[country] = average

avg_dict

{'Canada': 88.94117647058823,
 'Lebanon': 87.0,
 'Serbia': 86.0,
 'Slovenia': 87.0,
 'Mexico': 84.5,
 'Turkey': 87.5,
 'Portugal': 88.02525252525253,
 'Ukraine': 84.0,
 'New Zealand': 88.3,
 'Bulgaria': 88.23076923076923,
 'Germany': 89.83225806451613,
 'Croatia': 81.5,
 'Australia': 88.4969696969697,
 'China': 89.0,
 'England': 91.72727272727273,
 'India': 92.0,
 'Brazil': 85.66666666666667,
 'South Africa': 88.03603603603604,
 'Chile': 86.29558011049724,
 'Czech Republic': 88.0,
 'Israel': 88.71875,
 'Austria': 90.82229965156795,
 'Argentina': 86.68025078369907,
 'US': 88.54829742876998,
 'Romania': 86.83333333333333,
 'Italy': 88.27374301675978,
 'France': 88.91433566433567,
 'Georgia': 87.28571428571429,
 'Uruguay': 87.33333333333333,
 'Moldova': 87.0,
 'Hungary': 89.53846153846153,
 'Greece': 87.125,
 None: 89.2,
 'Spain': 87.16666666666667}

### 6. What's the province that produces the wines with the highest rating? (1 point)

Build a dictionary with the following structure:

{province: average points of wines coming from that province}

And then sort the dictionary by the average points, and print the province with the highest average points.

Hint: you can sort a dictionary by value using the following code:

```
sorted_dict = sorted(my_dict.items(), key=lambda x: x[1], reverse=True)
```

In [43]:
provinces = {element["province"] for element in json_data}

new_dict = {province: [0, 0] for province in provinces}

for review in json_data:
    new_dict[review["province"]][0] += 1
    new_dict[review["province"]][1] += review["points"]

avg_dict = {}

for province in new_dict:
    average = new_dict[province][1] / new_dict[province][0]
    avg_dict[province] = average

highest_rating = max(avg_dict.values())

[prov for prov in avg_dict if avg_dict[prov] >= highest_rating]

['Südburgenland', 'Madeira']

### 7. Update each wine's description by adding at the end of each description the following piece of text (1 point):

"This is a {designation} from {country} that scored {points} points"

In [46]:
json_data[0]

{'points': 89,
 'title': 'Caymus 1998 Cabernet Sauvignon (Napa Valley)',
 'description': 'Creamy black cherry aromas layered with fresh brussel sprouts and spicy arugula flavors of red plums and toasted oak.',
 'taster_name': None,
 'taster_twitter_handle': None,
 'price': 70,
 'designation': None,
 'variety': 'Cabernet Sauvignon',
 'region_1': 'Napa Valley',
 'region_2': 'Napa',
 'province': 'California',
 'country': 'US',
 'winery': 'Caymus',
 'length': 19}

In [49]:
for review in json_data:
    designation = review["designation"]
    country = review["country"]
    points = review["points"]
    extra_description = f"This is a {designation} from {country} that scored {points} points"

    review["description"] = review["description"] + extra_description

json_data[0]["description"]

'Creamy black cherry aromas layered with fresh brussel sprouts and spicy arugula flavors of red plums and toasted oak.This is a None from US that scored 89 points This is a None from US that scored 89 pointsThis is a None from US that scored 89 points'

### 8. What's the proportion of wine tasters that have a Twitter account? (1 point)

In [50]:
json_data[0]

{'points': 89,
 'title': 'Caymus 1998 Cabernet Sauvignon (Napa Valley)',
 'description': 'Creamy black cherry aromas layered with fresh brussel sprouts and spicy arugula flavors of red plums and toasted oak.This is a None from US that scored 89 points This is a None from US that scored 89 pointsThis is a None from US that scored 89 points',
 'taster_name': None,
 'taster_twitter_handle': None,
 'price': 70,
 'designation': None,
 'variety': 'Cabernet Sauvignon',
 'region_1': 'Napa Valley',
 'region_2': 'Napa',
 'province': 'California',
 'country': 'US',
 'winery': 'Caymus',
 'length': 19}

In [56]:
unique_twitter_handles = {
    review["taster_twitter_handle"] for review in json_data
    if review["taster_twitter_handle"] != None
}

unique_tasters = {
    review["taster_name"] for review in json_data
    if review["taster_name"] != None
}

proportion = len(unique_twitter_handles) / len(unique_tasters)

print(proportion)

0.7777777777777778


### Question 9 (1 point)

* Create a function called `affordable_wines` that receives the wines reviews list and a specific budget, and returns how many wines you can buy with that price. (0.5 points) 
* Create another function called `twitter_presence` that receives the wines reviews list and a wine name and returns True if the wine has a twitter handle for the taster, and False otherwise. (0.5 points)

Prove your functions with these examples:

* `affordable_wines(wines, 10)` should return 423 wines in that budget
* `twitter_presence(wines, "Nicosia 2013 Vulkà Bianco  (Etna)")` should return True, meaning there is a twitter handle for the taster of that wine

In [58]:
def affordable_wines(reviews, budget):
    counter = 0
    for review in reviews:
        if review["price"] != None and review["price"] <= budget:
            counter += 1
    return counter

affordable_wines(json_data, 10)

423

### Question 10 (1 point)

* Which is the most common variety of wine in the dataset? (0.3 points)
* Which is the most expensive wine in the dataset? (0.3 points)
* Which is the taster (other than `None`) that has reviewed the most wines? (0.4 points)