# Practice your skills loading data from a CSV file
In this exercise, you will load a CSV file and you'll perform some operations on it to extract data. If you know how to do this with the Pandas library, you can use that. If not, you can use the csv library with the ready-to-use example this notebook provides.

In [1]:
from csv import DictReader

# Open the CSV file and read it into a list of dictionaries and ignore unicode errors

with open('sample_data/wine-ratings-small.csv', encoding='utf-8', errors='ignore') as f:
    reader = DictReader(f)
    wines = list(reader)


# The wines variable is now a list of dictionaries, one for each row in the CSV file. This is the sample output of a single entry:
# {'': '1',
#  'name': 'Laurenz V Charming Gruner Veltliner 2014',
#  'grape': '',
#  'region': 'Kamptal, Austria',
#  'variety': 'White Wine',
#  'rating': '90.0',
#  'notes': ''}



Looping over the list of dictionaries can be tricky with plain Python. Specialized libraries like Pandas make this much easier but the downside is that you need to learn a new library. The following code is a bit more verbose but it's a good exercise to learn how to work with dictionaries in Python.


In [2]:
# This example creates a new list that only has wines from Napa Valley. The new list is called napa_wines:
napa_wines = []
for wine in wines:
    if 'Napa' in wine['region']:
        napa_wines.append(wine)

napa_wines

[{'': '24',
  'name': 'Lava Vine Winery Napa Valley Cabernet Sauvignon 2014',
  'grape': '',
  'region': 'Napa Valley, California',
  'variety': 'Red Wine',
  'rating': '91.0',
  'notes': 'A wonderful representation of how amazing the 2014 vintage could be and how to balance Napa Valley’s Intensity. A ripe cherry and cassis entry with dusted cocoa and a touch of graham cracker entice. The silky rich entry is so balanced there seems to be no separation from mid-palate through the lengthy finish with multitudes of fruit and spice to accompany. This Cabernet Sauvignon is fully integrated, super complex and will age beautifully.'},
 {'': '25',
  'name': 'Lava Vine Winery Napa Valley Reserve Cabernet Sauvignon 2012',
  'grape': '',
  'region': 'Napa Valley, California',
  'variety': 'Red Wine',
  'rating': '92.0',
  'notes': 'Black berries and hints of strawberry invite with muddled cherry cola spices. Deep earthy tones on a silk entry proceed to a rich, full mid-palate. Expressive tannins 

**NOTE**: If you are trying to use ratings, remember that you will need to convert the ratings to integers for numerical comparisons.

## Using Pandas
Alternatively, you can use the Pandas library to load the CSV file and then extract the data. You'll need to install the Pandas library first. You can do this with the following command:

```bash
pip install pandas
```

Then, you can use the following code to load the CSV file and extract the data:

```python
import pandas as pd

df = pd.read_csv('sample_data/wine-ratings-small.csv')
df.head()
```

In [3]:
import pandas as pd
df = pd.read_csv("sample_data/wine-ratings-small.csv", index_col=0) # read the csv file and set the index column to 0
df.head() # show the first 5 rows of the dataframe


Unnamed: 0,name,grape,region,variety,rating,notes
0,Laurenz V Charming Gruner Veltliner 2013,,"Kamptal, Austria",White Wine,90.0,Aromas of ripe apples and a typical Veltliner ...
1,Laurenz V Charming Gruner Veltliner 2014,,"Kamptal, Austria",White Wine,90.0,Aromas of ripe apples and a typical Veltliner ...
2,Laurenz V Singing Gruner Veltliner 2007,,Austria,White Wine,90.0,"A very attractive fruit bouquet yields apple, ..."
3,Laurenz V Singing Gruner Veltliner 2010,,Austria,White Wine,88.0,"A very attractive fruit bouquet yields apple, ..."
4,Laurenz V Singing Gruner Veltliner 2011,,Austria,White Wine,88.0,"A very attractive fruit bouquet yields apple, ..."


## Manipulate data with Pandas or as a dictionary
At this point, you can use Pandas if you know how to use it. Otherwise, you can use the data as a dictionary. You can use the following code to extract the data:

```python
data = df.to_dict()
```

In [4]:
dict_data = df.to_dict()
# You'll get several keys, one for each column in the dataframe. You can access the values of a column by using the column name as a key. You'll also
# get the index of each row as a key. You can access the values of a row by using the index as a key.

dict_data['name'] # get the values of the 'name' column
# sample output:
# {0: 'Laurenz V Charming Gruner Veltliner 2013',
# 1: 'Laurenz V Charming Gruner Veltliner 2014', ...}


{0: 'Laurenz V Charming Gruner Veltliner 2013',
 1: 'Laurenz V Charming Gruner Veltliner 2014',
 2: 'Laurenz V Singing Gruner Veltliner 2007',
 3: 'Laurenz V Singing Gruner Veltliner 2010',
 4: 'Laurenz V Singing Gruner Veltliner 2011',
 5: 'Laurenz V Singing Gruner Veltliner 2013',
 6: 'Lava Cap American River Red',
 7: 'Lava Cap Barbera 2010',
 8: 'Lava Cap Battonage Chardonnay 2012',
 9: 'Lava Cap Cabernet Sauvignon 2013',
 10: 'Lava Cap Cabernet Sauvignon 2016',
 11: 'Lava Cap Petite Sirah 2013',
 12: 'Lava Cap Petite Sirah 2014',
 13: 'Lava Cap Petite Sirah 2016',
 14: 'Lava Cap Reserve Chardonnay 2015',
 15: 'Lava Cap Reserve Chardonnay 2018',
 16: 'Lava Cap Reserve Chardonnay 2016',
 17: 'Lava Cap Reserve Merlot 2015',
 18: 'Lava Cap Sauvignon Blanc 2015',
 19: 'Lava Cap Sauvignon Blanc 2017',
 20: 'Lava Cap Syrah 2009',
 21: 'Lava Cap Syrah 2014',
 22: 'Lava Cap Syrah 2013',
 23: 'Lava Vine Winery Knights Valley Reserve Cabernet Sauvignon 2013',
 24: 'Lava Vine Winery Napa Vall

In [5]:
# if you want to get the values of a row, you can use the index of the row as a key, but you have to use it for every column you need. For example:
print(dict_data['name'][0], # get the value of the 'name' column for the row with index 0
dict_data['rating'][0], # get the value of the 'rating' column for the row with index 0
dict_data['region'][0], # get the value of the 'region' column for the row with index 0
)



Laurenz V Charming Gruner Veltliner 2013 90.0 Kamptal, Austria


In [62]:
import json

# Recuperamos vinos tintos
df_red_wine = df[df['variety'] == 'Red Wine']
df_red_wine.head()

# Cada registro es un objeto JSON
json_red_wine = df_red_wine.to_json(orient="records")

dict_red_wine = df_red_wine.to_dict(orient="records")
#print(json.dumps(dict_red_wine, indent=2))

with open('sample_data/red_wine.json', 'w') as f:
  json.dump(dict_red_wine, f)

In [64]:
# Vinos TOP puntiacion 90 - 92
df_top_wine = df[(df['rating'] > 90) & (df['rating'] < 92)]
df_top_wine.head()

# Crea una llave con cada indice del data frame
dict_top_wine = df_top_wine.to_dict(orient="index")
#print(json.dumps(dict_top_wine, indent=2))

with open('sample_data/top_wine.json', "w") as f:
  json.dump(dict_top_wine, f)

In [67]:
# Vinos de Francia
df_french_wine = df[df.region.str.contains('France')]
df_french_wine.head()

# Conversion directa del data frame a JSON
df_french_wine.to_json('sample_data/french_wine.json', orient="records")