# GeoCache: *Wine Spectator*'s Top 100 Wines, 1988-2020
List available online on *Wine Spectator*'s [Top 100 Lists web site](https://top100.winespectator.com/lists/).

## File Setup

In [1]:
# import and initialize main python libraries
import numpy as np
import pandas as pd
import shapefile as shp
import matplotlib.pyplot as plt
import seaborn as sns

# import libraries for file navigation
import os
import shutil
import glob
from pandas_ods_reader import read_ods

# import other packages
from scipy import stats
from sklearn import linear_model

# import geo packages
import geopandas as gpd
import descartes
from shapely.geometry import Point, Polygon

# import Geopy packages
import geopy
from geopy.geocoders import Nominatim

In [2]:
# initialize vizualization set
sns.set(style="whitegrid", palette="colorblind", color_codes=True)
sns.mpl.rc("figure", figsize=(10, 6))

# Jupyter Notebook
%matplotlib inline

## Dataframe Exploration

In [3]:
# Note: save CSV files in UTF-8 format to preserve special characters.
df_Wine = pd.read_csv('./CSV_Wines.csv')
df_GeoCache = pd.read_csv('./CSV_GeoCache.csv')
df_GeoList = pd.read_csv('./CSV_GeoList.csv')

In [4]:
df_Wine.shape

(3301, 18)

In [5]:
df_Wine.dtypes

Review_Year           float64
Rank                   object
Vintage                object
Score                 float64
Price                  object
Winemaker              object
Wine                   object
Wine_Style             object
Grape_Blend            object
Blend_List             object
Geography              object
Cases_Made            float64
Cases_Imported        float64
Reviewer               object
Drink_now             float64
Best_Drink_from       float64
Best_Drink_Through    float64
Review                 object
dtype: object

In [6]:
df_GeoCache.shape

(1224, 3)

In [7]:
df_GeoList.shape

(445, 1)

In [8]:
df_Wine.sample(10)

Unnamed: 0,Review_Year,Rank,Vintage,Score,Price,Winemaker,Wine,Wine_Style,Grape_Blend,Blend_List,Geography,Cases_Made,Cases_Imported,Reviewer,Drink_now,Best_Drink_from,Best_Drink_Through,Review
1802,2002.0,3,1997,94.0,50,Castello Banfi,Brunello di Montalcino,Red,Brunello di Montalcino,,Brunello di Montalcino,32500.0,,JS,,2003.0,,"A Brunello for everyone. Solid and focused, wi..."
2391,1997.0,92,1994,92.0,15,Markham,Cabernet Sauvignon Napa Valley,Red,Cabernet Sauvignon,,Napa Valley,13350.0,,JL,,2000.0,,This seductive wine not only grows on you with...
514,2015.0,15,2011,94.0,30,Abadia Retuerta,Viño de la Tierra de Castilla y León Sardon de...,Red,Blend,Cabernet - Syrah – Tempranillo,Sardon de Duero,30000.0,,TM,1.0,2015.0,2031.0,Alluring for its plush texture and impressive ...
642,2014.0,43,2013,91.0,12,Charles Smith,Riesling Columbia Valley Kung Fu Girl Evergreen,White,Riesling,,Columbia Valley,128806.0,,HS,1.0,2014.0,2020.0,"Crisp and sleek, with juicy, expansive nectari..."
2344,1997.0,45,1996,91.0,12,Rosemount,Shiraz South Eastern Australia,Red,Shiraz | Syrah,,South Eastern Australia,150000.0,,HS,1.0,2000.0,,"Bursting with fruit, here's a lively, generous..."
838,2012.0,39,2010,92.0,20,Domaines Schlumberger,Pinot Gris Alsace Les Princes Abbés,White,Pinot Grigio | Pinot Gris,,Alsace,,2100.0,AN,1.0,2012.0,2022.0,"Finely knit, with a vibrancy to the refined ac..."
2377,1997.0,78,1995,92.0,40,Louis Carillon,Puligny-Montrachet,White,Chardonnay,,Puligny-Montrachet,1665.0,,,,1997.0,2005.0,"Amazing quality for a village wine, with almos..."
2260,1998.0,26,1995,97.0,135,Álvaro Palacios,Priorat L'Ermita,Red,Garnacha | Grenache | Garnatxa,,Priorat,450.0,,TM,1.0,1998.0,2005.0,Such a powerful mouthful of wine that after sw...
916,2011.0,17,2006,95.0,60,Tenuta Carlina,Brunello di Montalcino La Togata,Red,Brunello di Montalcino,,Brunello di Montalcino,4000.0,,BS,,2013.0,2026.0,Very pure aromas and flavors of raspberry and ...
1895,2002.0,96,1999,90.0,38,Château Montrose,St.-Estèphe,Red,Blend,Bordeaux Blend Red,St.-Estèphe,18715.0,,JS,,2005.0,,"Very fine indeed, offering complex aromas of b..."


In [9]:
df_GeoCache.sample(10)

Unnamed: 0,Geography,Hierarchy,Address
642,Alicante,Hierarchy_01,"Valencia, Spain"
212,Primitivo di Manduria,Hierarchy_00,Italy
1029,Puligny-Montrachet Sous le Puits,Hierarchy_03,"Puligny-Montrachet, Côte de Beaune, Burgundy, ..."
848,IGP Pays d'Oc,Hierarchy_02,"Pays d'Oc, Languedoc-Roussillon, France"
588,Toscana,Hierarchy_01,"Tuscany, Italy"
653,Paso Robles,Hierarchy_01,"California, USA"
634,Terra Alta,Hierarchy_01,"Catalonia | Catalunya, Spain"
841,St.-Joseph,Hierarchy_02,"St.-Joseph, Rhône, France"
486,Minervois La Livinière,Hierarchy_01,"Languedoc-Roussillon, France"
964,Knights Valley,Hierarchy_02,"North Coast, California, USA"


In [10]:
df_GeoList.sample(10)

Unnamed: 0,Address
238,"Minervois La Livinière, Minervois, Languedoc-R..."
410,"Vallagarina IGT, Trentino,Alto Adige, Italy"
217,"Maconnais, Burgundy, France"
376,"St.-Aubin Premier Cru, St.-Aubin, Côte de Beau..."
213,"Macedonia, Greece, Greece"
68,"Carneros, Napa Valley, Napa County, North Coas..."
128,"Côtes du Jura, Jura, France"
123,"Côte de Beaune, Burgundy, France"
174,"Heathcote, Victoria, Australia"
10,"Alsace, France"


### Geocode the Address dataframe
Reference: [Python’s geocoding — Convert a list of addresses into a map](https://towardsdatascience.com/pythons-geocoding-convert-a-list-of-addresses-into-a-map-f522ef513fd6)

In [11]:
# Initialize Nominatim into geolocator variable.
geolocator = Nominatim(user_agent='wine app')

In [12]:
geolocator.geocode('Castilla y León, Spain').raw

{'place_id': 258252333,
 'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright',
 'osm_type': 'relation',
 'osm_id': 349041,
 'boundingbox': ['40.0824504', '43.2382034', '-7.077073', '-1.7753716'],
 'lat': '41.8037172',
 'lon': '-4.7471726',
 'display_name': 'Castilla y León, España',
 'class': 'boundary',
 'type': 'administrative',
 'importance': 0.9625997816800999,
 'icon': 'https://nominatim.openstreetmap.org/ui/mapicons//poi_boundary_administrative.p.20.png'}

In [13]:
geolocator.geocode('Castilla y León, Spain').point

Point(41.8037172, -4.7471726, 0.0)

In [14]:
# Apply geolocator to the Address column in the GeoList dataframe.
df_GeoList['loc'] = df_GeoList['Address'].apply(geolocator.geocode)

In [15]:
# Get .point containing lat/long from Geocode response, if not none.
df_GeoList['point'] = df_GeoList['loc'].apply(lambda loc: tuple(loc.point) if loc else None)

In [16]:
# Split the .point column into separate columns for lat, long, and altitude
df_GeoList[['lat', 'long', 'altitude']] = pd.DataFrame(df_GeoList['point'].to_list(), index=df_GeoList.index)

In [17]:
df_GeoList

Unnamed: 0,Address,loc,point,lat,long,altitude
0,"Abruzzo, Italy","(Abruzzo, Italia, (42.227681, 13.854983))","(42.227681, 13.854983, 0.0)",42.227681,13.854983,0.0
1,"Adelaide Hills, South Australia, Australia","(Adelaide Hills Council, South Australia, Aust...","(-34.901351649999995, 138.8293202817461, 0.0)",-34.901352,138.829320,0.0
2,"Aegean Islands, Greece","(Aegean, Σάμη - Αγία Ευφημία, Καραβόμυλος, Δήμ...","(38.2504094, 20.6304217, 0.0)",38.250409,20.630422,0.0
3,"Aglianico del Vulture, Basilicata, Italy",,,,,
4,"Agrelo, Mendoza, Argentina","(Agrelo, Distrito Agrelo, Departamento Luján d...","(-33.1184629, -68.8859261, 0.0)",-33.118463,-68.885926,0.0
5,"Alba, Piedmont | Piemonte, Italy",,,,,
6,"Alentejo, Portugal","(Alentejo, Portugal, (38.0551003, -7.8605799))","(38.0551003, -7.8605799, 0.0)",38.055100,-7.860580,0.0
7,"Alexander Valley, Sonoma County, North Coast, ...",,,,,
8,"Alicante, Valencia, Spain","(Alacant / Alicante, l'Alacantí, Alacant / Ali...","(38.353738, -0.4901846, 0.0)",38.353738,-0.490185,0.0
9,"Almansa, Castilla La Mancha, Spain","(Almansa, Albacete, Castilla-La Mancha, 02640,...","(38.8682065, -1.0978627, 0.0)",38.868206,-1.097863,0.0


### Append geography details to the GeoCache dataframe
Determine how well populated geography is at different hierarchy levels.

In [18]:
df_GeoCache = pd.merge(df_GeoCache, df_GeoList, on = 'Address', how = 'left' )

In [19]:
df_GeoCache.to_csv(path_or_buf = './GeoCache.csv', index = False)

### Append Hierarchy 00 details to the df_Wine dataset

In [20]:
# filter df_GeoCache to Hierarchy_00

df_GeoCache00 = df_GeoCache[
    (df_GeoCache.Hierarchy == 'Hierarchy_00')
]

df_GeoCache00.sample(10)

Unnamed: 0,Geography,Hierarchy,Address,loc,point,lat,long,altitude
304,Lodi,Hierarchy_00,USA,"(United States, (39.7837304, -100.4458825))","(39.7837304, -100.4458825, 0.0)",39.78373,-100.445882,0.0
223,Brunello di Montalcino,Hierarchy_00,Italy,"(Italia, (42.6384261, 12.674297))","(42.6384261, 12.674297, 0.0)",42.638426,12.674297,0.0
112,Richebourg,Hierarchy_00,France,"(France, (46.603354, 1.8883335))","(46.603354, 1.8883335, 0.0)",46.603354,1.888334,0.0
145,Châteauneuf-du-Pape,Hierarchy_00,France,"(France, (46.603354, 1.8883335))","(46.603354, 1.8883335, 0.0)",46.603354,1.888334,0.0
116,Vosne-Romanée Cros Parantoux,Hierarchy_00,France,"(France, (46.603354, 1.8883335))","(46.603354, 1.8883335, 0.0)",46.603354,1.888334,0.0
70,Crémant de Bourgogne,Hierarchy_00,France,"(France, (46.603354, 1.8883335))","(46.603354, 1.8883335, 0.0)",46.603354,1.888334,0.0
22,South Australia,Hierarchy_00,Australia,"(Australia, (-24.7761086, 134.755))","(-24.7761086, 134.755, 0.0)",-24.776109,134.755,0.0
198,Rosso Piceno,Hierarchy_00,Italy,"(Italia, (42.6384261, 12.674297))","(42.6384261, 12.674297, 0.0)",42.638426,12.674297,0.0
102,Echézeaux,Hierarchy_00,France,"(France, (46.603354, 1.8883335))","(46.603354, 1.8883335, 0.0)",46.603354,1.888334,0.0
170,Rheingau,Hierarchy_00,Germany,"(Deutschland, (51.0834196, 10.4234469))","(51.0834196, 10.4234469, 0.0)",51.08342,10.423447,0.0


In [21]:
df_Wine00 = pd.merge(df_Wine, df_GeoCache00, on = 'Geography', how = 'left')

df_Wine00.sample(10)

Unnamed: 0,Review_Year,Rank,Vintage,Score,Price,Winemaker,Wine,Wine_Style,Grape_Blend,Blend_List,...,Best_Drink_from,Best_Drink_Through,Review,Hierarchy,Address,loc,point,lat,long,altitude
2080,2000.0,79,1998,90.0,12,Columbia Crest,Chardonnay Columbia Valley Estate Series,White,Chardonnay,,...,2000.0,2003.0,"A complex, creamy Washington white, rich, long...",Hierarchy_00,USA,"(United States, (39.7837304, -100.4458825))","(39.7837304, -100.4458825, 0.0)",39.78373,-100.445882,0.0
409,2016.0,9,2013,96.0,106,Château Smith-Haut-Lafitte,Pessac-Léognan White,White,Blend,"Sauvignon Blanc, Sauvignon Gris and Sémillon",...,2017.0,2023.0,"This has a gorgeous feel, with opulent fruit o...",Hierarchy_00,France,"(France, (46.603354, 1.8883335))","(46.603354, 1.8883335, 0.0)",46.603354,1.888334,0.0
922,2011.0,21,2009,93.0,15,Georges Duboeuf,Morgon Jean Descombes,Red,Gamay,,...,2011.0,,Light tannins and a smoky mineral note frame t...,Hierarchy_00,France,"(France, (46.603354, 1.8883335))","(46.603354, 1.8883335, 0.0)",46.603354,1.888334,0.0
41,2020.0,42,2013,93.0,40,Gloria Ferrer,Brut Carneros Royal Cuvée Late Disgorged,Sparkling,Blend,Sparkling Blend,...,2020.0,,"A sumptuous and rich style, with vibrant Asian...",Hierarchy_00,USA,"(United States, (39.7837304, -100.4458825))","(39.7837304, -100.4458825, 0.0)",39.78373,-100.445882,0.0
338,2017.0,39,2014,94.0,50,Spring Valley,Frederick Walla Walla Valley,Red,Blend,"Cabernet Sauvignon, Cabernet Franc, Merlot, Pe...",...,2017.0,2023.0,"Refined and impeccably structured, with floral...",Hierarchy_00,USA,"(United States, (39.7837304, -100.4458825))","(39.7837304, -100.4458825, 0.0)",39.78373,-100.445882,0.0
2147,1999.0,46,1995,92.0,30,Giovanni Sordo,Barolo,Red,Blend,Nebbiolo,...,1999.0,2007.0,"Delicate and pure, with gorgeous balance. Ligh...",Hierarchy_00,Italy,"(Italia, (42.6384261, 12.674297))","(42.6384261, 12.674297, 0.0)",42.638426,12.674297,0.0
1070,2010.0,69,2008,91.0,28,Orin Swift,Zinfandel California Saldo,Red,Zinfandel,,...,2010.0,2015.0,"Well-built, yet rich and stylish, with spicy b...",Hierarchy_00,USA,"(United States, (39.7837304, -100.4458825))","(39.7837304, -100.4458825, 0.0)",39.78373,-100.445882,0.0
709,2013.0,8,2010,96.0,120,Château de Beaucastel,Châteauneuf-du-Pape,Red,Châteauneuf-du-Pape,,...,2016.0,2035.0,"Dark, dense and very closed now, this has a tr...",Hierarchy_00,France,"(France, (46.603354, 1.8883335))","(46.603354, 1.8883335, 0.0)",46.603354,1.888334,0.0
92,2020.0,93,2017,97.0,127,Château Pavie-Decesse,St.-Emilion,Red,Blend,Merlot and Cabernet Franc,...,2022.0,2040.0,This is laden with cassis and warmed plum comp...,Hierarchy_00,France,"(France, (46.603354, 1.8883335))","(46.603354, 1.8883335, 0.0)",46.603354,1.888334,0.0
1297,2008.0,96,2006,93.0,60,Cabreo,Toscana Il Borgo,Red,Blend,Sangiovese and Cabernet Sauvignon,...,2011.0,,"This is dark and rich, with loads of blackberr...",Hierarchy_00,Italy,"(Italia, (42.6384261, 12.674297))","(42.6384261, 12.674297, 0.0)",42.638426,12.674297,0.0


### Append Hierarchy 01 details to the df_Wine dataset

In [22]:
# filter df_GeoCache to Hierarchy_00

df_GeoCache01 = df_GeoCache[
    (df_GeoCache.Hierarchy == 'Hierarchy_01')
]

df_GeoCache01.sample(10)

Unnamed: 0,Geography,Hierarchy,Address,loc,point,lat,long,altitude
444,Le Montrachet,Hierarchy_01,"Burgundy, France","(Bourgogne, France métropolitaine, France, (47...","(47.27808725, 4.222486304306048, 0.0)",47.278087,4.222486,0.0
365,Hunter Valley,Hierarchy_01,"New South Wales, Australia","(New South Wales, Australia, (-31.8759835, 147...","(-31.8759835, 147.2869493, 0.0)",-31.875984,147.286949,0.0
435,Corton Les Renardes,Hierarchy_01,"Burgundy, France","(Bourgogne, France métropolitaine, France, (47...","(47.27808725, 4.222486304306048, 0.0)",47.278087,4.222486,0.0
629,Toro,Hierarchy_01,"Castilla y León, Spain","(Castilla y León, España, (41.8037172, -4.7471...","(41.8037172, -4.7471726, 0.0)",41.803717,-4.747173,0.0
572,Etna,Hierarchy_01,"Sicily, Italy","(Sicilia, Italia, (37.587794, 14.155048))","(37.587794, 14.155048, 0.0)",37.587794,14.155048,0.0
355,Cafayate - Calchaqui Valley,Hierarchy_01,"Cafayate , Calchaqui Valley, Argentina","(Calchaquí, Municipio de Animaná, Cafayate, Sa...","(-26.0051871, -65.8669424, 0.0)",-26.005187,-65.866942,0.0
522,Mosel,Hierarchy_01,"Mosel, Germany","(Mosel, Lützel, Koblenz, Rheinland-Pfalz, 5607...","(50.3659752, 7.5858251, 0.0)",50.365975,7.585825,0.0
455,Musigny,Hierarchy_01,"Burgundy, France","(Bourgogne, France métropolitaine, France, (47...","(47.27808725, 4.222486304306048, 0.0)",47.278087,4.222486,0.0
614,Constantia,Hierarchy_01,"Western Cape, South Africa","(Western Cape, South Africa, (-33.546977, 20.7...","(-33.546977, 20.72753, 0.0)",-33.546977,20.72753,0.0
485,Languedoc,Hierarchy_01,"Languedoc-Roussillon, France","(Languedoc-Roussillon, France métropolitaine, ...","(43.65420305, 3.674669940206605, 0.0)",43.654203,3.67467,0.0


In [23]:
df_Wine01 = pd.merge(df_Wine, df_GeoCache01, on = 'Geography', how = 'left')

df_Wine01.sample(10)

Unnamed: 0,Review_Year,Rank,Vintage,Score,Price,Winemaker,Wine,Wine_Style,Grape_Blend,Blend_List,...,Best_Drink_from,Best_Drink_Through,Review,Hierarchy,Address,loc,point,lat,long,altitude
2149,1999.0,48,1997,94.0,52,J. Moreau & Fils,Chablis Les Clos,White,Chardonnay,,...,2003.0,2015.0,"Incredibly fresh and vibrant in style, the cha...",Hierarchy_01,"Burgundy, France","(Bourgogne, France métropolitaine, France, (47...","(47.27808725, 4.222486304306048, 0.0)",47.278087,4.222486,0.0
1290,2008.0,89,2006,90.0,18,Stadt Krems,Grüner Veltliner Qualitätswein Trocken Kremsta...,White,Grüner,,...,2008.0,2015.0,"Intense, with concentrated flavors of ripe pea...",Hierarchy_01,"Kremstal, Austria","(Inzersdorf im Kremstal, Bezirk Kirchdorf, Obe...","(47.9263917, 14.0780469, 0.0)",47.926392,14.078047,0.0
2868,1992.0,66,1990,88.0,28,Kumeu River,Chardonnay Kumeu,White,Chardonnay,,...,,,"Pulls out all the stops. Ripe, buttery aromas ...",Hierarchy_01,"Auckland, New Zealand","(Auckland, Waitematā, Auckland, 1010, New Zeal...","(-36.852095, 174.7631803, 0.0)",-36.852095,174.76318,0.0
2354,1997.0,53,1995,92.0,15,Flora Springs,Sangiovese Napa Valley,Red,Sangiovese,,...,,,"Openly fruity, with lots of ripe blackberry an...",Hierarchy_01,"California, USA","(California, United States, (36.7014631, -118....","(36.7014631, -118.755997, 0.0)",36.701463,-118.755997,0.0
981,2011.0,80,2008,96.0,150,Continuum,Napa Valley,Red,Blend,"Cabernet Sauvignon, Cabernet Franc, Petit Verd...",...,2013.0,2023.0,"A remarkable effort, offering riveting, expres...",Hierarchy_01,"California, USA","(California, United States, (36.7014631, -118....","(36.7014631, -118.755997, 0.0)",36.701463,-118.755997,0.0
1872,2002.0,71,2000,90.0,24,Talley,Chardonnay Arroyo Grande Valley,White,Chardonnay,,...,2002.0,2005.0,"A wine of finesse and elegance, with intense, ...",Hierarchy_01,"California, USA","(California, United States, (36.7014631, -118....","(36.7014631, -118.755997, 0.0)",36.701463,-118.755997,0.0
1685,2004.0,84,2003,90.0,13,Waterbrook,Mélange Columbia Valley,Red,Blend,"Cabernet Sauvignon, Sangiovese, Merlot, Syrah ...",...,2004.0,2010.0,"Ripe and plump, generous with its spicy, mocha...",Hierarchy_01,"Washington, USA","(Washington, District of Columbia, United Stat...","(38.8949924, -77.0365581, 0.0)",38.894992,-77.036558,0.0
3070,1990.0,68,1988,91.0,20,Saintsbury,Pinot Noir Carneros,Red,Pinot Noir,,...,1990.0,,Difficult to find a better Pinot Noir at this ...,Hierarchy_01,"California, USA","(California, United States, (36.7014631, -118....","(36.7014631, -118.755997, 0.0)",36.701463,-118.755997,0.0
2587,1995.0,86,1993,90.0,25,Chateau Ste. Michelle,Chardonnay Columbia Valley Cold Creek Vineyard,White,Chardonnay,,...,1995.0,1997.0,"Subtle, harmonious and beautifully balanced, b...",Hierarchy_01,"Washington, USA","(Washington, District of Columbia, United Stat...","(38.8949924, -77.0365581, 0.0)",38.894992,-77.036558,0.0
436,2016.0,36,2011,95.0,70,G.D. Vajra,Barolo Bricco delle Viole,Red,Blend,Nebbiolo,...,2019.0,2032.0,"Graphite and iron aromas lead off, with cherry...",Hierarchy_01,"Piedmont | Piemonte, Italy","(Piedmont Properties, 78, SP50, San Marzano Ol...","(44.7605629, 8.2998538, 0.0)",44.760563,8.299854,0.0


### Save files for use in other notebooks

In [28]:
# Remove duplicates by index: 2017 (46), 2015 (73), 1995 (94)
df_Wine00 = df_Wine00.drop([df_Wine00.index[2596], df_Wine00.index[574], df_Wine00.index[346]])
df_Wine01 = df_Wine01.drop([df_Wine00.index[2596], df_Wine00.index[574], df_Wine00.index[346]])

In [29]:
df_Wine00.shape

(3301, 25)

In [30]:
df_Wine01.shape

(3301, 25)

In [31]:
df_Wine00.to_csv(path_or_buf = './Wine_Hier00.csv', index = False)
df_Wine01.to_csv(path_or_buf = './Wine_Hier01.csv', index = False)