- Create a "Wine" class that will be used to represent each wine
- attributes:
    - search terms
    - size
    - vintage
    - grape
    - region
    - country 
    - producer
    - style (white/red)
    - time added, time changed if applicible (how can I keep track of changes?)
    - tasting notes
    - food pairings
    - KCW staff pick
- functions:
    - scraping functions to add attributes
    - find similar wines (opportunity for modeling)
    - stretch goal: purchase wine function (wooooaahhh)

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('kings_county_wines.csv',index_col=0)

In [3]:
df.head()

Unnamed: 0,name,link,price,description,date_collected,image_link,in_stock,details
0,"Sangria ""Limonada"" NV Palacio de Canedo",https://www.kingscountywines.com/sangria-limon...,14.99,Cinnamon & lemons & oranges & mencia. A natura...,2020-02-05 14:35:30.639523,https://cdn.shoplightspeed.com/shops/603808/fi...,In stock,
1,Rosso NV Paolo Palumbo,https://www.kingscountywines.com/rosso-2016-pa...,12.99,Soft and supple. The OG Neapolitan pizza wine.,2020-02-05 14:35:30.842275,https://cdn.shoplightspeed.com/shops/603808/fi...,In stock,
2,Anjou Blanc 2018 Domaine de Clayou,https://www.kingscountywines.com/anjou-blanc-2...,14.99,"Dry chenin. Orange flower, wintergreen, yellow...",2020-02-05 14:35:31.111292,https://cdn.shoplightspeed.com/shops/603808/fi...,In stock,
3,"Pinot Noir ""Golden"" 2017 Folktale*",https://www.kingscountywines.com/pinot-noir-20...,16.99,"Estate fruit pinot noir, not too chunky, very ...",2020-02-05 14:35:31.302038,https://cdn.shoplightspeed.com/shops/603808/fi...,In stock,
4,Shiraz HP Hydraulic Press 2016 David Franz,https://www.kingscountywines.com/shiraz-h-p-20...,25.99,"Classic bold flavors; rare delicacy. Meaty, pl...",2020-02-05 14:35:31.522531,https://cdn.shoplightspeed.com/shops/603808/fi...,In stock,


In [53]:
test_record = df.loc[df['name']=='Shiraz HP Hydraulic Press 2016 David Franz']

Unnamed: 0,name,link,price,description,date_collected,image_link,in_stock,details
4,Shiraz HP Hydraulic Press 2016 David Franz,https://www.kingscountywines.com/shiraz-h-p-20...,25.99,"Classic bold flavors; rare delicacy. Meaty, pl...",2020-02-05 14:35:31.522531,https://cdn.shoplightspeed.com/shops/603808/fi...,In stock,


In [129]:
import pandas as pd
from datetime import datetime as dt
import re
import unicodedata
class KCW_product():
    """
    This class represents a record from the KCW csv. It has no additional info, 
    but it includes methods for convenience and functionality.
    """
    name = None
    link = None 
    price = None
    description = None 
    date_collected = None 
    image_link = None 
    in_stock = None
    details = None 
    
    def __init__(self,wine_name):
        KCW_csvPath = "/Users/schlinkertc/code/wine_project/wine/kings_county_wines.csv"
        KCW_df = pd.read_csv(KCW_csvPath,index_col=0)
        KCW_record = (
            KCW_df.loc[KCW_df['name']==wine-name]
        )
        #make sure that we only have one wine under that name
        if len(KCW_record)!=1:
            print('more than 1 record found!')
            return None
        else:
            KCW_record = KCW_record.iloc[0] #returns a series
        
        self.name = KCW_record['name']
        self.price = KCW_record['price']
        self.description = KCW_record['description']
        if KCW_record['description']=='null' and KCW_record['details']!='null':
            self.description = KCW['details']
        self.date_collected = dt.strptime(KCW_record['date_collected'],'%Y-%m-%d %H:%M:%S.%f')
        self.image_link = KCW_record['image_link']
        self.in_stock = KCW_record['in_stock']
        self.details = KCW_record['details']
    def __repr__(self):
        return f"KCW_product: Wine={self.name},URL={self.link}"
    
    def search_terms(self):
        terms = [x.replace('"','') for x in self.name.split(' ')]
        # get rid of numbers
        pattern = re.compile('[0-9]')
        parsed_terms = []
        for term in terms:
            if re.search(pattern,term) == None:
                parsed_terms.append(term)
        
        search_terms_bytes = [unicodedata.normalize('NFKD', x).encode('ascii','ignore') 
                        for x in parsed_terms]
        search_terms = [x.decode() for x in search_terms_bytes]
        
        return search_terms
    
    def vintage(self):
        terms = [x.replace('"','') for x in self.name.split(' ')]
        pattern = re.compile('[0-9]{4}')
        vintage = None
        for term in terms:
            if re.search(pattern,term):
                vintage = term
        return vintage

In [130]:
test_name = test_record.iloc[0]['name']

In [131]:
product = KCW_product(test_name)

In [133]:
product.vintage()

'2016'

An instance of the Wine class will be instantiated from the KCW csv that's scraped from their site. 

Maybe we should have a seperate class that gets instantiated if a wine changes i.e. if it's not there anymore, out of stock, or if the description changes. Attributes would include: field change, old value, new value, date changed, etc

Should this represent a SQL table? I should have classes that make use of relationships like foreign keys etc. e.g. I should have relational classes for grapes, regions, producers, scraped info (wine-searcher,vivino). But this could be a class for convenience and operational uses. 

In [112]:
test_record.iloc[0]['date_collected']

'2020-02-05 14:35:31.522531'

In [113]:
dt.strptime(test_record.iloc[0]['date_collected'],'%Y-%m-%d %H:%M:%S.%f')

datetime.datetime(2020, 2, 5, 14, 35, 31, 522531)

In [117]:
test_record.iloc[0]

name                     Shiraz HP Hydraulic Press 2016 David Franz
link              https://www.kingscountywines.com/shiraz-h-p-20...
price                                                         25.99
description       Classic bold flavors; rare delicacy. Meaty, pl...
date_collected                           2020-02-05 14:35:31.522531
image_link        https://cdn.shoplightspeed.com/shops/603808/fi...
in_stock                                                   In stock
details                                                         NaN
Name: 4, dtype: object

## I want the following class definitions to encompass all info sources. It's a work in progress...

In [15]:

class Wine():
    name = ''
    price = 0.0
    description = ''
    date_collected = ''
    image_link = ''
    in_stock = True
    
    # remember that the attributes don't just reflect what's in our original CSV...
    
    #attributes derived from KCW csv
    staff_pick = '' # False
    search_terms = []
    vintage = 0
    size = 0.0
    
    # attributes scraped from other sources
    # how should I handle multiple, potentially conflicting sources?
    style = "" # red/white/sparkling etc. 
    grape = "" # references the grape class 
    blend = False # would be a list if applicibale
    producer = '' # references winery class
    region = "" # references the region class
    taste = {"tasting_notes":[],
            "food_pairings":[]}
    aging = "" #oak vs steel
    
    links = {"KCW":None,
            "wine-searcher":None,
            "vivino":None,
            "KCW_img":None,
            "wine-searcher_img":None,
            "vivino_img":None}
    
    def __init__(self,wine_name):



In [12]:
class Grape():
    color = ""
    soil = ""
    characteristics = []
    climate = ""
    
    # should associated regions/wines come from our data or another source like wikipedia?
    notable_regions = []
    notable_wines = []
    
    

In [14]:
class Region():
    name = ''
    country = ''
    zone = ''
    region = ''
    
    notable_wines = []

In [16]:
class Winery():
    name = ''
    location = ''
    