# Executive Summary
### Run last two cells to see reccomender in action
The goal of this project was to build a reccomender system. The Scotch-whiskey dataset was obtained from kaggle.com. It originally consiited of name of scotches, category of scotch, review points, price, currency and description. As given this dataset, with some cleaning, could be used for a reccomender system, but with some feature engineering and extraction, can be made more robust.

The first step is cleaning the data. Checking for duplicate values and null values, and dropping those rows from the dataframe. A more interesting problem was makling sure that there are no duplicate names in the name columns. A duplicate name will cause an error in the reccomender system. 

The next stpe was slicing the alchohol percentage from the name column. A function was written to extract the alchohol precentages from the name column and add them to a column called 'alc'. If the name did not contain an alchohol precentage the function were to add a Nan value then. The rows that contained Nan values were then dropped.

The next step was engineering some festures with the help of sci-kit-learn. First was taking the categories columns and transforming it using LabelEncoder which assigns a number for each category. The numerical categories were made to the column 'cat'. Next was making all the words in description into usuable numerical data. Utilizing CountVectorizer the words were made into word vectors. Stopword for english were enforced, so words like, and, or, and the, are ignored. Monograms(one word) and bi-grams(two words) are taken into account. These word vectors were made into an array and then made into dataframe. The words dataframe was than concatenated to the original dataframe. So more cleaning was then done to remove more null values.

Next was getting a dataframe of all numerical features. Before we did that the names columns was made into the index to be later used for the recommender system. Any column that did not contain numerical entries was excluded from the features in the dataframe. Now that the dataframe is all numerical a distance metric between all the vectors can be made. Utilzing cosine similarity distance metric we can see how similar the scotches are to each other. Cosine Similarity was employed on the whole dataframe and a new data frame was made with the index as the names columns and the coloumns as the name columns, making a cosine similarity matrix. Using the cosine simialrity matrix reccomendations can be made. It works on the pricipal of cosine. Similar vectors will have a small angle, closer to zero than dissimilar vectors which will have vectors with large angles between them, the largest being 90 degrees. The cosine of zero is one, so similar vectors will have a cosine value close to one. The cosine of 90 is zero so, dissimilar vectors will have a cosine value close to zero. We can see the top five cosine similarity values using the reccomender cell. 







In [1]:
#import libraries
import pandas as pd
import numpy as np


# Initial Cleaning

In [2]:
# read in data
df = pd.read_csv('./scotch_review.csv')

In [3]:
df.head()

Unnamed: 0.1,Unnamed: 0,name,category,review.point,price,currency,description
0,1,"Johnnie Walker Blue Label, 40%",Blended Scotch Whisky,97,225.0,$,"Magnificently powerful and intense. Caramels, ..."
1,2,"Black Bowmore, 1964 vintage, 42 year old, 40.5%",Single Malt Scotch,97,4500.0,$,What impresses me most is how this whisky evol...
2,3,"Bowmore 46 year old (distilled 1964), 42.9%",Single Malt Scotch,97,13500.0,$,There have been some legendary Bowmores from t...
3,4,"Compass Box The General, 53.4%",Blended Malt Scotch Whisky,96,325.0,$,With a name inspired by a 1926 Buster Keaton m...
4,5,"Chivas Regal Ultis, 40%",Blended Malt Scotch Whisky,96,160.0,$,"Captivating, enticing, and wonderfully charmin..."


In [4]:
#drop column Unnamed: 0
df.drop('Unnamed: 0', axis = 1, inplace =True)

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2247 entries, 0 to 2246
Data columns (total 6 columns):
name            2247 non-null object
category        2247 non-null object
review.point    2247 non-null int64
price           2247 non-null object
currency        2247 non-null object
description     2247 non-null object
dtypes: int64(1), object(5)
memory usage: 105.4+ KB


In [6]:
# check for duplicates
df.duplicated().sum()

2

In [7]:
#drop duplicates
df.drop_duplicates(inplace=True)

In [8]:
df.isnull().sum()

name            0
category        0
review.point    0
price           0
currency        0
description     0
dtype: int64

In [9]:
#drop duplicates in name, is important for reccomender to work properly
df.name = df.name.drop_duplicates()

In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2245 entries, 0 to 2246
Data columns (total 6 columns):
name            2223 non-null object
category        2245 non-null object
review.point    2245 non-null int64
price           2245 non-null object
currency        2245 non-null object
description     2245 non-null object
dtypes: int64(1), object(5)
memory usage: 122.8+ KB


In [11]:
#reset index
df.reset_index(inplace = True)

In [12]:
#check for nulls
df.isnull().sum()

index            0
name            22
category         0
review.point     0
price            0
currency         0
description      0
dtype: int64

In [13]:
#drop nulls
df.dropna(inplace=True)

In [14]:
#reset index
df.reset_index(inplace=True)

# Feature Extraction

In [15]:
#test code for integers
df.name[0][-4:-1].strip(' ').strip('%')

'40'

In [22]:
#test code for floats
df.name[1][-6:-1].strip(' ').strip('%')

'40.5'

In [50]:
# append the alchohol content from the end of the name to its own column, 
#if no alchohol content apparent append a nan value
alc = []
def extract(my_df):
    for i in range(len(my_dfdf)):
        if my_df.name[i][-4:-1].strip(' ').strip('%').isdigit():
            alc.append(float(my_df.name[i][-4:-1].strip(' ').strip('%')))
        else:
            try:
                alc.append(float(my_df.name[i][-6:-1].strip(' ').strip('%')))
            except ValueError:
                alc.append(np.nan)

        
            

    
    

     


extract(df)    

In [53]:
# make alc the alchohol content column
df['alc'] = alc

In [56]:
# check for nans
df.isnull().sum()

level_0          0
index            0
name             0
category         0
review.point     0
price            0
currency         0
description      0
alc             44
dtype: int64

In [57]:
#drop nans
df.dropna(inplace=True)

In [None]:
df.isnull().sum()

In [59]:
# drop two columns level_0 and index
df.drop(['level_0','index'],axis=1, inplace=True)

In [61]:
df.head()

Unnamed: 0,name,category,review.point,price,currency,description,alc
0,"Johnnie Walker Blue Label, 40%",Blended Scotch Whisky,97,225.0,$,"Magnificently powerful and intense. Caramels, ...",40.0
1,"Black Bowmore, 1964 vintage, 42 year old, 40.5%",Single Malt Scotch,97,4500.0,$,What impresses me most is how this whisky evol...,40.5
2,"Bowmore 46 year old (distilled 1964), 42.9%",Single Malt Scotch,97,13500.0,$,There have been some legendary Bowmores from t...,42.9
3,"Compass Box The General, 53.4%",Blended Malt Scotch Whisky,96,325.0,$,With a name inspired by a 1926 Buster Keaton m...,53.4
4,"Chivas Regal Ultis, 40%",Blended Malt Scotch Whisky,96,160.0,$,"Captivating, enticing, and wonderfully charmin...",40.0


# Feature Engineering

In [100]:
# import libraries for feature building
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import LabelEncoder

In [101]:
#call Label Encoer to be used on the categopry column to make it into numerical data
le = LabelEncoder()

In [102]:
#fit transform label encoder
cat = le.fit_transform(df.category)

In [103]:
# make the label encoded category the column cat in the dataframe
df['cat'] = cat

In [104]:
df.head()

Unnamed: 0,name,category,review.point,price,currency,description,alc,cat
0,"Johnnie Walker Blue Label, 40%",Blended Scotch Whisky,97,225.0,$,"Magnificently powerful and intense. Caramels, ...",40.0,1
1,"Black Bowmore, 1964 vintage, 42 year old, 40.5%",Single Malt Scotch,97,4500.0,$,What impresses me most is how this whisky evol...,40.5,4
2,"Bowmore 46 year old (distilled 1964), 42.9%",Single Malt Scotch,97,13500.0,$,There have been some legendary Bowmores from t...,42.9,4
3,"Compass Box The General, 53.4%",Blended Malt Scotch Whisky,96,325.0,$,With a name inspired by a 1926 Buster Keaton m...,53.4,0
4,"Chivas Regal Ultis, 40%",Blended Malt Scotch Whisky,96,160.0,$,"Captivating, enticing, and wonderfully charmin...",40.0,0


In [105]:
# Need word vectors from the description column, usung stop words in english and 1-gram and 2-grams
cv = CountVectorizer(stop_words='english', ngram_range=(1,2))

In [106]:
#fit trans form CountVecotrizer on description column into an array
words = cv.fit_transform(df.description).toarray()

In [110]:
# make word vector array into a dataframe, words
words = pd.DataFrame(words, columns=cv.get_feature_names())

In [112]:
# concatenate dataframe with words dataframe
df = pd.concat([df, words], axis=1)

In [113]:
df.head()

Unnamed: 0,name,category,review.point,price,currency,description,alc,cat,00,000,...,zingy yes,zippy,zippy acidity,zippy clean,zone,zone moderate,ìle,ìle 2016,ìle limited,ìle release
0,"Johnnie Walker Blue Label, 40%",Blended Scotch Whisky,97.0,225.0,$,"Magnificently powerful and intense. Caramels, ...",40.0,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Black Bowmore, 1964 vintage, 42 year old, 40.5%",Single Malt Scotch,97.0,4500.0,$,What impresses me most is how this whisky evol...,40.5,4.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bowmore 46 year old (distilled 1964), 42.9%",Single Malt Scotch,97.0,13500.0,$,There have been some legendary Bowmores from t...,42.9,4.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Compass Box The General, 53.4%",Blended Malt Scotch Whisky,96.0,325.0,$,With a name inspired by a 1926 Buster Keaton m...,53.4,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Chivas Regal Ultis, 40%",Blended Malt Scotch Whisky,96.0,160.0,$,"Captivating, enticing, and wonderfully charmin...",40.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [114]:
# make the the name column the index of the dataframe
df.index = df.name

In [115]:
df.head()

Unnamed: 0_level_0,name,category,review.point,price,currency,description,alc,cat,00,000,...,zingy yes,zippy,zippy acidity,zippy clean,zone,zone moderate,ìle,ìle 2016,ìle limited,ìle release
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"Johnnie Walker Blue Label, 40%","Johnnie Walker Blue Label, 40%",Blended Scotch Whisky,97.0,225.0,$,"Magnificently powerful and intense. Caramels, ...",40.0,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"Black Bowmore, 1964 vintage, 42 year old, 40.5%","Black Bowmore, 1964 vintage, 42 year old, 40.5%",Single Malt Scotch,97.0,4500.0,$,What impresses me most is how this whisky evol...,40.5,4.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"Bowmore 46 year old (distilled 1964), 42.9%","Bowmore 46 year old (distilled 1964), 42.9%",Single Malt Scotch,97.0,13500.0,$,There have been some legendary Bowmores from t...,42.9,4.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"Compass Box The General, 53.4%","Compass Box The General, 53.4%",Blended Malt Scotch Whisky,96.0,325.0,$,With a name inspired by a 1926 Buster Keaton m...,53.4,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"Chivas Regal Ultis, 40%","Chivas Regal Ultis, 40%",Blended Malt Scotch Whisky,96.0,160.0,$,"Captivating, enticing, and wonderfully charmin...",40.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [128]:
# get columns with numerical data only
features = [col for col in df.columns if col not in ['name','category','currency', 'description']]

In [130]:
# mnake data frame of only numerical features
df = df[features]

In [131]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 2223 entries, Johnnie Walker Blue Label, 40% to Distillery Select 'Inchmoan' (distilled at Loch Lomond), Cask #151, 13 year old, 1992 vintage, 45%
Columns: 69415 entries, review.point to ìle release
dtypes: float64(69413), object(2)
memory usage: 1.1+ GB


In [133]:
#get rid of nay non-numerical features
df = df.select_dtypes(exclude=object)

In [134]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 2223 entries, Johnnie Walker Blue Label, 40% to Distillery Select 'Inchmoan' (distilled at Loch Lomond), Cask #151, 13 year old, 1992 vintage, 45%
Columns: 69413 entries, review.point to ìle release
dtypes: float64(69413)
memory usage: 1.1+ GB


In [139]:
#check for nulls
df.isnull().sum()

review.point           44
price                  44
alc                    44
cat                    44
00                     44
000                    44
000 bottle             44
000 bottled            44
000 bottles            44
000 cases              44
000 extravagantly      44
000 individually       44
000 liters             44
000 special            44
000 strong             44
000 think              44
000 whisky             44
002                    44
002 lemonade           44
011                    44
011 500                44
060                    44
060 bottles            44
076                    44
076 bottles            44
08                     44
08 masterclass         44
080                    44
080 bottles            44
090                    44
                       ..
zing ginger            44
zing maturity          44
zing notes             44
zing palate            44
zing showing           44
zing truly             44
zing whisky            44
zinginess   

In [140]:
#drop null values
df.dropna(inplace=True)

In [141]:
df.isnull().sum()

review.point           0
price                  0
alc                    0
cat                    0
00                     0
000                    0
000 bottle             0
000 bottled            0
000 bottles            0
000 cases              0
000 extravagantly      0
000 individually       0
000 liters             0
000 special            0
000 strong             0
000 think              0
000 whisky             0
002                    0
002 lemonade           0
011                    0
011 500                0
060                    0
060 bottles            0
076                    0
076 bottles            0
08                     0
08 masterclass         0
080                    0
080 bottles            0
090                    0
                      ..
zing ginger            0
zing maturity          0
zing notes             0
zing palate            0
zing showing           0
zing truly             0
zing whisky            0
zinginess              0
zinginess continues    0


In [147]:
df.head()

Unnamed: 0,review.point,price,alc,cat,00,000,000 bottle,000 bottled,000 bottles,000 cases,...,zingy yes,zippy,zippy acidity,zippy clean,zone,zone moderate,ìle,ìle 2016,ìle limited,ìle release
"Johnnie Walker Blue Label, 40%",97.0,0.0,40.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"Black Bowmore, 1964 vintage, 42 year old, 40.5%",97.0,0.0,40.5,4.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"Bowmore 46 year old (distilled 1964), 42.9%",97.0,0.0,42.9,4.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"Compass Box The General, 53.4%",96.0,0.0,53.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"Chivas Regal Ultis, 40%",96.0,0.0,40.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [143]:
# save this dataframe as .csv file
df.to_csv('./df.csv', index_label=False)

In [145]:
# read in .csv as datframe
df = pd.read_csv('./df.csv')

In [146]:
df.head()

Unnamed: 0,review.point,price,alc,cat,00,000,000 bottle,000 bottled,000 bottles,000 cases,...,zingy yes,zippy,zippy acidity,zippy clean,zone,zone moderate,ìle,ìle 2016,ìle limited,ìle release
"Johnnie Walker Blue Label, 40%",97.0,0.0,40.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"Black Bowmore, 1964 vintage, 42 year old, 40.5%",97.0,0.0,40.5,4.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"Bowmore 46 year old (distilled 1964), 42.9%",97.0,0.0,42.9,4.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"Compass Box The General, 53.4%",96.0,0.0,53.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"Chivas Regal Ultis, 40%",96.0,0.0,40.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


# Distance Metric - Cosine

In [149]:
#run cosine_similarity on dataframe
cos = cosine_similarity(df)

In [151]:
# create dataframe of cosine similarity values where the index is names and the columns is names
recs = pd.DataFrame(cos, index = df.index, columns = df.index)

In [153]:
recs.head()

Unnamed: 0,"Johnnie Walker Blue Label, 40%","Black Bowmore, 1964 vintage, 42 year old, 40.5%","Bowmore 46 year old (distilled 1964), 42.9%","Compass Box The General, 53.4%","Chivas Regal Ultis, 40%","Ardbeg Corryvreckan, 57.1%","Gold Bowmore, 1964 vintage, 42.4%","Bowmore, 40 year old, 44.8%","The Dalmore, 50 year old, 52.8%","Glenfarclas Family Casks 1954 Cask #1260, 47.2%",...,"Islay Mist, 8 year old, 40%","The Singleton of Dufftown 28 year old, 52.3%","Clan Denny (distilled at Girvan) 1992 21 year old HH9451, 59.6%","Douglas Laing Single Minded (distilled at Jura) 8 year old, 41.5%","Single Malts of Scotland (distilled at Craigellachie) 1996, 52.7%","High Commissioner, 40%","The Arran Malt, 43%","Bowmore, 16 year old, 1990 vintage, 53.8%","Bruichladdich 'Waves', 46%","Inchmurrin 15 year old, 46%"
"Johnnie Walker Blue Label, 40%",1.0,0.99017,0.991265,0.984664,0.990698,0.98129,0.985275,0.989971,0.982504,0.988512,...,0.988273,0.970474,0.95449,0.985258,0.972248,0.987454,0.984687,0.968607,0.980744,0.980909
"Black Bowmore, 1964 vintage, 42 year old, 40.5%",0.99017,1.0,0.991693,0.984868,0.990051,0.983391,0.987846,0.991289,0.985258,0.989199,...,0.989604,0.972514,0.956911,0.987694,0.974304,0.98804,0.986691,0.971231,0.98222,0.983631
"Bowmore 46 year old (distilled 1964), 42.9%",0.991265,0.991693,1.0,0.988023,0.991137,0.986286,0.987828,0.992785,0.986493,0.991293,...,0.990743,0.977084,0.962042,0.989244,0.978558,0.990453,0.989288,0.975499,0.986046,0.986205
"Compass Box The General, 53.4%",0.984664,0.984868,0.988023,1.0,0.985396,0.992465,0.982754,0.989678,0.990245,0.990552,...,0.99172,0.987148,0.977797,0.990216,0.988772,0.991015,0.991874,0.987393,0.991492,0.991133
"Chivas Regal Ultis, 40%",0.990698,0.990051,0.991137,0.985396,1.0,0.982043,0.985321,0.989912,0.982702,0.988357,...,0.988654,0.970978,0.95523,0.985376,0.972369,0.988046,0.984874,0.969178,0.981251,0.98079


In [209]:
# save cosine similarity matrix as .csv
recs.to_csv('./recs.csv', index_label=False)

In [47]:
# run this cell for reccomender
#read in cosine similarity matrix as .csv
import pandas as pd
recs=pd.read_csv('./recs.csv')
recs.head()

Unnamed: 0,"Johnnie Walker Blue Label, 40%","Black Bowmore, 1964 vintage, 42 year old, 40.5%","Bowmore 46 year old (distilled 1964), 42.9%","Compass Box The General, 53.4%","Chivas Regal Ultis, 40%","Ardbeg Corryvreckan, 57.1%","Gold Bowmore, 1964 vintage, 42.4%","Bowmore, 40 year old, 44.8%","The Dalmore, 50 year old, 52.8%","Glenfarclas Family Casks 1954 Cask #1260, 47.2%",...,"Islay Mist, 8 year old, 40%","The Singleton of Dufftown 28 year old, 52.3%","Clan Denny (distilled at Girvan) 1992 21 year old HH9451, 59.6%","Douglas Laing Single Minded (distilled at Jura) 8 year old, 41.5%","Single Malts of Scotland (distilled at Craigellachie) 1996, 52.7%","High Commissioner, 40%","The Arran Malt, 43%","Bowmore, 16 year old, 1990 vintage, 53.8%","Bruichladdich 'Waves', 46%","Inchmurrin 15 year old, 46%"
"Johnnie Walker Blue Label, 40%",1.0,0.99017,0.991265,0.984664,0.990698,0.98129,0.985275,0.989971,0.982504,0.988512,...,0.988273,0.970474,0.95449,0.985258,0.972248,0.987454,0.984687,0.968607,0.980744,0.980909
"Black Bowmore, 1964 vintage, 42 year old, 40.5%",0.99017,1.0,0.991693,0.984868,0.990051,0.983391,0.987846,0.991289,0.985258,0.989199,...,0.989604,0.972514,0.956911,0.987694,0.974304,0.98804,0.986691,0.971231,0.98222,0.983631
"Bowmore 46 year old (distilled 1964), 42.9%",0.991265,0.991693,1.0,0.988023,0.991137,0.986286,0.987828,0.992785,0.986493,0.991293,...,0.990743,0.977084,0.962042,0.989244,0.978558,0.990453,0.989288,0.975499,0.986046,0.986205
"Compass Box The General, 53.4%",0.984664,0.984868,0.988023,1.0,0.985396,0.992465,0.982754,0.989678,0.990245,0.990552,...,0.99172,0.987148,0.977797,0.990216,0.988772,0.991015,0.991874,0.987393,0.991492,0.991133
"Chivas Regal Ultis, 40%",0.990698,0.990051,0.991137,0.985396,1.0,0.982043,0.985321,0.989912,0.982702,0.988357,...,0.988654,0.970978,0.95523,0.985376,0.972369,0.988046,0.984874,0.969178,0.981251,0.98079


In [49]:
# run cell input a name of a Scotch, see the top five similar Scotches, based on cosine similarity
search = input()
if len(recs[recs.columns.str.contains(search)].index) == 0:
        print ("Not in Directory")
else:
    for sco in recs[recs.columns.str.contains(search)].index:
        print(sco)
        print("")
        print("Similar Scotches")
        print(recs[sco].sort_values(ascending=False)[1:6])
        print("")
        print("")

Ardmore Traditional
Ardmore Traditional Cask 1998

Similar Scotches
Gordon & MacPhail (distilled at Glenlossie), 27 year old, 1978 vintage, cask #1815    0.997463
Balvenie 1973 Vintage, 30 year old, Cask #9219                                        0.950143
Adelphi (distilled at Glenrothes) 7 year old, 67.4%                                   0.887637
Caol Ila Unpeated 12 year old Special Release 2011, 64%                               0.877481
Wemyss Malts Fruit Bonbons (distilled at Glen Garioch) 1989, 66%                      0.873136
Name: Ardmore Traditional Cask 1998, dtype: float64


Ardmore Traditional Cask, 46%

Similar Scotches
Arran Single Island Malt, Non-chill-filtered, 46%                             0.993025
Douglas Laing Old Particular (distilled at Blair Athol) 20 year old, 51.5%    0.992914
Bruichladdich 3D, The Big Peat, 50%                                           0.992875
Douglas Laing Provenance (distilled at Caol Ila) 6 year old, 46%              0.992809
Lagavul