# Objective
The purpose is to create a demonstable prototype that mines purchase data and predicts categories similar to the input.
For elucidations sake we will divide the summary problem into sub problems.
*  Problem 1 -Given item A predict item B which is most associated through purchase patterns.
*  Problem 2 -Given item A predict a category which has the most similarity to the item's category


# Packages required
* Pandas for data frame
* Numpy for arrays
* Scipy for sparse jaccardian

In [2]:
import pandas as pd
import numpy as np
from scipy import sparse

# Problem 1

# Import Data

In [3]:
#data = pd.read_csv('../data/FMCGSales.csv', names = ['BillId','ItemId','ItemName','Level1','Level2','Level3','Level4','Level5','Level6'] )


In [4]:
data = pd.read_csv('../data/1LakhFMCGSalesWithCategory.csv', names = ['BillId','ItemId','ItemName','Level1','Level2','Level3','Level4','Level5','Level6'] )


In [5]:
#Dummy for pivot table
data['dummy'] = 1

In [6]:
data.head()

Unnamed: 0,BillId,ItemId,ItemName,Level1,Level2,Level3,Level4,Level5,Level6,dummy
0,121210,52344,GULABARI ROSE GLOW CLEANSER 100ML,FMCG,FMCG NON FOOD,PERSONAL CARE,SKIN CARE,FACE WASH,CLEANSING,1
1,121270,59909,J&J BABY WIPES 80PCS,FMCG,FMCG NON FOOD,PERSONAL CARE,BABY CARE,HYGIENE,WIPES,1
2,121321,103829,SAT ISABGOL 100GM,FMCG,AYUSH,AYURVEDIC,POWDER,CHURAN,,1
3,121360,30225,COLGATE GEL MAXFRESH RED 150GM,FMCG,FMCG NON FOOD,PERSONAL CARE,ORAL CARE,TOOTHPASTE,GEL,1
4,121788,91629,PATANJALI DANT KANTI MEDI ORAL GEL 100GM,FMCG,FMCG NON FOOD,PERSONAL CARE,ORAL CARE,TOOTHPASTE,GEL,1


# Data Exploration

In [7]:
data.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
BillId,100000.0,11117620.0,7275864.0,129.0,1533840.0,16334841.5,16439740.75,16544632.0
ItemId,100000.0,65121.84,33269.99,9254.0,33533.0,59909.0,91113.0,127464.0
dummy,100000.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0


In [8]:
#is any row NULL ?
data.isnull().any().any(), data.shape

(True, (100000, 10))

In [9]:
#describe nullness
data.isnull().sum(axis=0)

BillId          0
ItemId          0
ItemName        0
Level1          0
Level2          0
Level3          0
Level4          0
Level5         12
Level6      10241
dummy           0
dtype: int64

In [10]:
data = data.dropna()

In [11]:
#is any row NULL ?
data.isnull().any().any(), data.shape

(False, (89759, 10))

Level 5 has 5 null values
Level 6 has 1286 null Values
Use data.dropna() to drop null rows if we are using that. Dont know if filling null values will solve the problem because the null values can have many values. Perhaps clustering can be employed to label these products first.

In [12]:
#test for loc arbitrary item Id 220862
data.loc[data['BillId'] == 1238]

Unnamed: 0,BillId,ItemId,ItemName,Level1,Level2,Level3,Level4,Level5,Level6,dummy
56,1238,101287,ROGAN BADAM SIRIN 25ML,FMCG,FMCG FOOD,GROCERY,COOKING MEDIUM,OIL,ALMOND OIL,1
119,1238,52331,GULABARI 250ML,FMCG,FMCG NON FOOD,PERSONAL CARE,SKIN CARE,LOTION,BEAUTY & NOURISHMENT,1
200,1238,51605,GOOD KNIGHT ADVANCE REFILL 45ML,FMCG,FMCG NON FOOD,HOME CARE,PEST CONTROL,LIQUID,MOSQUITO REPELLENT,1
234,1238,54581,HIMALAYA FACE WASH PURIFYING NEEM 50ML,FMCG,FMCG NON FOOD,PERSONAL CARE,SKIN CARE,FACE WASH,NEEM,1
313,1238,89822,PAMPERS BD S 22PCS,FMCG,FMCG NON FOOD,PERSONAL CARE,BABY CARE,HYGIENE,DIAPERS S,1


In [18]:
data.loc[data['Level5'] == 'FACE WASH']

Unnamed: 0,BillId,ItemId,ItemName,Level1,Level2,Level3,Level4,Level5,Level6,dummy
0,121210,52344,GULABARI ROSE GLOW CLEANSER 100ML,FMCG,FMCG NON FOOD,PERSONAL CARE,SKIN CARE,FACE WASH,CLEANSING,1
8,122996,52344,GULABARI ROSE GLOW CLEANSER 100ML,FMCG,FMCG NON FOOD,PERSONAL CARE,SKIN CARE,FACE WASH,CLEANSING,1
20,121360,54573,HIMALAYA FACE WASH OIL CLEAR LEMON FOAMING 150ML,FMCG,FMCG NON FOOD,PERSONAL CARE,SKIN CARE,FACE WASH,CLEANSING,1
25,122996,48324,GARNIER FACE WASH MEN OIL CLEAR 100GM,FMCG,FMCG NON FOOD,PERSONAL CARE,SKIN CARE,FACE WASH,CLEANSING,1
28,12247,48337,GARNIER FACE WASH OC MATCHA D TOX GEL 50GM,FMCG,FMCG NON FOOD,PERSONAL CARE,SKIN CARE,FACE WASH,CLEANSING,1
40,123103,52344,GULABARI ROSE GLOW CLEANSER 100ML,FMCG,FMCG NON FOOD,PERSONAL CARE,SKIN CARE,FACE WASH,CLEANSING,1
43,121109,23475,CLEAN & CLEAR FW MORNING LEMON 100ML,FMCG,FMCG NON FOOD,PERSONAL CARE,SKIN CARE,FACE WASH,CLEANSING,1
188,122680,48357,GARNIER WHITE COMPLETE FACE WASH 100GM,FMCG,FMCG NON FOOD,PERSONAL CARE,SKIN CARE,FACE WASH,FAIRNESS,1
197,122277,54521,HIMALAYA FACE WASH CLEAR COMPLEX WHITE 100ML,FMCG,FMCG NON FOOD,PERSONAL CARE,SKIN CARE,FACE WASH,FAIRNESS,1
199,123140,38404,DS GLOW FW 100GM,FMCG,FMCG NON FOOD,PERSONAL CARE,SKIN CARE,FACE WASH,GLOW,1


In [13]:
data.loc[data['Level5'] == 'RTC']

Unnamed: 0,BillId,ItemId,ItemName,Level1,Level2,Level3,Level4,Level5,Level6,dummy
41843,16304795,10377,ACT II CLASSIC SALTED 40GM 1X40GM,FMCG,FMCG FOOD,PROCESSED FOOD,SAVOURIES,RTC,POP CORN,1
45816,16322518,10387,ACT II GOLDEN SIZZLE 35GM 10/-,FMCG,FMCG FOOD,PROCESSED FOOD,SAVOURIES,RTC,POP CORN,1
49437,16343795,10377,ACT II CLASSIC SALTED 40GM 1X40GM,FMCG,FMCG FOOD,PROCESSED FOOD,SAVOURIES,RTC,POP CORN,1
53145,16356480,10390,ACT II MAGIC BUTTER 30 GM,FMCG,FMCG FOOD,PROCESSED FOOD,SAVOURIES,RTC,POP CORN,1
53218,16353107,10377,ACT II CLASSIC SALTED 40GM 1X40GM,FMCG,FMCG FOOD,PROCESSED FOOD,SAVOURIES,RTC,POP CORN,1
54123,16354792,10390,ACT II MAGIC BUTTER 30 GM,FMCG,FMCG FOOD,PROCESSED FOOD,SAVOURIES,RTC,POP CORN,1
55328,16366466,10377,ACT II CLASSIC SALTED 40GM 1X40GM,FMCG,FMCG FOOD,PROCESSED FOOD,SAVOURIES,RTC,POP CORN,1
57736,16369702,10377,ACT II CLASSIC SALTED 40GM 1X40GM,FMCG,FMCG FOOD,PROCESSED FOOD,SAVOURIES,RTC,POP CORN,1
61695,16387442,10377,ACT II CLASSIC SALTED 40GM 1X40GM,FMCG,FMCG FOOD,PROCESSED FOOD,SAVOURIES,RTC,POP CORN,1
63412,16395194,10380,ACT II DIET POP CORN 0.6 LESS SALT 70GM,FMCG,FMCG FOOD,PROCESSED FOOD,SAVOURIES,RTC,POP CORN,1


In [13]:
data[data['ItemName'].str.contains("MAGGI")].sort_values(by='ItemName')

Unnamed: 0,BillId,ItemId,ItemName,Level1,Level2,Level3,Level4,Level5,Level6,dummy
67630,16415038,71542,MAGGI NOODLES 100GM,FMCG,FMCG FOOD,PROCESSED FOOD,NOODLES & PASTA,NOODLES,OTHERS,1
25232,1539832,71543,MAGGI NOODLES 140GM,FMCG,FMCG FOOD,PROCESSED FOOD,NOODLES & PASTA,NOODLES,OTHERS,1
69502,16422347,71553,MAGGI NOODLES 70GM,FMCG,FMCG FOOD,PROCESSED FOOD,NOODLES & PASTA,NOODLES,OTHERS,1
78606,16454859,71553,MAGGI NOODLES 70GM,FMCG,FMCG FOOD,PROCESSED FOOD,NOODLES & PASTA,NOODLES,OTHERS,1
78436,16457893,71553,MAGGI NOODLES 70GM,FMCG,FMCG FOOD,PROCESSED FOOD,NOODLES & PASTA,NOODLES,OTHERS,1
78434,16457794,71553,MAGGI NOODLES 70GM,FMCG,FMCG FOOD,PROCESSED FOOD,NOODLES & PASTA,NOODLES,OTHERS,1
77908,16457149,71553,MAGGI NOODLES 70GM,FMCG,FMCG FOOD,PROCESSED FOOD,NOODLES & PASTA,NOODLES,OTHERS,1
77896,16456602,71553,MAGGI NOODLES 70GM,FMCG,FMCG FOOD,PROCESSED FOOD,NOODLES & PASTA,NOODLES,OTHERS,1
77894,16456432,71553,MAGGI NOODLES 70GM,FMCG,FMCG FOOD,PROCESSED FOOD,NOODLES & PASTA,NOODLES,OTHERS,1
77513,16454033,71553,MAGGI NOODLES 70GM,FMCG,FMCG FOOD,PROCESSED FOOD,NOODLES & PASTA,NOODLES,OTHERS,1


In [14]:
listed = data['Level5'].unique()  


In [15]:
listed_df = pd.DataFrame(listed)

In [16]:
listed_df.sort_values(by = 0).to_csv('../data/listof_level5_values.csv')

# Pivot Table

In [17]:
matrix = data.pivot_table(values='dummy',index ='BillId', columns ='Level5')
matrix.head()

Level5,ACCESSORIES,AEROSOL,BALM,BAR,BATH,BODY WASH,BOURBON,BREAKFAST SPREAD,BUTTER,CAKE,...,TAPE,TOILET TISSUE,TOOTH BRUSH,TOOTH POWDER,TOOTHPASTE,WAFERS,WATER,WAX,WHOLE SPICE,WIPES
BillId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
129,,,,,,,,,,,...,,,,,,,,,,
177,,,,,,,,1.0,,,...,,,,,,,,,,
179,,,,,,,,,,,...,,,,,,,,,,
1214,,,1.0,,,,,,,,...,,,,,,,,,,
1216,,,,,,,,,,,...,,,,,,,,,,


In [18]:
#matrix_dummy[['TOOTH BRUSH','TOOTH PASTE']].to_csv('../data/toothlistforcheck.csv')

In [19]:
matrix.shape

(65437, 105)

In [20]:
matrix.to_csv('../data/matrixpivottable.csv')

In [21]:
matrix_dummy = matrix.copy().fillna(0)

In [22]:
matrix_dummy.head()

Level5,ACCESSORIES,AEROSOL,BALM,BAR,BATH,BODY WASH,BOURBON,BREAKFAST SPREAD,BUTTER,CAKE,...,TAPE,TOILET TISSUE,TOOTH BRUSH,TOOTH POWDER,TOOTHPASTE,WAFERS,WATER,WAX,WHOLE SPICE,WIPES
BillId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
177,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
179,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1214,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1216,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


# Skewed Jaccardian
We define a jaccardian as intersection over union. A skewed Jaccardian is intersection over set A

### method from http://na-o-ys.github.io/others/2015-11-07-sparse-vector-similarities.html

In [23]:
#needs parameter in scipy.sparse.csc_matrix
type(matrix_dummy)

pandas.core.frame.DataFrame

# Experiment

In [24]:
sparse_matrix = sparse.csc_matrix(matrix_dummy)

In [25]:
type(sparse_matrix)

scipy.sparse.csc.csc_matrix

In [26]:
def jaccard_similarities(mat):
    cols_sum = mat.getnnz(axis=0)
    ab = mat.T * mat

    # for rows
    aa = np.repeat(cols_sum, ab.getnnz(axis=0))
    # for columns
    bb = cols_sum[ab.indices]

    similarities = ab.copy()
    similarities.data /= (aa)

    return similarities

In [27]:
jaccard_similarities =  jaccard_similarities(sparse_matrix)

In [28]:
jaccard_similarities

<105x105 sparse matrix of type '<class 'numpy.float64'>'
	with 4267 stored elements in Compressed Sparse Row format>

In [29]:
print(jaccard_similarities)

  (0, 96)	0.00065359477124183
  (0, 65)	0.00065359477124183
  (0, 13)	0.00065359477124183
  (0, 39)	0.00065359477124183
  (0, 90)	0.00065359477124183
  (0, 38)	0.00065359477124183
  (0, 67)	0.00065359477124183
  (0, 92)	0.00196078431372549
  (0, 33)	0.00065359477124183
  (0, 36)	0.00065359477124183
  (0, 27)	0.00065359477124183
  (0, 11)	0.00196078431372549
  (0, 53)	0.00130718954248366
  (0, 64)	0.0032679738562091504
  (0, 48)	0.004575163398692811
  (0, 6)	0.00392156862745098
  (0, 7)	0.0032679738562091504
  (0, 59)	0.004575163398692811
  (0, 71)	0.00718954248366013
  (0, 15)	0.00392156862745098
  (0, 8)	0.00196078431372549
  (0, 26)	0.02287581699346405
  (0, 17)	0.00130718954248366
  (0, 29)	0.01437908496732026
  (0, 81)	0.00130718954248366
  :	:
  (101, 12)	0.0005192107995846313
  (101, 8)	0.010903426791277258
  (101, 16)	0.036344755970924195
  (101, 11)	0.014537902388369679
  (101, 69)	0.0036344755970924196
  (101, 59)	0.0005192107995846313
  (101, 95)	0.0005192107995846313
  (101,

In [31]:
jaccardian = pd.DataFrame(jaccard_similarities.toarray(), index = matrix.columns,columns = matrix.columns)

In [32]:
jaccardian.head()

Level5,ACCESSORIES,AEROSOL,BALM,BAR,BATH,BODY WASH,BOURBON,BREAKFAST SPREAD,BUTTER,CAKE,...,TAPE,TOILET TISSUE,TOOTH BRUSH,TOOTH POWDER,TOOTHPASTE,WAFERS,WATER,WAX,WHOLE SPICE,WIPES
Level5,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
ACCESSORIES,1.0,0.0,0.008497,0.001307,0.0,0.004575,0.003922,0.003268,0.001961,0.016993,...,0.000654,0.000654,0.029412,0.0,0.031373,0.0,0.003268,0.0,0.0,0.0
AEROSOL,0.0,1.0,0.0,0.0,0.0,0.0,0.011299,0.00565,0.0,0.00565,...,0.0,0.0,0.011299,0.0,0.011299,0.0,0.0,0.0,0.0,0.0
BALM,0.009091,0.0,1.0,0.000699,0.0,0.001399,0.000699,0.006294,0.002797,0.009091,...,0.001399,0.0,0.00979,0.0,0.026573,0.0,0.002098,0.0,0.0,0.0
BAR,0.013793,0.0,0.006897,1.0,0.0,0.0,0.02069,0.0,0.013793,0.013793,...,0.0,0.0,0.082759,0.006897,0.17931,0.0,0.0,0.0,0.0,0.0
BATH,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [33]:
jaccardian.shape

(105, 105)

In [34]:
type(jaccardian)

pandas.core.frame.DataFrame

In [35]:
#df = pd.DataFrame(np.triu(jaccardian, 1), columns=jaccardian.columns, index=jaccardian.index)

In [36]:
#type(df), df.shape

(pandas.core.frame.DataFrame, (105, 105))

In [68]:
filtered = jaccardian[jaccardian[jaccardian < 1] > 0.10].stack()

In [69]:
filtered

Level5              Level5             
BAR                 OIL                    0.137931
                    SOAP                   0.386207
                    TOOTHPASTE             0.179310
BODY WASH           ACCESSORIES            0.112903
                    HYGIENE                0.241935
                    LOTION                 0.112903
                    OIL                    0.193548
                    SHAMPOO                0.161290
                    SOAP                   0.112903
BOURBON             CAKE                   0.104024
                    CHOCOLATE              0.123651
                    COOKIES                0.140334
BUTTER              CRACKER                0.103851
CAKE                COOKIES                0.118515
CANDY               CHOCOLATE              0.242248
CASHEW              CAKE                   0.101010
                    COOKIES                0.146465
                    CRACKER                0.151515
                    CREA

In [67]:
jaccardian.loc[['TOOTH BRUSH'],['TOOTH POWDER']]

Level5,TOOTH POWDER
Level5,Unnamed: 1_level_1
TOOTH BRUSH,0.001558


In [44]:
jaccardian.loc['TOOTH BRUSH'].sort_values(ascending=False)

Level5
TOOTH BRUSH               1.000000
TOOTHPASTE                0.365524
SOAP                      0.086708
OIL                       0.047767
CREAM                     0.047248
FACE WASH                 0.031672
CHOCOLATE                 0.029076
ACCESSORIES               0.023364
CRACKER                   0.018692
SHAMPOO                   0.018692
COOKIES                   0.016615
SHAVING                   0.015576
HYGIENE                   0.014538
LIQUID                    0.012461
HAND                      0.012461
LOTION                    0.011942
TALC                      0.011423
NAMKEEN                   0.011423
CAKE                      0.010903
FUNCTIONAL BEVERAGE       0.010903
FIRST Aid                 0.010384
CANDY                     0.009865
CONTRACEPTIVE             0.009865
BUTTER                    0.008827
GLUCOSE                   0.008307
DIGESTIVE CARE            0.007788
BALM                      0.007269
CHEWING GUM               0.006750
BAR          

In [45]:
jaccardian.loc['TOOTHPASTE'].sort_values(ascending=False)

Level5
TOOTHPASTE                1.000000
TOOTH BRUSH               0.228571
SOAP                      0.104221
OIL                       0.065260
CREAM                     0.057143
FACE WASH                 0.028896
SHAMPOO                   0.023052
FUNCTIONAL BEVERAGE       0.022727
HYGIENE                   0.020455
TALC                      0.019805
LIQUID                    0.017532
CHOCOLATE                 0.017208
SHAVING                   0.017208
LOTION                    0.015584
ACCESSORIES               0.015584
FIRST Aid                 0.015260
CRACKER                   0.014610
COOKIES                   0.013961
BALM                      0.012338
FRAGRANCES                0.012013
HAND                      0.011688
CAKE                      0.011364
NAMKEEN                   0.010065
DIGESTIVE CARE            0.009091
CONTRACEPTIVE             0.008766
BAR                       0.008442
GRANULES                  0.008117
BUTTER                    0.008117
CANDY        

In [46]:
jaccardian.loc['SOAP'].sort_values(ascending=False)

Level5
SOAP                      1.000000
TOOTHPASTE                0.121407
OIL                       0.114599
CREAM                     0.081316
TOOTH BRUSH               0.063162
SHAMPOO                   0.051437
HYGIENE                   0.050303
FACE WASH                 0.040847
CHOCOLATE                 0.031014
FIRST Aid                 0.027610
LOTION                    0.026475
COOKIES                   0.023828
TALC                      0.023071
BAR                       0.021180
HAND                      0.020802
FUNCTIONAL BEVERAGE       0.020045
LIQUID                    0.017398
CAKE                      0.016641
SHAVING                   0.016263
CRACKER                   0.015885
ACCESSORIES               0.015885
DETERGENT POWDER          0.015507
GLUCOSE                   0.012481
BALM                      0.010968
NAMKEEN                   0.010590
FRAGRANCES                0.010212
BUTTER                    0.010212
DIGESTIVE CARE            0.009455
GRANULES     

In [47]:
jaccardian.loc['FACE WASH'].sort_values(ascending=False)

Level5
FACE WASH              1.000000
CREAM                  0.094844
SOAP                   0.068746
OIL                    0.064927
TOOTHPASTE             0.056652
LOTION                 0.048377
TOOTH BRUSH            0.038829
FRAGRANCES             0.038192
SHAMPOO                0.036919
HYGIENE                0.032463
CHOCOLATE              0.028644
FUNCTIONAL BEVERAGE    0.023552
LIQUID                 0.020369
FIRST Aid              0.019733
SHAVING                0.019096
TALC                   0.016550
HAND                   0.015277
CAKE                   0.014004
DIGESTIVE CARE         0.012094
FACE GEL               0.011458
COOKIES                0.010821
COLOUR                 0.010185
ACCESSORIES            0.010185
CONDITIONER            0.008912
JUICE DRINK            0.008912
CONTRACEPTIVE          0.008912
CHEWING GUM            0.007638
FACE SCRUB             0.007638
FACE PACK              0.007002
CRACKER                0.007002
                         ...   
C

In [58]:
jaccardian.loc['POPCORN'].sort_values(ascending=False)

Level5
POPCORN                   1.0
RTC                       0.5
CHOCOLATE                 0.5
WIPES                     0.0
CONTRACEPTIVE             0.0
CORNFLAKES                0.0
COTTON                    0.0
CRACKER                   0.0
CREAM                     0.0
CREAM & OINTMENTS         0.0
DAIRY                     0.0
DEODORIZERS               0.0
DETERGENT POWDER          0.0
DIGESTIVE CARE            0.0
DISINFECTANT              0.0
EYE COSMETICS & BEAUTY    0.0
FACE GEL                  0.0
FACE PACK                 0.0
FACE SCRUB                0.0
FACE WASH                 0.0
FACIAL TISSUES            0.0
FIRST Aid                 0.0
FRAGRANCES                0.0
FUNCTIONAL BEVERAGE       0.0
GEL                       0.0
GIFT PACK                 0.0
GLUCOSE                   0.0
COOKIES                   0.0
CONDIMENT                 0.0
CONDITIONER               0.0
                         ... 
WAFERS                    0.0
WATER                     0.0
WAX

In [53]:
jaccardian.loc['CHOCOLATE'].sort_values(ascending=False)

Level5
CHOCOLATE                 1.000000
COOKIES                   0.067527
CREAM                     0.047325
CANDY                     0.046764
CAKE                      0.041339
CRACKER                   0.034231
CHEWING GUM               0.032361
NAMKEEN                   0.025814
BOURBON                   0.023569
GLUCOSE                   0.022634
JUICE DRINK               0.018331
BUTTER                    0.017396
SOAP                      0.015339
LOZENGES                  0.014590
DIGESTIVE CARE            0.014029
HYGIENE                   0.013655
OIL                       0.013094
WATER                     0.013094
TOOTH BRUSH               0.010475
TOOTHPASTE                0.009914
ACCESSORIES               0.009727
FUNCTIONAL BEVERAGE       0.008979
FACE WASH                 0.008418
NOODLES                   0.007295
CARBONATED                0.007108
LOTION                    0.006547
CONTRACEPTIVE             0.006547
MARIE                     0.006360
SHAMPOO      