# **Enter user text in the next cell and run all cells.**

In [0]:
user_input = "text, Relaxed, Violet, Aroused, Creative, Happy, Energetic, Flowery, Diesel"

# Medicine Predictor App:
## Compact version of Basilica.ai Medicine Cabinet Build_Group_3 Model
## With user_text input & top_5_recommendations output

This model uses stored values to save time, instead of:

A. calculating the user_text_embedding each time, we will only do it once

B. calculating the cultivar embedding for each query, it is done in advance (pickled).

### How this model works internally:
1. the user enters text (describing what medicine they want)
2. the user text is then given a numerical description: an "embedding value."
3. comparable 'embedding values' were pre-calculated and stored (pickled) 
4. the user text is compared for measured similarity or difference with the medicine descriptions, and each is scored.
5. The top 5 scored medicines are returned as recommendations as text 

A less compact model including how the pickle file was made,
feature engineering, etc:

https://colab.research.google.com/drive/1VDcJ-Do6Ylg1X3MAz9VhVZswWG8x7LsQ

In [0]:
!pip install basilica

Collecting basilica
  Downloading https://files.pythonhosted.org/packages/68/19/6216f1c0ad6d0f738bd1061cb5c65097021b41f3891046fac87bc4c4e1ae/basilica-0.2.8.tar.gz
Building wheels for collected packages: basilica
  Building wheel for basilica (setup.py) ... [?25l[?25hdone
  Created wheel for basilica: filename=basilica-0.2.8-cp36-none-any.whl size=4710 sha256=de5bda29ad6640f37f42487fc3254289bc8410cb770e99948df896d638c343ab
  Stored in directory: /root/.cache/pip/wheels/31/18/9f/46f6face8baf98e31b52bf91a0d76930ec76860f9e9211104d
Successfully built basilica
Installing collected packages: basilica
Successfully installed basilica-0.2.8


In [0]:
# Import The Libraries & Packages
import basilica
import numpy as np
import pandas as pd
from scipy import spatial

In [0]:
# Set The Variables
user_input = ""
user_input_embedding = []

In [0]:
# Website for the current data source: https://www.kaggle.com/kingburrito666/cannabis-strains
# csv here for download
#https://drive.google.com/open?id=15-KMmSgxISrH8WtGGZPSB-C7DANSMLny

# download the data into the current working directory/folder
!wget https://raw.githubusercontent.com/MedCabinet/ML_Machine_Learning_Files/master/med1.csv

--2020-01-03 21:05:44--  https://raw.githubusercontent.com/MedCabinet/ML_Machine_Learning_Files/master/med1.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1267451 (1.2M) [text/plain]
Saving to: ‘med1.csv’


2020-01-03 21:05:45 (26.1 MB/s) - ‘med1.csv’ saved [1267451/1267451]



In [0]:
!wget https://github.com/lineality/4.4_Build_files/raw/master/medembedv2.pkl

--2020-01-03 21:05:46--  https://github.com/lineality/4.4_Build_files/raw/master/medembedv2.pkl
Resolving github.com (github.com)... 140.82.113.3
Connecting to github.com (github.com)|140.82.113.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/lineality/4.4_Build_files/master/medembedv2.pkl [following]
--2020-01-03 21:05:46--  https://raw.githubusercontent.com/lineality/4.4_Build_files/master/medembedv2.pkl
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16288303 (16M) [application/octet-stream]
Saving to: ‘medembedv2.pkl’


2020-01-03 21:05:46 (132 MB/s) - ‘medembedv2.pkl’ saved [16288303/16288303]



In [0]:
# inspect, see what's loaded
!ls

med1.csv  medembedv2.pkl  sample_data


In [0]:
# put the data into a dataframe named df
df = pd.read_csv('med1.csv')

In [0]:
# if not done above, enter user input
#user_input = "text, Relaxed, Violet, Aroused, Creative, Happy, Energetic, Flowery, Diesel"

## Function to calculate_user_text_embedding (no For-Loop needed)

In [0]:
# a function to calculate_user_text_embedding
# to save the embedding value in session memory

score = 0

def calculate_user_text_embedding(input, user_input_embedding):

  # setting a string of two sentences for the algo to compare
  sentences = [input]

  # calculating embedding for both user_entered_text and for features
  with basilica.Connection('36a370e3-becb-99f5-93a0-a92344e78eab') as c:
    user_input_embedding = list(c.embed_sentences(sentences))
  
  return user_input_embedding

# run the function to save the embedding value in session memory
user_input_embedding = calculate_user_text_embedding(user_input, user_input_embedding)

## Restoring embedding values from Pickle File

In [0]:
#unpickling file
unpickled_df_test = pd.read_pickle("./medembedv2.pkl")

# Function to score_user_input_from_stored_embedding_from_stored_values
## goes with a for-loop

In [0]:
# (works)
#v2
# score_user_input_from_stored_embedding_from_stored_values
# a function to look for the similarity score, comparing the user input
# to each of the cultivars (strains)

score = 0

def score_user_input_from_stored_embedding_from_stored_values(input, score, row1, user_input_embedding):

  # obtains pre-calculated values from a pickled dataframe of arrays
  embedding_stored = unpickled_df_test.loc[row1, 0]
  
  # calculates the similarity of user_text vs. product description
  score = 1 - spatial.distance.cosine(embedding_stored, user_input_embedding)

  # returns a variable that can be used outside of the function
  return score

# For-Loop to score_user_input_from_stored_embedding_from_stored_values

In [0]:
# For the real model this will be set to the fill dataset value, currently 2351
# see dataframe shape above

for i in range(2351):
  # calls the function to set the value of 'score'
  # which is the score of the user input
  score = score_user_input_from_stored_embedding_from_stored_values(user_input, score, i, user_input_embedding)
  
  #stores the score in the dataframe
  df.loc[i,'score'] = score
  
  # optionally prints the score as verbosity for the user
  print(df.loc[i,'score'])
  print(i)

0.3558788114020005
0
0.32919572719148704
1
0.365484269621424
2
0.3655727577400646
3
0.3795005712499453
4
0.30876512721795013
5
0.38836334068771905
6
0.371575646899987
7
0.3714658815860108
8
0.36506568076411083
9
0.3332568102462974
10
0.3469042264501103
11
0.35611907548223654
12
0.36519998545698473
13
0.3681638325609451
14
0.35869068692869444
15
0.30876512721795013
16
0.3633560680100376
17
0.3704766975485103
18
0.3627291228817475
19
0.36295889308429785
20
0.3703407079678789
21
0.3662296030506813
22
0.38141282424607503
23
0.36893290732999895
24
0.3394615313519026
25
0.3394757631513896
26
0.38463588565437656
27
0.3806390746626016
28
0.3733555725499198
29
0.35796543775886125
30
0.3571313095226356
31
0.34116458839449737
32
0.3751850241238124
33
0.3501339959842049
34
0.35771916733306586
35
0.35343442666722147
36
0.34568156803971717
37
0.34734889442005634
38
0.3705989536962877
39
0.35166857640229976
40
0.3499208797302551
41
0.3478426393671836
42
0.3548638111050977
43
0.3593889398547947
44
0.3

In [0]:
output = df['Strain'].groupby(df['score']).value_counts().nlargest(5, keep='last')

In [0]:
output

score     Strain         
0.356251  B-Witched          2
0.399902  Monkey-Paw         1
0.398082  Harle-Tsu          1
0.397844  Cbd-Kush           1
0.397239  Turbo-Mind-Warp    1
Name: Strain, dtype: int64

In [0]:
# you could select more specific data to specify, 
# but here is all the output as a string
output_string = str(output)

In [0]:
output_string

'score     Strain         \n0.356251  B-Witched          2\n0.399902  Monkey-Paw         1\n0.398082  Harle-Tsu          1\n0.397844  Cbd-Kush           1\n0.397239  Turbo-Mind-Warp    1\nName: Strain, dtype: int64'