# **Building a Recommendation System for Alternative Medicines or Kits**

This guide focuses on creating a recommendation system using Collaborative Filtering and Content-Based Filtering approaches. We'll simulate synthetic data and use Google BigQuery as the data source.
Approach

##     Problem Definition:
        Suggest alternative medicines or kits to pharmacies based on purchase history.
        Recommend manufacturing priorities by analyzing demand trends.
##    Hypothesis:
        Collaborative Filtering Hypothesis: Users (pharmacies) with similar purchasing behavior will likely be interested in similar medicines or kits.
        Content-Based Filtering Hypothesis: If a pharmacy prefers certain attributes (e.g., active ingredients, type of kits), they will prefer items with similar attributes.

##    Actions:
        Generate synthetic purchase and product data.
        Load data from Google BigQuery.
        Build both Collaborative and Content-Based Filtering models.
        Compare results to choose the best approach.

# Generate Synthetic Data

In [1]:
import pandas as pd
import numpy as np

# Generate synthetic data
np.random.seed(42)

# Pharmacies
pharmacies = [f"Pharmacy_{i}" for i in range(1, 51)]

# Medicines/Kits
medicines = [f"Medicine_{i}" for i in range(1, 101)]

# Generate purchase history (ratings: 1-5)
data = []
for pharmacy in pharmacies:
    for medicine in np.random.choice(medicines, size=np.random.randint(5, 15), replace=False):
        data.append([pharmacy, medicine, np.random.randint(1, 6)])

purchase_data = pd.DataFrame(data, columns=["pharmacy", "medicine", "rating"])
print(purchase_data.head())


     pharmacy     medicine  rating
0  Pharmacy_1  Medicine_84       3
1  Pharmacy_1  Medicine_54       1
2  Pharmacy_1  Medicine_71       3
3  Pharmacy_1  Medicine_46       3
4  Pharmacy_1  Medicine_45       1


# Load Data from Google BigQuerry

In [2]:
from google.cloud import bigquery

# Initialize BigQuery client
client = bigquery.Client()

# Query BigQuery
query = """
SELECT pharmacy_id, medicine_id, rating
FROM `your_project.your_dataset.purchase_data`
"""
purchase_data = client.query(query).to_dataframe()
print(purchase_data.head())

RefreshError: ("Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true from the Google Compute Engine metadata service. Status: 404 Response:\nb''", <google.auth.transport.requests._Response object at 0x78be984baad0>)

# Collaborative Filtering

We'll use a matrix factorization approach (Singular Value Decomposition - SVD)

In [3]:
from surprise import SVD, Dataset, Reader
from surprise.model_selection import train_test_split, accuracy

# Prepare data for Surprise
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(purchase_data, reader)

# Train-test split
trainset, testset = train_test_split(data, test_size=0.2, random_state=42)

# Train model
model = SVD()
model.fit(trainset)

# Evaluate model
predictions = model.test(testset)
rmse = accuracy.rmse(predictions)
print(f"RMSE: {rmse}")

ModuleNotFoundError: No module named 'surprise'

# Make predipredictions

In [4]:
pharmacy_id = "Pharmacy_1"
medicine_id = "Medicine_10"

predicted_rating = model.predict(pharmacy_id, medicine_id)
print(predicted_rating)

NameError: name 'model' is not defined

# Step 4: Content-Based Filtering

For this approach, we use the features of medicines and preferences of pharmacies.

Synthetic Data Generation:

In [5]:
# Generate features for medicines
medicine_features = pd.DataFrame({
    "medicine": medicines,
    "active_ingredient": np.random.choice(["Ingredient_A", "Ingredient_B", "Ingredient_C"], size=100),
    "type": np.random.choice(["Tablet", "Syrup", "Injection"], size=100)
})
print(medicine_features.head())

     medicine active_ingredient       type
0  Medicine_1      Ingredient_A      Syrup
1  Medicine_2      Ingredient_C  Injection
2  Medicine_3      Ingredient_B  Injection
3  Medicine_4      Ingredient_B     Tablet
4  Medicine_5      Ingredient_B     Tablet


# ImpleImplementation

In [6]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Create text-based features
medicine_features["features"] = (medicine_features["active_ingredient"] + " " + medicine_features["type"])

# Vectorize features
vectorizer = TfidfVectorizer()
feature_matrix = vectorizer.fit_transform(medicine_features["features"])

# Compute similarity
similarity_matrix = cosine_similarity(feature_matrix)

# Recommend based on similarity
def recommend_medicine(medicine_id, top_n=5):
    idx = medicine_features[medicine_features["medicine"] == medicine_id].index[0]
    scores = list(enumerate(similarity_matrix[idx]))
    scores = sorted(scores, key=lambda x: x[1], reverse=True)
    recommended_indices = [i[0] for i in scores[1:top_n+1]]
    return medicine_features.iloc[recommended_indices]["medicine"]

print(recommend_medicine("Medicine_10"))

9     Medicine_10
15    Medicine_16
18    Medicine_19
31    Medicine_32
65    Medicine_66
Name: medicine, dtype: object


# Compare and Deploy

    Compare RMSE (Collaborative Filtering) and recommendation accuracy (Content-Based Filtering).
    Deploy the chosen model using Flask or FastAPI with BigQuery as the backend

In [7]:
from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/recommend', methods=['GET'])
def recommend():
    pharmacy_id = request.args.get('pharmacy_id')
    medicine_id = request.args.get('medicine_id')
    # Use collaborative or content-based model
    prediction = model.predict(pharmacy_id, medicine_id)
    return jsonify({"pharmacy": pharmacy_id, "medicine": medicine_id, "predicted_rating": prediction.est})

if __name__ == '__main__':
    app.run(debug=True)


 * Serving Flask app '__main__'
 * Debug mode: on


 * Running on http://127.0.0.1:5000
INFO:werkzeug:[33mPress CTRL+C to quit[0m
INFO:werkzeug: * Restarting with stat
