# INTERNSHIP ASSIGNMENT PRODUCT RECOMMENDATION
Option 3: Recommendation
Complexity: Intermediate 
Task: Recommend item to the given customer id for a given date.
User Story: User should be able to provide a Customer ID and Date, and program should be able to recommend item to be purchased# 

# Explaination of Problem 


Problem Explaination:
This problem involves creating a program to recommend an item to a customer based on their customer ID and a specific date. The program needs access to relevant data that includes information about customers, items, and historical purchasing patterns. The user story describes the desired functionality from the user's perspective, and the program should provide a relevant and personalized recommendation based on the customer's preferences or historical buying behavior.



# Explaination of Approach

The code implements a product recommendation system based on customer ID and recommendation date. Here is how it works:

1. The dataset is loaded from an Excel file and prepared by removing any missing values.

2. The user provides the customer ID and recommendation date as input.

3. The dataset is filtered to retrieve the relevant data for the given customer and date.

4. If no data is found for the specified customer and date, a message is displayed indicating the absence of relevant information.

5. If data is available, the unique product descriptions of the purchased items by the customer are extracted.

6. The code applies K-Means clustering to identify similar products based on their descriptions.

7. To perform clustering, the product descriptions are transformed into a sparse matrix representation using TF-IDF vectorization.

8. K-Means clustering is performed on the entire set of product descriptions. The goal is to group similar products together.

9. The cluster labels for the purchased products are obtained based on the trained K-Means model.

10. Similar products are identified by retrieving other products belonging to the same clusters as the purchased items.

11. The recommended products are compiled by combining the products from the identified clusters, ensuring there are no duplicates.

12. If no recommended products are found, a message is displayed to indicate that there are no recommendations available.

13. If there are recommended products, they are displayed to the user.

By utilizing K-Means clustering on the entire set of product descriptions, the code groups similar products together and recommends items that are similar to those purchased by the customer on the specified date.

# Import Libraries


In [1]:
#This cell imports the necessary libraries for data manipulation, TF-IDF vectorization, and K-Means clustering.
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans


# Read and Preprocess the Dataset

In [4]:
# Read the dataset from an Excel file
data = pd.read_excel('Online Retail.xlsx')

# Preprocess the dataset
data = data.dropna()

# Read the first 10 elements
print(data.head(10))


  InvoiceNo StockCode                          Description  Quantity  \
0    536365    85123A   WHITE HANGING HEART T-LIGHT HOLDER         6   
1    536365     71053                  WHITE METAL LANTERN         6   
2    536365    84406B       CREAM CUPID HEARTS COAT HANGER         8   
3    536365    84029G  KNITTED UNION FLAG HOT WATER BOTTLE         6   
4    536365    84029E       RED WOOLLY HOTTIE WHITE HEART.         6   
5    536365     22752         SET 7 BABUSHKA NESTING BOXES         2   
6    536365     21730    GLASS STAR FROSTED T-LIGHT HOLDER         6   
7    536366     22633               HAND WARMER UNION JACK         6   
8    536366     22632            HAND WARMER RED POLKA DOT         6   
9    536367     84879        ASSORTED COLOUR BIRD ORNAMENT        32   

          InvoiceDate  UnitPrice  CustomerID         Country  
0 2010-12-01 08:26:00       2.55     17850.0  United Kingdom  
1 2010-12-01 08:26:00       3.39     17850.0  United Kingdom  
2 2010-12-01 08:26

# User Input
This cell prompts the user to enter the customer ID and recommendation date.

In [3]:
# User input for customer ID and recommendation date
customer_id = input("Enter the Customer ID: ")
recommendation_date = input("Enter the Recommendation Date (dd/mm/yy): ")


Enter the Customer ID: 17850
Enter the Recommendation Date (dd/mm/yy): 01/12/10


# Filter Dataset and Extract Purchased Product Descriptions
This cell filters the dataset based on the given customer ID and checks if there is any data available for that customer. It also extracts the unique product descriptions of the purchased products by the customer.


In [5]:
# Filter dataset for the given customer ID and check customer data
customer_data = data[data['CustomerID'] == int(customer_id)]

if customer_data.empty:
    print("No data found for the given customer ID.")
else:
    # Get the product descriptions of the purchased products by the customer
    purchased_product_descriptions = customer_data['Description'].unique()


# TF-IDF Vectorization
This cell creates a sparse matrix representation of all product descriptions in the dataset using TF-IDF vectorization.
 

In [6]:
    # Create a sparse matrix of product descriptions using TF-IDF vectorization
    vectorizer = TfidfVectorizer()
    tfidf_matrix = vectorizer.fit_transform(data['Description'])


# Apply K-Means Clustering and Obtain Cluster Labels for Purchased Products
This cell applies K-Means clustering on the sparse matrix to group similar products and obtains the cluster labels for the purchased products by the customer.

In [None]:
# Apply K-Means clustering to group similar products
num_clusters = 5  # Number of clusters for K-Means
kmeans = KMeans(n_clusters=num_clusters, random_state=42)
kmeans.fit(tfidf_matrix)

# Get the cluster labels for the purchased products
purchased_product_labels = kmeans.predict(vectorizer.transform(purchased_product_descriptions))


In the above code, K-Means clustering is applied on the sparse matrix to group similar products. The specified number of clusters (5 in this case) is used. Then, the cluster labels are predicted for the purchased products by the customer using the trained K-Means model.






# Find Products in Same Clusters
This cell finds products in the same clusters as the purchased products.


In [None]:
    # Find products in the same cluster as the purchased products
    recommended_products = []
    for label in purchased_product_labels:
        cluster_products = data[kmeans.labels_ == label]['Description'].unique()
        recommended_products.extend(cluster_products)

    recommended_products = list(set(recommended_products))  # Remove duplicates


# Display Recommended Products

This cell displays the recommended products


In [None]:
    if len(recommended_products) == 0:
        print("No recommended products found.")
    else:
        # Display the recommended products
        print("Recommended Products:")
        for product in recommended_products:
            print(f"- {product}")


The code applies K-Means clustering on the whole dataset's product descriptions and recommends similar products to the customer based on the clusters to which their purchased products belong.