Reference this [article](https://medium.com/data-science-in-your-pocket/recommendation-systems-using-langchain-and-llms-with-codes-d3c4c4e66732)

In [3]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [1]:
import numpy as np
import pandas as pd

# Define the number of users and unique items
num_users = 1000
num_items = 20

# Generate random user IDs and item IDs
user_ids = np.arange(1, num_users + 1)
item_ids = np.arange(1, num_items + 1)

# Create random interaction data
data = {
    'user_id': np.random.choice(user_ids, size=num_users * 10),
    'item_id': np.random.choice(item_ids, size=num_users * 10),
}

# Create a pandas DataFrame from the data
df = pd.DataFrame(data).drop_duplicates()

# Display the first few rows of the generated data
print(df.head())

   user_id  item_id
0      355       15
1      787        2
2      651       13
3      119       15
4      129       19


In [2]:
#grouping all interactions by a user as list
df = df.groupby(['user_id'])['item_id'].agg(list).reset_index()

#creating OHE 
df['item_id'] = df['item_id'].transform(lambda x: [0 if y+1 not in x else y+1 for y in range(20)])

#save csv
df.to_csv('dummy_data.csv',index=False)

In [11]:
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain_community.embeddings import OllamaEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.document_loaders.csv_loader import CSVLoader

In [4]:
from langchain_community.llms import Ollama
llm = Ollama(model="llama3")
llm

Ollama(model='llama3')

In [9]:
# For pritning markdown text
# StackOverflow: https://stackoverflow.com/a/32035217
from IPython.display import Markdown, display
def printmd(string):
    display(Markdown(string))

In [8]:
output = llm.invoke("how can langsmith help with testing?")
printmd(output)

Langsmith, as a language model, can be used in various ways to assist with testing:

1. **Automated Testing**: Langsmith can generate test data and scenarios for automated testing frameworks like Selenium or Cypress. This saves developers time and reduces the need for manual testing.
2. **Test Data Generation**: Langsmith can create realistic test data (e.g., user input, product descriptions) to help populate test cases. This ensures that tests cover a wide range of scenarios and edge cases.
3. **Natural Language Processing (NLP) Testing**: Langsmith's NLP capabilities can be used to test the accuracy of natural language processing models, such as text classification, sentiment analysis, or named entity recognition.
4. **Error Message Generation**: Langsmith can generate error messages that mimic real-world scenarios, helping developers test error handling and recovery mechanisms in their code.
5. **Test Case Development**: Langsmith's ability to generate text based on prompts can be used to develop test cases for software applications. For example, generating test cases for a chatbot or virtual assistant.
6. **Code Review**: Langsmith can help with code review by analyzing code snippets and providing feedback on syntax, semantics, and best practices, helping developers catch errors and improve their code quality.

To get started with using Langsmith for testing, you can:

* Use the Langsmith API to integrate it with your testing framework or script.
* Generate test data or scenarios using the Langsmith web interface or command-line tool.
* Train a custom Langsmith model for specific testing purposes (e.g., generating error messages or test cases).

By leveraging Langsmith's language processing capabilities, you can streamline your testing process, reduce errors, and improve code quality.

In [14]:
#data loader
loader = CSVLoader(file_path="dummy_data.csv")
data = loader.load()

#data transformers
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(data)

#embeddings model, this can be a local LLM as well
embeddings = OllamaEmbeddings(model = 'llama3')
#Vector DB
docsearch = FAISS.from_documents(texts, embeddings)

#Retriever
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever())

In [15]:
qa

RetrievalQA(combine_documents_chain=StuffDocumentsChain(llm_chain=LLMChain(prompt=PromptTemplate(input_variables=['context', 'question'], template="Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\n{context}\n\nQuestion: {question}\nHelpful Answer:"), llm=Ollama(model='llama3')), document_variable_name='context'), retriever=VectorStoreRetriever(tags=['FAISS', 'OllamaEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x7f4e894d1b50>))

In [17]:
print(qa.run('Suggest 2 articles to user-id 78 using given data which it has not seen.\
Follow this approach 1: Find similar Users and 2: sugest new articles from similar users.\
Also give a reason for suggestion').split('.'))

["To suggest two articles to user-id 78, I'll follow the approach of finding similar users and suggesting new articles from those similar users", "\n\nFirst, I'll identify the most frequent item_id values across all users:\n\n* [1], [2], [6], [7], [8], [9], [12], [13], [15], [16], and [19] are present in multiple user_id values", ' These items seem to be popular among users', "\n\nNext, I'll find similar users to user-id 78 by looking for users who have purchased similar items:\n\n* User-id 98 has purchased item_id [1], which is also present in the data of user-id 2 and user-id 10", '\n* User-id 555 has purchased item_id [6] and [19], both of which are present in the data of user-id 10', "\n\nBased on this analysis, I'll suggest two articles to user-id 78 that have not been seen before:\n\n1", ' Item_id: 14 (reason: Users 98 and 10 have similar tastes, and user-id 98 has not purchased item_id 14)\n2', ' Item_id: 21 (reason: User-id 555 has similar interests to users 10 and 2, and user-

In [18]:
########### DONE ####################