<a href="https://colab.research.google.com/github/WazaCraft/framework/blob/main/REL_deal_id_api_01077.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Deal-Identification-ID-API 0.1.7.7 (hotfix)
##LLM Knowledge Assistant with Customizable Prompt Wrapper for Deal Identification


> Rel: July 2, 2023 Version: 0.1.7




In [None]:
# LLM Knowledge Assistant with Customizable Prompt Wrapper for Deal Identification
# Author: Johnathan Greenaway
# Organization: StackCommerce Inc.
# Release Date: July 2, 2023
#0.1.7.7
#Fixed Flask response route and reference
#Serious error revisions
#0.1.7.5
#Fixed embedding naming
#0.1.7.4
#Reintroduced Pickle
#0.1.7.3
#Deal-Identification-ID-API 0.1.7
#LLM Knowledge Assistant with Customizable Prompt Wrapper for Deal Identification
#Added commands to enable / disable Flask server for REST API
#Added reconfig server port path
#0.1.7.2
#Added Flask

#To do: add vector selection
!pip install openai
!pip install bs4
!pip install requests
!pip install beautifulsoup4
!pip install scikit-learn
!pip install Flask

Please enter your OpenAI API key: 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
Daily data refreshed. Now browsing 75+ deal feeds.
Enter URL or question or 'deal-id up' or 'deal-id down' (or 'exit' to quit): recommend a deal for beauty experts
Sure, I'd recommend the "FOREO LUNA mini 2 Facial Cleansing Brush, Gentle Exfoliation and Sonic Cleansing for All Skin Types". 

As of now, It's a great deal with a significant discount:

- Current Price: $76.30
- List Price: $119.00
- Avg. Price: ~$97.24

This round shape silicon brush is a revolution in cleaning face and removing hard makeup. Plus, it’s suitable for all skin types. It's an innovative skincare device that delivers deep-cleansing via T-Sonic pulsations and is designed to provide a professional level of skincare with its invigorating one-minute ritual.

You can buy it from [Amazon.com](https://camelcamelcamel.com/product/0UV10HH48P1IXLP4DJLCW/go?context=popular)

Please note that prices are subject to change. Always check th

In [None]:
import os
import requests
import openai
import datetime
import numpy as np
import pickle
import socket
import threading
from bs4 import BeautifulSoup
from urllib.parse import urlparse
from sklearn.metrics.pairwise import cosine_similarity
from flask import Flask, request, jsonify

app = Flask(__name__)

openai_api_key = input("Please enter your OpenAI API key: ")
openai.api_key = openai_api_key
os.environ['USER_PROMPT'] = 'Here is the info from the text: {content}. Based on this, what is the answer to "{question}"?'

def chunk_text(text, max_tokens=8000):
    words = text.split()
    chunks = []
    current_chunk = []
    current_length = 0
    for word in words:
        if current_length + len(word) + 1 > max_tokens:
            chunks.append(' '.join(current_chunk))
            current_chunk = []
            current_length = 0
        current_chunk.append(word)
        current_length += len(word) + 1
    if current_chunk:
        chunks.append(' '.join(current_chunk))
    return chunks

def get_embedding_for_large_text(text):
    chunks = chunk_text(text)
    embeddings = []
    for chunk in chunks:
        response = openai.Embedding.create(input=chunk, model="text-embedding-ada-002")
        embedding = response['data'][0]['embedding']
        embeddings.append(embedding)
    return embeddings

def create_file_name(url, extension='txt'):
    parsed_url = urlparse(url)
    url_path_parts = parsed_url.path.strip('/').split('/')
    last_part = url_path_parts[-1] if url_path_parts else parsed_url.netloc
    current_date = datetime.datetime.now().strftime("%Y-%m-%d")
    return f"{last_part}-{current_date}.{extension}"

def get_most_similar_text_chunk(question, embeddings_dict):
    question_embedding = get_embedding_for_large_text(question)[0]
    similarity_scores = []
    for text_chunk_embedding in embeddings_dict['embeddings']:
        similarity_scores.append(cosine_similarity([question_embedding], [text_chunk_embedding])[0][0])
    most_similar_index = np.argmax(similarity_scores)
    return embeddings_dict['text_chunks'][most_similar_index]

def generate_response(question, embeddings_dict):
    similar_text_chunk = get_most_similar_text_chunk(question, embeddings_dict)
    user_prompt = os.environ['USER_PROMPT'].format(content=similar_text_chunk, question=question)
    messages = [
        {"role": "system", "content": "You are a knowledgeable assistant."},
        {"role": "user", "content": user_prompt}
    ]
    try:
        response = openai.ChatCompletion.create(model="gpt-4", messages=messages)
        assistant_reply = response['choices'][0]['message']['content']
        return assistant_reply
    except Exception as e:
        return str(e)

def extract_and_save_urls(html_content, file):
    soup = BeautifulSoup(html_content, 'html.parser')
    for link in soup.find_all('a'):
        url = link.get('href')
        if url:
            file.write(url + '\n')

def save_embeddings_to_file(embeddings_dict, file_name):
    with open(file_name, 'wb') as file:
        pickle.dump(embeddings_dict, file)

def load_embeddings_from_file(file_name):
    with open(file_name, 'rb') as file:
        return pickle.load(file)

embeddings_dict = {}

url = 'https://www.rssground.com/services/rss-converter/64a0a74cd5ee7/RSS-Payload'
response = requests.get(url)
text = response.text
file_name = create_file_name(url)

with open(file_name, 'w') as file:
    file.write(text)
    extract_and_save_urls(text, file)

embeddings = get_embedding_for_large_text(text)
chunks = chunk_text(text)
embeddings_file_name = create_file_name(url, extension='pkl')
embeddings_dict[embeddings_file_name] = {'text_chunks': chunks, 'embeddings': embeddings}
save_embeddings_to_file(embeddings_dict, embeddings_file_name)

print("Daily data refreshed. Now browsing 75+ deal feeds.")

@app.route('/ask', methods=['GET'])
def ask_question():
    question = request.args.get('question')
    if question:
        responses = []
        for embeddings_file_name in embeddings_dict.keys():
            response = generate_response(question, embeddings_dict[embeddings_file_name])
            responses.append(response)
        return jsonify(responses)
    return jsonify({"error": "No question provided"})

def run_web_api(port):
    app.run(port=port)

def is_port_in_use(port):
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
        return s.connect_ex(('localhost', port)) == 0

api_thread = None

while True:
    user_input = input("Enter URL or question or 'deal-id up' or 'deal-id down' (or 'exit' to quit): ")

    if user_input.lower() == 'exit':
        break
    elif user_input.lower() == 'deal-id up':
        if api_thread is None or not api_thread.is_alive():
            port = 5000
            while is_port_in_use(port):
                port = int(input(f"Port {port} is in use. Please enter a different port: "))
            api_thread = threading.Thread(target=run_web_api, args=(port,))
            api_thread.daemon = True
            api_thread.start()
        else:
            print("Server is already running")

    elif user_input.lower() == 'deal-id down':
        if api_thread and api_thread.is_alive():
            print("Stopping the server.")
            requests.post(f'http://localhost:{port}/shutdown')
            api_thread.join()
        else:
            print("Server is not running")

    elif user_input.lower().startswith('http'):
        url = user_input
        response = requests.get(url)
        text = response.text
        file_name = create_file_name(url)

        with open(file_name, 'w') as file:
            file.write(text)
            extract_and_save_urls(text, file)

        embeddings = get_embedding_for_large_text(text)
        chunks = chunk_text(text)
        embeddings_file_name = create_file_name(url, extension='pkl')
        embeddings_dict[embeddings_file_name] = {'text_chunks': chunks, 'embeddings': embeddings}
        save_embeddings_to_file(embeddings_dict, embeddings_file_name)

    else:
        question = user_input
        for embeddings_file_name in embeddings_dict.keys():
            response = generate_response(question, embeddings_dict[embeddings_file_name])
            print(response)


Please enter your OpenAI API key: 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
Daily data refreshed. Now browsing 75+ deal feeds.
Enter URL or question or 'deal-id up' or 'deal-id down' (or 'exit' to quit): deal-id up
Port 5000 is in use. Please enter a different port: 7000
 * Serving Flask app '__main__'
 * Debug mode: off


 * Running on http://127.0.0.1:7000
INFO:werkzeug:[33mPress CTRL+C to quit[0m
INFO:werkzeug:127.0.0.1 - - [03/Jul/2023 06:03:56] "GET /ask?question=What%20are%20the%20latest%20deals%20available? HTTP/1.1" 200 -


KeyboardInterrupt: ignored

Deal ID API Documentation
Overview
This API allows you to interact with the Deal ID service which provides information based on a dataset of 75+ deal feeds. The data is refreshed daily.

Endpoint
GET /ask

Description
Retrieves answers to questions based on the information from the text dataset.

Parameters
Name	Type	In	Description
question	string	query	The question you want to ask based on the text dataset.
Responses
200 OK
Successful response.

Content:

A JSON array containing the answers to the question based on the data available in the text dataset.
400 Bad Request
Occurs when no question parameter is provided in the request.

Content:

A JSON object containing an error message.
Example
Request

http
Copy code
GET http://localhost:5000/ask?question=What%20are%20the%20latest%20deals%20available?
Response

json
Copy code
[
    "The latest deals available are ...",
    "Another source suggests that ..."
]
Usage Instructions
Python Script
Use the requests library in Python:

python
Copy code
import requests

# Define the endpoint URL
url = "http://localhost:5000/ask"

# Define the question you want to ask
question = "What are the latest deals available?"

# Make the GET request
response = requests.get(url, params={"question": question})

# Check if the request was successful
if response.status_code == 200:
    # Print the answer
    print(response.json())
else:
    # Print error message
    print("Error:", response.json())
Command Line (using curl)
Use curl command in the terminal or command prompt:

sh
Copy code
curl "http://localhost:5000/ask?question=What%20are%20the%20latest%20deals%20available?"
Web Browser
Enter the URL directly in the address bar of your web browser:

perl
Copy code
http://localhost:5000/ask?question=What%20are%20the%20latest%20deals%20available?
Please make sure the server is running on the correct port before making the request. You can start the server by running the script and entering deal-id up in the console. Additionally, if the port 5000 is already in use, you may need to specify a different port number.