# ChatBot Using DL / NLP

In [1]:
import urllib.request
from bs4 import BeautifulSoup

# Define the URL of the webpage to read
url = 'https://en.wikipedia.org/wiki/Chatbot'

# Fetch the webpage content
response = urllib.request.urlopen(url)
html = response.read()

# Parse the HTML using BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')

# Find all text data within the paragraphs (or other relevant tags) of the webpage
text_data = []
for paragraph in soup.find_all('p'):
    text_data.append(paragraph.get_text())

# Join the extracted text into a single string
full_text = '\n'.join(text_data)

# Print the extracted text
print(full_text)




A chatbot (originally chatterbot)[1] is a software application or web interface that is designed to mimic human conversation through text or voice interactions.[2][3][4] Modern chatbots are typically online and use generative artificial intelligence systems that are capable of maintaining a conversation with a user in natural language and simulating the way a human would behave as a conversational partner. Such chatbots often use deep learning and natural language processing, but simpler chatbots have existed for decades.

Since late 2022, the field has gained widespread attention due to the popularity of OpenAI's ChatGPT,[5][6] followed by alternatives such as Microsoft's Copilot and Google's Gemini.[7] Such examples reflect the recent practice of basing such products upon broad foundational large language models, such as GPT-4 or the Gemini language model, that get fine-tuned so as to target specific tasks or applications (i.e., simulating human conversation, in the case of chatbot

In [2]:
import urllib.request
from bs4 import BeautifulSoup
import re
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Define the URL of the webpage to read
url = 'https://en.wikipedia.org/wiki/Chatbot'

# Fetch the webpage content
response = urllib.request.urlopen(url)
html = response.read()

# Parse the HTML using BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')

# Find all text data within the paragraphs of the webpage
text_data = []
for paragraph in soup.find_all('p'):
    text_data.append(paragraph.get_text())

# Join the extracted text into a single string
full_text = '\n'.join(text_data)

# Clean the text by removing special characters and extra spaces
cleaned_text = re.sub(r'\[[0-9]*\]', ' ', full_text)  # Remove citation numbers
cleaned_text = re.sub(r'\s+', ' ', cleaned_text)  # Replace multiple spaces with a single space

# Tokenize the text into sentences
sentences = re.split(r'\.|\?|\!', cleaned_text)

# Initialize TF-IDF Vectorizer
tfidf_vectorizer = TfidfVectorizer()

# Fit and transform the sentences into TF-IDF vectors
tfidf_matrix = tfidf_vectorizer.fit_transform(sentences)


In [3]:
def find_matching_response(user_query, sentences, tfidf_vectorizer, tfidf_matrix):
    # Transform user query into a TF-IDF vector
    query_vector = tfidf_vectorizer.transform([user_query])

    # Calculate cosine similarity between user query vector and sentence vectors
    similarities = cosine_similarity(query_vector, tfidf_matrix)

    # Find the index of the most similar sentence
    most_similar_index = similarities.argmax()

    # Return the most similar sentence as the response
    return sentences[most_similar_index]

# Example usage
user_query = "What is Computing Machinery and Intelligence?"
matching_response = find_matching_response(user_query, sentences, tfidf_vectorizer, tfidf_matrix)
print("Response:", matching_response)


Response:  In 1950, Alan Turing's famous article "Computing Machinery and Intelligence" was published, which proposed what is now called the Turing test as a criterion of intelligence
