<a href="https://colab.research.google.com/github/hamzaharmanhusni/ProjectSkillAcademyPro/blob/main/Projectlangchain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Project: Langchain

**Instructions for Students:**

Please carefully follow these steps to complete and submit your assignment:

1. **Completing the Assignment**: You are required to work on and complete all tasks in the provided assignment. Be disciplined and ensure that you thoroughly engage with each task.
   
2. **Creating a Google Drive Folder**: If you don't previously have a folder for collecting assignments, you must create a new folder in your Google Drive. This will be a repository for all your completed assignment files, helping you keep your work organized and easy to access.
   
3. **Uploading Completed Assignment**: Upon completion of your assignment, make sure to upload all necessary files, involving codes, reports, and related documents into the created Google Drive folder. Save this link in the 'Student Identity' section and also provide it as the last parameter in the `submit` function that has been provided.
   
4. **Sharing Folder Link**: You're required to share the link to your assignment Google Drive folder. This is crucial for the submission and evaluation of your assignment.
   
5. **Setting Permission toPublic**: Please make sure your **Google Drive folder is set to public**. This allows your instructor to access your solutions and assess your work correctly.

Adhering to these procedures will facilitate a smooth assignment process for you and the reviewers.

**Description:**

Welcome to your project assignment on Langchain. This project will give you hands-on experience and a deeper understanding of the concepts you learned. You will be assigned the following novel `Pride and Prejudice` by Jane Austen:

* In text file format (.txt) as your source of data: https://www.gutenberg.org/cache/epub/1342/pg1342.txt
* Alternatively you can also use the html version: http://authorama.com/book/pride-and-prejudice.html

Your task is to:

* Create a chatbot that will receive a user query and get the answer based on the content of the novel.
* Create a gradio interface for your chatbot.

Remember, the key to mastering these concepts is practice. So, take your time to understand each task, apply your knowledge, and don't hesitate to ask questions if you encounter any difficulties. Good luck!

**Notes:**

Please take note of the following important points while working on this project:

1. Do not change the Query Space code block, you can make a copy for your own inference.

2. Feel free to add new code block to separate your code into manageable blocks.

3. We recommend OpenAI, a trial version is still available. But if you want to try other LLM, please feel free to do so.

4. You do need to pass OPENAI_API_KEY as an environment variable because the Google Colab will be public, there are many methods, but here is one that you may use:
   - Install python-dotenv
   - Create an env file
   - Fill the env file with the key-value pair for OPENAI_API_KEY
   - Run the following magic command
     - `%load_ext dotenv`
     - `%dotenv ./openai.env`
   - You can check if the API KEY is available using `os.environ`
     - `os.environ['OPENAI_API_KEY']`

In [None]:
# @title #### Student Identity
student_id = "REAJGDG4" # @param {type:"string"}
name = "Hamzah Arman Husni" # @param {type:"string"}
drive_link = "https://colab.research.google.com/drive/1sKtIlKDXjq5oVqVXbGmGhV79nEce9cBy?authuser=3"  # @param {type:"string"}
assignment_id = "00_langchain_project"

## Installation and Import `rggrader` Package

In [None]:
%pip install rggrader
from rggrader import submit_image
from rggrader import submit

Collecting rggrader
  Downloading rggrader-0.1.6-py3-none-any.whl (2.5 kB)
Installing collected packages: rggrader
Successfully installed rggrader-0.1.6


## Working Space

In [None]:
!pip install langchain openai tiktoken faiss-cpu chromadb sentence-transformers -q
!pip install --upgrade --quiet huggingface_hub
!pip install gradio

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/809.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m184.3/809.1 kB[0m [31m5.3 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m809.1/809.1 kB[0m [31m15.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m257.5/257.5 kB[0m [31m28.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m70.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.0/27.0 MB[0m [31m41.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m525.5/525.5 kB[0m [31m49.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m156.5/156.5 kB[0m [31m23.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90

In [None]:
from langchain.document_loaders import TextLoader
import requests

url = "https://www.gutenberg.org/cache/epub/1342/pg1342.txt"

# Fetch the content of the text file from the URL
response = requests.get(url)

if response.status_code == 200:
    # If the request was successful, get the text content
    text_content = response.text

    # Now you can process the text content as needed
    print(text_content[:100])  # Print the first 100 characters as an example
else:
    print("Failed to fetch the text file from the URL:", url)

#using textloader
loader = TextLoader('/content/pg1342.txt')
pages = loader.load()


﻿The Project Gutenberg eBook of Pride and Prejudice
    
This ebook is for the use of anyone anywh


In [None]:
import os
import openai
from google.colab import userdata

OPENAI_API_KEY = ""
os.environ["HUGGINGFACEHUB_API_TOKEN"] = ""

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

loader = TextLoader("/content/pg1342.txt")
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50)
docs = text_splitter.split_documents(documents)
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
db = FAISS.from_documents(docs, embeddings)

In [None]:
# get a token: https://huggingface.co/docs/api-inference/quicktour#get-your-api-token

from getpass import getpass

HUGGINGFACEHUB_API_TOKEN = getpass()

··········


In [None]:
import os

os.environ["HUGGINGFACEHUB_API_TOKEN"] = HUGGINGFACEHUB_API_TOKEN

In [None]:
from langchain_community.llms import HuggingFaceEndpoint
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA

repo_id = 'mistralai/Mixtral-8x7B-Instruct-v0.1'
llm = HuggingFaceEndpoint(
    repo_id=repo_id, max_length=128, temperature=0.5, token=HUGGINGFACEHUB_API_TOKEN
)
qa = RetrievalQA.from_chain_type(llm=llm, retriever=db.as_retriever())

query = "What are the full names of the two main characters in Pride and Prejudice?"

answer = qa.run(query)
answer

                    max_length was transferred to model_kwargs.
                    Please make sure that max_length is what you intended.
                    token was transferred to model_kwargs.
                    Please make sure that token is what you intended.


Token has not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


  warn_deprecated(


' The two main characters in Pride and Prejudice are Elizabeth Bennet and Fitzwilliam Darcy.'

## Query Space

In [None]:
# query = "What are the full names of the two main characters in Pride and Prejudice ?"
# answer = qa.run(query)

question_id = "00_langchain_query_answer"
submit(student_id, name, assignment_id, str(answer), question_id, drive_link)

'Assignment successfully submitted'

## Submit Gradio screenshot

![Upload colab](https://storage.googleapis.com/rg-ai-bootcamp/project-3-pipeline-and-gradio/upload-colab.png)

You need to submit screenshot of your Gradio's app. In Google Colab you can just use the "Folder" sidebar and click the upload button.

Make sure your screenshot match below requirements:

- It should have an input box for user to type the query and an output box for user to type the query.
- It should have the query and the answer from Query Space block in the respective boxes.

Example of Expected Output:

![gradio-result](https://storage.googleapis.com/rg-ai-bootcamp/projects/langchain-gradio.png)

In [None]:
#write your Gradio implementation here
import gradio as gr

# Define a function to process input and generate output
def process_input(query):
    # Here you would put your processing logic
    output_text = qa.run(query)
    return output_text

# Create a Gradio interface
iface = gr.Interface(
    fn=process_input,
    inputs=gr.Textbox(lines=5, label="Enter your Question"),
    outputs="text",
    title="Text Processing Interface"
)

# Launch the interface
iface.launch()



Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://f9ce5bd48eb32d9925.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




In [None]:
question_id = "01_langchain_gradio"
submit_image(student_id, question_id, 'submission.jpg')

'Assignment successfully submitted'

# FIN