<a href="https://colab.research.google.com/github/kumar045/Assignment-For-Filed/blob/main/langchain_json.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# [JSON](https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/json.html#json), [JSONLoader](https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/json.html#using-jsonloader) and [JSON Agent](https://python.langchain.com/en/latest/modules/agents/toolkits/examples/json.html)  

## [Youtube Video Covering this notebook](https://youtu.be/Ldr-ioU_ELo)

## [JSON (JavaScript Object Notation)](https://en.wikipedia.org/wiki/JSON)
- There are many online json viewer, One Exammple -> [JSONViewer](https://jsonformatter.org/json-viewer)
- open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other serializable values).
- [YAML](https://en.wikipedia.org/wiki/YAML)
- [Json vs Yaml](https://www.perplexity.ai/search/d9d35ab3-1de1-459a-a87f-b5eeb8074bee?s=c)

## ⚙️ Setup

In [None]:
%%capture
!pip install langchain watermark openai jq

In [None]:
%load_ext watermark
%watermark -a "Sudarshan Koirala" -vmp langchain,openai,jq

Author: Sudarshan Koirala

Python implementation: CPython
Python version       : 3.10.11
IPython version      : 7.34.0

langchain: 0.0.186
openai   : 0.27.7
jq       : 1.4.1

Compiler    : GCC 9.4.0
OS          : Linux
Release     : 5.15.107+
Machine     : x86_64
Processor   : x86_64
CPU cores   : 2
Architecture: 64bit



In [None]:
import os
import openai
import warnings

warnings.filterwarnings("ignore")

In [None]:
# get your openai api key from https://platform.openai.com/account/api-keys 🔑
from getpass import getpass

OPENAI_API_KEY = getpass()
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
openai.api_key = os.getenv("OPENAI_API_KEY")

··········


## JSON

In [None]:
# download facebook_chat.json from langchain github repo
!wget https://raw.githubusercontent.com/hwchase17/langchain/master/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat.json -O facebook_chat.json

--2023-05-31 09:13:41--  https://raw.githubusercontent.com/hwchase17/langchain/master/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2167 (2.1K) [text/plain]
Saving to: ‘facebook_chat.json’


2023-05-31 09:13:41 (39.4 MB/s) - ‘facebook_chat.json’ saved [2167/2167]



In [None]:
import json
from pathlib import Path
from pprint import pprint


file_path='/content/facebook_chat.json'
data = json.loads(Path(file_path).read_text())

In [None]:
#print(data)

In [None]:
pprint(data)

{'image': {'creation_timestamp': 1675549016, 'uri': 'image_of_the_chat.jpg'},
 'is_still_participant': True,
 'joinable_mode': {'link': '', 'mode': 1},
 'magic_words': [],
 'messages': [{'content': 'Bye!',
               'sender_name': 'User 2',
               'timestamp_ms': 1675597571851},
              {'content': 'Oh no worries! Bye',
               'sender_name': 'User 1',
               'timestamp_ms': 1675597435669},
              {'content': 'No Im sorry it was my mistake, the blue one is not '
                          'for sale',
               'sender_name': 'User 2',
               'timestamp_ms': 1675596277579},
              {'content': 'I thought you were selling the blue one!',
               'sender_name': 'User 1',
               'timestamp_ms': 1675595140251},
              {'content': 'Im not interested in this bag. Im interested in the '
                          'blue one!',
               'sender_name': 'User 1',
               'timestamp_ms': 1675595109305},
   

## Using JSONLoader
- The JSONLoader uses a specified [jq schema](https://en.wikipedia.org/wiki/Jq_(programming_language)) to parse the JSON files
- It uses the `jq` python package. Check this [manual](https://jqlang.github.io/jq/manual/) for a detailed documentation of the jq syntax.

In [None]:
from langchain.document_loaders import JSONLoader

In [None]:
loader = JSONLoader(
    file_path='/content/facebook_chat.json',
    jq_schema='.messages[].content',
    text_content=False)

data = loader.load()

In [None]:
pprint(data)

[Document(page_content='Bye!', metadata={'source': '/content/facebook_chat.json', 'seq_num': 1}),
 Document(page_content='Oh no worries! Bye', metadata={'source': '/content/facebook_chat.json', 'seq_num': 2}),
 Document(page_content='No Im sorry it was my mistake, the blue one is not for sale', metadata={'source': '/content/facebook_chat.json', 'seq_num': 3}),
 Document(page_content='I thought you were selling the blue one!', metadata={'source': '/content/facebook_chat.json', 'seq_num': 4}),
 Document(page_content='Im not interested in this bag. Im interested in the blue one!', metadata={'source': '/content/facebook_chat.json', 'seq_num': 5}),
 Document(page_content='Here is $129', metadata={'source': '/content/facebook_chat.json', 'seq_num': 6}),
 Document(page_content='', metadata={'source': '/content/facebook_chat.json', 'seq_num': 7}),
 Document(page_content='Online is at least $100', metadata={'source': '/content/facebook_chat.json', 'seq_num': 8}),
 Document(page_content='How muc

### Extracting metadata

In [None]:
# Define the metadata extraction function.
def metadata_func(record: dict, metadata: dict) -> dict:

    metadata["sender_name"] = record.get("sender_name")
    metadata["timestamp_ms"] = record.get("timestamp_ms")

    return metadata


loader = JSONLoader(
    file_path='/content/facebook_chat.json',
    jq_schema='.messages[]',
    content_key="content",
    text_content=False,
    metadata_func=metadata_func
)

data = loader.load()

In [None]:
pprint(data)

[Document(page_content='Bye!', metadata={'source': '/content/facebook_chat.json', 'seq_num': 1, 'sender_name': 'User 2', 'timestamp_ms': 1675597571851}),
 Document(page_content='Oh no worries! Bye', metadata={'source': '/content/facebook_chat.json', 'seq_num': 2, 'sender_name': 'User 1', 'timestamp_ms': 1675597435669}),
 Document(page_content='No Im sorry it was my mistake, the blue one is not for sale', metadata={'source': '/content/facebook_chat.json', 'seq_num': 3, 'sender_name': 'User 2', 'timestamp_ms': 1675596277579}),
 Document(page_content='I thought you were selling the blue one!', metadata={'source': '/content/facebook_chat.json', 'seq_num': 4, 'sender_name': 'User 1', 'timestamp_ms': 1675595140251}),
 Document(page_content='Im not interested in this bag. Im interested in the blue one!', metadata={'source': '/content/facebook_chat.json', 'seq_num': 5, 'sender_name': 'User 1', 'timestamp_ms': 1675595109305}),
 Document(page_content='Here is $129', metadata={'source': '/content

## JSON Agent
- Agent designed to interact with large JSON/dict objects
- <font color="orange"> When is it needed ? </font>
    - This is useful when you want to answer questions about a JSON blob that’s too large to fit in the context window of an LLM.
    - The agent is able to iteratively explore the blob to find what it needs to answer the user’s question
- Let's JSON agent to answer some questions about the API spec

In [None]:
# download the yaml file from openai github page
!wget https://raw.githubusercontent.com/openai/openai-openapi/master/openapi.yaml -O openai_openapi.yml

--2023-05-31 09:27:18--  https://raw.githubusercontent.com/openai/openai-openapi/master/openapi.yaml
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.111.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 122995 (120K) [text/plain]
Saving to: ‘openai_openapi.yml’


2023-05-31 09:27:19 (9.89 MB/s) - ‘openai_openapi.yml’ saved [122995/122995]



### Initialization of Agent

In [None]:
import os
import yaml

from langchain.agents import (
    create_json_agent,
    AgentExecutor
)
from langchain.agents.agent_toolkits import JsonToolkit
from langchain.chains import LLMChain
from langchain.llms.openai import OpenAI
from langchain.requests import TextRequestsWrapper
from langchain.tools.json.tool import JsonSpec

In [None]:
with open("/content/openai_openapi.yml") as f:
    data = yaml.load(f, Loader=yaml.FullLoader)
json_spec = JsonSpec(dict_=data, max_value_length=4000)
json_toolkit = JsonToolkit(spec=json_spec)

json_agent_executor = create_json_agent(
    llm=OpenAI(temperature=0),
    toolkit=json_toolkit,
    verbose=True
)

### Getting the required POST parameters for a request [openai_openapi_yml github link](https://github.com/openai/openai-openapi/blob/master/openapi.yaml)

In [None]:
json_agent_executor.run("What are the required parameters in the request body to the /completions endpoint?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAction: json_spec_list_keys
Action Input: data[0m
Observation: [36;1m[1;3m['openapi', 'info', 'servers', 'tags', 'paths', 'components', 'x-oaiMeta'][0m
Thought:[32;1m[1;3m I should look at the paths key to see what endpoints exist
Action: json_spec_list_keys
Action Input: data["paths"][0m
Observation: [36;1m[1;3m['/engines', '/engines/{engine_id}', '/completions', '/chat/completions', '/edits', '/images/generations', '/images/edits', '/images/variations', '/embeddings', '/audio/transcriptions', '/audio/translations', '/engines/{engine_id}/search', '/files', '/files/{file_id}', '/files/{file_id}/content', '/answers', '/classifications', '/fine-tunes', '/fine-tunes/{fine_tune_id}', '/fine-tunes/{fine_tune_id}/cancel', '/fine-tunes/{fine_tune_id}/events', '/models', '/models/{model}', '/moderations'][0m
Thought:[32;1m[1;3m I should look at the /completions endpoint to see what parameters are required
Action: json_spe

"The required parameters in the request body to the /completions endpoint are 'model'."

**Hope you learned something new today ⛓️👨‍💻**