-
Notifications
You must be signed in to change notification settings - Fork 15.2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Google drive integration (lite) (#9999)
My other [pull-request](#5135) is too big to be acceptable. I propose another 'lite' version. I update only notebook to propose an integration with the external project [`langchain-googledrive`](https://github.com/pprados/langchain-googledrive). --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
- Loading branch information
Showing
3 changed files
with
221 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,215 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Google Drive tool\n", | ||
"\n", | ||
"This notebook walks through connecting a LangChain to the Google Drive API.\n", | ||
"\n", | ||
"## Prerequisites\n", | ||
"\n", | ||
"1. Create a Google Cloud project or use an existing project\n", | ||
"1. Enable the [Google Drive API](https://console.cloud.google.com/flows/enableapi?apiid=drive.googleapis.com)\n", | ||
"1. [Authorize credentials for desktop app](https://developers.google.com/drive/api/quickstart/python#authorize_credentials_for_a_desktop_application)\n", | ||
"1. `pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib`\n", | ||
"\n", | ||
"## Instructions for retrieving your Google Docs data\n", | ||
"By default, the `GoogleDriveTools` and `GoogleDriveWrapper` expects the `credentials.json` file to be `~/.credentials/credentials.json`, but this is configurable using the `GOOGLE_ACCOUNT_FILE` environment variable. \n", | ||
"The location of `token.json` use the same directory (or use the parameter `token_path`). Note that `token.json` will be created automatically the first time you use the tool.\n", | ||
"\n", | ||
"`GoogleDriveSearchTool` can retrieve a selection of files with some requests. \n", | ||
"\n", | ||
"By default, If you use a `folder_id`, all the files inside this folder can be retrieved to `Document`, if the name match the query.\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"#!pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"You can obtain your folder and document id from the URL:\n", | ||
"* Folder: https://drive.google.com/drive/u/0/folders/1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5 -> folder id is `\"1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5\"`\n", | ||
"* Document: https://docs.google.com/document/d/1bfaMQ18_i56204VaQDVeAFpqEijJTgvurupdEDiaUQw/edit -> document id is `\"1bfaMQ18_i56204VaQDVeAFpqEijJTgvurupdEDiaUQw\"`\n", | ||
"\n", | ||
"The special value `root` is for your personal home." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"folder_id=\"root\"\n", | ||
"#folder_id='1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5'" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"By default, all files with these mime-type can be converted to `Document`.\n", | ||
"- text/text\n", | ||
"- text/plain\n", | ||
"- text/html\n", | ||
"- text/csv\n", | ||
"- text/markdown\n", | ||
"- image/png\n", | ||
"- image/jpeg\n", | ||
"- application/epub+zip\n", | ||
"- application/pdf\n", | ||
"- application/rtf\n", | ||
"- application/vnd.google-apps.document (GDoc)\n", | ||
"- application/vnd.google-apps.presentation (GSlide)\n", | ||
"- application/vnd.google-apps.spreadsheet (GSheet)\n", | ||
"- application/vnd.google.colaboratory (Notebook colab)\n", | ||
"- application/vnd.openxmlformats-officedocument.presentationml.presentation (PPTX)\n", | ||
"- application/vnd.openxmlformats-officedocument.wordprocessingml.document (DOCX)\n", | ||
"\n", | ||
"It's possible to update or customize this. See the documentation of `GoogleDriveAPIWrapper`.\n", | ||
"\n", | ||
"But, the corresponding packages must installed." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"#!pip install unstructured" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"from langchain_googledrive.utilities.google_drive import GoogleDriveAPIWrapper\n", | ||
"from langchain_googledrive.tools.google_drive.tool import GoogleDriveSearchTool\n", | ||
"\n", | ||
"# By default, search only in the filename.\n", | ||
"tool = GoogleDriveSearchTool(\n", | ||
" api_wrapper=GoogleDriveAPIWrapper(\n", | ||
" folder_id=folder_id,\n", | ||
" num_results=2,\n", | ||
" template=\"gdrive-query-in-folder\", # Search in the body of documents\n", | ||
" )\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import logging\n", | ||
"logging.basicConfig(level=logging.INFO)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"tool.run(\"machine learning\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"tool.description" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from langchain.agents import load_tools\n", | ||
"tools = load_tools([\"google-drive-search\"],\n", | ||
" folder_id=folder_id,\n", | ||
" template=\"gdrive-query-in-folder\",\n", | ||
" )" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Use within an Agent" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"from langchain import OpenAI\n", | ||
"from langchain.agents import initialize_agent, AgentType\n", | ||
"llm = OpenAI(temperature=0)\n", | ||
"agent = initialize_agent(\n", | ||
" tools=tools,\n", | ||
" llm=llm,\n", | ||
" agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"agent.run(\n", | ||
" \"Search in google drive, who is 'Yann LeCun' ?\"\n", | ||
")" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.9" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 4 | ||
} |