# Box loader

The `BoxLoader` class helps you get your unstructured content from Box in Langchain's `Document` format. You can do this with either a `List[str]` containing Box file IDs, or with a `str` containing a Box folder ID. 

You can provide a `Bool` that tells the loader whether to try to fetch a text representation of the file, and for these text representations, you can specify a character limit to limit how much text is returned. **We highly recommend using text representations whenever possible. There is an additional `Bool` to decide to fetch images or ignore them. `BoxLoader` ignores any file that isn't in our list of supported document or image types.

If getting files from a folder with folder ID, you can also set a `Bool` to tell the loader to get all sub-folders in that folder, as well. 

> [!WARNING]
> A Box instance can contain Petabytes of files, and folders can contain millions of files. Be intentional when choosing what folders you choose to index. And we recommend never getting all files from folder 0 recursively. Folder ID 0 is your root folder.

For files without a text representation, we rely on Unstructured's community loaders. These are part of the package and all dependencies will be loaded for you. 

> [!IMPORTANT]
> If you plan to include images, you will need Tesseract installed on the system running the application and you will need the path to tesseract's `bin` directory in the shell's `PATH` environment variable: `PATH=$PATH;/opt/homebrew/Cellar/tesseract/5.4.1/bin`
> For more information on Tesseract, visit their [website](https://tesseract-ocr.github.io/tessdoc/Installation.html).

## Setup

### Installation

The first step is to install the `langchain-box` package

In [None]:
%pip install --upgrade --quiet langchain-box

### Box setup

In order to use the Box package, you will need a few things:

* A Box account. For the Box AI connector, this must be an Enterprise Plus system. For the other tools, you can use a [free developer account](https://account.box.com/signup/n/developer#ty9l3).
* [A Box app](https://developer.box.com/guides/getting-started/first-application/). This is configured in the [developer console](https://account.box.com/developers/console), and for Box AI, must have the `Manage AI` scope enabled. Here you will also select your authentication method
* The app must be [enabled by the administrator](https://developer.box.com/guides/authorization/custom-app-approval/#manual-approval). For free developer accounts, this is whomever signed up for the account.

## Examples

For these examples, we will use [token authentication](https://developer.box.com/guides/authentication/tokens/developer-tokens). This can be used with any [authentication method](https://developer.box.com/guides/authentication/). Just get the token with whatever methodology. If you want to learn more about how to use other authentication types with `langchain-box`, visit the [Box provider](/docs/integrations/providers/box) document.

1. Set up your token

In [None]:
import getpass
import os

box_developer_token = getpass.getpass("Enter your Box Developer Token: ")

2. Import the libraries you need

In [None]:
from langchain_box.document_loaders import BoxLoader

### Load files

If you wish to load files, you must provide the `List` of file ids at instantiation time. 

This requires 1 piece of information:

* **box_file_ids** (`List[str]`)- A list of Box file IDs.  

In [None]:
box_file_ids = ["1169674971571", "1169680553945"]

loader = BoxLoader(
    box_file_ids=box_file_ids,
    character_limit=10000,  # Optional. Defaults to no limit
    get_text_rep=True,  # Get text rep first when available, default True
    get_images=False  # Download images, defaults to False
)

### Load from folder

If you wish to load files from a folder, you must provide a `str` with the Box folder ID at instantiation time. 

This requires 1 piece of information:

* **box_folder_id** (`str`)- A string containing a Box folder ID.  

In [None]:
box_folder_id = "1169674971571"

loader = BoxLoader(
    box_folder_id=box_folder_id,
    recursive=False,  # Optional. return entire tree, defaults to False
    character_limit=10000,  # Optional. Defaults to no limit
    get_text_rep=True,  # Get text rep first when available, default True
    get_images=False  # Download images, defaults to False
)

### Loading the documents

Now you can lazy load the documents

In [None]:
documents = loader.lazy_load()

For completeness, we can now print out `documents` to see the result.

In [None]:
print(f"documents = {documents}")

## API reference

For detailed documentation of all `BoxLoader` features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/document_loaders/langchain_box.document_loaders.BoxLoader.html).

## Help

If you have questions, you can check out our [developer documentation](https://developer.box.com) or reach out to use in our [developer community](https://community.box.com).