# How-To Guide: Steamship Loader
This is an example of using the Steamship Loader to export Steamship File's to Documents for use in LangChain applications.

For this example, we will create a temporary workspace and load a local file into that workspace to simulate a production Steamship workspace. We will use the File Loader utilities included in `steamship-langchain` for this purpose. We will associate an import ID with our import to allow for demonstration of the query-based loading capabilities of `SteamshipLoader`. Including extra metadata is optional.

In real usage, you would use a persistent workspace and the files would live permanently in that workspace. This would allow creation of Steamship Files to take place over time (via File Importer plugins, client-side uploads, etc.).

In [1]:
from steamship import Steamship
from steamship_langchain.document_loaders import SteamshipLoader
from steamship_langchain.file_loaders import TextFileLoader

In [5]:
# create the temporary workspace and load a simple text file to simulate a production workspace
with Steamship.temporary_workspace() as client:
    file_loader = TextFileLoader(client=client)
    loaded_files = file_loader.load(
        path="../../state_of_the_union.txt", metadata={"import-id": "state-of-union"}
    )

    # use SteamshipLoader to create LangChain documents using the loaded files
    doc_loader = SteamshipLoader(client=client, files=loaded_files)
    loaded_docs_from_files = doc_loader.load()

    # use SteamshipLoader to create LangChain documents using a file query
    query = 'filetag and (kind "metadata" and value("import-id") = "state-of-union")'
    doc_loader_with_query = SteamshipLoader(client=client, query=query)
    loaded_docs_from_query = doc_loader_with_query.load()

Show the results using direct loading.

In [16]:
print(f"Loaded workspace files into {len(loaded_docs_from_files)} documents directly.")
print(f"First characters of document:\n\n{loaded_docs_from_files[0].page_content[0:242]}")

Loaded workspace files into 1 documents directly.
First characters of document:

Madam Speaker, Madam Vice President, and our First Lady and Second Gentleman, members of Congress and the Cabinet, Justices of the Supreme Court, my fellow Americans: Last year, COVID-19 kept us apart. This year, we’re finally together again.


Now show the results from using a query.

In [17]:
print(f"Loaded workspace files into {len(loaded_docs_from_query)} documents via query.")
print(f"First characters of document:\n\n{loaded_docs_from_query[0].page_content[0:242]}")

Loaded workspace files into 1 documents via query.
First characters of document:

Madam Speaker, Madam Vice President, and our First Lady and Second Gentleman, members of Congress and the Cabinet, Justices of the Supreme Court, my fellow Americans: Last year, COVID-19 kept us apart. This year, we’re finally together again.


## Advanced usage

Steamship `Files` can consist of many `Blocks`, allowing users to segment portions of text to meet their modeling needs. For example, blocks may represent separate pages in a PDF.

With `SteamshipLoader`, you may choose how you want to join (or keep separate) different blocks for a file. Below are some examples.

In [18]:
from steamship import Block, File

# create the temporary workspace and load a text file with multiple blocks
with Steamship.temporary_workspace() as client:
    test_file = File.create(
        client=client,
        blocks=[
            Block(text="There's a lady who's sure"),
            Block(text="All that glitters is gold"),
            Block(text="And she's buying a stairway to heaven"),
        ],
    )

    # join the blocks with a single return
    join_loader = SteamshipLoader(client=client, files=[test_file], join_str="\n")
    joined_docs = join_loader.load()

    # keep blocks separate
    separate_loader = SteamshipLoader(client=client, files=[test_file], collapse_blocks=False)
    separate_docs = separate_loader.load()

Show the results of collapsing the blocks to create a single document.

In [20]:
print(f"Loaded files into {len(joined_docs)} documents.")
print(f"Document:\n\n{joined_docs[0].page_content}")

Loaded files into 1 documents.
Document:

There's a lady who's sure
All that glitters is gold
And she's buying a stairway to heaven


Now show the results of keeping file blocks separate...

In [21]:
print(f"Loaded files into {len(separate_docs)} documents.")
print(f"Documents:\n\n{separate_docs}")

Loaded files into 3 documents.
Documents:

[Document(page_content="There's a lady who's sure", lookup_str='', metadata={'source': 'doting-smokescreen-8kmwk'}, lookup_index=0), Document(page_content='All that glitters is gold', lookup_str='', metadata={'source': 'doting-smokescreen-8kmwk'}, lookup_index=0), Document(page_content="And she's buying a stairway to heaven", lookup_str='', metadata={'source': 'doting-smokescreen-8kmwk'}, lookup_index=0)]
