# Apify Demo

Modified from [Apify](https://python.langchain.com/en/latest/modules/agents/tools/examples/apify.html)

Need to install apify client: ```pip install apify-client chromadb tiktoken```

In [6]:

from langchain.document_loaders.base import Document
from langchain.indexes import VectorstoreIndexCreator
from langchain.utilities import ApifyWrapper
# import os
# os.environ["OPENAI_API_KEY"] = "Your OpenAI API key"
# os.environ["APIFY_API_TOKEN"] = "Your Apify API token"

# get keys from .env
from dotenv import load_dotenv

load_dotenv()

True

In [15]:
# warning: large dataset
apify = ApifyWrapper()
loader = apify.call_actor(
    actor_id="apify/website-content-crawler",
    run_input={"startUrls": [{"url": "https://python.langchain.com/en/latest/"}]},
    dataset_mapping_function=lambda item: Document(
        page_content=item["text"] or "", metadata={"source": item["url"]}
    ),
)

In [7]:
index = VectorstoreIndexCreator().from_loaders([loader])

Using embedded DuckDB without persistence: data will be transient


In [8]:
query = "What is LangChain?"
result = index.query_with_sources(query)

print(result["answer"])
print(result["sources"])

 LangChain is a framework for developing applications powered by language models. It provides memory components, chains, agents, and use cases for personal assistants.

https://python.langchain.com/en/latest/_sources/index.rst, https://python.langchain.com/en/latest/modules/memory.html, https://python.langchain.com/en/latest/, https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/diffbot.html


In [24]:
query = "Does the word 'hallucinate' appear in the docs?"
result = index.query_with_sources(query)

print(result["answer"])
print(result["sources"])

 The word 'hallucinate' does not appear in the docs.

https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/imsdb.html, https://python.langchain.com/en/latest/_sources/modules/indexes/document_loaders/examples/imsdb.ipynb, https://python.langchain.com/en/latest/_sources/modules/models/llms/integrations/replicate.ipynb


In [40]:
query = "Are there any references to drug use in the docs?"
result = index.query_with_sources(query)

print(result["answer"])
print(result["sources"])

 There are no references to drug use in the docs.

https://python.langchain.com/en/latest/_sources/modules/indexes/document_loaders/examples/imsdb.ipynb, https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/imsdb.html, https://python.langchain.com/en/latest/_sources/modules/indexes/document_loaders/examples/imsdb.ipynb


In [41]:
query = "Who is sam?"
result = index.query_with_sources(query)

print(result["answer"])
print(result["sources"])

 Sam is the founder of a company called Daimon and is working on a hackathon project with Deven to add more complex memory structures to Langchain, including a key-value store for entities mentioned so far in the conversation. He seems to be very motivated and passionate about his project, and is working hard to make it a success.

https://python.langchain.com/en/latest/modules/memory/types/entity_summary_memory.html


In [26]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma

# write embeddings to disk
persist_directory = '../../data/chroma_dbs/langchain_0_0_142'

documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=512, chunk_overlap=20)
docs = text_splitter.split_documents(documents)

embedding = OpenAIEmbeddings()

vectordb = Chroma.from_documents(documents=docs, embedding=embedding, persist_directory=persist_directory)
vectordb.persist()

 Sam is the founder of a company called Daimon and is working on a hackathon project with Deven to add more complex memory structures to Langchain, including a key-value store for entities mentioned so far in the conversation. He seems to be very motivated and passionate about his project, and is working hard to make it a success.

https://python.langchain.com/en/latest/modules/memory/types/entity_summary_memory.html
