In [1]:
from langchain_community.document_loaders import WikipediaLoader

In [2]:
loader = WikipediaLoader(query="MKUltra", load_max_docs=1, doc_content_chars_max =50000)
documents = loader.load()

In [3]:
len(documents)

1

In [4]:
len(documents[0].page_content)

44274

In [15]:
from langchain_text_splitters import CharacterTextSplitter
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(chunk_size=500)

In [16]:
docs = text_splitter.split_documents(documents=documents)

Created a chunk of size 1202, which is longer than the specified 500
Created a chunk of size 546, which is longer than the specified 500


In [17]:
len(docs)

22

In [18]:
print(docs[3].page_content)

=== Applications ===
The 1976 Church Committee report found that, in the MKDELTA program, "Drugs were used primarily as an aid to interrogations, but MKULTRA/MKDELTA materials were also used for harassment, discrediting or disabling purposes."


=== Other related projects ===
In 1964, MKSEARCH was the name given to the continuation of the MKULTRA program. The MKSEARCH program was divided into two projects dubbed MKOFTEN and MKCHICKWIT. Funding for MKSEARCH commenced in 1965, and ended in 1971. The project was a joint project between the U.S. Army Chemical Corps and the CIA's Office of Research and Development to find new offensive-use agents, with a focus on incapacitating agents. Its purpose was to develop, test, and evaluate capabilities in the covert use of biological, chemical, and radioactive material systems and techniques of producing predictable human behavioral and/or physiological changes in support of highly sensitive operational requirements.
By March 1971, over 26,000 pote

In [19]:
from langchain_openai import OpenAIEmbeddings
embedding_function = OpenAIEmbeddings()

In [20]:
from langchain_chroma import Chroma
db = Chroma.from_documents(documents=docs,
                           embedding=embedding_function,
                           persist_directory="./some_new_mkultra3")

In [21]:
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_openai import ChatOpenAI

In [32]:
question = "which drugs were used?"

In [33]:
model = ChatOpenAI(model="gpt-4o-mini")

In [34]:
retriever_from_llm = MultiQueryRetriever.from_llm(retriever=db.as_retriever(), 
                                                  llm=model)

In [35]:
# logging

import logging
logging.basicConfig()
logging.getLogger('langchain.retriever.multi_query').setLevel(logging.INFO)

In [42]:
unique_docs = retriever_from_llm.invoke(input=question)

In [44]:
print(unique_docs[0].page_content)

=== Other drugs ===
Another technique investigated was the intravenous administration of a barbiturate into one arm and an amphetamine into the other. The barbiturates were released into the person first, and as soon as the person began to fall asleep, the amphetamines were released.
Other experiments involved heroin, morphine, temazepam (used under code name MKSEARCH), mescaline, psilocybin, scopolamine, alcohol and sodium pentothal.


=== Hypnosis ===
Declassified MKUltra documents indicate they studied hypnosis in the early 1950s. Experimental goals included creating "hypnotically induced anxieties", "hypnotically increasing ability to learn and recall complex written matter", studying hypnosis and polygraph examinations, "hypnotically increasing ability to observe and recall complex arrangements of physical objects", and studying "relationship of personality to susceptibility to hypnosis". They conducted experiments with drug-induced hypnosis and with anterograde and retrograde amn

In [46]:
len(documents[2].page_content)

12029

In [39]:
len(documents[0].page_content)

20000

In [31]:
# print(documents[0].metadata['title'])
# print("-"*len(documents[0].metadata['title']))
# print()
# print(documents[0].page_content)
# # print(documents[1].page_content)
