### EXTRACT PDF CONTENT

In [1]:
!pip install pdfplumber
!pip install langchain
!pip install langchain_community




In [2]:
from langchain_community.document_loaders import PDFPlumberLoader
import os
import datetime

pdf_files= [f for f in os.listdir('pdf_directory') if f.endswith('.pdf')]

nvidia_pages=[]

tesla_pages=[]

for pdf_file in pdf_files:
    file_path=os.path.join('pdf_directory',pdf_file)
    print(f"processing file:{file_path}\n")
    loader=PDFPlumberLoader(file_path=file_path)
    pages=loader.load_and_split()
    print(f"loaded len= {len(pages)} pages from {pdf_file}")

    if pdf_file.startswith('NVIDIA'):
        nvidia_pages.extend(pages)
    elif pdf_file.startswith('TESLA'):
        tesla_pages.extend(pages)


processing file:pdf_directory/NVIDIA-Rev_by_Mkt_Qtrly_Trend_Q225.pdf

loaded len= 1 pages from NVIDIA-Rev_by_Mkt_Qtrly_Trend_Q225.pdf
processing file:pdf_directory/TESLA-20231231-gen.pdf

loaded len= 180 pages from TESLA-20231231-gen.pdf
processing file:pdf_directory/NVIDIA-CFO-Commentary.pdf

loaded len= 9 pages from NVIDIA-CFO-Commentary.pdf
processing file:pdf_directory/TESLA-20240630-gen.pdf

loaded len= 93 pages from TESLA-20240630-gen.pdf


In [3]:
# print out the first page of the first document for each category as an example
if nvidia_pages:
    print("=========================================")
    print("First page of the first Nvidia document:")
    print("=========================================\n")
    print(nvidia_pages[0].page_content)
else:
    print("No Nvidia pages found in the PDFs.")

First page of the first Nvidia document:

NVIDIA QUARTERLY REVENUE TREND
REVENUE BY MARKETS
Q2 FY25 Q1 FY25 Q4 FY24 Q3 FY24 Q2 FY24 Q1 FY24 Q4 FY23 Q3 FY23
($ in millions)
Data Center $26,272 $22,563 $18,404 $14,514 $10,323 $4,284 $3,616 $3,833
Gaming 2,880 2,647 2,865 2,856 2,486 2,240 1,831 1,574
Professional
454 427 463 416 379 295 226 200
Visualization
Auto 346 329 281 261 253 296 294 251
OEM & Other 88 78 90 73 66 77 84 73
TOTAL $30,040 $26,044 $22,103 $18,120 $13,507 $7,192 $6,051 $5,931


In [4]:
if tesla_pages:
    print("=========================================")
    print("First page of the first Tesla document:")
    print("=========================================\n")
    print(tesla_pages[0].page_content)
else:
    print("No Tesla pages found in the PDFs.")


First page of the first Tesla document:

UNITED STATES
SECURITIES AND EXCHANGE COMMISSION
Washington, D.C. 20549
FORM 10-K
(Mark One)
x ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934
For the fiscal year ended December 31, 2023
OR
o TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934
For the transition period from _________ to _________
Commission File Number: 001-34756
Tesla, Inc.
(Exact name of registrant as specified in its charter)
Delaware 91-2197729
(State or other jurisdiction of (I.R.S. Employer
incorporation or organization) Identification No.)
1 Tesla Road
Austin, Texas 78725
(Address of principal executive offices) (Zip Code)
(512) 516-8177
(Registrant’s telephone number, including area code)
Securities registered pursuant to Section 12(b) of the Act:
Title of each class Trading Symbol(s) Name of each exchange on which registered
Common stock TSLA The Nasdaq Global Select Market
Securities registered p

In [5]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Initialize the text splitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=7500, chunk_overlap=100)

# Split text into chunks for Nvidia pages
nvidia_text_chunks = []
for page in nvidia_pages:
    chunks = text_splitter.split_text(page.page_content)
    nvidia_text_chunks.extend(chunks)

# Split text into chunks for Tesla pages
tesla_text_chunks = []
for page in tesla_pages:
    chunks = text_splitter.split_text(page.page_content)
    tesla_text_chunks.extend(chunks)

In [6]:
nvidia_text_chunks

['NVIDIA QUARTERLY REVENUE TREND\nREVENUE BY MARKETS\nQ2 FY25 Q1 FY25 Q4 FY24 Q3 FY24 Q2 FY24 Q1 FY24 Q4 FY23 Q3 FY23\n($ in millions)\nData Center $26,272 $22,563 $18,404 $14,514 $10,323 $4,284 $3,616 $3,833\nGaming 2,880 2,647 2,865 2,856 2,486 2,240 1,831 1,574\nProfessional\n454 427 463 416 379 295 226 200\nVisualization\nAuto 346 329 281 261 253 296 294 251\nOEM & Other 88 78 90 73 66 77 84 73\nTOTAL $30,040 $26,044 $22,103 $18,120 $13,507 $7,192 $6,051 $5,931',
 'CFO Commentary on Second Quarter Fiscal 2025 Results\nQ2 Fiscal 2025 Summary\nGAAP\n($ in millions, except earnings per\nQ2 FY25 Q1 FY25 Q2 FY24 Q/Q Y/Y\nshare)\nRevenue $30,040 $26,044 $13,507 Up 15% Up 122%\nGross margin 75.1 % 78.4 % 70.1 % Down 3.3 pts Up 5.0 pts\nOperating expenses $3,932 $3,497 $2,662 Up 12% Up 48%\nOperating income $18,642 $16,909 $6,800 Up 10% Up 174%\nNet income $16,599 $14,881 $6,188 Up 12% Up 168%\nDiluted earnings per share $0.67 $0.60 $0.25 Up 12% Up 168%\nNon-GAAP\n($ in millions, except ea

In [7]:
tesla_text_chunks

['UNITED STATES\nSECURITIES AND EXCHANGE COMMISSION\nWashington, D.C. 20549\nFORM 10-K\n(Mark One)\nx ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934\nFor the fiscal year ended December 31, 2023\nOR\no TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934\nFor the transition period from _________ to _________\nCommission File Number: 001-34756\nTesla, Inc.\n(Exact name of registrant as specified in its charter)\nDelaware 91-2197729\n(State or other jurisdiction of (I.R.S. Employer\nincorporation or organization) Identification No.)\n1 Tesla Road\nAustin, Texas 78725\n(Address of principal executive offices) (Zip Code)\n(512) 516-8177\n(Registrant’s telephone number, including area code)\nSecurities registered pursuant to Section 12(b) of the Act:\nTitle of each class Trading Symbol(s) Name of each exchange on which registered\nCommon stock TSLA The Nasdaq Global Select Market\nSecurities registered pursuant to Sect

In [8]:


# Example metadata management (customize as needed)
def add_metadata(chunks, doc_title):
    metadata_chunks = []
    for chunk in chunks:
        metadata = {
            "title": doc_title,
            "author": "company",  # Update based on document data
            "date": str(datetime.date.today())
        }
        metadata_chunks.append({"text": chunk, "metadata": metadata})
    return metadata_chunks

# Add metadata to Nvidia chunks
nvidia_chunks_with_metadata = add_metadata(nvidia_text_chunks, "NVIDIA Financial Report")

# Add metadata to Tesla chunks
tesla_chunks_with_metadata = add_metadata(tesla_text_chunks, "TESLA Financial Report")


In [9]:
nvidia_chunks_with_metadata

[{'text': 'NVIDIA QUARTERLY REVENUE TREND\nREVENUE BY MARKETS\nQ2 FY25 Q1 FY25 Q4 FY24 Q3 FY24 Q2 FY24 Q1 FY24 Q4 FY23 Q3 FY23\n($ in millions)\nData Center $26,272 $22,563 $18,404 $14,514 $10,323 $4,284 $3,616 $3,833\nGaming 2,880 2,647 2,865 2,856 2,486 2,240 1,831 1,574\nProfessional\n454 427 463 416 379 295 226 200\nVisualization\nAuto 346 329 281 261 253 296 294 251\nOEM & Other 88 78 90 73 66 77 84 73\nTOTAL $30,040 $26,044 $22,103 $18,120 $13,507 $7,192 $6,051 $5,931',
  'metadata': {'title': 'NVIDIA Financial Report',
   'author': 'company',
   'date': '2024-09-14'}},
 {'text': 'CFO Commentary on Second Quarter Fiscal 2025 Results\nQ2 Fiscal 2025 Summary\nGAAP\n($ in millions, except earnings per\nQ2 FY25 Q1 FY25 Q2 FY24 Q/Q Y/Y\nshare)\nRevenue $30,040 $26,044 $13,507 Up 15% Up 122%\nGross margin 75.1 % 78.4 % 70.1 % Down 3.3 pts Up 5.0 pts\nOperating expenses $3,932 $3,497 $2,662 Up 12% Up 48%\nOperating income $18,642 $16,909 $6,800 Up 10% Up 174%\nNet income $16,599 $14,881

In [10]:
tesla_chunks_with_metadata

[{'text': 'UNITED STATES\nSECURITIES AND EXCHANGE COMMISSION\nWashington, D.C. 20549\nFORM 10-K\n(Mark One)\nx ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934\nFor the fiscal year ended December 31, 2023\nOR\no TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934\nFor the transition period from _________ to _________\nCommission File Number: 001-34756\nTesla, Inc.\n(Exact name of registrant as specified in its charter)\nDelaware 91-2197729\n(State or other jurisdiction of (I.R.S. Employer\nincorporation or organization) Identification No.)\n1 Tesla Road\nAustin, Texas 78725\n(Address of principal executive offices) (Zip Code)\n(512) 516-8177\n(Registrant’s telephone number, including area code)\nSecurities registered pursuant to Section 12(b) of the Act:\nTitle of each class Trading Symbol(s) Name of each exchange on which registered\nCommon stock TSLA The Nasdaq Global Select Market\nSecurities registered pursuan

### *Embeddings*

In [11]:
!pip install ollama



In [12]:
!ollama serve > rocama.log 2>&1 &

In [13]:
!curl https://ollama.ai/install.sh | sh

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 13320    0 13320    0     0  24744      0 --:--:-- --:--:-- --:--:-- 24758
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
############################################################################################# 100.0%
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.


In [14]:
!pip show ollama

Name: ollama
Version: 0.3.3
Summary: The official Python client for Ollama.
Home-page: https://ollama.ai
Author: Ollama
Author-email: hello@ollama.com
License: MIT
Location: /usr/local/lib/python3.10/dist-packages
Requires: httpx
Required-by: 


In [15]:
!ollama serve > rocama.log 2>&1 &
!ollama pull nomic-embed-text

[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest 
pulling 970aa74c0a90... 100% ▕▏ 274 MB                         
pulling c71d239df917... 100% ▕▏  11 KB                         
pulling ce4a164fc046... 100% ▕▏   17 B                         
pulling 31df23ea7daa... 100% ▕▏  420 B                         
verifying sha256 digest 
writing manifest 
success [?25h


In [16]:
!ollama list

NAME                   	ID          	SIZE  	MODIFIED               
nomic-embed-text:latest	0a109f422b47	274 MB	Less than a second ago	
llama3:8b              	365c0bd3c000	4.7 GB	43 minutes ago        	
llama3:latest          	365c0bd3c000	4.7 GB	43 minutes ago        	


In [17]:
import ollama

# Function to generate embeddings for text chunks
def generate_embeddings(text_chunks, model_name='nomic-embed-text'):
    embeddings = []
    for chunk in text_chunks:
        # Generate the embedding for each chunk
        embedding = ollama.embeddings(model=model_name, prompt=chunk)
        embeddings.append(embedding)
    return embeddings

In [18]:
# Example: Embed Nvidia text chunks
nvidia_texts = [chunk["text"] for chunk in nvidia_chunks_with_metadata]
nvidia_embeddings = generate_embeddings(nvidia_texts)

nvidia_embeddings

[{'embedding': [0.24099159240722656,
   1.3321235179901123,
   -4.129232883453369,
   -0.21007049083709717,
   -0.017531363293528557,
   -1.3487530946731567,
   0.878192663192749,
   -1.911118507385254,
   0.7788557410240173,
   0.6508660912513733,
   0.31745702028274536,
   0.44303804636001587,
   0.8489767909049988,
   0.05049687623977661,
   0.8648646473884583,
   0.8768801689147949,
   -1.2956660985946655,
   -0.25778788328170776,
   0.6026367545127869,
   1.0388867855072021,
   -0.27912452816963196,
   -1.5318105220794678,
   -1.3641412258148193,
   -0.9011434316635132,
   1.2099612951278687,
   0.9842431545257568,
   -0.22335359454154968,
   -0.0949820801615715,
   0.3312908411026001,
   0.6393224596977234,
   1.2854816913604736,
   0.7058480381965637,
   -0.34776031970977783,
   -0.6859400272369385,
   -0.43442466855049133,
   -0.7001001238822937,
   0.43211209774017334,
   0.3926382064819336,
   0.7065295577049255,
   -0.0912320613861084,
   1.1675912141799927,
   -1.6150870323

In [19]:
# # Example: Embed Tesla text chunks
# tesla_texts = [chunk["text"] for chunk in tesla_chunks_with_metadata]
# tesla_embeddings = generate_embeddings(tesla_texts)

# tesla_embeddings

### CHROMA DB TO STORE EMBEDDINGS

In [20]:
!ollama serve > rocama.log 2>&1 &

In [21]:
%pip install langchain_community fastembed chromadb



In [22]:
from langchain_community.vectorstores import Chroma
from langchain.schema import Document
from langchain_community.embeddings import OllamaEmbeddings

# Wrap Nvidia texts with their respective metadata into Document objects
nvidia_documents = [Document(page_content=chunk['text'], metadata=chunk['metadata']) for chunk in nvidia_chunks_with_metadata]


# Add Nvidia embeddings to the database
nvidia_vector_db = Chroma.from_documents(documents=nvidia_documents,
                      embedding=OllamaEmbeddings(model="nomic-embed-text",show_progress=False),
                      collection_name="nvidia-local-rag")

In [23]:
# # Wrap Tesla texts with their respective metadata into Document objects
# tesla_documents = [Document(page_content=chunk['text'], metadata=chunk['metadata']) for chunk in tesla_chunks_with_metadata]

# # Add Tesla embeddings to the database
# Chroma.from_documents(documents=tesla_documents,
#                       embedding=tesla_embeddings,
#                       collection_name="tesla-local-rag")


### QUERY PROCESSING & MULTI QUERY RETRIEVER

In [24]:
!ollama serve > rocama.log 2>&1 &
!ollama pull llama3

[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest 
pulling 6a0746a1ec1a... 100% ▕▏ 4.7 GB                         
pulling 4fa551d4f938... 100% ▕▏  12 KB                         
pulling 8ab4849b038c... 100% ▕▏  254 B                         
pulling 577073ffcc6c... 100% ▕▏  110 B                         
pulling 3f8eb4da87fa... 100% ▕▏  485 B                         
verifying sha256 digest 
writing manifest 
success [?25h


In [25]:
! ollama pull llama3:8b

[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest 
pulling 6a0746a1ec1a... 100% ▕▏ 4.7 GB                         
pulling 4fa551d4f938... 100% ▕▏  12 KB                         
pulling 8ab4849b038c... 100% ▕▏  254 B                         
pulling 577073ffcc6c... 100% ▕▏  110 B                         
pulling 3f8eb4da87fa... 100% ▕▏  485 B                         
verifying sha256 digest 
writing manifest 
success [?25h


In [26]:
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.chat_models import ChatOllama
from langchain_core.runnables import RunnablePassthrough
from langchain.retrievers.multi_query import MultiQueryRetriever

In [27]:
# LLM from Ollama
local_model = "llama3"
llm = ChatOllama(model=local_model)

In [28]:
QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is to generate five
    different versions of the given user question to retrieve relevant documents from
    a vector database. By generating multiple perspectives on the user question, your
    goal is to help the user overcome some of the limitations of the distance-based
    similarity search. Provide these alternative questions separated by newlines.
    Original question: {question}""",
)

In [29]:
retriever = MultiQueryRetriever.from_llm(
                                          nvidia_vector_db.as_retriever(),
                                          ChatOllama(model=local_model),
                                          prompt=QUERY_PROMPT
)

# RAG prompt
template = """Answer the question based ONLY on the following context:
{context}
Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

In [30]:
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [31]:
from IPython.display import Markdown

questions = '''What are the main revenue drivers for Nvidia this fiscal year?'''
display(Markdown(chain.invoke(questions)))

Based on the provided context, the main revenue drivers for Nvidia this fiscal year are:

1. Data Center: With a significant increase in revenue from $14.514 million in Q3 FY24 to $26.272 million in Q2 FY25.
2. Gaming: A steady increase in revenue from $2.856 million in Q3 FY24 to $2.880 million in Q2 FY25.

These two segments, Data Center and Gaming, are the primary drivers of Nvidia's revenue growth this fiscal year.

In [32]:
questions = '''Can you some financial advise on Nvidia Stock to the future? should people consider buying it?'''
display(Markdown(chain.invoke(questions)))

Based solely on the provided context, which appears to be a financial report for NVIDIA Corporation, I can offer some general insights that might be helpful.

The report presents the company's performance and outlook for the third quarter of fiscal 2025. Here are a few key points:

1. Revenue: The company expects revenue of $32.5 billion in Q3 FY2025, with a plus or minus 2% margin.
2. Gross margins: Both GAAP and non-GAAP gross margins are expected to be around 74.4% and 75.0%, respectively, with a ±50 basis points margin.
3. Operating expenses: GAAP operating expenses are expected to be approximately $4.3 billion, while non-GAAP operating expenses are expected to be around $3.0 billion.

From these numbers, it seems that NVIDIA's revenue and gross margins are expected to remain strong in the near future. However, operating expenses are expected to increase by a significant margin (mid-to-upper 40% range).

As for whether people should consider buying Nvidia stock based on this report alone, it ultimately depends on individual investment goals and risk tolerance. Here are some points to consider:

* If you're looking for short-term gains, the relatively stable revenue and gross margins might be attractive.
* However, if you're concerned about the company's increasing operating expenses, which could impact profitability in the long run, you might want to wait or reconsider your investment.

Keep in mind that this is just a snapshot of one financial report, and there are many other factors that can influence a stock's performance. It's always important to do thorough research and consult with financial experts before making any investment decisions.