# Challenge 3: Start coding

## Introduction

In this challenge you will interact with OpenAI and Phi-3 APIs using Python.
You can use the following notebook schema and complete the code or you can create your own notebook from scretch.

the Steps to complete the challenge are:
- Play with the vanilla models
- Bring your own data

Be sure you have your python environment activated 




## Step 1: Play with the vanilla models

in this step you need to connect to the Azure OpenAI and Phi-3 APIs using code.

### Azure OpenAI API

Let's start with Azure OpenAI API.

- Populate environment variables based on the MaaS deployed in Azure AI Studio.
- Provide the question as prompt (you can use questions from the first part of the challenge)
- Create the OpenAI API client.
- Use the OpenAI API client to generate completions
- Print the completions
- Print the number of tokens used in the prompt and the completion.

<div class="alert alert-block alert-warning">
So that we do not commit any secrets in our Git repository we are using <a href="https://pypi.org/project/python-dotenv/">python-dotenv</a> to manager our environment variables. It will also make things easier when deploying the application in Azure. 

At the root of this repository copy `env.sample.txt` to `.env` open `.env` and edit the variables for step 1. You can edit variables for other steps later when you get there.
</div>

In [1]:
import os, dotenv
dotenv.load_dotenv()

# Setup environment
AZURE_OPENAI_API_KEY = os.getenv("AZURE_OPENAI_API_KEY")
AZURE_OPENAI_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT")
AZURE_OPENAI_API_VERSION = os.getenv("AZURE_OPENAI_API_VERSION")
AZURE_OPENAI_MODEL = os.getenv("AZURE_OPENAI_MODEL")
AZURE_OPENAI_DEPLOYMENT_NAME = os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME")

# Libraries
from openai import AzureOpenAI

In [2]:
# Define the question

QUESTION = "What are the revenues and the operative margins of Alphabet Inc. in 2022 and how it compares with the previous year?"

# Create an Azure OpenAI client

client = AzureOpenAI(
  api_key = AZURE_OPENAI_API_KEY,  
  api_version = AZURE_OPENAI_API_VERSION,
  azure_endpoint = AZURE_OPENAI_ENDPOINT
)

# Use the client to generate completions

response = client.chat.completions.create(
    model=AZURE_OPENAI_DEPLOYMENT_NAME, # model = "deployment_name".
    messages=[
        {"role": "system", "content": "Assistant is a large language model trained by OpenAI."},
        {"role": "user", "content": QUESTION}
    ]
)


# Print the response
print(response.choices[0].message.content)

# Get the number of tokens in the response
print(f"PROMPT TOKENS: {response.usage.prompt_tokens} | COMPLETITION TOKENS: {response.usage.completion_tokens}") 

As of my last update in October 2021, I don't have real-time access to data or the internet to fetch the most current financial statements of Intel or any other company. However, you can easily find the latest financial information for Intel in the following ways:

1. **Intel's Annual Report:** Companies publish their annual reports on their official websites under the "Investor Relations" section. For Intel, you can visit their Investor Relations page to find the latest annual report, which includes revenue, operating margins, and a comparison with previous years.

2. **SEC Filings:** In the United States, publicly traded companies are required to file regular financial reports to the U.S. Securities and Exchange Commission (SEC). You can access these filings through the SEC's EDGAR database.

3. **Financial News Websites:** Websites such as Yahoo Finance, Google Finance, and Bloomberg provide financial summaries and comparisons for publicly traded companies.

4. **Analyst Reports:** 

### Phi-3 API

Now let's do the same using the Phi-3 API.

the steps are similar to the Azure OpenAI API.

- Populate environment variables based on the MaaS deployed in Azure AI Studio.
- Provide the question as prompt (you can use questions from the first part of the challenge)
- Create the OpenAI API client.
- Use the OpenAI API client to generate completions
- Print the completions
- Print the number of tokens used in the prompt and the completion.

**Important:** Make sure you update your `.env` file.

In [3]:
import os, dotenv
dotenv.load_dotenv()

# Setup environment
PHI_API_KEY = os.getenv("PHI_API_KEY")
PHI_ENDPOINT = os.getenv("PHI_ENDPOINT")
PHI_DEPLOYMENT_NAME = os.getenv("PHI_DEPLOYMENT_NAME")

# Libraries
from openai import OpenAI

In [4]:
# Define the question

QUESTION = "What are the revenues and the operative margins of Alphabet Inc. in 2022 and how it compares with the previous year?"

# Create an Azure OpenAI client

client = OpenAI(
  base_url= PHI_ENDPOINT,
  api_key= PHI_API_KEY
)

# Use the client to generate completions

response = client.chat.completions.create(
    model=PHI_DEPLOYMENT_NAME, # model = "deployment_name".
    messages=[
        {"role": "system", "content": "Assistant is a large language model trained by Microsoft."},
        {"role": "user", "content": QUESTION}
    ]
)

# Print the response
print(response.choices[0].message.content)

# Get the number of tokens in the response
print(f"PROMPT TOKENS: {response.usage.prompt_tokens} | COMPLETITION TOKENS: {response.usage.completion_tokens}") 

 I'm sorry for any confusion, but as of my last update in April 2023, I can't provide real-time or the most recent financial data directly. However, I can guide you on how to find this information.

To compare Intel's revenues and operating margins for 2022 with the previous year, you can:

1. Visit Intel's official website: Companies often publish their annual revenue and financial performance in their Investor Relations section.

2. Check financial news websites: Websites like Bloomberg, CNBC, and Reuters regularly report on company financial results.

3. Look at financial data platforms: Yahoo Finance, Google Finance, and Morningstar provide detailed financial information, including annual revenue and operating margins.

4. Review SEC filings: The U.S. Securities and Exchange Commission (SEC) requires public companies to file annual reports (Form 10-K) and quarterly reports (Form 10-Q), which include detailed financial statements.

Here is a general approach to analyze Intel's finan

## Step 2: Bring your own data

After the test of the vanilla models, now it's time to bring your data into the picture.


We will use Langchain framework and Azure AI Search for this.

Remember what you learned from Challenge 0 regarding the RAG end-to-end process.
- Index
    - Load (Document Loader)
    - Split (Text Splitters)
    - Store (Vector Stores and Embeddings)
- Retrieve
- Generate


### Azure OpenAI API

- Populate environment variables based on the MaaS deployed in Azure AI Studio.
- Create a Search Vector Store, the Azure Open AI embedding and the Azure Chat OpenAI objects.
- Index : Load documents from the data source (you can use AzureBlobStorageContainerLoader)
- Index : Split the documents in chucks (you can use the RecursiveCharacterTextSplitter)
- Index : Store the documents in the vector store (you can use the add_documents method)
- Retrieve: Create a retriver using the Vector Store (SimilaritySearch and top_k)
- Generate: Use the langchain chain to generate completions (get context from retriever and format the context in single line with the question -> add the proper prompt -> send to LLM -> get structured output)

**Important:** Make sure you update your `.env` file.

In [5]:
import os, dotenv
dotenv.load_dotenv()

# ENVIRONMENT VARIABLES
# OpenAI
AZURE_OPENAI_API_KEY = os.getenv("AZURE_OPENAI_API_KEY")
AZURE_OPENAI_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT")
AZURE_OPENAI_API_VERSION = os.getenv("AZURE_OPENAI_API_VERSION")
AZURE_OPENAI_MODEL = os.getenv("AZURE_OPENAI_MODEL")
AZURE_OPENAI_DEPLOYMENT_NAME = os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME")
AZURE_OPENAI_EMBEDDING = os.getenv("AZURE_OPENAI_EMBEDDING")

# Azure Search
AZURE_SEARCH_ENDPOINT = os.getenv("AZURE_SEARCH_ENDPOINT")
AZURE_SEARCH_API_KEY = os.getenv("AZURE_SEARCH_API_KEY")
AZURE_SEARCH_INDEX = os.getenv("AZURE_SEARCH_INDEX")

# Azure Blob Storage
AZURE_STORAGE_CONNECTION_STRING = os.getenv("AZURE_STORAGE_CONNECTION_STRING")
AZURE_STORAGE_CONTAINER = os.getenv("AZURE_STORAGE_CONTAINER")

# Import Libraries
from langchain_openai import AzureOpenAIEmbeddings, AzureChatOpenAI
from langchain_community.vectorstores.azuresearch import AzureSearch
from langchain_community.document_loaders import AzureBlobStorageContainerLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

In [6]:
# Create the required objects

# Azure OpenAI Embeddings

embeddings = AzureOpenAIEmbeddings(
    azure_deployment = AZURE_OPENAI_EMBEDDING,
    openai_api_version = AZURE_OPENAI_API_VERSION,
    azure_endpoint = AZURE_OPENAI_ENDPOINT,
    api_key = AZURE_OPENAI_API_KEY
)

# Azure Search Vector Store

vector_store= AzureSearch (
    azure_search_endpoint=AZURE_SEARCH_ENDPOINT,
    azure_search_key=AZURE_SEARCH_API_KEY,
    index_name=AZURE_SEARCH_INDEX,
    embedding_function=embeddings.embed_query,
    # Configure max retries for the Azure client
    additional_search_client_options={"retry_total": 4},
)

# Define the LLM model to use
llm = AzureChatOpenAI(
    azure_deployment=AZURE_OPENAI_DEPLOYMENT_NAME,
    api_key=AZURE_OPENAI_API_KEY,
    api_version=AZURE_OPENAI_API_VERSION,
    azure_endpoint=AZURE_OPENAI_ENDPOINT,
    temperature=0,
    max_retries=2
)

In [7]:
# Index: Load the documents

# Load the document from Azure Blob Storage (AzureBlobStorageContainerLoader)
loader = AzureBlobStorageContainerLoader(
    conn_str=AZURE_STORAGE_CONNECTION_STRING,
    container=AZURE_STORAGE_CONTAINER,
)
documents = loader.load()
print(documents)

[Document(metadata={'source': '/tmp/tmpokvef1u9/data/2023 FY AMZN.pdf'}, page_content='UNITED STATESSECURITIES AND EXCHANGE COMMISSIONWashington, D.C. 20549 ____________________________________FORM 10-K____________________________________ (Mark One)☒ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934For the fiscal year ended December 31, 2023or☐TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934For the transition period from to .Commission File No. 000-22513____________________________________AMAZON.COM, INC.(Exact name of registrant as specified in its charter)Delaware 91-1646860(State or other jurisdiction ofincorporation or organization) (I.R.S. EmployerIdentification No.)410 Terry Avenue NorthSeattle, Washington 98109-5210(206) 266-1000(Address and telephone number, including area code, of registrant’s principal executive offices)Securities registered pursuant to Section 12(b) of the Act:Title of Each ClassTradin

In [8]:
# Index: Split (RecursiveCharacterTextSplitter - 1000 characters - 200 overlap)

text_splitter = RecursiveCharacterTextSplitter(
   chunk_size=1000, 
   chunk_overlap=200,
   length_function=len,
   is_separator_regex=False
)

chunks = text_splitter.split_documents(documents)

print(f"Number of chunks: {len(chunks)}")
print(chunks[20])
print(chunks[len(chunks) - 5])


Number of chunks: 1696
page_content='vehicle safety controls, and engineering ergonomic solutions. Our safety team is dedicated to using the science of safety to solve complex problems and establish new industry best practices. We also provide mentorship and support resources to our employees, and have deployed numerous programs that advance employee engagement, communication, and feedback.' metadata={'source': '/tmp/tmpokvef1u9/data/2023 FY AMZN.pdf'}
page_content='d) Disclosed in this report any change in the registrant’s internal control over financial reporting that occurred during the registrant’s most recent fiscal quarter (the registrant’s fourth fiscal quarter in the case of an annual report) that has materially affected, or is reasonably likely to materially affect, the registrant’s internal control over financial reporting; and

5. The registrant’s other certifying officer and I have disclosed, based on our most recent evaluation of internal control over financial reporting, 

In [9]:
# Index: Store (add_documents)

vectors = vector_store.add_documents(documents=chunks)

In [10]:
# Retrieve (similarity_score_threshold - score_threshold=0.5)

retriever = vector_store.as_retriever(search_type="similarity_score_threshold", search_kwargs={"score_threshold": 0.5})

retriever.invoke("What are the revenues and the operative margins of Alphabet Inc. in 2022 and how it compares with the previous year?")


[Document(metadata={'id': 'NWU4NTg1MTAtYzA1My00ZTNiLWIwNDctOWY2ZWNlMmYxMjU1', 'header': '{"Header 2": "Financial Performance", "Header 3": "Key Points:"}', 'source': '/workspaces/ch-hackathons/its-a-rag-solutions/challenge4/../../its-a-rag/data/fsi/pdf/2023 FY INTC.pdf', 'image': None}, page_content='- Revenue increased each year from 2021 to 2023.\n- There was an operating income loss in 2022 and 2023, with the loss being larger in 2023 compared to 2022.\n- In 2021, there was a positive operating income, unlike in the subsequent years." --></figure>  \n2023 vs. 2022  \nRevenue was $952 million, up $483 million from 2022, driven by higher packaging revenue. We had an operating loss of $482 million, compared to an operating loss of $281 million from 2022, primarily due to increased spending to drive strategic growth.  \n2022 vs. 2021  \nRevenue was $469 million, up $122 million from 2021, primarily driven by higher sales of multi-beam mask writer tools. We had an operating loss of $281 

In [11]:
# Generate

# Take all the result documents from the retriever and format them into a single string suitable for input into the language model.
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Use the ChatPromptTemplate to define the prompt that will be sent to the model (Human) remember to include the question and the context
prompt = ChatPromptTemplate.from_messages([("system", "You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise. Question: {question} Context: {context} Answer:"),])

# Define the Chain to get the answer
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [12]:
# Test the solution
print(rag_chain.invoke("What are the revenues of Google in the year 2000?"))
print(rag_chain.invoke("What are the revenues and the operative margins of Alphabet Inc. in 2022 and how it compares with the previous year?"))
print(rag_chain.invoke("Can you compare and create a table with the revenue of Alphabet Inc., NVIDIA, MICROSOFT, APPLE and AMAZON in years 2023?"))
print(rag_chain.invoke("Did APPLE repurchase common stock in 2023? create a table of Apple repurchased stock with date, numbers of stocks and values in dollars."))
print(rag_chain.invoke("Can you give me the Fiscal Year 2023 Highlights for Apple, Microsoft, Nvidia and Google?"))

I don't know. The provided context does not include information about Intel Corporation's revenues for the year 2000.
In 2022, Intel's revenue was $63,054 million, down from $79,024 million in 2021. The operating margin in 2022 was 3.7%, a significant decrease from 24.6% in 2021. This decline was primarily due to increased spending to drive strategic growth, leading to an operating loss of $281 million in 2022 compared to an operating income of $19,456 million in 2021.
I don't have the complete revenue data for Intel from 2019 to 2023. However, I can provide the available data:

| Year | Revenue (in billions) |
|------|-----------------------|
| 2023 | $54.2                 |
| 2022 | $63.0                 |
| 2021 | $79.0                 |
| 2020 | Not provided          |
| 2019 | Not provided          |

For the missing years, you would need to refer to Intel's annual financial reports.
In 2023, Intel's GAAP gross margin is 40.0%, down from 42.6% in 2022, and the Non-GAAP gross margi