# Challenge 3: Start coding

## Introduction

In this challenge you will interact with OpenAI and Phi-3 APIs using Python.
You can use the following notebook schema and complete the code or you can create your own notebook from scretch.

the Steps to complete the challenge are:
- Play with the vanilla models
- Bring your own data

Be sure you have your python environment activated 




## Step 1: Play with the vanilla models

in this step you need to connect to the Azure OpenAI and Phi-3 APIs using code.

### Azure OpenAI API

Let's start with Azure OpenAI API.

- Populate environment variables based on the MaaS deployed in Azure AI Studio.
- Provide the question as prompt (you can use questions from the first part of the challenge)
- Create the OpenAI API client.
- Use the OpenAI API client to generate completions
- Print the completions
- Print the number of tokens used in the prompt and the completion.

<div class="alert alert-block alert-warning">
So that we do not commit any secrets in our Git repository we are using <a href="https://pypi.org/project/python-dotenv/">python-dotenv</a> to manager our environment variables. It will also make things easier when deploying the application in Azure. 

At the root of this repository copy `env.sample.txt` to `.env` open `.env` and edit the variables for step 1. You can edit variables for other steps later when you get there.
</div>

In [1]:
import os, dotenv
dotenv.load_dotenv()

# Setup environment
AZURE_OPENAI_API_KEY = os.getenv("AZURE_OPENAI_API_KEY")
AZURE_OPENAI_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT")
AZURE_OPENAI_API_VERSION = os.getenv("AZURE_OPENAI_API_VERSION")
AZURE_OPENAI_MODEL = os.getenv("AZURE_OPENAI_MODEL")
AZURE_OPENAI_DEPLOYMENT_NAME = os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME")

# Libraries
from openai import AzureOpenAI

In [2]:
# Define the question

QUESTION = "What are the revenues and the operative margins of Alphabet Inc. in 2022 and how it compares with the previous year?"

# Create an Azure OpenAI client

client = AzureOpenAI(
  api_key = AZURE_OPENAI_API_KEY,  
  api_version = AZURE_OPENAI_API_VERSION,
  azure_endpoint = AZURE_OPENAI_ENDPOINT
)

# Use the client to generate completions

response = client.chat.completions.create(
    model=AZURE_OPENAI_DEPLOYMENT_NAME, # model = "deployment_name".
    messages=[
        {"role": "system", "content": "Assistant is a large language model trained by OpenAI."},
        {"role": "user", "content": QUESTION}
    ]
)


# Print the response
print(response.choices[0].message.content)

# Get the number of tokens in the response
print(f"PROMPT TOKENS: {response.usage.prompt_tokens} | COMPLETITION TOKENS: {response.usage.completion_tokens}") 

As of my last update, I don't have access to real-time data or updates from after September 2021, so I'm unable to provide the specific revenues and operating margins for Alphabet Inc. for the year 2022 or any direct comparisons with the previous year. However, typically, Alphabet Inc., the parent company of Google, files their financial performance publicly every quarter and annually. You can find the most accurate and up-to-date information on Alphabet's revenues and operating margins in their official SEC filings or on their investor relations website. Additionally, financial news websites and databases like Bloomberg, Reuters, or Yahoo Finance often provide detailed financial analysis of public companies including annual and quarterly reports.
PROMPT TOKENS: 47 | COMPLETITION TOKENS: 137


### Phi-3 API

Now let's do the same using the Phi-3 API.

the steps are similar to the Azure OpenAI API.

- Populate environment variables based on the MaaS deployed in Azure AI Studio.
- Provide the question as prompt (you can use questions from the first part of the challenge)
- Create the OpenAI API client.
- Use the OpenAI API client to generate completions
- Print the completions
- Print the number of tokens used in the prompt and the completion.

**Important:** Make sure you update your `.env` file.

In [3]:
import os, dotenv
dotenv.load_dotenv()

# Setup environment
PHI_API_KEY = os.getenv("PHI_API_KEY")
PHI_ENDPOINT = os.getenv("PHI_ENDPOINT")
PHI_DEPLOYMENT_NAME = os.getenv("PHI_DEPLOYMENT_NAME")


This section works with azure-ai-inference

In [4]:
# Libraries
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential
from azure.ai.inference.models import SystemMessage, UserMessage


# Define the question

QUESTION = "What are the revenues and the operative margins of Alphabet Inc. in 2022 and how it compares with the previous year?"

# Create an Chatcompletion client
phiClient = ChatCompletionsClient(
    endpoint=PHI_ENDPOINT,
    credential=AzureKeyCredential(PHI_API_KEY))

model_info = phiClient.get_model_info()
print("Model name:", model_info.model_name)
print("Model type:", model_info.model_type)
print("Model provider name:", model_info.model_provider_name)


# Use the client to generate completions

phiResponse = phiClient.complete(
    model="Phi-3-medium-128k-instruct",
    messages=[
        SystemMessage(content="Assistant is a large language model trained by Microsoft."),
        UserMessage(content=QUESTION)
    ]
)

# Print the response
print(phiResponse.choices[0].message.content)

# Get the number of tokens in the response
print(f"PROMPT TOKENS: {phiResponse.usage.prompt_tokens} | COMPLETITION TOKENS: {phiResponse.usage.completion_tokens}") 

Model name: phi3-medium-128k
Model type: chat-completion
Model provider name: Phi
 In 2022, Alphabet Inc. reported revenues of approximately $282.8 billion, which represented an increase of roughly 12% compared to the previous year's revenue of about $252.6 billion.

The operative margin (also known as operating margin), a profitability ratio that indicates how much profit a company makes from its operations before interest and taxes, for Alphabet Inc. in 2022 was approximately 24.3%. This figure is a slight increase compared to 2021, where the company's operating margin was about 23.7%.

It's important to note that these figures can be found in the company's annual report or quarterly earnings reports, which are publicly available on the Alphabet Inc. investor relations website.
PROMPT TOKENS: 35 | COMPLETITION TOKENS: 180


This section works with OpenAI librairies

In [5]:
from openai import OpenAI
# Define the question

QUESTION = "What are the revenues and the operative margins of Alphabet Inc. in 2022 and how it compares with the previous year?"

# Create an Azure OpenAI client

client = OpenAI(
  base_url= PHI_ENDPOINT,
  api_key= PHI_API_KEY
)

# Use the client to generate completions

response = client.chat.completions.create(
    model="Phi-3-medium-128k-instruct",
    messages=[
        {"role": "system", "content": "Assistant is a large language model trained by Microsoft."},
        {"role": "user", "content": QUESTION}
    ]
)

# Print the response
print(response.choices[0].message.content)

# Get the number of tokens in the response
print(f"PROMPT TOKENS: {response.usage.prompt_tokens} | COMPLETITION TOKENS: {response.usage.completion_tokens}") 

 Alphabet Inc., the parent company of Google, reported total revenues of $257.6 billion in 2022, according to its annual report. This represents an increase of 22% compared to the $212.1 billion reported in 2021.

The operative margin of Alphabet Inc. also improved in 2022, reaching 25.5% compared to 23.6% in 2021. This indicates that the company was more efficient at controlling its operating costs and generating profits from its core business operations.

To put these figures into context, Alphabet Inc.'s strong revenue growth and improved operative margin in 2022 can be attributed to several factors, including the continued growth of its digital advertising business, the expansion of its cloud services, and the ongoing success of its other hardware and software products. Additionally, the company's effective cost management strategies and investments in innovation and talent likely contributed to its improved financial performance.

Overall, Alphabet Inc.'s financial results in 2022

## Step 2: Bring your own data

After the test of the vanilla models, now it's time to bring your data into the picture.


We will use Langchain framework and Azure AI Search for this.

Remember what you learned from Challenge 0 regarding the RAG end-to-end process.
- Index
    - Load (Document Loader)
    - Split (Text Splitters)
    - Store (Vector Stores and Embeddings)
- Retrieve
- Generate


### Azure OpenAI API

- Populate environment variables based on the MaaS deployed in Azure AI Studio.
- Create a Search Vector Store, the Azure Open AI embedding and the Azure Chat OpenAI objects.
- Index : Load documents from the data source (you can use AzureBlobStorageContainerLoader)
- Index : Split the documents in chucks (you can use the RecursiveCharacterTextSplitter)
- Index : Store the documents in the vector store (you can use the add_documents method)
- Retrieve: Create a retriver using the Vector Store (SimilaritySearch and top_k)
- Generate: Use the langchain chain to generate completions (get context from retriever and format the context in single line with the question -> add the proper prompt -> send to LLM -> get structured output)

**Important:** Make sure you update your `.env` file.

<div class="alert alert-block alert-warning">
Here the preferred endpoint for Gpt-4o and embedding is the global one of format : `"https://< location >.api.cognitive.microsoft.com"

you need to be careful on the api-version which can be different between gpt-4o and embedding
</div>

In [17]:
import os, dotenv
dotenv.load_dotenv()

# ENVIRONMENT VARIABLES
# OpenAI
AZURE_OPENAI_API_KEY = os.getenv("AZURE_OPENAI_API_KEY")
AZURE_OPENAI_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT")
AZURE_OPENAI_API_VERSION = os.getenv("AZURE_OPENAI_API_VERSION")
AZURE_OPENAI_MODEL = os.getenv("AZURE_OPENAI_MODEL")
AZURE_OPENAI_DEPLOYMENT_NAME = os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME")
AZURE_OPENAI_EMBEDDING = os.getenv("AZURE_OPENAI_EMBEDDING")
AZURE_OPENAI_EMBEDDING_API_VERSION = os.getenv("AZURE_OPENAI_EMBEDDING_API_VERSION")

# Azure Search
AZURE_SEARCH_ENDPOINT = os.getenv("AZURE_SEARCH_ENDPOINT")
AZURE_SEARCH_API_KEY = os.getenv("AZURE_SEARCH_API_KEY")
AZURE_SEARCH_INDEX = os.getenv("AZURE_SEARCH_INDEX")

# Azure Blob Storage
AZURE_STORAGE_CONNECTION_STRING = os.getenv("AZURE_STORAGE_CONNECTION_STRING")
AZURE_STORAGE_CONTAINER = os.getenv("AZURE_STORAGE_CONTAINER")

# Import Libraries
from langchain_openai import AzureOpenAIEmbeddings, AzureChatOpenAI
from langchain_community.vectorstores.azuresearch import AzureSearch
from langchain_community.document_loaders import AzureBlobStorageContainerLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

In [18]:
# Create the required objects

# Azure OpenAI Embeddings

embeddings = AzureOpenAIEmbeddings(
    azure_deployment = AZURE_OPENAI_EMBEDDING,
    openai_api_version = AZURE_OPENAI_EMBEDDING_API_VERSION,
    azure_endpoint = AZURE_OPENAI_ENDPOINT,
    api_key = AZURE_OPENAI_API_KEY
)

# Azure Search Vector Store

vector_store= AzureSearch (
    azure_search_endpoint=AZURE_SEARCH_ENDPOINT,
    azure_search_key=AZURE_SEARCH_API_KEY,
    index_name=AZURE_SEARCH_INDEX,
    embedding_function=embeddings.embed_query,
    # Configure max retries for the Azure client
    additional_search_client_options={"retry_total": 4},
)

# Define the LLM model to use
llm = AzureChatOpenAI(
    azure_deployment=AZURE_OPENAI_DEPLOYMENT_NAME,
    api_key=AZURE_OPENAI_API_KEY,
    api_version=AZURE_OPENAI_API_VERSION,
    azure_endpoint=AZURE_OPENAI_ENDPOINT,
    temperature=0,
    max_retries=2
)

Here make sure you have enabled for BOTH Storage accounts (stitsaragdataxxx and stitsaragxxx) the Allow storage account key access in Settings -> Configuration.

Here the connection string needs to be in the format : "DefaultEndpointProtocol=https;AccountName=< nameofyourstorageaccount >;AccountKey=< key >;EndpointSuffix=core.windows.net"

In [20]:
# Index: Load the documents

# Load the document from Azure Blob Storage (AzureBlobStorageContainerLoader)
loader = AzureBlobStorageContainerLoader(
    conn_str=AZURE_STORAGE_CONNECTION_STRING,
    container=AZURE_STORAGE_CONTAINER,
)
documents = loader.load()
print(documents)



In [21]:
# Index: Split (RecursiveCharacterTextSplitter - 1000 characters - 200 overlap)

text_splitter = RecursiveCharacterTextSplitter(
   chunk_size=1000, 
   chunk_overlap=200,
   length_function=len,
   is_separator_regex=False
)

chunks = text_splitter.split_documents(documents)

print(f"Number of chunks: {len(chunks)}")
print(chunks[20])
print(chunks[len(chunks) - 5])


Number of chunks: 2381
page_content='vehicle safety controls, and engineering ergonomic solutions. Our safety team is dedicated to using the science of safety to solve complex problems and establish new industry best practices. We also provide mentorship and support resources to our employees, and have deployed numerous programs that advance employee engagement, communication, and feedback.' metadata={'source': '/tmp/tmpf_zly3oh/ragdata/2023 FY AMZN.pdf'}
page_content='Incentive Compensation

Any compensation that is granted, earned or vested based wholly or in part upon the attainment of a Financial Reporting Measure

Lookback Period

The three completed fiscal years immediately preceding the Accounting Restatement Date, as well as any transition period (resulting from a change in the Company’s fiscal year) within or immediately following those three completed fiscal years (except that a transition period of at least nine months shall be deemed a completed fiscal year); provided, that

In [22]:
# Index: Store (add_documents)

vectors = vector_store.add_documents(documents=chunks)

In [23]:
# Retrieve (similarity_score_threshold - score_threshold=0.5)

retriever = vector_store.as_retriever(search_type="similarity_score_threshold", search_kwargs={"score_threshold": 0.5})

retriever.invoke("What are the revenues and the operative margins of Alphabet Inc. in 2022 and how it compares with the previous year?")


[Document(metadata={'id': 'MzcwNzU4MmMtYmQ5OC00MzMwLWJiZjEtZjNjYjMyMjIxODM1', 'source': '/tmp/tmpe8vko3u2/ragdata/2023 FY GOOGL.pdf'}, page_content='(1)\n\nTotal income from operations\n\n$\n\n$\n\n82,699 $ (1,922) (4,636) (1,299) 74,842 $\n\n95,858 1,716 (4,095) (9,186) 84,293\n\n(1)\n\nIn addition to the costs included in Alphabet-level activities, hedging gains (losses) related to revenue were $2.0 billion and $236 million in 2022 and 2023, respectively. For the year ended December 31, 2023, Alphabet-level activities include charges related to the reduction in force and our office space optimization efforts totaling $3.9 billion. In addition, for the year ended December 31, 2023, we incurred $269 million in accelerated rent and accelerated depreciation. For additional information relating to our workforce reduction and other initiatives, see Note 8 of the Notes to Consolidated Financial Statements included in Item 8 of this Annual Report on Form 10-K. For additional information rela

In [24]:
# Generate

# Take all the result documents from the retriever and format them into a single string suitable for input into the language model.
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Use the ChatPromptTemplate to define the prompt that will be sent to the model (Human) remember to include the question and the context
prompt = ChatPromptTemplate.from_messages([("system", "You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise. Question: {question} Context: {context} Answer:"),])

# Define the Chain to get the answer
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [25]:
# Test the solution
print(rag_chain.invoke("What are the revenues of Google in the year 2000?"))
print(rag_chain.invoke("What are the revenues and the operative margins of Alphabet Inc. in 2022 and how it compares with the previous year?"))
print(rag_chain.invoke("Can you compare and create a table with the revenue of Alphabet Inc., NVIDIA, MICROSOFT, APPLE and AMAZON in years 2023?"))
print(rag_chain.invoke("Did APPLE repurchase common stock in 2023? create a table of Apple repurchased stock with date, numbers of stocks and values in dollars."))
print(rag_chain.invoke("Can you give me the Fiscal Year 2023 Highlights for Apple, Microsoft, Nvidia and Google?"))

The provided context does not include information about Google's revenues in the year 2000.
In 2022, Alphabet Inc. reported total income from operations of $74,842 million. In 2023, this figure increased to $84,293 million, indicating a growth in operational income. The retrieved context does not provide specific details on the operative margins for 2022 and 2023, nor does it compare these figures directly with the previous year beyond the income from operations.
I don't have the specific revenue figures for all the companies mentioned (Alphabet Inc., NVIDIA, MICROSOFT, APPLE, and AMAZON) for the year 2023. The context provided only includes detailed revenue information for NVIDIA and partial information for Alphabet Inc. for the fiscal year ending in 2023.
Yes, Apple did repurchase common stock in 2023. Below is a table summarizing the repurchased stock:

| Date Range                  | Number of Shares Repurchased (in thousands) | Value in Dollars (in millions) |
|-------------------