# OpenTelemetry Basics

Now that we have an understand of what is happening in the Azure, we need to understand how we can get the data from our application to Azure. In this section we'll be using OpenTelemetry to collect telemetry data from our application and send it to Azure Application Insights.

[OpenTelemetry](https://opentelemetry.io/) is an open-source observability framework that provides a set of APIs, libraries, agents, and instrumentation to enable observability for applications. It allows developers to collect and export telemetry data such as traces, metrics, and logs from their applications to various backends.


Before we can start we need to install the OpenTelemetry SDK and get our connection string from Azure Application Insights. 


We can split OpenTelemetry into the following components:


## SDK´s

The OpenTelemetry SDK is the core component of OpenTelemetry that provides the APIs and libraries for collecting telemetry data. this is running as part of your application. The SDK is responsible for instrumenting your application code, collecting telemetry data, and exporting it to a backend.

The SDK is available for various programming languages, including Java, Python, JavaScript, Go, and .NET. Each language has its own SDK implementation, but they all follow the same core principles and concepts of OpenTelemetry.

We´ll be using opentelemetry with the Azure monitor backend as it has a built in exporter.

### Enrichment & extension libraries

Open telemetry has libraries for quite a few languages that allow to auto enrichment. 

In dotnet this could be teh OpenTelemetry.Extensions.Hosting, or opentelemetry-instrumentation-httpx in python. 


You can find the list of python enrichment libraries here [Python Enrichment Libraries](https://github.com/open-telemetry/opentelemetry-python-contrib/tree/main/instrumentation#readme).


## Collectors

The OpenTelemetry Collector is a vendor-agnostic proxy that can receive, process, and export telemetry data. It supports receiving telemetry data in multiple formats (for example, OTLP, Jaeger, Prometheus, as well as many commercial/proprietary tools) and sending data to one or more backends. It also supports processing and filtering telemetry data before it gets exported.

You can read more here [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/).

## Exporters
Exporters are components that send telemetry data to a backend. OpenTelemetry provides a set of built-in exporters for various backends, including Azure Application Insights, Jaeger, Prometheus, and others. You can also create custom exporters to send data to other backends.

You see read more here [Exporters](https://opentelemetry.io/docs/concepts/components/#exporters).

## Standalone agent

The OpenTelemetry Standalone Agent is a lightweight agent that can be deployed alongside your application to collect telemetry data. It can receive telemetry data from the OpenTelemetry SDK and export it to a backend. The Standalone Agent is useful for scenarios where you want to decouple the telemetry collection from your application code or when you want to use a different language for the agent than the one used in your application. This could also be in a scenario where you have network isolation and need to tunnel all telemetry data through a single host.

You can read more here [OpenTelemetry Standalone Agent](https://opentelemetry.io/docs/collector/standalone-agent/).







## Connecting OpenTelemetry to Azure Application Insights

To connect our application to Azure Application Insights, we need to provide a connection string. This connection string contains the necessary information for our application to send telemetry data to the Application Insights resource in Azure. In a production environment, you would typically store this connection string in a secure location such as via a configuration management system.

Let's fetch the connection string from Azure Application Insights. we can get the connection string from either the azure portal or the CLI

<details>
  <summary>Using the Azure portal</summary>

1. Open the Azure portal and navigate to your Application Insights resource.
2. In the overview blade, copy the "Connection String". 
</details>
<details>
  <summary>Using the Azure CLI</summary>

```bash
$UserInitials = "ABCD" # Replace with your initials
$SystemName = "msows${UserInitials}" # Microsoft Observability Workshop - msows
$appiName = "appi-{$SystemName}-dev"
az resource show -n $appiName -g "rg-obersvabilityworkshop-dev" --resource-type "microsoft.insights/components" --query properties.ConnectionString 
```
</details>


The connection string will look like this:

```bash
InstrumentationKey=1234567890-1234-1234-1234-1234567890;IngestionEndpoint=https://swedencentral-0.in.applicationinsights.azure.com/;LiveEndpoint=https://swedencentral.livediagnostics.monitor.azure.com/;ApplicationId=1234567890-12345-12345-12345-12345-1234567890
```


For this workshop, we will be using an environment file This is a common practice for local development and testing.

Create a file called `.env` in the root of your project directory. This file will contain the connection string for Application Insights. The format of the file should be as follows:

```bash
APPLICATIONINSIGHTS_CONNECTION_STRING="Paste connection string here"
```

From here any application that uses the OpenTelemetry SDK will be able to read the connection string from this file via the load_dotenv() function. This is a common practice for local development and testing.

## Preparing your python environment

Let's get a python environment up and running. We'll be using a virtual environment to isolate our dependencies. This is a common practice for python development.
```bash
# Create a virtual environment
python3 -m venv .venv
# Activate the virtual environment
source .venv/bin/activate  
# Install the required packages
pip install -r requirements.txt
```

In the jupyter notebook you can select which kernel to use in the upper right corner. Select the kernel that corresponds to the virtual environment you just created.


#### Python code

First lets setup the basic auto instrumentation for the OpenTelemetry SDK. This will  instrument your application to collect telemetry data without requiring major code changes. The OpenTelemetry SDK provides a set of APIs and libraries that can be used to instrument your application and collect telemetry data. We'll take a look them in a second.

In [1]:
import logging
from azure.monitor.opentelemetry import configure_azure_monitor
from opentelemetry import trace
import random
import time
from dotenv import load_dotenv

load_dotenv()
configure_azure_monitor()

Now that we have initialized the OpenTelemetry SDK, we can start collecting telemetry data. We'll start by using the logger. and then tracer functionality.

The logger is the standard log implementation, it allows us to log messages at different levels (debug, info, warning, error, critical). The logger is a standard python logger and can be used in the same way as the standard python logger. The tracer is a bit different, it is used to create spans. A span is a single unit of work within trace, and a trace is a collection of spans that represent a end to end operation.


In [2]:


logger = logging.getLogger()  # Logging telemetry will be collected from logging calls made with this logger and all of it's children loggers.
tracer = trace.get_tracer("Jupyter_Notebook") # Tracing telemetry will be collected from spans created with this tracer.


Now let's generate some a log to check if everything is working as expected.

In [7]:
logger.info("This is an info message")

To test if the OpenTelemetry SDK is working correctly, you'll should be able to see the logs in Application Insights. You can do this by going to the Application Insights resource in the Azure portal and navigating to the Transaction search and you should be able to see your logs. It should look something like this: 

**Note** it might take a minute or two for the logs to show up in Application Insights.

![Transaction Search](.files/appi_Transaction_search.png)

You can also query for logs via the azCLI by running the following command in the Azure CLI:

```bash
az monitor app-insights query --analytics-query "traces" -g "rg-obersvabilityworkshop-dev" -a "appi-CHANGE ME-dev"
```

If we want to enrich logs with metadata we can also do this, In python this is called extra, and in azure it'll show up under Custom Properties to the logs.


In [4]:
logtypes = [logging.DEBUG,logging.INFO,logging.WARNING,logging.ERROR,logging.CRITICAL ]

extra = {
    "workload": "workshop",
    "component": "03-open-telemetry-basics",
    "environment": "Development", # You'd get this from the environment variables in a real application
}

for logtype in logtypes:
    logger.log(logtype, f"This is an {logtype} message", extra=extra)

We can also enrich the logs further using enrichers, More can be found here about them-
* https://opentelemetry.io/docs/collector/transforming-telemetry/#enriching-telemetry-with-resource-attributes
* https://opentelemetry.io/docs/collector/transforming-telemetry/#adding-or-deleting-attributes

We can also set custom properties for our the application using opentelemetry environment variables. here we would populate the service name and version, and normally we set these variables at deployment a ci/cd pipeline with the git commit hash, or the version of the application. This is a good practice for production applications. 

Try setting the following environment variables in our .env file and see how they show up in Application Insights. (You'll need restart the notebook kernel for the changes to take effect)

```bash
OTEL_RESOURCE_ATTRIBUTES="service.name=msows-application,service.version=1.0.0,service.instance.id=1234567890"
```

you can see the entire list of environment variables that opentelemetry collects by itself [here](https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/).

They also allow to to set sample rates for the different telemetry types. This is useful if you want to limit the amount of data that is sent to Application Insights. For example, if you only want to send 10% of the traces.


#### Tracers and Spans

Now that we've set up the basic auto instrumentation, we can start monitoring specific functions or processes within our application. This can be done by using spans. they allow us to measure the duration, track the flow of execution, and capture metadata

When using spans we have a few different options to enrich the logs, setting the status or marking events.

```python
with tracer.start_as_current_span("Query Example") as span:
    span.set_attribute("Query", "Select * from users where userId = %1") # adding a custom dimension to the log entry
    span.set_status(trace.StatusCode.OK) # setting the status to OK
    span.add_event("query committed", {"Table": "users"}) # logging a separate event in the span linked via the parent traceID
```


Here is a mock example of spans, Here we aren't really measuring anything but think about external apis, database calls or any other process that we want to measure. 


In [9]:
with tracer.start_as_current_span("Query loop Example") as span:
    for i in range(0,10):
        with tracer.start_as_current_span("inner loop") as span:
            span.set_attribute("Query", "Select * from users where userId = %1")
            time.sleep(random.uniform(0.1, 0.01)) 
            with tracer.start_as_current_span("Query Execution") as child:
                try:
                    child.set_attribute("Query", "Select * from users where userId = %1")
                    time.sleep(random.uniform(1, 0.1)) 
                    result = [1, 2, 3]  # Simulate a database query result
                except Exception as e:
                    time.sleep(random.uniform(0.2, 0.1))
                    child.set_status(trace.StatusCode.ERROR, "Query Failure")
                    child.record_exception(e)
                child.set_status(trace.StatusCode.OK)
                child.set_attribute("Results",  len(result))

Let's try to implement logging in the below example, it's just a small piece of code that inserts data into a database, and then makes a query to it.

Try adding monitoring that does the follow:
* How long does it take to insert the data into the database?
* How long does it take to query the data from the database?
* How long does it take to generate all the user data?
* How many results are returned from the query?
* The status of the query (success or failure)
* Did any exceptions occur during the process?

In [None]:
import sqlite3
from faker import Faker
import os

class sql:
    def ExecuteQuery(self, query, parameters = ()) -> None:
        if not os.path.isfile("sqlite3.db"):
            self.SetupSQLiteDB()
        con = sqlite3.connect("sqlite3.db")
        cur = con.cursor()
        try:
            resultset = cur.execute(query, parameters)
            result = resultset.fetchall()
        except Exception as e:
            print(e)
        finally:
            con.close()
        return result

    def SetupSQLiteDB(self) -> None:
        with self.tracer.start_as_current_span("SetupSQLiteDB") as span:
            con = sqlite3.connect("sqlite3.db")
            cur = con.cursor()
            cur.execute(
                "CREATE TABLE users(unique id ,name, birthday, current_location, email, phone)"
            )
            con.commit()
            con.close()

    def GenerateUserData(self, count=200) -> None:
        if not os.path.isfile("sqlite3.db"):
            self.SetupSQLiteDB()
        for i in range(100):
            try:
                fake = Faker()
                id = i
                name = fake.name()
                birthday = fake.date_of_birth(minimum_age=18, maximum_age=90)
                current_location = fake.city()
                email = fake.email()
                phone = fake.phone_number()
                con = sqlite3.connect("sqlite3.db")
                cur = con.cursor()
                cur.execute(
                    "INSERT INTO users VALUES (?, ?, ?, ?, ?, ?)",
                    (id, name, birthday, current_location, email, phone),
                )
                con.commit()
            except Exception as e:
                print(e)
            finally:
                con.close()
                    
sql = sql()
sql.ExecuteQuery("SELECT * FROM users where name like ?", ('%adam%',))


<details>
  <summary>Solution idea</summary>

```python
import sqlite3
from faker import Faker
from opentelemetry import trace
import os
import time
import random


class sql:
    tracer = trace.get_tracer(__name__)

    def ExecuteQuery(self, query, parameters = ()) -> None:
        with self.tracer.start_as_current_span("ExecuteQuery") as span:
            if not os.path.isfile("sqlite3.db"):
                self.SetupSQLiteDB()
            span.set_attribute("Query", query)
            with self.tracer.start_as_current_span("Query Execution") as child:
                try:
                    con = sqlite3.connect("sqlite3.db")
                    cur = con.cursor()
                    resultset = cur.execute(query, parameters)
                    result = resultset.fetchall()
                except Exception as e:
                    span.set_status(trace.StatusCode.ERROR, "Query Failure")
                    span.record_exception(e)
                    raise e
                finally:
                    con.close()
                span.set_status(trace.StatusCode.OK)
                span.add_event("Query results", len(result))
        
        return result

    def SetupSQLiteDB(self) -> None:
        with self.tracer.start_as_current_span("SetupSQLiteDB") as span:
            try:
                con = sqlite3.connect("sqlite3.db")
                cur = con.cursor()
                cur.execute(
                    "CREATE TABLE users(unique id ,name, birthday, current_location, email, phone)"
                )
                con.commit()
                span.add_event("committed", {"Table": "users"})
                span.set_status(trace.StatusCode.OK)
            except Exception as e:
                span.set_status(trace.StatusCode.ERROR, "Error inserting user data")
                span.record_exception(e)
            finally:
                con.close()

    def GenerateUserData(self, count=200) -> None:
        with self.tracer.start_as_current_span(
            "GenerateUserData"
        ) as span:
            if not os.path.isfile("sqlite3.db"):
                self.SetupSQLiteDB()
            span.set_attribute("count", count)
            for i in range(100):
                with self.tracer.start_as_current_span("InsertUser") as span:
                    try:
                        fake = Faker()
                        id = i
                        name = fake.name()
                        birthday = fake.date_of_birth(minimum_age=18, maximum_age=90)
                        current_location = fake.city()
                        email = fake.email()
                        phone = fake.phone_number()
                        con = sqlite3.connect("sqlite3.db")
                        cur = con.cursor()
                        cur.execute(
                            "INSERT INTO users VALUES (?, ?, ?, ?, ?, ?)",
                            (id, name, birthday, current_location, email, phone),
                        )
                        con.commit()
                        span.set_status(trace.StatusCode.OK)
                    except Exception as e:
                        span.set_status(trace.StatusCode.ERROR, "Error inserting user data")
                        span.record_exception(e)
                    finally:
                        con.close()
                    
sql = sql()
sql.ExecuteQuery("SELECT * FROM users where id = ?", (1,))
```
</details>

When we make external to other services these will also be auto instrumented. for example. if we connect to a storage account using the azure storage sdk.

```python
from azure.storage.blob import BlobServiceClient

tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span(name="BlobServiceClient")
    client = BlobServiceClient.from_connection_string('connectionstring')
    client.create_container('data')  # Call will be traced
```


# Instrumenting a FastAPI application

In the 03 folder there is a simple FastAPI application that we can use to test the OpenTelemetry SDK. The application is a simple CRUD application that allows us to create, loan, and return books. it's a simple library system, It was 'vibe' coded in a few minutes using github copilot, The application uses SQLite as the database, the application is a simple REST API that allows us to interact with the database.

The task is:

* Add OpenTelemetry SDK to the application and instrument it to collect telemetry data.
* Add logging to the application and send the logs to Application Insights.
* Add spans when applicable to the application and send the information to Application Insights.
* setup the opentelemetry-instrumentation-sqlite3 library to instrument the sqlite3.


Look at the next section for examples on what to collect, and reflect if there's anything else you would also like to collect.

#### Different use/cases


Alright now that we have a basic understanding of logging, and telemetry, let\s think about a few different use-cases, and what we might want to monitor. This is a good exercise to do before we start implementing telemetry in our application. We want to make sure that we are collecting the right data, both for debugging, operations and for business purposes.


Let's think about a few different workloads and use-cases.

#### Web application

<details>
  <summary>Logs</summary>

* Who is making requests to the application?
* What requests are being made to the application?
* What errors are being returned by the application?
* What is the response time of the application?
* What is the status code of the application?
* What is the size of the response?
* What is the user agent of the request?
</details>
<details>
  <summary>Traces</summary>

* How long did it take to process the request?
* How long did it take to execute each function in the request?
* How long did it take to execute each SQL query in the request?
* How long did it take to execute each API call in the request?
* How long did it take to execute each external service call in the request, and which dependencies do we have ?
</details>
<details>
<summary>Metrics</summary>

* How much RAM and CPU did we utilize?
* How much IO did we use?
* what is the number of requests per second?
* What is the number of errors per second?
  
</details>

#### Data Crawler

<details>
  <summary>Logs</summary>

* How many records were crawled, and how long did it take to crawl them?
* How many records were rejected, and why were they rejected?
* How many records were loaded into the target system, and how long did it take to load them?
* Did we have any errors or exceptions?
* If we used an identity, what identity did each step use?
* Are we being throttled, or rate limited by the target system(s)?
</details>
<details>
  <summary>Traces</summary>

* How long did it take to crawl the each record?
* How are we spending the time, eg get next record, fetch payload, parse payload, process it, and store it?
</details>
<details>
  <summary>Metrics</summary>

* How much RAM and CPU did we utilize?
* How much IO did we use?
* How many records were crawled per second?
* How many records were rejected per second?
* How many records were loaded into the target system per second?


  **note** Some of these metrics can maybe be used to calculated using the logs, but that only gives us on average and not if we have spikes, and drops
</details>

#### Data processor

<details>
  <summary>Logs</summary>

  * What records were processed, and how long did it take to process them?
  * How many records were rejected, and why were they rejected?
  * Meta data about the records, eg size, type, format, etc.
</details>
<details>
  <summary>Traces</summary>

  * How long did it take to process each record?
  * How are we spending the time, Waiting for openai, local cpu/gpu processing.
</details>
<details>
  <summary>Metrics</summary>

  * How much RAM and CPU did we utilize?
  * How many records were processed per second?
  * Whats is our IO usage?
</details>

#### ETL pipeline
<details>
  <summary>Logs</summary>

* What pipeline is it?
* Where is it running?
* What started the pipeline? Manually, Scheduler, or a trigger? and if a trigger, what trigger?
* If we used an identity, what identity did each step use?
* How many records were processed, and how long did it take to process them?
* How many records were rejected, and why were they rejected?
* How many records were loaded into the target system, and how long did it take to load them?
* Did we have any errors or exceptions
  
</details>
<details>
  <summary>Traces</summary>
   
* Data extraction, How long did it take to extract the data from the source system(s), and how how did we spend the time?
* Data transformation, How long did it take to transform the data? how are we spending the time? eg what functions were called, and how long did they take?
* Data loading. How long did it take to load the data into the target system?

</details>
<details>
  <summary>Metrics</summary>

* How much RAM and CPU did we utilize?
* How much IO did we use?
* How many records were processed per second?
* How many records were rejected per second?
* How many records were loaded into the target system per second? 

**note** Some of these metrics can maybe be used to calculated using the logs, but that only gives us on average and not if we have spikes, and drops

</details>

### Next steps.

Let's take a look at a sample application and instrument it with open telemetry, and add some logging and tracing to it.

