# Using Azure OpenAI GPT-4 Vision to extract structured JSON data from Image documents

This notebook demonstrates [how to use GPT-4 Vision](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/gpt-with-vision?tabs=rest) to extract structured JSON data from Image documents, using the [Azure OpenAI Service](https://learn.microsoft.com/en-us/azure/ai-services/openai/overview).

This is Modified Code of this Repo, to see more detailed look, please visit [Using Azure OpenAI GPT-4 Vision to extract structured JSON data from PDF documents](https://learn.microsoft.com/en-us/samples/azure-samples/azure-openai-gpt-4-vision-pdf-extraction-sample/using-azure-openai-gpt-4-vision-to-extract-structured-json-data-from-pdf-documents/)

## Pre-requisites

The notebook uses [.NET 8](https://dotnet.microsoft.com/download/dotnet/8.0) to run the C# code that interacts with the Azure OpenAI Service.

### Other Requirements

- Install the latest [**.NET SDK**](https://dotnet.microsoft.com/download).
- Install [**Visual Studio Code**](https://code.visualstudio.com/) with the [**Polyglot Notebooks extension**](https://marketplace.visualstudio.com/items?itemName=ms-dotnettools.dotnet-interactive-vscode).


**Note**: The GPT-4 Vision model is currently in preview and is available in limited capacity (10K per region) in selected regions only. For more information, see the [Azure OpenAI Service documentation](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#gpt-4-and-gpt-4-turbo-preview-model-availability).

## Install .NET dependencies

This notebook uses .NET to interact with the Azure OpenAI Service. It takes advantage of the following NuGet packages:

### DotNetEnv

The [DotNetEnv](https://github.com/tonerdo/dotnet-env) library is used to load environment variables from a `.env` file which can be accessed via the `Environment.GetEnvironmentVariable(string)` method. This library is used to load the Azure OpenAI Service endpoint, key and model deployment name from the [`./config.env`](./config.env) file.

In [34]:
#r "nuget:System.Text.Json, 8.0.1"
#r "nuget:DotNetEnv, 3.0.0"

using System.Net;
using System.Net.Http;
using System.Text.Json.Nodes;
using System.Text.Json;
using System.IO; 

using DotNetEnv;


In [35]:
Env.Load("config.env");

var endpoint = Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT");
var apiKey = Environment.GetEnvironmentVariable("AZURE_OPENAI_API_KEY");
var modelDeployment = Environment.GetEnvironmentVariable("AZURE_OPENAI_VISION_MODEL_DEPLOYMENT_NAME");
var apiVersion = "2023-12-01-preview";


## Use GPT-4-Vision-Preview to extract the data from the image

The GPT-4 Vision model can be used to extract structured JSON data from the image. The following code demonstrates how to use the deployed Azure OpenAI Service directly via the API to extract structured JSON data from the image.

In this example, the payload for the Chat completion endpoint is a JSON object with the following details:

### System Prompt

The system prompt is the instruction to the model that prescribes the model's behavior. They allow you to constrain the model's behavior to a specific task, making it more adaptable for specific use cases, such as extracting structured JSON data from documents.

In this case, it is to extract structured JSON data from the image. Here is what we have provided:

**You are an AI assistant that extracts data from documents and returns them as structured JSON objects.**

Learn more about [system prompts](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/system-message).

### User Prompt

The user prompt is the input to the model that provides context for the model's response. It is the input that the model uses to generate a response. 

In this case, it is the image of the document plus some additional text context to help the model understand the task. Here is what we have provided:

**Extract the data from this invoice. Provide the Company Details, Invoice For, and Invoices**

> **Note:** For the user prompt, we do not need to specify the response as JSON. This is because the system prompt already specifies that the response should be structured JSON data.

This prompt ensures that the model understands the task, and the additional text context provides the model with the necessary information to extract the structured JSON data from the image. This approach would result in a response similar to the following:

```json
{
  "Company Details": {
    "Name": "Contoso",
    "Address": "1 Redmond way Suite 6000 Redmond, WA 99243"
  },
  "Invoice For": {
    "Name": "Microsoft",
    "Address": "1020 Enterprise Way Sunnyvale, CA 87659"
  },
  "Invoices": [
    {
      "Invoice Number": "34278587",
      "Invoice Date": "6/18/2017",
      "Invoice Due Date": "6/24/2017",
      "Charges": "$56,651.49",
      "VAT ID": "PT"
    }
  ]
}
```

In [39]:
// REPLACE THE FILE NAME
var imageFileName = "HP Scan Document_3-3.jpg";
var base64Image = Convert.ToBase64String(File.ReadAllBytes(imageFileName));


JsonObject jsonPayload = new JsonObject
{
    {
        "messages", new JsonArray 
        {
            new JsonObject
            {
                { "role", "system" },
                { "content", "You are an AI assistant that extracts data from documents and returns them as structured JSON" }
            },
            new JsonObject
            {
                { "role", "user" },
                { "content",
                    new JsonArray
                    {
                        new JsonObject
                        {
                            { "type", "text" },
                            { "text", "Extract the data from this invoice. Provide the Borrower Name, CUSIP, Total Loan Amount, and Date." }
                        },
                        new JsonObject
                        {
                            { "type", "image_url" },
                            { "image_url", new JsonObject { { "url", $"data:image/jpeg;base64,{base64String}" } } }
                        }
                    }
                }
            }
        }
    },
    { "model", modelDeployment },
    { "max_tokens", 300 },
    { "temperature", 0.1 },
    { "top_p", 0.1 },
};

string payload = JsonSerializer.Serialize(jsonPayload, new JsonSerializerOptions
{
    WriteIndented = true
});

In [40]:
string visionEndpoint = $"{endpoint}openai/deployments/{modelDeployment}/chat/completions?api-version={apiVersion}";

using (HttpClient httpClient = new HttpClient())
{
    httpClient.BaseAddress = new Uri(visionEndpoint);
    httpClient.DefaultRequestHeaders.Add("api-key", apiKey);
    httpClient.DefaultRequestHeaders.Accept.Add(new System.Net.Http.Headers.MediaTypeWithQualityHeaderValue("application/json"));

    var stringContent = new StringContent(payload, Encoding.UTF8, "application/json");

    var response = await httpClient.PostAsync(visionEndpoint, stringContent);

    if (response.IsSuccessStatusCode)
    {
        using (var responseStream = await response.Content.ReadAsStreamAsync())
        {
            // Parse the JSON response using JsonDocument
            using (var jsonDoc = await JsonDocument.ParseAsync(responseStream))
            {
                // Access the message content dynamically
                JsonElement jsonElement = jsonDoc.RootElement;
                string messageContent = jsonElement.GetProperty("choices")[0].GetProperty("message").GetProperty("content").GetString();

                // Output the message content
                Console.WriteLine($"Output: {messageContent}");
            }
        }
    }
    else
    {
        Console.WriteLine($"Error: {response}");
    }
}

Output: ```json
{
  "Borrower Name": "GFL ENVIRONMENTAL INC",
  "CUSIP": "C7052B4CT",
  "Total Loan Amount": "USD 1,330,669,937.23",
  "Date": "22nd MARCH 2021"
}
```
