In [8]:
#r "nuget: Azure.AI.OpenAI"
#r "nuget: Azure.Identity"
#r "nuget: Azure"
#r "nuget: Newtonsoft.Json"

using Azure;
using Azure.AI.OpenAI;
using Azure.Identity;
using OpenAI.Embeddings;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
using System.Net.Http;
using System.Collections.ObjectModel;

Create a new Container to store vectors.
We need to declare a vector policy for the container
Then we will create a vector index
and exclude vector index property so it will not get indexed as regular index.

In [None]:
var cstring = "AccountEndpoint=https://localhost:8081/;AccountKey=C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw==";
var client = new CosmosClient(cstring);
var db = client.GetDatabase("StackOverflow");
 
List<Embedding> embeddings = new List<Embedding>()
{
      new Embedding()
      {
          Path = "/bodyvector",
          DataType = VectorDataType.Float32,
          DistanceFunction = DistanceFunction.Cosine,
          Dimensions = 1536,
      }
};
var collection = new Collection<Embedding>(embeddings);

ContainerProperties props = new ContainerProperties("VectorPosts", "/OwnerUserId"){    
    VectorEmbeddingPolicy = new (collection),
    IndexingPolicy = new IndexingPolicy(){
        VectorIndexes = new Collection<VectorIndexPath>()
        {
            new VectorIndexPath()
            {
                Path = "/bodyvector",
                Type = VectorIndexType.QuantizedFlat
            }
        }
    }
};
props.IndexingPolicy.IncludedPaths.Add(new IncludedPath()
{
    Path = "/*"
});
props.IndexingPolicy.ExcludedPaths.Add(new ExcludedPath()
{
    Path = "/bodyvector/?"
});

var postContainer = await db.CreateContainerIfNotExistsAsync(props, throughput: 4000);

In [9]:
public class Post    
{
    public string id { get; set; }
    public int PostId { get; set; }
    public string PostBody { get; set; }
    public string Title { get; set; }
    public int ViewCount { get; set; }
    public int AnswerCount { get; set; }
    public int CommentCount { get; set; }
    public int FavoriteCount { get; set; }
    public int AcceptedAnswerId { get; set; }
    public DateTime? CreatedOn { get; set; }
    public DateTime? ClosedDate { get; set; }
    public int OwnerUserId { get; set; }
    public string OwnerDisplayName { get; set; }
    public string PostType { get; set; }
    public int Score { get; set; }
    public string Tags { get; set; }
    public float[] bodyvector {get;set;}
    public string score {get;set;}
}

Declare the client to access Azure AI.
We will use the deployment name "embedding"

In [13]:
var openAIClient = new AzureOpenAIClient(
    new Uri("https://savranweb.openai.azure.com/"),
    new AzureKeyCredential("3Qg3P8Fw37qaN3ZtG2JJapZZVjBFTKvOIDaIYXoL5a5fduu3wz2XJQQJ99BDACYeBjFXJ3w3AAABACOGPUlQ"));
var aiclient = openAIClient.GetEmbeddingClient("embedding");

In [14]:
ReadOnlyMemory<float> GenerateVector(string text)
{    
    OpenAIEmbedding newembedding = aiclient.GenerateEmbedding(text);
    return newembedding.ToFloats();
}

In [None]:
Console.WriteLine(string.Join(",",GenerateVector("This is a test embedding").ToArray()));

In [None]:
var json = await new HttpClient().GetStringAsync("https://raw.githubusercontent.com/hsavran/Presentations/refs/heads/main/stackoverflow.json");
var postList = JsonConvert.DeserializeObject<List<Post>>(json);
postList.Count.Display();

client = new CosmosClient(cstring, new CosmosClientOptions() {AllowBulkExecution = true});
Container postContainer = client.GetContainer("StackOverflow", "VectorPosts");
foreach (var post in postList.Take(1000))
{
    post.bodyvector = GenerateVector(post.PostBody).ToArray();
    await postContainer.UpsertItemAsync(post, new PartitionKey(post.OwnerUserId));    
}

Make a vector search

In [10]:
#r "nuget: Newtonsoft.Json"
#r "nuget: Microsoft.Azure.Cosmos"
using Microsoft.Azure.Cosmos;

In [11]:
var cstring = "AccountEndpoint=https://localhost:8081/;AccountKey=C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw==";
var client = new CosmosClient(cstring);
var db = client.GetDatabase("StackOverflow");
var vectorContainer = db.GetContainer("VectorPosts");

In [45]:
  //What are the most common SQL Server problem
  float[] embedding = GenerateVector("Find the common database topics?").ToArray();
  var queryDef = new QueryDefinition(
      query: $"SELECT TOP 10 c.PostBody, c.Title, VectorDistance(c.bodyvector,@embedding) AS score FROM c WHERE c.PostType ='Question' ORDER BY VectorDistance(c.bodyvector,@embedding)"
      ).WithParameter("@embedding", embedding);
  FeedIterator<Post> feed = vectorContainer.GetItemQueryIterator<Post>(
      queryDefinition: queryDef
  );
  string results = "";  
  while (feed.HasMoreResults) 
  {
      FeedResponse<Post> response = await feed.ReadNextAsync();      
      foreach (Post item in response)
      {
        results = string.Concat(results, item.PostBody,"\n");
        Console.WriteLine($"Found item:\t{item.score}\t{item.Title}");
      }      
  }
Console.WriteLine(results);

Found item:	0.7777008344870484	Mechanisms for tracking DB schema changes
Found item:	0.7677607888037157	How to export data from SQL Server 2005 to MySQL
Found item:	0.7650628408879497	How big can a MySQL database get before performance starts to degrade
Found item:	0.7635516222422777	Flat file databases
Found item:	0.7620110856707862	Auto Generate Database Diagram MySQL
Found item:	0.7583531478382687	Create a SQLite database based on an XSD Data Set
Found item:	0.7567722506294345	Upgrading SQL Server 6.5
Found item:	0.7556085075533724	Using multiple SQLite databases at once
Found item:	0.7548356251765791	Speed Comparisons - Procedural vs. OO in interpreted languages
Found item:	0.754336419254531	Binary Data in MySQL
<p>What are the best methods for tracking and/or automating DB schema changes?  Our team uses Subversion for version control and we've been able to automate some of our tasks this way (pushing builds up to a staging server, deploying tested code to a production server) but 

Let's call a LLM Model and ask a question about the data

In [2]:
#r "nuget: Microsoft.SemanticKernel"
#r "nuget: Microsoft.SemanticKernel.Connectors.AzureOpenAI"


In [3]:
using Microsoft.Extensions.DependencyInjection;
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Connectors.OpenAI;
using Microsoft.SemanticKernel.ChatCompletion;

In [47]:
var builder = Kernel.CreateBuilder();
builder.AddAzureOpenAIChatCompletion("test4o", "https://savranweb.openai.azure.com/", "3Qg3P8Fw37qaN3ZtG2JJapZZVjBFTKvOIDaIYXoL5a5fduu3wz2XJQQJ99BDACYeBjFXJ3w3AAABACOGPUlQ");
Kernel kernel = builder.Build();
var chatcompservice = kernel.Services.GetRequiredService<IChatCompletionService>();

var chathistory = new ChatHistory();
var executionsettings = new OpenAIPromptExecutionSettings { FunctionChoiceBehavior = FunctionChoiceBehavior.Auto() };
executionsettings.ChatSystemPrompt = "You are a helpful assistant. The given text is the result of a vector search. Summarize the text.";
var texttosummary = "Hasan is an expert in Azure data products and has been recognized as a Microsoft Data Platform MVP. He owns SavranWeb Consulting and holds the Senior Business Intelligence Manager position at Progressive Insurance. His work involves designing advanced business solutions using the latest web and database development technologies. Hasan is a seasoned professional with more than two decades of experience in software as a developer, architect, and manager. He is a global conference speaker and enjoys blogging about SQL Server, Azure Cosmos DB, C#, and front-end development.";


chathistory.AddUserMessage(results);
var result = await chatcompservice.GetChatMessageContentAsync(chathistory, executionsettings, kernel);
result.Content.Display();

The text presents a variety of database-related questions and challenges. 

1. **Tracking and Automating DB Schema Changes**: Seeking methods to automate database updates using Subversion version control, possibly integrating auto-update scripts for efficient multi-server management. Questions about implementing solutions using Subversion post-commit hooks or creating a custom solution arise.

2. **Data Transfer Between SQL Server and MySQL**: Discusses difficulties in converting a SQL Server database to MySQL due to CSV formatting issues and a lack of data type information. Seeks tools to facilitate copying tables between databases.

3. **MySQL Database Performance Concerns**: Examines scalability and performance of a large 15M record MySQL database, questioning if cleaning data is necessary and how factors like database size and record count affect performance.

4. **Flat File Database Structures in PHP**: Explores best practices for creating flat file databases in PHP, avoiding SQL-