Skip to content

Commit

Permalink
Start
Browse files Browse the repository at this point in the history
  • Loading branch information
dluc committed Jul 12, 2023
1 parent 88d67de commit 6e1c134
Show file tree
Hide file tree
Showing 79 changed files with 3,980 additions and 0 deletions.
63 changes: 63 additions & 0 deletions services/semantic-memory/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Semantic Memory Service

Semantic Memory service allows to index/store and query your data using natural language.

The solution is divided in two main areas: **Encoding** and **Retrieval**.

# Encoding

The encoding phase allows to ingest data and index it, using Embeddings and LLMs.

Documents are encoded using one or more "data pipelines" where consecutive
"handlers" take the input and process it, turning raw data into memories.

Pipelines can be customized, and they typically consist of:

* **storage**: store a copy of the document (if necessary, copies can be deleted
after processing).
* text **extraction**: extract text from documents, presentations, etc.
* text **partitioning**: chunk the text in small blocks.
* text **indexing**: calculate embedding for each text block, store the embedding
with a reference to the original document.

## Runtime mode

Encoding can run **in process**, e.g. running all the handlers synchronously,
in real time, as soon as some content is loaded/uploaded.
In this case the upload process and handlers must be written in the same
language, e.g. C#.

Encoding can also run **as a distributed service**, deployed locally or in
the cloud. This mode provides some important benefits:

* **Handlers can be written in different languages**, e.g. extract
data using Python libraries, index using C#, etc. This can be useful when
working with file types supported better by specific libraries available
only in some programming language like Python.
* Content ingestion can be started using a **web service**. The repository
contains a web service ready to use in C# and Python (work in progress).
The web service can also be used by Copilot Chat, to store data and
search for answers.
* Content processing runs **asynchronously** in the background, allowing
to process several files in parallel, with support for retry logic.

# Retrieval

Memories can be retrieved using natural language queries. The service
supports also RAG, generating answers using prompts, relevant memories,
and plugins.

Similar to the encoding process, retrieval is available as a library and
as a web service.

# Repository structure

* handlers: set of reusable handlers for typical data pipelines. You can use
these in production, or use them as a starting point for your custom business
logic.
* lib-dotnet: reusable libraries for C# webservices and handlers.
* lib-python: reusable libraries for python webservices and handlers.
* samples: samples showing how to upload files, how to use encoder/retrieval, etc.
* tools: command line tools, e.g. scripts to start RabbitMQ locally.
* webservice-dotnet: C# web service to upload documents and search memories.
* webservice-python: python web service to upload documents and search memories.
112 changes: 112 additions & 0 deletions services/semantic-memory/SemanticMemory.sln
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@

Microsoft Visual Studio Solution File, Format Version 12.00

Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "handlers", "handlers", "{CD8D1906-11F0-4D16-955F-5D1C7D7EB128}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "TextExtractionHandler", "handlers\extract-text-dotnet\TextExtractionHandler.csproj", "{9A23325F-FCF5-4BF7-9413-D325A779AE42}"
EndProject

Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "lib", "lib", "{8E36E21B-00CC-4B85-98FB-6CB2CA5F978C}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Configuration", "lib-dotnet\Configuration\Configuration.csproj", "{DB5CA047-1184-4878-B1FC-0C17C9FAE7AA}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "DataFormats.Office", "lib-dotnet\DataFormats.Office\DataFormats.Office.csproj", "{174E9AA0-E379-4B5C-8125-54CD4EE57ADF}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "DataFormats.Pdf", "lib-dotnet\DataFormats.Pdf\DataFormats.Pdf.csproj", "{E383AB2F-CFCD-4EAF-A612-85E7FC85667F}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Diagnostics", "lib-dotnet\Diagnostics\Diagnostics.csproj", "{60130AA3-B980-4DF6-8BFF-6F0CF09AD629}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Storage", "lib-dotnet\Storage\Storage.csproj", "{E7906A61-09F5-44D4-ACDC-1486E72B4775}"
EndProject

Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "samples", "samples", "{6335B02C-9964-4B39-9795-C5F5F0392515}"
ProjectSection(SolutionItems) = preProject
samples\upload-one-file.sh = samples\upload-one-file.sh
EndProjectSection
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "UploadSomeFiles", "samples\UploadSomeFiles\UploadSomeFiles.csproj", "{ACD9F6E9-0B7E-4B69-9468-28C42657AB10}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "InProcessFileImportSample", "samples\InProcessFileImportSample\InProcessFileImportSample.csproj", "{B166120E-6040-4B7D-8199-18AE55762F59}"
EndProject

Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "WebService", "webservice-dotnet\WebService.csproj", "{57663B9D-08FD-49F8-B1D3-20F2CA28A39C}"
EndProject

Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "tools", "tools", "{E097BACD-C329-4FF6-A4AC-260F7BA143FF}"
ProjectSection(SolutionItems) = preProject
tools\run-rabbitmq.sh = tools\run-rabbitmq.sh
EndProjectSection
EndProject
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "root", "root", "{6EF76FD8-4C35-4370-8539-5DDF45357A50}"
ProjectSection(SolutionItems) = preProject
README.md = README.md
EndProjectSection
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "WordDocExtract", "samples\WordDocExtract\WordDocExtract.csproj", "{A7078032-3928-476D-B3B9-E35A597D31E7}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "PdfDocExtract", "samples\PdfDocExtract\PdfDocExtract.csproj", "{BF98AFF3-8997-400B-A1DB-E7146A118DB6}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Release|Any CPU = Release|Any CPU
EndGlobalSection
GlobalSection(NestedProjects) = preSolution
{E7906A61-09F5-44D4-ACDC-1486E72B4775} = {8E36E21B-00CC-4B85-98FB-6CB2CA5F978C}
{60130AA3-B980-4DF6-8BFF-6F0CF09AD629} = {8E36E21B-00CC-4B85-98FB-6CB2CA5F978C}
{DB5CA047-1184-4878-B1FC-0C17C9FAE7AA} = {8E36E21B-00CC-4B85-98FB-6CB2CA5F978C}
{ACD9F6E9-0B7E-4B69-9468-28C42657AB10} = {6335B02C-9964-4B39-9795-C5F5F0392515}
{B166120E-6040-4B7D-8199-18AE55762F59} = {6335B02C-9964-4B39-9795-C5F5F0392515}
{9A23325F-FCF5-4BF7-9413-D325A779AE42} = {CD8D1906-11F0-4D16-955F-5D1C7D7EB128}
{174E9AA0-E379-4B5C-8125-54CD4EE57ADF} = {8E36E21B-00CC-4B85-98FB-6CB2CA5F978C}
{E383AB2F-CFCD-4EAF-A612-85E7FC85667F} = {8E36E21B-00CC-4B85-98FB-6CB2CA5F978C}
{A7078032-3928-476D-B3B9-E35A597D31E7} = {6335B02C-9964-4B39-9795-C5F5F0392515}
{BF98AFF3-8997-400B-A1DB-E7146A118DB6} = {6335B02C-9964-4B39-9795-C5F5F0392515}
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{E7906A61-09F5-44D4-ACDC-1486E72B4775}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{E7906A61-09F5-44D4-ACDC-1486E72B4775}.Debug|Any CPU.Build.0 = Debug|Any CPU
{E7906A61-09F5-44D4-ACDC-1486E72B4775}.Release|Any CPU.ActiveCfg = Release|Any CPU
{E7906A61-09F5-44D4-ACDC-1486E72B4775}.Release|Any CPU.Build.0 = Release|Any CPU
{60130AA3-B980-4DF6-8BFF-6F0CF09AD629}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{60130AA3-B980-4DF6-8BFF-6F0CF09AD629}.Debug|Any CPU.Build.0 = Debug|Any CPU
{60130AA3-B980-4DF6-8BFF-6F0CF09AD629}.Release|Any CPU.ActiveCfg = Release|Any CPU
{60130AA3-B980-4DF6-8BFF-6F0CF09AD629}.Release|Any CPU.Build.0 = Release|Any CPU
{DB5CA047-1184-4878-B1FC-0C17C9FAE7AA}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{DB5CA047-1184-4878-B1FC-0C17C9FAE7AA}.Debug|Any CPU.Build.0 = Debug|Any CPU
{DB5CA047-1184-4878-B1FC-0C17C9FAE7AA}.Release|Any CPU.ActiveCfg = Release|Any CPU
{DB5CA047-1184-4878-B1FC-0C17C9FAE7AA}.Release|Any CPU.Build.0 = Release|Any CPU
{ACD9F6E9-0B7E-4B69-9468-28C42657AB10}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{ACD9F6E9-0B7E-4B69-9468-28C42657AB10}.Debug|Any CPU.Build.0 = Debug|Any CPU
{ACD9F6E9-0B7E-4B69-9468-28C42657AB10}.Release|Any CPU.ActiveCfg = Release|Any CPU
{ACD9F6E9-0B7E-4B69-9468-28C42657AB10}.Release|Any CPU.Build.0 = Release|Any CPU
{B166120E-6040-4B7D-8199-18AE55762F59}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{B166120E-6040-4B7D-8199-18AE55762F59}.Debug|Any CPU.Build.0 = Debug|Any CPU
{B166120E-6040-4B7D-8199-18AE55762F59}.Release|Any CPU.ActiveCfg = Release|Any CPU
{B166120E-6040-4B7D-8199-18AE55762F59}.Release|Any CPU.Build.0 = Release|Any CPU
{9A23325F-FCF5-4BF7-9413-D325A779AE42}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{9A23325F-FCF5-4BF7-9413-D325A779AE42}.Debug|Any CPU.Build.0 = Debug|Any CPU
{9A23325F-FCF5-4BF7-9413-D325A779AE42}.Release|Any CPU.ActiveCfg = Release|Any CPU
{9A23325F-FCF5-4BF7-9413-D325A779AE42}.Release|Any CPU.Build.0 = Release|Any CPU
{174E9AA0-E379-4B5C-8125-54CD4EE57ADF}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{174E9AA0-E379-4B5C-8125-54CD4EE57ADF}.Debug|Any CPU.Build.0 = Debug|Any CPU
{174E9AA0-E379-4B5C-8125-54CD4EE57ADF}.Release|Any CPU.ActiveCfg = Release|Any CPU
{174E9AA0-E379-4B5C-8125-54CD4EE57ADF}.Release|Any CPU.Build.0 = Release|Any CPU
{E383AB2F-CFCD-4EAF-A612-85E7FC85667F}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{E383AB2F-CFCD-4EAF-A612-85E7FC85667F}.Debug|Any CPU.Build.0 = Debug|Any CPU
{E383AB2F-CFCD-4EAF-A612-85E7FC85667F}.Release|Any CPU.ActiveCfg = Release|Any CPU
{E383AB2F-CFCD-4EAF-A612-85E7FC85667F}.Release|Any CPU.Build.0 = Release|Any CPU
{57663B9D-08FD-49F8-B1D3-20F2CA28A39C}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{57663B9D-08FD-49F8-B1D3-20F2CA28A39C}.Debug|Any CPU.Build.0 = Debug|Any CPU
{57663B9D-08FD-49F8-B1D3-20F2CA28A39C}.Release|Any CPU.ActiveCfg = Release|Any CPU
{57663B9D-08FD-49F8-B1D3-20F2CA28A39C}.Release|Any CPU.Build.0 = Release|Any CPU
{A7078032-3928-476D-B3B9-E35A597D31E7}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{A7078032-3928-476D-B3B9-E35A597D31E7}.Debug|Any CPU.Build.0 = Debug|Any CPU
{A7078032-3928-476D-B3B9-E35A597D31E7}.Release|Any CPU.ActiveCfg = Release|Any CPU
{A7078032-3928-476D-B3B9-E35A597D31E7}.Release|Any CPU.Build.0 = Release|Any CPU
{BF98AFF3-8997-400B-A1DB-E7146A118DB6}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{BF98AFF3-8997-400B-A1DB-E7146A118DB6}.Debug|Any CPU.Build.0 = Debug|Any CPU
{BF98AFF3-8997-400B-A1DB-E7146A118DB6}.Release|Any CPU.ActiveCfg = Release|Any CPU
{BF98AFF3-8997-400B-A1DB-E7146A118DB6}.Release|Any CPU.Build.0 = Release|Any CPU
EndGlobalSection
EndGlobal
Loading

0 comments on commit 6e1c134

Please sign in to comment.