-
Notifications
You must be signed in to change notification settings - Fork 3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
79 changed files
with
3,980 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
# Semantic Memory Service | ||
|
||
Semantic Memory service allows to index/store and query your data using natural language. | ||
|
||
The solution is divided in two main areas: **Encoding** and **Retrieval**. | ||
|
||
# Encoding | ||
|
||
The encoding phase allows to ingest data and index it, using Embeddings and LLMs. | ||
|
||
Documents are encoded using one or more "data pipelines" where consecutive | ||
"handlers" take the input and process it, turning raw data into memories. | ||
|
||
Pipelines can be customized, and they typically consist of: | ||
|
||
* **storage**: store a copy of the document (if necessary, copies can be deleted | ||
after processing). | ||
* text **extraction**: extract text from documents, presentations, etc. | ||
* text **partitioning**: chunk the text in small blocks. | ||
* text **indexing**: calculate embedding for each text block, store the embedding | ||
with a reference to the original document. | ||
|
||
## Runtime mode | ||
|
||
Encoding can run **in process**, e.g. running all the handlers synchronously, | ||
in real time, as soon as some content is loaded/uploaded. | ||
In this case the upload process and handlers must be written in the same | ||
language, e.g. C#. | ||
|
||
Encoding can also run **as a distributed service**, deployed locally or in | ||
the cloud. This mode provides some important benefits: | ||
|
||
* **Handlers can be written in different languages**, e.g. extract | ||
data using Python libraries, index using C#, etc. This can be useful when | ||
working with file types supported better by specific libraries available | ||
only in some programming language like Python. | ||
* Content ingestion can be started using a **web service**. The repository | ||
contains a web service ready to use in C# and Python (work in progress). | ||
The web service can also be used by Copilot Chat, to store data and | ||
search for answers. | ||
* Content processing runs **asynchronously** in the background, allowing | ||
to process several files in parallel, with support for retry logic. | ||
|
||
# Retrieval | ||
|
||
Memories can be retrieved using natural language queries. The service | ||
supports also RAG, generating answers using prompts, relevant memories, | ||
and plugins. | ||
|
||
Similar to the encoding process, retrieval is available as a library and | ||
as a web service. | ||
|
||
# Repository structure | ||
|
||
* handlers: set of reusable handlers for typical data pipelines. You can use | ||
these in production, or use them as a starting point for your custom business | ||
logic. | ||
* lib-dotnet: reusable libraries for C# webservices and handlers. | ||
* lib-python: reusable libraries for python webservices and handlers. | ||
* samples: samples showing how to upload files, how to use encoder/retrieval, etc. | ||
* tools: command line tools, e.g. scripts to start RabbitMQ locally. | ||
* webservice-dotnet: C# web service to upload documents and search memories. | ||
* webservice-python: python web service to upload documents and search memories. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
|
||
Microsoft Visual Studio Solution File, Format Version 12.00 | ||
|
||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "handlers", "handlers", "{CD8D1906-11F0-4D16-955F-5D1C7D7EB128}" | ||
EndProject | ||
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "TextExtractionHandler", "handlers\extract-text-dotnet\TextExtractionHandler.csproj", "{9A23325F-FCF5-4BF7-9413-D325A779AE42}" | ||
EndProject | ||
|
||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "lib", "lib", "{8E36E21B-00CC-4B85-98FB-6CB2CA5F978C}" | ||
EndProject | ||
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Configuration", "lib-dotnet\Configuration\Configuration.csproj", "{DB5CA047-1184-4878-B1FC-0C17C9FAE7AA}" | ||
EndProject | ||
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "DataFormats.Office", "lib-dotnet\DataFormats.Office\DataFormats.Office.csproj", "{174E9AA0-E379-4B5C-8125-54CD4EE57ADF}" | ||
EndProject | ||
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "DataFormats.Pdf", "lib-dotnet\DataFormats.Pdf\DataFormats.Pdf.csproj", "{E383AB2F-CFCD-4EAF-A612-85E7FC85667F}" | ||
EndProject | ||
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Diagnostics", "lib-dotnet\Diagnostics\Diagnostics.csproj", "{60130AA3-B980-4DF6-8BFF-6F0CF09AD629}" | ||
EndProject | ||
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Storage", "lib-dotnet\Storage\Storage.csproj", "{E7906A61-09F5-44D4-ACDC-1486E72B4775}" | ||
EndProject | ||
|
||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "samples", "samples", "{6335B02C-9964-4B39-9795-C5F5F0392515}" | ||
ProjectSection(SolutionItems) = preProject | ||
samples\upload-one-file.sh = samples\upload-one-file.sh | ||
EndProjectSection | ||
EndProject | ||
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "UploadSomeFiles", "samples\UploadSomeFiles\UploadSomeFiles.csproj", "{ACD9F6E9-0B7E-4B69-9468-28C42657AB10}" | ||
EndProject | ||
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "InProcessFileImportSample", "samples\InProcessFileImportSample\InProcessFileImportSample.csproj", "{B166120E-6040-4B7D-8199-18AE55762F59}" | ||
EndProject | ||
|
||
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "WebService", "webservice-dotnet\WebService.csproj", "{57663B9D-08FD-49F8-B1D3-20F2CA28A39C}" | ||
EndProject | ||
|
||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "tools", "tools", "{E097BACD-C329-4FF6-A4AC-260F7BA143FF}" | ||
ProjectSection(SolutionItems) = preProject | ||
tools\run-rabbitmq.sh = tools\run-rabbitmq.sh | ||
EndProjectSection | ||
EndProject | ||
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "root", "root", "{6EF76FD8-4C35-4370-8539-5DDF45357A50}" | ||
ProjectSection(SolutionItems) = preProject | ||
README.md = README.md | ||
EndProjectSection | ||
EndProject | ||
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "WordDocExtract", "samples\WordDocExtract\WordDocExtract.csproj", "{A7078032-3928-476D-B3B9-E35A597D31E7}" | ||
EndProject | ||
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "PdfDocExtract", "samples\PdfDocExtract\PdfDocExtract.csproj", "{BF98AFF3-8997-400B-A1DB-E7146A118DB6}" | ||
EndProject | ||
Global | ||
GlobalSection(SolutionConfigurationPlatforms) = preSolution | ||
Debug|Any CPU = Debug|Any CPU | ||
Release|Any CPU = Release|Any CPU | ||
EndGlobalSection | ||
GlobalSection(NestedProjects) = preSolution | ||
{E7906A61-09F5-44D4-ACDC-1486E72B4775} = {8E36E21B-00CC-4B85-98FB-6CB2CA5F978C} | ||
{60130AA3-B980-4DF6-8BFF-6F0CF09AD629} = {8E36E21B-00CC-4B85-98FB-6CB2CA5F978C} | ||
{DB5CA047-1184-4878-B1FC-0C17C9FAE7AA} = {8E36E21B-00CC-4B85-98FB-6CB2CA5F978C} | ||
{ACD9F6E9-0B7E-4B69-9468-28C42657AB10} = {6335B02C-9964-4B39-9795-C5F5F0392515} | ||
{B166120E-6040-4B7D-8199-18AE55762F59} = {6335B02C-9964-4B39-9795-C5F5F0392515} | ||
{9A23325F-FCF5-4BF7-9413-D325A779AE42} = {CD8D1906-11F0-4D16-955F-5D1C7D7EB128} | ||
{174E9AA0-E379-4B5C-8125-54CD4EE57ADF} = {8E36E21B-00CC-4B85-98FB-6CB2CA5F978C} | ||
{E383AB2F-CFCD-4EAF-A612-85E7FC85667F} = {8E36E21B-00CC-4B85-98FB-6CB2CA5F978C} | ||
{A7078032-3928-476D-B3B9-E35A597D31E7} = {6335B02C-9964-4B39-9795-C5F5F0392515} | ||
{BF98AFF3-8997-400B-A1DB-E7146A118DB6} = {6335B02C-9964-4B39-9795-C5F5F0392515} | ||
EndGlobalSection | ||
GlobalSection(ProjectConfigurationPlatforms) = postSolution | ||
{E7906A61-09F5-44D4-ACDC-1486E72B4775}.Debug|Any CPU.ActiveCfg = Debug|Any CPU | ||
{E7906A61-09F5-44D4-ACDC-1486E72B4775}.Debug|Any CPU.Build.0 = Debug|Any CPU | ||
{E7906A61-09F5-44D4-ACDC-1486E72B4775}.Release|Any CPU.ActiveCfg = Release|Any CPU | ||
{E7906A61-09F5-44D4-ACDC-1486E72B4775}.Release|Any CPU.Build.0 = Release|Any CPU | ||
{60130AA3-B980-4DF6-8BFF-6F0CF09AD629}.Debug|Any CPU.ActiveCfg = Debug|Any CPU | ||
{60130AA3-B980-4DF6-8BFF-6F0CF09AD629}.Debug|Any CPU.Build.0 = Debug|Any CPU | ||
{60130AA3-B980-4DF6-8BFF-6F0CF09AD629}.Release|Any CPU.ActiveCfg = Release|Any CPU | ||
{60130AA3-B980-4DF6-8BFF-6F0CF09AD629}.Release|Any CPU.Build.0 = Release|Any CPU | ||
{DB5CA047-1184-4878-B1FC-0C17C9FAE7AA}.Debug|Any CPU.ActiveCfg = Debug|Any CPU | ||
{DB5CA047-1184-4878-B1FC-0C17C9FAE7AA}.Debug|Any CPU.Build.0 = Debug|Any CPU | ||
{DB5CA047-1184-4878-B1FC-0C17C9FAE7AA}.Release|Any CPU.ActiveCfg = Release|Any CPU | ||
{DB5CA047-1184-4878-B1FC-0C17C9FAE7AA}.Release|Any CPU.Build.0 = Release|Any CPU | ||
{ACD9F6E9-0B7E-4B69-9468-28C42657AB10}.Debug|Any CPU.ActiveCfg = Debug|Any CPU | ||
{ACD9F6E9-0B7E-4B69-9468-28C42657AB10}.Debug|Any CPU.Build.0 = Debug|Any CPU | ||
{ACD9F6E9-0B7E-4B69-9468-28C42657AB10}.Release|Any CPU.ActiveCfg = Release|Any CPU | ||
{ACD9F6E9-0B7E-4B69-9468-28C42657AB10}.Release|Any CPU.Build.0 = Release|Any CPU | ||
{B166120E-6040-4B7D-8199-18AE55762F59}.Debug|Any CPU.ActiveCfg = Debug|Any CPU | ||
{B166120E-6040-4B7D-8199-18AE55762F59}.Debug|Any CPU.Build.0 = Debug|Any CPU | ||
{B166120E-6040-4B7D-8199-18AE55762F59}.Release|Any CPU.ActiveCfg = Release|Any CPU | ||
{B166120E-6040-4B7D-8199-18AE55762F59}.Release|Any CPU.Build.0 = Release|Any CPU | ||
{9A23325F-FCF5-4BF7-9413-D325A779AE42}.Debug|Any CPU.ActiveCfg = Debug|Any CPU | ||
{9A23325F-FCF5-4BF7-9413-D325A779AE42}.Debug|Any CPU.Build.0 = Debug|Any CPU | ||
{9A23325F-FCF5-4BF7-9413-D325A779AE42}.Release|Any CPU.ActiveCfg = Release|Any CPU | ||
{9A23325F-FCF5-4BF7-9413-D325A779AE42}.Release|Any CPU.Build.0 = Release|Any CPU | ||
{174E9AA0-E379-4B5C-8125-54CD4EE57ADF}.Debug|Any CPU.ActiveCfg = Debug|Any CPU | ||
{174E9AA0-E379-4B5C-8125-54CD4EE57ADF}.Debug|Any CPU.Build.0 = Debug|Any CPU | ||
{174E9AA0-E379-4B5C-8125-54CD4EE57ADF}.Release|Any CPU.ActiveCfg = Release|Any CPU | ||
{174E9AA0-E379-4B5C-8125-54CD4EE57ADF}.Release|Any CPU.Build.0 = Release|Any CPU | ||
{E383AB2F-CFCD-4EAF-A612-85E7FC85667F}.Debug|Any CPU.ActiveCfg = Debug|Any CPU | ||
{E383AB2F-CFCD-4EAF-A612-85E7FC85667F}.Debug|Any CPU.Build.0 = Debug|Any CPU | ||
{E383AB2F-CFCD-4EAF-A612-85E7FC85667F}.Release|Any CPU.ActiveCfg = Release|Any CPU | ||
{E383AB2F-CFCD-4EAF-A612-85E7FC85667F}.Release|Any CPU.Build.0 = Release|Any CPU | ||
{57663B9D-08FD-49F8-B1D3-20F2CA28A39C}.Debug|Any CPU.ActiveCfg = Debug|Any CPU | ||
{57663B9D-08FD-49F8-B1D3-20F2CA28A39C}.Debug|Any CPU.Build.0 = Debug|Any CPU | ||
{57663B9D-08FD-49F8-B1D3-20F2CA28A39C}.Release|Any CPU.ActiveCfg = Release|Any CPU | ||
{57663B9D-08FD-49F8-B1D3-20F2CA28A39C}.Release|Any CPU.Build.0 = Release|Any CPU | ||
{A7078032-3928-476D-B3B9-E35A597D31E7}.Debug|Any CPU.ActiveCfg = Debug|Any CPU | ||
{A7078032-3928-476D-B3B9-E35A597D31E7}.Debug|Any CPU.Build.0 = Debug|Any CPU | ||
{A7078032-3928-476D-B3B9-E35A597D31E7}.Release|Any CPU.ActiveCfg = Release|Any CPU | ||
{A7078032-3928-476D-B3B9-E35A597D31E7}.Release|Any CPU.Build.0 = Release|Any CPU | ||
{BF98AFF3-8997-400B-A1DB-E7146A118DB6}.Debug|Any CPU.ActiveCfg = Debug|Any CPU | ||
{BF98AFF3-8997-400B-A1DB-E7146A118DB6}.Debug|Any CPU.Build.0 = Debug|Any CPU | ||
{BF98AFF3-8997-400B-A1DB-E7146A118DB6}.Release|Any CPU.ActiveCfg = Release|Any CPU | ||
{BF98AFF3-8997-400B-A1DB-E7146A118DB6}.Release|Any CPU.Build.0 = Release|Any CPU | ||
EndGlobalSection | ||
EndGlobal |
Oops, something went wrong.