-
Notifications
You must be signed in to change notification settings - Fork 294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export/Download document support #415
Conversation
@microsoft-github-policy-service agree [company="Microsoft"] |
@microsoft-github-policy-service agree company="Microsoft" |
@coryisakson could you merge the latest changes from main and check che build? I can't build the code, with a bunch or warnings and errors. Thanks |
@dluc the branch is updated and unit tests are passing. |
I tried fixing the conflicts and reviewing but the PR it too big. I think it would really help excluding changes that are unrelated to the new feature, e.g. spacing, string changes (like the mime type in qdrant). E.g. rather than 52 files maybe bring the PR down to 30 files or so. Given the big number of changes to interfaces and new classes it's going to take some time. See also other comment, I would split the changes to IContentStorage out, so we can review the "write" changes first and make this PR easier to manage |
A quick thought about the changes to IContentStorage: we can reuse the current mime detection to know the mime type, without the need of storing it. I understand that storing it would be ideal, but we can do that separately and later. This should allow to make the PR much smaller. Thoughts? |
extensions/AzureAISearch/AzureAISearch/AzureAISearchFiltering.cs
Outdated
Show resolved
Hide resolved
I rebuilt the branch fixing some merge gone wrong, and making a few minor changes to namespaces and names. I see the approach taken introduces a new dependency with the responsibility of checking access and downloading files, which I'm not sure about, in terms of design. I'll try playing with some changes, reorganizing these responsibilities and how memory/storage/orchestrator work together to provide the same functionality. My preference would be about leveraging the orchestrator, not nesting content access into the validation service (which should just validate). I haven't looked at the download part yet, e.g IContent interface, which might actually be more important given it affects the primary API. |
7b35d5a
to
4b18425
Compare
|
||
namespace Microsoft.KernelMemory; | ||
|
||
public sealed class StreamableFileContent : IDisposable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering if we can reuse .NET FileInfo
class and delete this one
## Motivation and Context (Why the change? What's the scenario?) Supporting abstractions for new File Download feature. See also PR #415 ## High level description (Approach, Design) * New version 0.40 * Breaking changes on storage interface * New methods on orchestration interface * New methods on memory interface
Motivation and Context (Why the change? What's the scenario?)
Validating AI answers requires access to the source grounding documents and data. The KM solution enables easy ingestion of grounding documents as well as the ability to remove documents. A file download feature allows consumers access to the grounding source materials and allow them to verify the answers presented by the ASK endpoint.
High level description (Approach, Design)
Since KM is a backend service not meant for multi-user direct access (ie KM security model is based on a single key, like a SQL server or any DB), the endpoint provides direct access to all files, similarly to search allows access to all memory records. For public deployments, a middleware webservice should take care of securing links, e.g. adding and validating signatures and user tokens.