Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document how to create completions using full notebook content #777

Merged

Conversation

krassowski
Copy link
Member

Closes #708

@krassowski krassowski added the documentation Improvements or additions to documentation label May 8, 2024
@krassowski krassowski marked this pull request as draft May 8, 2024 10:26
prefix = request.prefix
suffix = request.suffix.strip()

server_ydoc = self.settings.get("jupyter_server_ydoc", None)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the provider does not have access to settings. We would need to expose the jupyter_server_ydoc either:

  • on provider instances as an optional attribute,
  • as an argument to generate_inline_completions and stream_inline_completions

if this is a use-case we want to support. Any thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, jupyter-ai could take the responsibility of extracting the document (YNotebook) and pass it to generate_inline_completions/stream_inline_completions. This way the providers would be restricted to accessing collaborative model of the document they were invoked on (rather than of any document user has open).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts @dlqqq?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking back on this one :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both solutions are fine and either would be accepted, that would be helpful to know too - more than radio silence. I just do not want to work on a PR to learn that the maintainers would not accept it anyways.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts as I started an implementation (see the new commits below):

  • should we call this server_settings to avoid confusion?
  • should it be a class attribute? It seems easier from implementation perspective because then we can just set it once on BaseProvider level and then we do not need to update it ever; otherwise we have 4 places where providers get initialized so it would seem very repetitive
  • using MappingProxyType over Dict would narrow down the public API
  • ideally though I think we may want to filter which settings get exposed; this is not trivial - we cannot filter it at initialization time because extensions may load in different order so jupyter_server_ydoc may not yet be in the dictionary; instead it would need to be filtered on access (e.g. in a custom __getitem__(self, key: str) -> Any

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krassowski Hey Mike, I just got back from vacation. Let me take a look at this either this afternoon or tomorrow morning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope you had a good vacation!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krassowski Got sick yesterday but feeling better.

  • I agree with using the server_settings attribute name.
  • A class attribute is fine since this attribute should only be set by jupyter_ai once per Python process; I agree that using an instance attribute can be tedious because of how many places BaseProvider is being instantiated in.

using MappingProxyType over Dict would narrow down the public API

Does this just serve to hint that this dictionary is read-only? I'm not familiar with when this type should be used.

ideally though I think we may want to filter which settings get exposed; this is not trivial

I think it makes more sense to include the entire server_settings dictionary. I don't see an obvious benefit to only allow-listing some set of keys to BaseProvider. Is this motivated by a data privacy/security concern?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this just serve to hint that this dictionary is read-only?

MappingProxyType is in fact read-only. Assigning to it would raise:

TypeError: 'mappingproxy' object does not support item assignment

Is this motivated by a data privacy/security concern?

More that once we add things to API then we should not remove them without making a major release.

@krassowski krassowski force-pushed the document-full-notebook-extraction branch from 09eb901 to ecb6616 Compare May 26, 2024 17:02
@krassowski krassowski marked this pull request as ready for review May 26, 2024 17:14
@dlqqq dlqqq force-pushed the document-full-notebook-extraction branch from ecb6616 to 3ce9931 Compare June 5, 2024 16:46
@@ -265,6 +266,13 @@ class Config:
provider is selected.
"""

server_settings: ClassVar[Optional[MappingProxyType[str, Any]]] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check if there's a way to raise an exception when this class attr more than once? This is not a PR blocker, but it would be a good sanity check to include for developers that are accidentally assigning to this class attr again in their own server extension.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this would be nice to have. It would be trivial for an instance attribute with a property descriptor. I think it is feasible for a class instance too by using a metaclass. Taking a look

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will merge this PR and open a separate follow-up PR to improve this.

Copy link
Collaborator

@dlqqq dlqqq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, thank you! Left a minor (non-blocking) comment above. 👍

I didn't test the code sample provided in the docs, so please be sure to test that before merging.

@dlqqq dlqqq force-pushed the document-full-notebook-extraction branch from 3ce9931 to ae23db8 Compare June 5, 2024 22:06
Depending on the order in which extensions are initialized
`jupyter_server_ydoc` may or may not be there; instead
we may want to define a custom proxy which would restrict access
on access to `__getitem__()`
@dlqqq dlqqq force-pushed the document-full-notebook-extraction branch from ae23db8 to aa3436f Compare June 6, 2024 15:19
@krassowski krassowski merged commit 6f32acc into jupyterlab:main Jun 10, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants