Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python: Document MaD format #16789

Merged
merged 13 commits into from
Jun 25, 2024
Merged

Conversation

yoff
Copy link
Contributor

@yoff yoff commented Jun 19, 2024

Adds documentation for the Python Models as Data format. The structure is heavily based on the JavaScript documentation. Both because that document is quite nice and because it will be easier for users to navigate if the structure is similar.

In addition, this PR

  • adds a few tests reflecting the documentation
  • make the mentioned sink-kinds have an effect on relevant queries

- add a few tests reflecting the documentation
- make the mentioned sink-kinds have an effect on relevant queries
@yoff yoff requested a review from a team as a code owner June 19, 2024 15:01
@sidshank sidshank added the ready-for-doc-review This PR requires and is ready for review from the GitHub docs team. label Jun 19, 2024
subatoi
subatoi previously approved these changes Jun 20, 2024
Copy link
Contributor

@subatoi subatoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good docs-wise, thank you! Only some really minor comments

Co-authored-by: Ben Ahmady <32935794+subatoi@users.noreply.github.com>
@yoff yoff requested a review from subatoi June 20, 2024 08:58
subatoi
subatoi previously approved these changes Jun 20, 2024
Copy link
Contributor

@subatoi subatoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good stuff! I have made a bunch of comments/suggestions, but overall I think this is really solid. 👍

data:
- ["invoke.Context", "invoke", "Member[context].Member[Context].Instance"]

- The first column, **"invoke.Context"**, is the name of the type to reach.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The first column, **"invoke.Context"**, is the name of the type to reach.
- The first column, **"invoke.Context"**, is the name of the type that we're defining.

I found "type to reach" to be very vague and confusing. Perhaps this is a better formulation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not feel that we are defining this type, though. In cases where we make up a name, say to model some internal concept, I would say we are defining a type. But here, we are simply describing how one can obtain (an instance of) the type. Given that this can be used in various contexts, I actually like that it is a bit vague.

data:
- [
"django.db.models.FileField!",
"Call.Argument[upload_to:].Parameter[1,filenam:]",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Call.Argument[upload_to:].Parameter[1,filenam:]",
"Call.Argument[upload_to:].Parameter[1,filename:]",

How do we know that the argument is named filename, though? This depends on how the user defined the helper function, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but that convention is documented (here, link is also included in our docs). Given the way the documentation is written, I would expect many people to call the parameter 'filename' and I think a good model should recognise that parameter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also slightly confused about this part. I think it makes more sense to only have Parameter[1] -- no matter the naming convention, that should still cover everything we care to model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no matter the naming convention, that should still cover everything we care to model.

That may be (i.e. if people define the call-back with name-only parameters, we do not care about that case). Still, I think the example is more illustrative and usable as a reference the way it is now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That may be (i.e. if people define the call-back with name-only parameters, we do not care about that case).

if they define it with keyword only parameters, it wouldn't work, since the arguments are passed positionally (src from django)

Still, I think the example is more illustrative and usable as a reference the way it is now.

I agree it's pretty cool if we had an example where we could show off Parameter[<int>,<keyword>:]. However, I don't find this to be a convincing example since the filename: part doesn't add any value. I fear that people reading this will get confused and think they ALWAYS NEED to add the name of the parameter, but maybe I'm reading too much into it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if they define it with keyword only parameters, it wouldn't work, since the arguments are passed positionally (src from django)

Hm, in that case it should be removed, thanks for digging that up.



- Since we're adding flow through a function call, we add a tuple to the **summaryModel** extensible predicate.
- The first column, **"builtins"**, begins the search for relevant calls at references to the **builtin** package.
Copy link
Contributor

@tausbn tausbn Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... And also otherwise undefined globals with specific names. Perhaps it would be better to not mention the builtins package explicitly, and just say it finds references to built-in names? (Which includes builtins.whatever.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, in the end it became somewhat different...see if you like.

yoff and others added 4 commits June 24, 2024 10:29
Co-authored-by: Taus <tausbn@github.com>
- fix wording on `builtins`
- add named argument/parameter access path components
@yoff yoff requested a review from tausbn June 24, 2024 08:59
Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM 👍

Minor detail around code snippets. I've made one suggestion, but you will need to apply it throughout the document

Comment on lines 190 to 191
In this example, we'll show how to add flow through calls to `re.compile`.
`re.compile` returns a compiled regular expression for efficient evaluation, but the pattern to be compiled is stored in the `pattern` attribute of the resulting object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While reading through the rendered version, it became clear to me that reStructured text requires double backticks for code snippets (docs).

Suggested change
In this example, we'll show how to add flow through calls to `re.compile`.
`re.compile` returns a compiled regular expression for efficient evaluation, but the pattern to be compiled is stored in the `pattern` attribute of the resulting object.
In this example, we'll show how to add flow through calls to ``re.compile``.
``re.compile`` returns a compiled regular expression for efficient evaluation, but the pattern to be compiled is stored in the ``pattern`` attribute of the resulting object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it should be replaced everywhere now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite what I asked for.

If you look at the rendered version (before, after) the change looks like:

Before

image

After

image

However, both of them still only use single backticks, meaning the code snippets are not rendered properly.

image


The previous version looked better to me, so I would recommend reverting cd7031a and 9d62c21 (the double-qoutes in the examples made them easier to read for me)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite what I asked for.

Indeed, I realised once I started looking at the generated artifact 🤦 I should have commented then, so you did not have to pull it out. Thanks for doing that, though. I will get the double back-tics in, and revert to single quotes in headers to align with the other documentation pages (e.g. the one for javascript)). About the double quotes, I removed them on request, but that was mainly a request for consistency, so perhaps they should be in in a consistent way...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will get the double back-tics in, and revert to single quotes in headers

👍

About the double quotes, I removed them on request, but that was mainly a request for consistency, so perhaps they should be in in a consistent way...

Ah, I see they are present in the JS docs as well, so 🤷

@yoff yoff requested a review from RasmusWL June 24, 2024 10:01
Comment on lines 190 to 191
In this example, we'll show how to add flow through calls to `re.compile`.
`re.compile` returns a compiled regular expression for efficient evaluation, but the pattern to be compiled is stored in the `pattern` attribute of the resulting object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite what I asked for.

If you look at the rendered version (before, after) the change looks like:

Before

image

After

image

However, both of them still only use single backticks, meaning the code snippets are not rendered properly.

image


The previous version looked better to me, so I would recommend reverting cd7031a and 9d62c21 (the double-qoutes in the examples made them easier to read for me)

data:
- [
"django.db.models.FileField!",
"Call.Argument[upload_to:].Parameter[1,filenam:]",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That may be (i.e. if people define the call-back with name-only parameters, we do not care about that case).

if they define it with keyword only parameters, it wouldn't work, since the arguments are passed positionally (src from django)

Still, I think the example is more illustrative and usable as a reference the way it is now.

I agree it's pretty cool if we had an example where we could show off Parameter[<int>,<keyword>:]. However, I don't find this to be a convincing example since the filename: part doesn't add any value. I fear that people reading this will get confused and think they ALWAYS NEED to add the name of the parameter, but maybe I'm reading too much into it.

@yoff yoff force-pushed the python/document-models-as-data branch from 259dbf4 to 6524b8e Compare June 25, 2024 10:12
@yoff
Copy link
Contributor Author

yoff commented Jun 25, 2024

Alright, my quote edits had made a mess. I did a small rebase, things should look reasonable now. I am looking forward to seeing the generated artifacts... :-)

@yoff yoff requested a review from RasmusWL June 25, 2024 10:14
Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 LGTM

Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small typo fix in the change note, otherwise this looks good to me! 👍

---
category: minorAnalysis
---
* A number of Python queries now support sinks defined vi data extensions. The format of data extensions for Python has been documented.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* A number of Python queries now support sinks defined vi data extensions. The format of data extensions for Python has been documented.
* A number of Python queries now support sinks defined using data extensions. The format of data extensions for Python has been documented.

Typo, but also maybe a more plain word here will suffice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I just saw the approval and not this comment..

@yoff yoff merged commit 58b6b3f into github:main Jun 25, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Python ready-for-doc-review This PR requires and is ready for review from the GitHub docs team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants