Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor path_to #3875

Merged
merged 6 commits into from
Sep 25, 2022
Merged

Refactor path_to #3875

merged 6 commits into from
Sep 25, 2022

Conversation

alanmcruickshank
Copy link
Member

This is another relatively separable part of #3847. It's a useful development alone and can be merged before the rest. That will also simplify the review of the other PR.

This changes the behaviour of BaseSegment.path_to().

Previously it would return a list of segments. The first element of that list would be self and the last would be the segment passed as other. The others would be the segments between.

I noticed that many uses of path_to always cut the last element (i.e. slice [:-1]), and that there are use cases where we want to know more than just the lineage, but also where in the parent each child is.

This changes path_to to instead:

  • ...return a list of PathStep (a dataclass), which combines the segment with a new index which states where in the parent the child was found. This is useful for understanding whether the segment was at the start or end of a parent.
  • ...to return one less element in that it will never return the other (i.e. the last element previously returned). This is partly because it was not often used, partly because anyone calling path_to already has access to that and mostly because it doesn't make sense to ask what index something is within itself.

@codecov
Copy link

codecov bot commented Sep 25, 2022

Codecov Report

Base: 100.00% // Head: 100.00% // No change to project coverage 👍

Coverage data is based on head (4b15e81) compared to base (d96f23e).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff            @@
##              main     #3875   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          179       179           
  Lines        13800     13811   +11     
=========================================
+ Hits         13800     13811   +11     
Impacted Files Coverage Δ
src/sqlfluff/rules/L052.py 100.00% <ø> (ø)
src/sqlfluff/utils/analysis/select.py 100.00% <ø> (ø)
src/sqlfluff/core/parser/segments/base.py 100.00% <100.00%> (ø)
src/sqlfluff/core/rules/base.py 100.00% <100.00%> (ø)
src/sqlfluff/rules/L016.py 100.00% <100.00%> (ø)
src/sqlfluff/rules/L026.py 100.00% <100.00%> (ø)
src/sqlfluff/rules/L042.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

src/sqlfluff/core/parser/segments/base.py Outdated Show resolved Hide resolved
src/sqlfluff/core/parser/segments/base.py Outdated Show resolved Hide resolved

# Are we in the right ballpark?
# NB: Comparisons have a higher precedence than `not`.
if not self.get_start_loc() <= other.get_start_loc() <= self.get_end_loc():
return None
return []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have sometimes wondered if this result should be indicated a different way. It could be an error in some situations.

Also, does this result become ambiguous now that the last step is removed? It could be interpreted in two ways:

  • There is no path to other from self.
  • other is an immediate child of self.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my interpretation of this is that it's ok for [] to be the only result and not require an additional None result. The reason is that [] should always be interpreted as other is not a descendent of self. That interpretation would include other is self, because something is not a descendent of itself.

I'll update the docstring to be clearer on that - but does that make sense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the docstring with the explanation:

The result of this should be interpreted as *the path from `self` to `other`*.
If the return value is `[]` (an empty list), that implies there is no path
from `self` to `other`. This would include the case where the two are the same
segment, as there is no path from a segment to itself.

Does that make more sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see -- if other is an immediate child, it returns a path containing self. The code is definitely good. In mathematical terms, the path we're returning would be described as a "half open interval", which we could mention in the docs as [self, other). I think it would be cool to add that additional explanation in the docstring, because it's formal and precise about what's being returned. Sometimes words are not enough. 😛

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@barrywhart - I've added some better tests to spell it out better and a not in the docstring. Reckon that's better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉♥️

Copy link
Member

@barrywhart barrywhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for the explanation and the doc changes!

@alanmcruickshank alanmcruickshank merged commit b92b5ae into main Sep 25, 2022
@alanmcruickshank alanmcruickshank deleted the ac/path_to branch September 25, 2022 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants