-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Add SourceView
which doesn't own source text as base class of Source
#65309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit e3e35f5 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
eeaf99d
to
352fbbc
Compare
@gmagogsfm has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
352fbbc
to
29c27c3
Compare
@gmagogsfm has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@@ -13,7 +13,8 @@ class SourceRangeUnpickler; | |||
struct SourceRange; | |||
|
|||
// Source represents a code segment. It keeps track of: | |||
// - text : the text of the code segment | |||
// - text or text_view : the text of the code segment, or a view into it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmm this makes ownership a little tangled, right? We do not really enforce that the source text sticks around, which makes this change unsafe in general; I don't really love "maybe-owned" style classes when they're not absolutely necessary.
Possibly: we could create a SourceView
class that shares most implementation with Source
but operates over a view, and use that during the startup schema parsing.
If that's too hard we definitely could leave as is and just say it's up to the caller to keep things straight, but ideally we can be more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really love "maybe-owned" style classes when they're not absolutely necessary.
I would support enabling c10::MaybeOwned<std::string>
where the borrow type is c10::string_view
if this turns out to be a good use case for it!
29c27c3
to
15d061f
Compare
@gmagogsfm has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@@ -105,15 +113,41 @@ struct Source { | |||
std::shared_ptr<SourceRangeUnpickler> gen_ranges_; | |||
}; | |||
|
|||
// A SourceRange is a view into a Source, that points to a subset of the source, | |||
// specified by `start` and `end` byte offsets into the source text. | |||
struct SourceView : public Source { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a particularly good use of inheritance. A Source
owns its string whereas a SourceView
does not; handing a SourceView
to code that expects a Source
could easily end in tears. CC @ezyang for the Liskov Substitution Principle lecture :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a real possibility but I think it should be fine in this case because the derived class already clearly states that it is a view and does not own the text. Thus caller is responsible for making the lifetime correct. Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The derived class already clearly states that it is a view
Code written to expect a Source
knows nothing about SourceView
and thus can't see those statements. It may, for example, try to retain a Source
for later use. Inheritance should not be used merely for implementation reuse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. I have been on PTO recently.
What if we refactor code to make Source and SourceView both inherit from SourceBase, which makes no guarantee about ownership?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, typically, when you have a SourceView object and a Source object, you write all clients with SourceView and you make Source implicitly convertible to SourceView. Does this work here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a good plan, making sure we are all on the same page:
SourceView
inherits fromSource
, overriding itstext
method, allowing implicit construction ofSourceView
fromSource
- All current clients of
Source
should useSourceView
, except forErrorReport
, which should always retain a real copy of source text. - Producers of
Source
stay unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SourceView inherits from Source, overriding its text method, allowing implicit construction of SourceView from Source
Inheriting like that won't allow implicit construction of SourceView from Source, it would allow implicit construction of Source from SourceView.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL
15d061f
to
729ee5a
Compare
@gmagogsfm has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Codecov Report
@@ Coverage Diff @@
## master #65309 +/- ##
==========================================
- Coverage 66.37% 66.37% -0.01%
==========================================
Files 739 738 -1
Lines 94299 94158 -141
==========================================
- Hits 62595 62497 -98
+ Misses 31704 31661 -43 |
@@ -27,7 +27,7 @@ namespace jit { | |||
namespace { | |||
struct SchemaParser { | |||
SchemaParser(const std::string& str) | |||
: L(std::make_shared<Source>(str)), | |||
: L(std::make_shared<SourceView>(c10::string_view(str))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it necessary to heap-allocate and reference-count the SourceView instead of storing it by value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly necessary, but changing this would be pretty involved (other structures like SourceRange, importer etc all use shared_ptr), leaving it as a future todo.
@@ -104,7 +104,9 @@ void initTreeViewBindings(PyObject* module) { | |||
return SourceRange(self.source_, start, end); | |||
}) | |||
.def_property_readonly("source", [](const SourceRangeFactory& self) { | |||
return self.source_->text(); | |||
auto text_view = self.source_->text(); | |||
std::string text = std::string(text_view.begin(), text_view.end()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this can be more succinctly written as std::string text(text_view.begin(), text_view.end());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, done.
729ee5a
to
76a7d82
Compare
CI Flow Status⚛️ CI FlowRuleset - Version:
You can add a comment to the PR and tag @pytorchbot with the following commands: # ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun
# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow For more information, please take a look at the CI Flow Wiki. |
716ea3e
to
c8ad5f4
Compare
@gmagogsfm has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
c8ad5f4
to
f705f10
Compare
Source
to not own source code text and operate on string_viewSourceView
which doesn't own source text as base class of Source
f705f10
to
6854f81
Compare
@gmagogsfm has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
97150bd
to
d1ef55a
Compare
@gmagogsfm has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
1 similar comment
@gmagogsfm has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
d1ef55a
to
1a54437
Compare
instead. This would save the cost copying text from stack to heap in some cases (like parsing function schema during loading phase of libtorch.so)
1a54437
to
e3e35f5
Compare
@gmagogsfm has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well done!
size_t start_; | ||
size_t end_; | ||
}; | ||
|
||
// OwnedSourceRange is just like a SourceRange except that it owns a `Source` | ||
// instead of `SourceView`. Thus OwnedSourceRange owns a copy of source text. | ||
struct OwnedSourceRange : public SourceRange { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If practical, I would recommend disabling copy & move on SourceRange to avoid accidentally slicing an OwnedSourceRange.
@@ -1641,7 +1641,7 @@ size_t Node::blocksFromGraphBlock() { | |||
} | |||
|
|||
inline const SourceRange& fakeRange() { | |||
static SourceRange range(std::make_shared<Source>(""), 0, 1); | |||
static SourceRange range(std::make_shared<Source>(std::string("")), 0, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just std::string()
is slightly more efficient IIRC and it is certainly less for the compiler to figure out
std::string text(text_view.begin(), text_view.end()); | ||
return text; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return directly to avoid an extra copy
return std::string(text_view.begin(), text_view.end());
…ce` (#65309) Summary: This would save the cost copying text from stack to heap in some cases (like parsing function schema during loading phase of libtorch.so) Pull Request resolved: #65309 Reviewed By: swolchok Differential Revision: D31060315 Pulled By: gmagogsfm fbshipit-source-id: 0caf7a688b40df52bb4388c5191d1a42351d6f1a
…ce` (#65309) Summary: This would save the cost copying text from stack to heap in some cases (like parsing function schema during loading phase of libtorch.so) Pull Request resolved: #65309 Reviewed By: swolchok Differential Revision: D31060315 Pulled By: gmagogsfm fbshipit-source-id: 0caf7a688b40df52bb4388c5191d1a42351d6f1a
This would save the cost copying text from stack to heap in some cases (like
parsing function schema during loading phase of libtorch.so)