Skip to content

Conversation

gmagogsfm
Copy link
Contributor

@gmagogsfm gmagogsfm commented Sep 19, 2021

This would save the cost copying text from stack to heap in some cases (like
parsing function schema during loading phase of libtorch.so)

@facebook-github-bot facebook-github-bot added oncall: jit Add this issue/PR to JIT oncall triage queue cla signed labels Sep 19, 2021
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Sep 19, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit e3e35f5 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@gmagogsfm gmagogsfm force-pushed the lazy_source_new branch 4 times, most recently from eeaf99d to 352fbbc Compare September 20, 2021 17:52
@facebook-github-bot
Copy link
Contributor

@gmagogsfm has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@gmagogsfm has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@gmagogsfm gmagogsfm marked this pull request as ready for review September 20, 2021 19:11
@gmagogsfm gmagogsfm requested review from suo and swolchok September 20, 2021 19:11
@@ -13,7 +13,8 @@ class SourceRangeUnpickler;
struct SourceRange;

// Source represents a code segment. It keeps track of:
// - text : the text of the code segment
// - text or text_view : the text of the code segment, or a view into it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm this makes ownership a little tangled, right? We do not really enforce that the source text sticks around, which makes this change unsafe in general; I don't really love "maybe-owned" style classes when they're not absolutely necessary.

Possibly: we could create a SourceView class that shares most implementation with Source but operates over a view, and use that during the startup schema parsing.

If that's too hard we definitely could leave as is and just say it's up to the caller to keep things straight, but ideally we can be more explicit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really love "maybe-owned" style classes when they're not absolutely necessary.

I would support enabling c10::MaybeOwned<std::string> where the borrow type is c10::string_view if this turns out to be a good use case for it!

@facebook-github-bot
Copy link
Contributor

@gmagogsfm has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@@ -105,15 +113,41 @@ struct Source {
std::shared_ptr<SourceRangeUnpickler> gen_ranges_;
};

// A SourceRange is a view into a Source, that points to a subset of the source,
// specified by `start` and `end` byte offsets into the source text.
struct SourceView : public Source {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a particularly good use of inheritance. A Source owns its string whereas a SourceView does not; handing a SourceView to code that expects a Source could easily end in tears. CC @ezyang for the Liskov Substitution Principle lecture :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a real possibility but I think it should be fine in this case because the derived class already clearly states that it is a view and does not own the text. Thus caller is responsible for making the lifetime correct. Let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The derived class already clearly states that it is a view

Code written to expect a Source knows nothing about SourceView and thus can't see those statements. It may, for example, try to retain a Source for later use. Inheritance should not be used merely for implementation reuse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. I have been on PTO recently.

What if we refactor code to make Source and SourceView both inherit from SourceBase, which makes no guarantee about ownership?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, typically, when you have a SourceView object and a Source object, you write all clients with SourceView and you make Source implicitly convertible to SourceView. Does this work here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good plan, making sure we are all on the same page:

  1. SourceView inherits from Source, overriding its text method, allowing implicit construction of SourceView from Source
  2. All current clients of Source should use SourceView, except for ErrorReport, which should always retain a real copy of source text.
  3. Producers of Source stay unchanged.

@swolchok @suo sounds good?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SourceView inherits from Source, overriding its text method, allowing implicit construction of SourceView from Source

Inheriting like that won't allow implicit construction of SourceView from Source, it would allow implicit construction of Source from SourceView.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL

@facebook-github-bot
Copy link
Contributor

@gmagogsfm has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@codecov
Copy link

codecov bot commented Sep 21, 2021

Codecov Report

Merging #65309 (729ee5a) into master (feefc94) will decrease coverage by 0.00%.
The diff coverage is n/a.

❗ Current head 729ee5a differs from pull request most recent head e3e35f5. Consider uploading reports for the commit e3e35f5 to get more accurate results

@@            Coverage Diff             @@
##           master   #65309      +/-   ##
==========================================
- Coverage   66.37%   66.37%   -0.01%     
==========================================
  Files         739      738       -1     
  Lines       94299    94158     -141     
==========================================
- Hits        62595    62497      -98     
+ Misses      31704    31661      -43     

@@ -27,7 +27,7 @@ namespace jit {
namespace {
struct SchemaParser {
SchemaParser(const std::string& str)
: L(std::make_shared<Source>(str)),
: L(std::make_shared<SourceView>(c10::string_view(str))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it necessary to heap-allocate and reference-count the SourceView instead of storing it by value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly necessary, but changing this would be pretty involved (other structures like SourceRange, importer etc all use shared_ptr), leaving it as a future todo.

@@ -104,7 +104,9 @@ void initTreeViewBindings(PyObject* module) {
return SourceRange(self.source_, start, end);
})
.def_property_readonly("source", [](const SourceRangeFactory& self) {
return self.source_->text();
auto text_view = self.source_->text();
std::string text = std::string(text_view.begin(), text_view.end());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this can be more succinctly written as std::string text(text_view.begin(), text_view.end());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, done.

@pytorch-probot
Copy link

pytorch-probot bot commented Oct 5, 2021

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/gmagogsfm/pytorch-1/blob/e3e35f579f067dfee8672a556353f192fcd73555/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/xla ✅ triggered
linux-vulkan-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers ✅ triggered
linux-xenial-py3.6-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx ✅ triggered
linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/win ✅ triggered
Skipped Workflows
libtorch-linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow 🚫 skipped
linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow 🚫 skipped
parallelnative-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
puretorch-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

@gmagogsfm gmagogsfm force-pushed the lazy_source_new branch 2 times, most recently from 716ea3e to c8ad5f4 Compare October 6, 2021 06:24
@facebook-github-bot
Copy link
Contributor

@gmagogsfm has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@gmagogsfm gmagogsfm changed the title Allow Source to not own source code text and operate on string_view Add SourceView which doesn't own source text as base class of Source Oct 6, 2021
@facebook-github-bot
Copy link
Contributor

@gmagogsfm has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@gmagogsfm gmagogsfm force-pushed the lazy_source_new branch 3 times, most recently from 97150bd to d1ef55a Compare October 15, 2021 21:31
@facebook-github-bot
Copy link
Contributor

@gmagogsfm has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@gmagogsfm has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

instead.

This would save the cost copying text from stack to heap in some cases (like
parsing function schema during loading phase of libtorch.so)
@facebook-github-bot
Copy link
Contributor

@gmagogsfm has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@gmagogsfm gmagogsfm requested a review from zhxchen17 October 16, 2021 05:48
Copy link
Contributor

@swolchok swolchok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well done!

size_t start_;
size_t end_;
};

// OwnedSourceRange is just like a SourceRange except that it owns a `Source`
// instead of `SourceView`. Thus OwnedSourceRange owns a copy of source text.
struct OwnedSourceRange : public SourceRange {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If practical, I would recommend disabling copy & move on SourceRange to avoid accidentally slicing an OwnedSourceRange.

@@ -1641,7 +1641,7 @@ size_t Node::blocksFromGraphBlock() {
}

inline const SourceRange& fakeRange() {
static SourceRange range(std::make_shared<Source>(""), 0, 1);
static SourceRange range(std::make_shared<Source>(std::string("")), 0, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just std::string() is slightly more efficient IIRC and it is certainly less for the compiler to figure out

Comment on lines +108 to +109
std::string text(text_view.begin(), text_view.end());
return text;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return directly to avoid an extra copy

return std::string(text_view.begin(), text_view.end());

facebook-github-bot pushed a commit that referenced this pull request Oct 19, 2021
…ce` (#65309)

Summary:
This would save the cost copying text from stack to heap in some cases (like
parsing function schema during loading phase of libtorch.so)

Pull Request resolved: #65309

Reviewed By: swolchok

Differential Revision: D31060315

Pulled By: gmagogsfm

fbshipit-source-id: 0caf7a688b40df52bb4388c5191d1a42351d6f1a
wconstab pushed a commit that referenced this pull request Oct 20, 2021
…ce` (#65309)

Summary:
This would save the cost copying text from stack to heap in some cases (like
parsing function schema during loading phase of libtorch.so)

Pull Request resolved: #65309

Reviewed By: swolchok

Differential Revision: D31060315

Pulled By: gmagogsfm

fbshipit-source-id: 0caf7a688b40df52bb4388c5191d1a42351d6f1a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed oncall: jit Add this issue/PR to JIT oncall triage queue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants