Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start cleaning up the string interner #34772

Merged
merged 6 commits into from Jul 13, 2016
Merged

Conversation

jseyfried
Copy link
Contributor

r? @eddyb

}).chain(reader::tagged_docs(item, tag_item_unnamed_field).map(|_| {
let name = intr.intern(&index.to_string());
let name = token::with_ident_interner(|interner| interner.intern(index.to_string()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is calling with_ident_interner even necessary here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is equivalent to token::intern(&index.to_string()) (except that it avoids a clone).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe token::intern should be as generic as Interner::intern?
Eventually this could be a generic Symbol::from, I guess.
I would like it if with_ident_interner was actually private.

Copy link
Contributor Author

@jseyfried jseyfried Jul 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'd like token::intern to be generic but it's a syntax-[breaking-change] since token::intern(&my_string) where my_string: String wouldn't typecheck (intern(my_string) or intern(&*my_string) would, though).

I was planning on making token::intern generic in a future PR with other interner-related syntax-[breaking-change]s.

I agree that with_ident_interner should eventually be private and that token::intern should eventually be Symbol::from (or Symbol::intern to contrast with Symbol::gensym).

@eddyb
Copy link
Member

eddyb commented Jul 11, 2016

cc @nrc @cgswords

@eddyb
Copy link
Member

eddyb commented Jul 11, 2016

r=me if none of the interested parties disagree on the changes.

@nrc
Copy link
Member

nrc commented Jul 12, 2016

lgtm

@eddyb
Copy link
Member

eddyb commented Jul 12, 2016

@bors r+

@bors
Copy link
Contributor

bors commented Jul 12, 2016

📌 Commit 060b5c5 has been approved by eddyb

@cgswords
Copy link
Contributor

@jseyfried double points if you also modify the interner to support marking some things as "unclearable" such that clearing the interner preserves them (which will make the clearing we do in the driver more robust and fix a few of the problems with transitioning attributes to tokenstreams)

@jseyfried
Copy link
Contributor Author

@cgswords
Hmm, what needs to be preserved after clearing the interner?
Could we just use InternedString (wrapper around Rc<String>) for what we need to preserve?

We could also use two interners and only clear one of them. We could make this type-safe by making Interner generic in the id type, i.e. Interner<Name> would get cleared and Interner<StaticName> would not (or whatever we want to call the equivalent of Name for stuff that shouldn't get cleared).

@bors
Copy link
Contributor

bors commented Jul 13, 2016

⌛ Testing commit 060b5c5 with merge 0b7fb80...

bors added a commit that referenced this pull request Jul 13, 2016
Start cleaning up the string interner

r? @eddyb
@bors bors merged commit 060b5c5 into rust-lang:master Jul 13, 2016
@cgswords
Copy link
Contributor

@jseyfried

Could we just use InternedString (wrapper around Rc) for what we need to preserve?

That's what the compiler is currently doing, but that breaks down when you need, eg, token literals to stick around in a tokenstream (because literals store Names). Using interned strings would be a big shift in the current Token implementation.

We could also use two interners and only clear one of them. We could make this type-safe by making Interner generic in the id type, i.e. Interner would get cleared and Interner would not (or whatever we want to call the equivalent of Name for stuff that shouldn't get cleared).

Splitting the type out is going to be annoying for the use case I described above---we still want Names for Tokens. An interner that internally keeps a map of 'clearables' (with a intern_unclearable procedure or something) will likely end up much nicer to interact with to this end.

@jseyfried jseyfried deleted the cleanup_interner branch July 14, 2016 17:41
@jseyfried
Copy link
Contributor Author

jseyfried commented Jul 14, 2016

@cgswords Why do we need some Names / interned strings to not get cleared, though?

Afaik, we don't need any interned strings (Names or InternedStrings) after we clear the interner.

@@ -477,17 +477,20 @@ pub type IdentInterner = Interner;
// if an interner exists in TLS, return it. Otherwise, prepare a
// fresh one.
// FIXME(eddyb) #8726 This should probably use a thread-local reference.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks obsolete

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants