[Pytorch Mobile] Remove caching (in code) of interned strings #50390

dhruvbird · 2021-01-11T19:43:20Z

Stack from ghstack:

[Pytorch Mobile] Remove caching (in code) of interned strings #50390 [Pytorch Mobile] Remove caching (in code) of interned strings

Currently, there is a massive switch/case statement that is generated in the InternedStrings::string() method to speed up Symbol -> string conversion without taking a mock (mutex). The relative call rate of this on mobile is insignificant, so unlikely to have any material impact on runtime even if the lookups happen under a lock. Plus, parallelism is almost absent on mobile, which is where locks/mutexes cause the most problem (taking a mutex without contention is usually very fast and just adds a memory barrier iirc).

The only impact that caching interned strings has is avoiding taking a lock when interned strings are looked up. They are not looked up very often during training, and based on basic testing, they don't seem to be looked up much during inference either.

During training, the following strings were looked up at test startup:

prim::profile
prim::profile_ivalue
prim::profile_optional
prim::FusionGroup
prim::TypeCheck
prim::FallbackGraph
prim::ChunkSizes
prim::ConstantChunk
prim::tolist
prim::FusedConcat
prim::DifferentiableGraph
prim::MMBatchSide
prim::TensorExprGroup

Command used to trigger training: buck test fbsource//xplat/papaya/client/executor/torch/store/transform/feature/test:test

During inference, the only symbol that was looked up was tolist.

Differential Revision: D25861786

Currently, there is a massive switch/case statement that is generated in the `InternedStrings::string()` method to speed up Symbol -> string conversion without taking a mock (mutex). The relative call rate of this on mobile is insignificant, so unlikely to have any material impact on runtime even if the lookups happen under a lock. Plus, parallelism is almost absent on mobile, which is where locks/mutexes cause the most problem (taking a mutex without contention is usually very fast and just adds a memory barrier iirc). The only impact that caching interned strings has is avoiding taking a lock when interned strings are looked up. They are not looked up very often during training, and based on basic testing, they don't seem to be looked up much during inference either. During training, the following strings were looked up at test startup: ``` prim::profile prim::profile_ivalue prim::profile_optional prim::FusionGroup prim::TypeCheck prim::FallbackGraph prim::ChunkSizes prim::ConstantChunk prim::tolist prim::FusedConcat prim::DifferentiableGraph prim::MMBatchSide prim::TensorExprGroup ``` Command used to trigger training: `buck test fbsource//xplat/papaya/client/executor/torch/store/transform/feature/test:test` During inference, the only symbol that was looked up was `tolist`. Differential Revision: [D25861786](https://our.internmc.facebook.com/intern/diff/D25861786/) [ghstack-poisoned]

facebook-github-bot · 2021-01-11T19:43:24Z

💊 CI failures summary and remediations

As of commit 6f9ec98 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

This comment has been revised 3 times.

Currently, there is a massive switch/case statement that is generated in the `InternedStrings::string()` method to speed up Symbol -> string conversion without taking a mock (mutex). The relative call rate of this on mobile is insignificant, so unlikely to have any material impact on runtime even if the lookups happen under a lock. Plus, parallelism is almost absent on mobile, which is where locks/mutexes cause the most problem (taking a mutex without contention is usually very fast and just adds a memory barrier iirc). The only impact that caching interned strings has is avoiding taking a lock when interned strings are looked up. They are not looked up very often during training, and based on basic testing, they don't seem to be looked up much during inference either. During training, the following strings were looked up at test startup: ``` prim::profile prim::profile_ivalue prim::profile_optional prim::FusionGroup prim::TypeCheck prim::FallbackGraph prim::ChunkSizes prim::ConstantChunk prim::tolist prim::FusedConcat prim::DifferentiableGraph prim::MMBatchSide prim::TensorExprGroup ``` Command used to trigger training: `buck test fbsource//xplat/papaya/client/executor/torch/store/transform/feature/test:test` During inference, the only symbol that was looked up was `tolist`. Differential Revision: [D25861786](https://our.internmc.facebook.com/intern/diff/D25861786/) ghstack-source-id: 119679831 Pull Request resolved: #50390

facebook-github-bot · 2021-01-13T01:56:41Z

This pull request has been merged in af968cd.

facebook-github-bot added the cla signed label Jan 11, 2021

facebook-github-bot closed this in af968cd Jan 13, 2021

facebook-github-bot added the Merged label Jan 13, 2021

facebook-github-bot deleted the gh/dhruvbird/24/head branch January 16, 2021 15:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Pytorch Mobile] Remove caching (in code) of interned strings #50390

[Pytorch Mobile] Remove caching (in code) of interned strings #50390

dhruvbird commented Jan 11, 2021 •

edited

Loading

facebook-github-bot commented Jan 11, 2021 •

edited

Loading

facebook-github-bot commented Jan 13, 2021

[Pytorch Mobile] Remove caching (in code) of interned strings #50390

[Pytorch Mobile] Remove caching (in code) of interned strings #50390

Conversation

dhruvbird commented Jan 11, 2021 • edited Loading

facebook-github-bot commented Jan 11, 2021 • edited Loading

💊 CI failures summary and remediations

facebook-github-bot commented Jan 13, 2021

dhruvbird commented Jan 11, 2021 •

edited

Loading

facebook-github-bot commented Jan 11, 2021 •

edited

Loading