Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Pytorch Mobile] Remove caching (in code) of interned strings #50390

Closed
wants to merge 1 commit into from

Conversation

dhruvbird
Copy link
Contributor

@dhruvbird dhruvbird commented Jan 11, 2021

Stack from ghstack:

Currently, there is a massive switch/case statement that is generated in the InternedStrings::string() method to speed up Symbol -> string conversion without taking a mock (mutex). The relative call rate of this on mobile is insignificant, so unlikely to have any material impact on runtime even if the lookups happen under a lock. Plus, parallelism is almost absent on mobile, which is where locks/mutexes cause the most problem (taking a mutex without contention is usually very fast and just adds a memory barrier iirc).

The only impact that caching interned strings has is avoiding taking a lock when interned strings are looked up. They are not looked up very often during training, and based on basic testing, they don't seem to be looked up much during inference either.

During training, the following strings were looked up at test startup:

prim::profile
prim::profile_ivalue
prim::profile_optional
prim::FusionGroup
prim::TypeCheck
prim::FallbackGraph
prim::ChunkSizes
prim::ConstantChunk
prim::tolist
prim::FusedConcat
prim::DifferentiableGraph
prim::MMBatchSide
prim::TensorExprGroup

Command used to trigger training: buck test fbsource//xplat/papaya/client/executor/torch/store/transform/feature/test:test

During inference, the only symbol that was looked up was tolist.

Differential Revision: D25861786

Currently, there is a massive switch/case statement that is generated in the `InternedStrings::string()` method to speed up Symbol -> string conversion without taking a mock (mutex). The relative call rate of this on mobile is insignificant, so unlikely to have any material impact on runtime even if the lookups happen under a lock. Plus, parallelism is almost absent on mobile, which is where locks/mutexes cause the most problem (taking a mutex without contention is usually very fast and just adds a memory barrier iirc).

The only impact that caching interned strings has is avoiding taking a lock when interned strings are looked up. They are not looked up very often during training, and based on basic testing, they don't seem to be looked up much during inference either.

During training, the following strings were looked up at test startup:

```
prim::profile
prim::profile_ivalue
prim::profile_optional
prim::FusionGroup
prim::TypeCheck
prim::FallbackGraph
prim::ChunkSizes
prim::ConstantChunk
prim::tolist
prim::FusedConcat
prim::DifferentiableGraph
prim::MMBatchSide
prim::TensorExprGroup
```

Command used to trigger training: `buck test fbsource//xplat/papaya/client/executor/torch/store/transform/feature/test:test`

During inference, the only symbol that was looked up was `tolist`.

Differential Revision: [D25861786](https://our.internmc.facebook.com/intern/diff/D25861786/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jan 11, 2021

💊 CI failures summary and remediations

As of commit 6f9ec98 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

This comment has been revised 3 times.

dhruvbird added a commit that referenced this pull request Jan 11, 2021
Currently, there is a massive switch/case statement that is generated in the `InternedStrings::string()` method to speed up Symbol -> string conversion without taking a mock (mutex). The relative call rate of this on mobile is insignificant, so unlikely to have any material impact on runtime even if the lookups happen under a lock. Plus, parallelism is almost absent on mobile, which is where locks/mutexes cause the most problem (taking a mutex without contention is usually very fast and just adds a memory barrier iirc).

The only impact that caching interned strings has is avoiding taking a lock when interned strings are looked up. They are not looked up very often during training, and based on basic testing, they don't seem to be looked up much during inference either.

During training, the following strings were looked up at test startup:

```
prim::profile
prim::profile_ivalue
prim::profile_optional
prim::FusionGroup
prim::TypeCheck
prim::FallbackGraph
prim::ChunkSizes
prim::ConstantChunk
prim::tolist
prim::FusedConcat
prim::DifferentiableGraph
prim::MMBatchSide
prim::TensorExprGroup
```

Command used to trigger training: `buck test fbsource//xplat/papaya/client/executor/torch/store/transform/feature/test:test`

During inference, the only symbol that was looked up was `tolist`.

Differential Revision: [D25861786](https://our.internmc.facebook.com/intern/diff/D25861786/)

ghstack-source-id: 119679831
Pull Request resolved: #50390
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in af968cd.

@facebook-github-bot facebook-github-bot deleted the gh/dhruvbird/24/head branch January 16, 2021 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants