Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modify tracin self influence helpers #994

Closed
wants to merge 1 commit into from

Conversation

99warriors
Copy link
Contributor

Summary:
change TracInCP._self_influence_batch_tracincp and TracInCP._self_influence_batch_tracincp TracInCP._self_influence_batches_tracincp_fast to be named self_influence, which is now public, and now accept a DataLoader yielding batches (as well as a single batch, as before). The modified helper function can be called by external functions to compute self influence.

The helper itself is also changed to improve efficiency, by reducing the number of times checkpoints are loaded. The modified helper, despite being able to compute self influence scores for a dataloader yielding batches, still only loads each checkpoint once, per call. This is because the modified helper now has an outer iteration over checkpoints, and an inner iteration over batches (the order of iteration is reversed compared to before). This helper is called by influence when running it in self influence mode.

The reason we cannot just increase the batch size to reduce the number of checkpoint loadings is that for large models (precisely those for which loading checkpoints is expensive), the model takes up too much memory, so that the batch size cannot be too large.

Minor change: for influence_src_dataset argument of all __init__'s, add description of what assumptions we make of the batches yielded by the dataloader.

Reviewed By: NarineK

Differential Revision: D35603078

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D35603078

@NarineK
Copy link
Contributor

NarineK commented Jul 21, 2022

cc: @99warriors it looks like some of the tests related to progress bar are failing. Do you mind looking into it ?

99warriors added a commit to 99warriors/captum that referenced this pull request Jul 22, 2022
Summary:
Pull Request resolved: pytorch#994

change `TracInCP._self_influence_batch_tracincp` and `TracInCP._self_influence_batch_tracincp` `TracInCP._self_influence_batches_tracincp_fast` to be named `self_influence`, which is now public, and now accept a DataLoader yielding batches (as well as a single batch, as before).  The modified helper function can be called by external functions to compute self influence.

The helper itself is also changed to improve efficiency, by reducing the number of times checkpoints are loaded.  The modified helper, despite being able to compute self influence scores for a dataloader yielding batches, still only loads each checkpoint once, per call.  This is because the modified helper now has an outer iteration over checkpoints, and an inner iteration over batches (the order of iteration is reversed compared to before). This helper is called by `influence` when running it in self influence mode.

The reason we cannot just increase the batch size to reduce the number of checkpoint loadings is that for large models (precisely those for which loading checkpoints is expensive), the model takes up too much memory, so that the batch size cannot be too large.

Minor change: for `influence_src_dataset` argument of all `__init__`'s, add description of what assumptions we make of the batches yielded by the dataloader.

Reviewed By: NarineK

Differential Revision: D35603078

fbshipit-source-id: 56efa7ca82253a71c3ea143f3e2f1cabbe483b58
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D35603078

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D35603078

99warriors added a commit to 99warriors/captum that referenced this pull request Jul 22, 2022
Summary:
Pull Request resolved: pytorch#994

change `TracInCP._self_influence_batch_tracincp` and `TracInCP._self_influence_batch_tracincp` `TracInCP._self_influence_batches_tracincp_fast` to be named `self_influence`, which is now public, and now accept a DataLoader yielding batches (as well as a single batch, as before).  The modified helper function can be called by external functions to compute self influence.

The helper itself is also changed to improve efficiency, by reducing the number of times checkpoints are loaded.  The modified helper, despite being able to compute self influence scores for a dataloader yielding batches, still only loads each checkpoint once, per call.  This is because the modified helper now has an outer iteration over checkpoints, and an inner iteration over batches (the order of iteration is reversed compared to before). This helper is called by `influence` when running it in self influence mode.

The reason we cannot just increase the batch size to reduce the number of checkpoint loadings is that for large models (precisely those for which loading checkpoints is expensive), the model takes up too much memory, so that the batch size cannot be too large.

Minor change: for `influence_src_dataset` argument of all `__init__`'s, add description of what assumptions we make of the batches yielded by the dataloader.

Reviewed By: NarineK

Differential Revision: D35603078

fbshipit-source-id: 8a84e1ce98be36b8eb0ac82c504417bdf21a6539
99warriors added a commit to 99warriors/captum that referenced this pull request Jul 23, 2022
Summary:
Pull Request resolved: pytorch#994

change `TracInCP._self_influence_batch_tracincp` and `TracInCP._self_influence_batch_tracincp` `TracInCP._self_influence_batches_tracincp_fast` to be named `self_influence`, which is now public, and now accept a DataLoader yielding batches (as well as a single batch, as before).  The modified helper function can be called by external functions to compute self influence.

The helper itself is also changed to improve efficiency, by reducing the number of times checkpoints are loaded.  The modified helper, despite being able to compute self influence scores for a dataloader yielding batches, still only loads each checkpoint once, per call.  This is because the modified helper now has an outer iteration over checkpoints, and an inner iteration over batches (the order of iteration is reversed compared to before). This helper is called by `influence` when running it in self influence mode.

The reason we cannot just increase the batch size to reduce the number of checkpoint loadings is that for large models (precisely those for which loading checkpoints is expensive), the model takes up too much memory, so that the batch size cannot be too large.

Minor change: for `influence_src_dataset` argument of all `__init__`'s, add description of what assumptions we make of the batches yielded by the dataloader.

Reviewed By: NarineK

Differential Revision: D35603078

fbshipit-source-id: 8ce8db910a48b92b07efc4de1ec100d2939e4794
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D35603078

@@ -95,7 +96,7 @@ class TracInCPBase(DataInfluence):
def __init__(
self,
model: Module,
influence_src_dataset: Union[Dataset, DataLoader],
train_dataset: Union[Dataset, DataLoader],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@99warriors, do you mind fixing this naming also in the tutorials ? I remember that we used them explicitly as argument names.

99warriors added a commit to 99warriors/captum that referenced this pull request Jul 26, 2022
Summary:
Pull Request resolved: pytorch#994

change `TracInCP._self_influence_batch_tracincp` and `TracInCP._self_influence_batch_tracincp` `TracInCP._self_influence_batches_tracincp_fast` to be named `self_influence`, which is now public, and now accept a DataLoader yielding batches (as well as a single batch, as before).  The modified helper function can be called by external functions to compute self influence.

The helper itself is also changed to improve efficiency, by reducing the number of times checkpoints are loaded.  The modified helper, despite being able to compute self influence scores for a dataloader yielding batches, still only loads each checkpoint once, per call.  This is because the modified helper now has an outer iteration over checkpoints, and an inner iteration over batches (the order of iteration is reversed compared to before). This helper is called by `influence` when running it in self influence mode.

The reason we cannot just increase the batch size to reduce the number of checkpoint loadings is that for large models (precisely those for which loading checkpoints is expensive), the model takes up too much memory, so that the batch size cannot be too large.

Minor change: for `influence_src_dataset` argument of all `__init__`'s, add description of what assumptions we make of the batches yielded by the dataloader.

Reviewed By: NarineK

Differential Revision: D35603078

fbshipit-source-id: 8944c0b909a2c6daa589b94e3df22d5f9e651346
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D35603078

99warriors added a commit to 99warriors/captum that referenced this pull request Jul 29, 2022
Summary:
Pull Request resolved: pytorch#994

change `TracInCP._self_influence_batch_tracincp` and `TracInCP._self_influence_batch_tracincp` `TracInCP._self_influence_batches_tracincp_fast` to be named `self_influence`, which is now public, and now accept a DataLoader yielding batches (as well as a single batch, as before).  The modified helper function can be called by external functions to compute self influence.

The helper itself is also changed to improve efficiency, by reducing the number of times checkpoints are loaded.  The modified helper, despite being able to compute self influence scores for a dataloader yielding batches, still only loads each checkpoint once, per call.  This is because the modified helper now has an outer iteration over checkpoints, and an inner iteration over batches (the order of iteration is reversed compared to before). This helper is called by `influence` when running it in self influence mode.

The reason we cannot just increase the batch size to reduce the number of checkpoint loadings is that for large models (precisely those for which loading checkpoints is expensive), the model takes up too much memory, so that the batch size cannot be too large.

Minor change: for `influence_src_dataset` argument of all `__init__`'s, add description of what assumptions we make of the batches yielded by the dataloader.

Reviewed By: NarineK

Differential Revision: D35603078

fbshipit-source-id: 5f2164d1ebe75359c39b5a540a13eee6ab582900
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D35603078

99warriors added a commit to 99warriors/captum that referenced this pull request Jul 29, 2022
Summary:
Pull Request resolved: pytorch#994

change `TracInCP._self_influence_batch_tracincp` and `TracInCP._self_influence_batch_tracincp` `TracInCP._self_influence_batches_tracincp_fast` to be named `self_influence`, which is now public, and now accept a DataLoader yielding batches (as well as a single batch, as before).  The modified helper function can be called by external functions to compute self influence.

The helper itself is also changed to improve efficiency, by reducing the number of times checkpoints are loaded.  The modified helper, despite being able to compute self influence scores for a dataloader yielding batches, still only loads each checkpoint once, per call.  This is because the modified helper now has an outer iteration over checkpoints, and an inner iteration over batches (the order of iteration is reversed compared to before). This helper is called by `influence` when running it in self influence mode.

The reason we cannot just increase the batch size to reduce the number of checkpoint loadings is that for large models (precisely those for which loading checkpoints is expensive), the model takes up too much memory, so that the batch size cannot be too large.

Minor change: for `influence_src_dataset` argument of all `__init__`'s, add description of what assumptions we make of the batches yielded by the dataloader.

Differential Revision: D35603078

fbshipit-source-id: cd7ac1e59ffa5f57cfd9990626c44d7033636b7b
99warriors added a commit to 99warriors/captum that referenced this pull request Jul 31, 2022
Summary:
Pull Request resolved: pytorch#994

change `TracInCP._self_influence_batch_tracincp` and `TracInCP._self_influence_batch_tracincp` `TracInCP._self_influence_batches_tracincp_fast` to be named `self_influence`, which is now public, and now accept a DataLoader yielding batches (as well as a single batch, as before).  The modified helper function can be called by external functions to compute self influence.

The helper itself is also changed to improve efficiency, by reducing the number of times checkpoints are loaded.  The modified helper, despite being able to compute self influence scores for a dataloader yielding batches, still only loads each checkpoint once, per call.  This is because the modified helper now has an outer iteration over checkpoints, and an inner iteration over batches (the order of iteration is reversed compared to before). This helper is called by `influence` when running it in self influence mode.

The reason we cannot just increase the batch size to reduce the number of checkpoint loadings is that for large models (precisely those for which loading checkpoints is expensive), the model takes up too much memory, so that the batch size cannot be too large.

Minor change: for `influence_src_dataset` argument of all `__init__`'s, add description of what assumptions we make of the batches yielded by the dataloader.

Differential Revision: D35603078

fbshipit-source-id: 5c89f2f8ba5350c0c38e0cad6ccfd0929133c386
99warriors added a commit to 99warriors/captum that referenced this pull request Jul 31, 2022
Summary:
Pull Request resolved: pytorch#994

change `TracInCP._self_influence_batch_tracincp` and `TracInCP._self_influence_batch_tracincp` `TracInCP._self_influence_batches_tracincp_fast` to be named `self_influence`, which is now public, and now accept a DataLoader yielding batches (as well as a single batch, as before).  The modified helper function can be called by external functions to compute self influence.

The helper itself is also changed to improve efficiency, by reducing the number of times checkpoints are loaded.  The modified helper, despite being able to compute self influence scores for a dataloader yielding batches, still only loads each checkpoint once, per call.  This is because the modified helper now has an outer iteration over checkpoints, and an inner iteration over batches (the order of iteration is reversed compared to before). This helper is called by `influence` when running it in self influence mode.

The reason we cannot just increase the batch size to reduce the number of checkpoint loadings is that for large models (precisely those for which loading checkpoints is expensive), the model takes up too much memory, so that the batch size cannot be too large.

Minor change: for `influence_src_dataset` argument of all `__init__`'s, add description of what assumptions we make of the batches yielded by the dataloader.

Differential Revision: D35603078

fbshipit-source-id: 87063052e68441b82514489f4d9f9ad29b396da4
Summary:
Pull Request resolved: pytorch#994

change `TracInCP._self_influence_batch_tracincp` and `TracInCP._self_influence_batch_tracincp` `TracInCP._self_influence_batches_tracincp_fast` to be named `self_influence`, which is now public, and now accept a DataLoader yielding batches (as well as a single batch, as before).  The modified helper function can be called by external functions to compute self influence.

The helper itself is also changed to improve efficiency, by reducing the number of times checkpoints are loaded.  The modified helper, despite being able to compute self influence scores for a dataloader yielding batches, still only loads each checkpoint once, per call.  This is because the modified helper now has an outer iteration over checkpoints, and an inner iteration over batches (the order of iteration is reversed compared to before). This helper is called by `influence` when running it in self influence mode.

The reason we cannot just increase the batch size to reduce the number of checkpoint loadings is that for large models (precisely those for which loading checkpoints is expensive), the model takes up too much memory, so that the batch size cannot be too large.

Minor change: for `influence_src_dataset` argument of all `__init__`'s, add description of what assumptions we make of the batches yielded by the dataloader.

Reviewed By: NarineK

Differential Revision: D35603078

fbshipit-source-id: 78d233a5da210424f3eed308ee563d3baeba4135
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D35603078

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants