-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel training #133
base: master
Are you sure you want to change the base?
Parallel training #133
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK for me! As long as the CI passes.
@raphaelreinauer I see that @yorickbrunet answered to all your comments. Are you satisfied or is there anything else you would like to discuss? |
@matteocao thanks for checking in and thanks @yorickbrunet for your answers so far. However, there's one key aspect that still needs attention - the PR description:
@yorickbrunet Could you please add a detailed description of all the features you added, then I can do a detailed PR review. |
Co-authored-by: raphaelreinauer <reinauerr@googlemail.com>
@raphaelreinauer we answered or fixed all comments. Can you have another look, close the issues that can be closed, and maybe approve the PR? Thanks |
Hi @yorickbrunet, could you please provide a detailed description of the features added in this PR to aid my review process? Thanks |
Hi @raphaelreinauer. I improved the general description of the PR, but the important description was there: This work introduces the parallelisation of giotto-deep on multiple GPUs via two methods: pytorch's FSDP and pipeline tools. Even though there are 52 files modified, the modifications are quite small in many of them. The most interesting file is |
Thanks, @yorickbrunet, for the changes. I'll review the changes by next week Friday. |
Unfortunately, I didn't have time to review the PR last weekend, but I'll do it this weekend. Sorry for the delay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @yorickbrunet, I've reviewed your additions and added some comments for your consideration. Also, it might be beneficial to include some tests. Thanks!
.gitignore
Outdated
@@ -4,6 +4,7 @@ | |||
*.pyd | |||
**/__pycache__ | |||
|
|||
*data* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be too restrictive, e.g. it would also include data_processor.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections && \ | ||
apt-get update && \ | ||
apt-get install -y \ | ||
python3 python3-pip \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please pin specific versions to ensure reproducibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The package used comes from ubuntu's packages. There is no version to pin as it won't change during the life of this version of the distribution.
benchmark/Dockerfile
Outdated
RUN cd giotto-deep && \ | ||
pip3 install --no-cache-dir --disable-pip-version-check -r requirements.txt | ||
|
||
COPY ./benchmark/requirements.txt giotto-deep/requirements2.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requirements2.txt
is not the best name; make it more descriptive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will be renamed to requirements.txt
, which in the end is not much better.
return RunData(r.start_time, r.end_time, model, parallel, epochs, batch_size, r.loss, r.accuracy, device_count, device_model) | ||
|
||
|
||
def uniq(data: typing.List[RunData]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this function doing? Can you simplify it as list(set(data))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually not. The function keeps the most recent elements of each class.
Some elements of the list may be of the same class but of different generation time, e.g. some benchmark runs that were restarted. This list keeps the most recent element of each class.
I added a comment in the function.
benchmark/benchmark.py
Outdated
plt.savefig(str(imgfile)) | ||
|
||
|
||
def plot_csv(run_data: typing.List[RunData], img_dir: pathlib.Path, now: datetime.datetime): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function has a lot of repeated code. Could you consider refactoring to reduce duplication and complexity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the function is complex but so is the data to fetch.
There are some pieces of code that are duplicated but mostly because it is not possible to have the exact same code for both blocks because of the different loops to build the data.
Thus I don't think that a refactoring would really help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think that four nested loops are too complicated and should be refactored.
benchmark/genpods.py
Outdated
|
||
with open("pod-template-plot.yml", "r") as fp: | ||
ymlt = string.Template(fp.read()) | ||
ymlv = ymlt.substitute(values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add timestamp of unique identifier to out filename.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will then be a mess of files.
People can move the files of a same batch into folders.
# general enough, and backends like XLA can reuse them in Colab notebooks as well. | ||
# Currently we only add this API first, we can consider adding it to documentation as | ||
# needed in the future. | ||
def spawn(fn, args=(), nprocs=1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea how to add error handling.
This code comes heavily from https://github.com/pytorch/pytorch/blob/v1.13.1/torch/multiprocessing/spawn.py#L178 where there is also no error handling. Thus I assume we're fine.
self.world_size = len(self.devices) | ||
|
||
|
||
def setup_env(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to set env variables here? This looks very error-prone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is necessary for parallel training with RPC.
There is no choice but defining env variables.
benchmark/Dockerfile
Outdated
RUN ln -snf /usr/share/zoneinfo/Europe/Zurich /etc/localtime && \ | ||
echo Europe/Zurich > /etc/timezone && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think it's better to remove it since specifying the time zone does not seem necessary. But keep it if you prefer.
benchmark/README.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be easier to specify the node pools as config files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such a change would require too much work and retesting for a project that is now closed.
I think it will be OK for this version. Anyone can adapt later as he/se whishes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should still be changed.
examples/parallel_bert.py
Outdated
n_sentences_to_consider=4000 | ||
|
||
tmp_path=os.path.join('./cola_public','raw','in_domain_train.tsv') | ||
df = pd.read_csv(tmp_path, delimiter='\t', header=None, names=['sentence_source', 'label', 'label_notes', 'sentence']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put a come after names=[...]. This will made the line more readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VascoSch92 Could you please explain what you mean exactly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of writing
df = pd.read_csv(tmp_path, delimiter='\t', header=None, names=['sentence_source', 'label', 'label_notes', 'sentence'])
use
df = pd.read_csv(
tmp_path,
delimiter='\t',
header=None,
names=['sentence_source', 'label', 'label_notes', 'sentence'],
)
Note the comma at the end of names. If you are using an automatic formatter (like yapf or black) and you put a comma at the end, it will format that for you.
The second one is much easier to read. You see exactly which method/class/function are you using and what are the parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarification @VascoSch92, I agree with your suggestion . We have a pre-commit hook that runs black formatter.
@yorickbrunet Could you please use that? See here: https://github.com/giotto-ai/giotto-deep?tab=readme-ov-file#contributing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Hey @yorickbrunet and @matteocao, I noticed that there have yet to be any comments on the PR reviews. Could you give some comments? So that we can merge this PR. |
Hi @raphaelreinauer I answered all comments with either a modification of the code or some justification why it cannot/won't be modified. Basically, the project is closed and we cannot spend two more weeks adding tests, retesting the setup after modification, etc. |
@raphaelreinauer @VascoSch92 Can you please close all comments that you consider OK? So that we know where we stand. Thanks. |
@yorickbrunet for me is good. You can resolve the discussions ;-) (i cannot) |
"""Keep the most recent elements of each class. | ||
|
||
Some elements of the list may be of the same class but of different generation | ||
time, e.g. some benchmark runs that were restarted. | ||
""" | ||
data2 = [] | ||
idx = 0 | ||
# parse every element in the list (unless those removed during the process) | ||
while idx < len(data): | ||
jdx = idx + 1 | ||
keep = data[idx] # set current data as kept | ||
# parse every further element in the list (unless those removed during the process) | ||
while jdx < len(data): | ||
# if the currently kept element and the current element are of the same "class" ... | ||
if data[jdx].same(keep): | ||
# ... compare if the current element is greater than the kept one ... | ||
if data[jdx].gt(keep): | ||
# ... and keep and remove the current element if it is greater | ||
keep = data.pop(jdx) | ||
else: | ||
# ... or only remove the current element if it is not greater | ||
del data[jdx] | ||
else: | ||
jdx += 1 | ||
data2.append(keep) | ||
idx += 1 | ||
return data2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming is terrible. Why is uniq
not unique
or, better, a more descriptive name? Also, data2
doesn't say anything. The logic is super complicated: two nested while loops with nested if-else statements, and the comments are just repeating what the code already expresses.
This can be written more concisely as:
def get_unique_latest_runs(data: typing.List[RunData]) -> typing.List[RunData]:
unique_config_latest_run = {}
for run in data:
configuration_key = (run.model, run.parallel, run.batch_size, run.gpu_count, run.gpu_model)
if configuration_key not in unique_config_latest_run or run.end_time > unique_config_latest_run[configuration_key].end_time:
unique_config_latest_run[configuration_key] = run
return list(unique_config_latest_run.values())
class Parallelism(enum.Enum): | ||
none = enum.auto() | ||
fsdp_full_shard = enum.auto() | ||
fsdp_shard_grad_op = enum.auto() | ||
fsdp_no_shard = enum.auto() | ||
pipeline = enum.auto() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parallelism is a mixture of a ParallelismType and a sharding strategy - instead one should use a composite of both.
Types of changes
Description
This work introduces the parallelisation of giotto-deep on multiple GPUs via two methods: pytorch's FSDP and pipeline tools.
The version of pytorch is increased to 1.13.1 to support the necessary features of FSDP.
To allow the parallelisation to be efficiently run, a new sampler
GiottoSampler
is defined that combinesDistributedSampler
andSubsetRandomSampler
.A benchmark tool allows running a model with different batch sizes on different GPUs and number of GPUs to compare the parallelised and not-parallelised runs. A generator of Kubernetes pods takes some GKE details as input and outputs the pod configurations, allowing a user to build its own configurations for its own cluster.
Any other comments?
Checklist
flake8
to check my Python changes.pytest
to check this on Python tests.