Pytorch-scheduler

The final aim of this project is to build a full suit of scheduler tools for pytorch. That includes tools that predict individual kernel latency and bandwidth as well full nerual network latency and bandwidth.

Prerequisites

To replicate the experiments in this research you need the following setup.

Before running the code, make sure you have the following prerequisites installed:

Python 3.x
PyTorch
CUDA (if using GPUs)

Getting Started

To get started with the project, follow these steps:

Clone the repository:
Install the required dependencies:
Modify the parameter ranges (widths, heights, in_channels_list, out_channels_list, batch_sizes, kernel_sizes, strides) in the main.py file according to your requirements.
Run the main.py file:
Wait for the program to complete. It will generate separate CSV files for each GPU, containing the measured kernel times for different parameter combinations.
Once all processes finish, the program will merge all the GPU-specific CSV files into a single file named results_convolution.csv.

Understanding the Code

The code performs the following steps:

It generates all possible combinations of parameters for convolution operations based on the specified ranges.
It splits the parameter combinations into chunks for each available GPU.
It creates separate CSV files for each GPU to record the measured kernel times.
It starts multiple processes in parallel to run Python scripts for each parameter combination on the specified GPUs.
Each Python script performs the convolution operation and measures the kernel time using either CUDA events or an external benchmark, depending on the value of the ltype variable.
The measured kernel times and flop counts are written to the corresponding GPU-specific CSV files.
After all processes finish, the program merges all the GPU-specific CSV files into a single file named results_convolution.csv.

Results Analysis

Once the program completes, you can analyze the results by examining the results_convolution.csv file. The CSV file contains the following columns:

Batch size
In Channels
Out Channels
Kernel Size
Stride
Width
Height
Flops
Latency
Latency Type

You can use this data to analyze the performance of different parameter combinations and compare the measured kernel times across GPUs.

Limitations and Future Work

The code assumes the availability of multiple GPUs and assigns a specific index to each GPU. If you have a different setup, you may need to modify the code accordingly.
The code currently focuses on measuring kernel times for convolution operations. For other types of operations or different neural network architectures, additional code modifications may be required.
Further improvements can be made to the prediction models used for estimating kernel times. This research project serves as a starting point and can be extended to include more advanced prediction techniques.

Contributing

Contributions to this research project are welcome! If you have any suggestions, bug fixes, or new features, feel free to open an issue or submit a pull request.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
docs		docs
experiments		experiments
test_scripts		test_scripts
README.md		README.md
Todos		Todos

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs

docs

experiments

experiments

test_scripts

test_scripts

README.md

README.md

Todos

Todos

Repository files navigation

Pytorch-scheduler

Prerequisites

Getting Started

Understanding the Code

Results Analysis

Limitations and Future Work

Contributing

License

About

Releases

Packages

Languages

hileamlakB/Pytorch-scheduler

Folders and files

Latest commit

History

Repository files navigation

Pytorch-scheduler

Prerequisites

Getting Started

Understanding the Code

Results Analysis

Limitations and Future Work

Contributing

License

About

Resources

Stars

Watchers

Forks

Languages