PyTorch Trainer Baseline + TPU Integration

I hope you find this repository useful. If you do, please start ⭐ this repository.

Introduction

This repository contains the code for an image classification task for dataset. The main task is to classify whether the image is of either a Car or a Tank.

I have also made a training notebook trained on GPU-T4x2. The codes in this repository is for training the model on TPUs.

Data

The dataset is posted on kaggle by Gateway Adam. The data consists of a set of images distinguished as Car & Tanks. The data is in the form of .jpg .

Training

If you want to train the model, you have to perform two steps:

1. Get the data

Download the data from here

Now, take the downloaded .zip file and extract it into the new folder input/.

Take care that the input/ folder is at same directory level as train.py file.

2. Installing the dependencies

To run the code in this repository, few frameworks need to be installed in your machine. Make sure you have enough space and stable internet connection.

Note: torch_xla can have issues while running on Kaggle or Colab Notebooks. Take a look at this kaggle discussion.

Run the below command for installing torch_xla dependecies:

$ pip install cloud-tpu-client==0.10 https://storage.googleapis.com/tpu-pytorch/wheels/torch_xla-1.7-cp37-cp37m-linux_x86_64.whl

Run the below command for installing the required dependencies.

$ pip install -r requirements.txt

3. Training the model

If you have the above steps right then, running the train.py should not produce any errors. To run the code, open the terminal and change the directory level same as train.py file. Now run the train.py file.

$ python train.py

You should start seeing the progress bar, on few seconds at the beginning of training. If you have any problem, feel free to open a Issue. Will be happy to help.

4. Training Result.

If you want to test the mode, find the model weights after training on vit_base_patch16_224 run on TPU at this link

Below shows the metrics plot on GPU.

Note:

When training on TPU, we make use of multiple cores and we must merge the scores from different cores. Run the below code:

if __name__=="__main__":
  def _map_fn(rank,flags):
    torch.set_default_tensor_type("torch.FloatTensor")
    a= _run()
  FLAGS= {}
  xmp.spawn(_map_fn,args=(FLAGS,),nprocs=8, start_method="fork")

In the above code the _run() is where the individual results are returned and _map_fn() is used to collect them.

For some useful tips for running PyTorch on TPU, I would recommend you go through this discussion on Kaggle. 👇 https://www.kaggle.com/competitions/jigsaw-multilingual-toxic-comment-classification/discussion/159723

If you wish to make any enhancements or changes to the existing code, you are humbly invited to do so by raising a pull request

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assets		assets
src		src
README.md		README.md
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyTorch Trainer Baseline + TPU Integration

Introduction

Data

Training

1. Get the data

2. Installing the dependencies

3. Training the model

4. Training Result.

Note:

About

Releases

Packages

Languages

nikhil-xb/PyTorch-Trainer-Baseline

Folders and files

Latest commit

History

Repository files navigation

PyTorch Trainer Baseline + TPU Integration

Introduction

Data

Training

1. Get the data

2. Installing the dependencies

3. Training the model

4. Training Result.

Note:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages