You can execute the code of the notebook cells in many ways using the icons on the notebook toolbar and cell toolbars, commands of the code cell context menu (right-click the code cell to open it), and the Run commands of the main menu. Use the following smart shortcuts to quickly run the code cells:

Ctrl+Enter: Runs the current cell.

Shift+Enter: Runs the current cell and select the cell below it.

First let's install the important packages that is required for our project. We are using the PyTorch lightning version 1.5.10

In [1]:
%%capture
!pip install boto3
!pip install ads
!pip install awscli
!pip install pytorch-lightning==1.5.10
!pip install ipython[notebook]
!pip install matplotlib
!pip install seaborn
!pip install tensorboard pandas
!pip install sklearn
!pip install torch torchvision

Let's import necessary libraries for the project. To keep the log of AI model metrics, tensorboard logger is used.

In [2]:
import os
import torch
from pytorch_lightning import LightningModule, Trainer
import pytorch_lightning as pl
from torch import nn
from torch.nn import functional as F
from torch.utils.data import DataLoader, random_split
from torchmetrics import Accuracy, F1Score, Precision, Recall, PrecisionRecallCurve
import torchvision
from torchvision import transforms
from torchvision.datasets import MNIST
import numpy as np
import matplotlib.pyplot as plt
from pytorch_lightning.loggers import TensorBoardLogger
from packaging import version
from statistics import mean
import pandas as pd
import tensorboard as tb
import json
from tensorboard.plugins.hparams.plugin_data_pb2 import HParamsPluginData
from datetime import datetime
import pickle
import re

# Define the path for the dataset to store
PATH_DATASETS = os.environ.get("PATH_DATASETS", "./data")

# Check for the GPU if available
AVAIL_GPUS = min(1, torch.cuda.device_count())

# Folder for storing tensorbolard logs and checkpoints
logger = TensorBoardLogger("tb_logs", name="my_model_tensorboard")

Lets print the version of PyTorch Lightning. We are using the PyTorch lightning version 1.5.10

In [3]:
print("torch version:",torch.__version__)
print("pytorch ligthening version:",pl.__version__)

torch version: 1.12.1+cu113
pytorch ligthening version: 1.5.10


OCI Object storage provides an Amazon S3 compatible API and Amazon’s Python SDK which is also called BOTO3. To use Boto3, you must first import it and indicate which service or services you're going to use.

In [4]:
import boto3

s3 = boto3.resource(
 's3',
 region_name="REGION_IDENTIFIER",
 aws_secret_access_key="YOUR_OCI_SECRET_KEY",
 aws_access_key_id="YOUR_BUCKETS_OCI_ID",
 endpoint_url="https://YOUR_OBJECT_STORAGE_NAMESPACE.compat.objectstorage.REGION_IDENTIFIER.oraclecloud.com"
)


# Here each AI Engineer have to define required parameters for the project. 

# Write the version number for the project
VERSION_NO = "v1"

# Define the project name here
PROJECT_NAME = "MNIST"

# Define Object Storage Bucket name here
BUCKET_NAME = "ds-1"

# Define the maximum number of epochs
MAX_EPOCHS = 5

# Please write the name of the Algorithm that you are using for this project
ALGO_NAME = "Convolutional Neural Network"

#Define a batch size
BATCH_SIZE = 256 if AVAIL_GPUS else 64

# Folder for storing tensorbolard logs and checkpoints
logger = TensorBoardLogger("tb_logs", name="my_model_tensorboard")


### Note what the following built-in functions are doing:

1. [prepare_data()](https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#prepare-data) 💾
    - This is where we can download the dataset. We point to our desired dataset and ask torchvision's `MNIST` dataset class to download if the dataset isn't found there.

2. [setup(stage)](https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#setup) ⚙️
    - Loads in data from file and prepares PyTorch tensor datasets for each split (train, val, test).
    - Setup expects a 'stage' arg which is used to separate logic for 'fit' and 'test'.
    - If you don't mind loading all your datasets at once, you can set up a condition to allow for both 'fit' related setup and 'test' related setup to run whenever `None` is passed to `stage` (or ignore it altogether and exclude any conditionals).

In [5]:
# Let’s initialize the model by creating class called LitMNIST that inherits from PyTorch LightningModule

class LitMNIST(LightningModule):
    def __init__(self, data_dir=PATH_DATASETS, hidden_size=64, learning_rate=2e-5, batch_size=BATCH_SIZE, total_epochs = MAX_EPOCHS, algorithm_name = ALGO_NAME):
        super().__init__()
        # Set our init args as class attributes
        self.data_dir = data_dir
        self.hidden_size = hidden_size
        self.learning_rate = learning_rate
        
        # Save the hyperparameters 
        self.save_hyperparameters()
        
        # Hardcode some dataset specific attributes
        self.num_classes = 10
        self.dims = (1, 28, 28)
        channels, width, height = self.dims
        self.transform = transforms.Compose(
            [
                transforms.ToTensor(),
                transforms.Normalize((0.1307,), (0.3081,)),
            ]
        )

        # Define PyTorch model
        self.model = nn.Sequential(
            nn.Flatten(),
            nn.Linear(channels * width * height, hidden_size),
            nn.ReLU(),
            nn.Dropout(0.1),
            nn.Linear(hidden_size, hidden_size),
            nn.ReLU(),
            nn.Dropout(0.1),
            nn.Linear(hidden_size, self.num_classes),
        )
        self.accuracy = Accuracy()


# Using the forward method lets generate the model’s output  by taking the inputs. 
# The output from the first input layer is fed to the relu activation function

    def forward(self, x):
        x = self.model(x)
        return F.log_softmax(x, dim=1)

# Let’s prepare for the model training by using the training_step method, which takes the two inputs ie. batch and index number of batch data. 
# Keep the log of train_loss by using self.log function
    def training_step(self, batch, batch_idx):
        x, y = batch
        pred=self.forward(x)

        # identifying number of correct predections in a given batch
        correct=pred.argmax(dim=1).eq(y).sum().item()

        # identifying total number of labels in a given batch
        total=len(y)

        #calculating the loss
        loss = F.nll_loss(pred, y)
        train_loss ={"train_loss": loss.item()}

        f1= F1Score(self.num_classes, threshold=0.5, average='micro')
        f1_score = f1(pred, y)

        pre = Precision(self.num_classes)
        precisionValue = pre(pred, y)

        re = Recall(self.num_classes)
        recallValue= re(pred, y)  

        output ={
            "loss": loss,
            "correct": correct,
            "total": total,
            "f1_score": f1_score,
            "precision":precisionValue,
            "recall":recallValue
        }
        return output

    def training_epoch_end(self, outputs):
       #  the function is called after every epoch is completed
        avg_loss = torch.stack([x['loss'] for x in outputs]).mean()
        avg_precision = torch.stack([x['precision'] for x in outputs]).mean()
        avg_recall = torch.stack([x['recall'] for x in outputs]).mean()
        avg_f1_score = torch.stack([x['f1_score'] for x in outputs]).mean()

        #calculate correct and total predictions
        correct=sum([x["correct"] for  x in outputs])
        total=sum([x["total"] for  x in outputs])


        self.logger.experiment.add_scalar("train_loss", avg_loss, self.current_epoch)
        self.logger.experiment.add_scalar("train_acc", correct/total, self.current_epoch)
        self.logger.experiment.add_scalar("train_recall", avg_recall, self.current_epoch)
        self.logger.experiment.add_scalar("train_precision", avg_precision, self.current_epoch)
        self.logger.experiment.add_scalar("train_f1_score", avg_f1_score, self.current_epoch)

# Now, define validation_step life cycle method to estimate the validation loss during the training period, as shown here
    def validation_step(self, batch, batch_idx):
        x, y = batch
        logits = self(x)
        loss = F.nll_loss(logits, y)
        preds = torch.argmax(logits, dim=1)
        val_acc = self.accuracy(preds, y)

        output={'val_loss':loss,
                'val_acc' : val_acc
                }
        return output

    def validation_epoch_end(self, outputs):
        avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
        avg_acc = torch.stack([x['val_acc'] for x in outputs]).mean()

        self.logger.experiment.add_scalar("val_loss", avg_loss, self.current_epoch)
        self.logger.experiment.add_scalar("val_accuracy", avg_acc, self.current_epoch)

        return {'val_loss': avg_loss,
                'val_accuracy': avg_acc}

# In the test step, you can calculate the loss and accuracy values using a pre-built accuracy method from the torchmetrics.functional module
    def test_step(self, batch, batch_idx):
        x, y = batch
        pred = self(x)
        correct=pred.argmax(dim=1).eq(y).sum().item()
        total=len(y)*1.0
        test_loss = F.nll_loss(pred, y)
        test_accuracy = correct/total

        f1= F1Score(self.num_classes, threshold=0.5, average='micro')
        f1_score = f1(pred, y)

        pre = Precision(self.num_classes)
        precisionValue = pre(pred, y)

        re = Recall(self.num_classes)
        recallValue= re(pred, y)  

        test_output ={
            "test_loss": test_loss,
            "test_f1": f1_score,
            "test_recall": recallValue,
            "test_precision": precisionValue,
            "correct": correct,
            "total": total,
        }

        return test_output

    def test_epoch_end(self, outputs):
        avg_loss = torch.stack([x['test_loss'] for x in outputs]).mean()
        correct=sum([x["correct"] for  x in outputs])
        total=sum([x["total"] for  x in outputs])

        avg_test_f1_score = torch.stack([x['test_f1'] for x in outputs]).mean()
        avg_test_recall = torch.stack([x['test_recall'] for x in outputs]).mean()
        avg_test_precision = torch.stack([x['test_precision'] for x in outputs]).mean()


        logs = {"test_loss": avg_loss, 
                "test_accuracy": correct/total, 
                "test_f1_score":avg_test_f1_score, 
                "test_recall": avg_test_recall, 
                "test_precision": avg_test_precision}

        self.logger.experiment.add_scalar("test_loss", avg_loss, self.current_epoch)
        self.logger.experiment.add_scalar("test_accuracy", correct/total, self.current_epoch)
        self.logger.experiment.add_scalar("test_f1_score", avg_test_f1_score, self.current_epoch)
        self.logger.experiment.add_scalar("test_recall", avg_test_recall, self.current_epoch)
        self.logger.experiment.add_scalar("test_precision", avg_test_precision, self.current_epoch)

        return {'log': logs, 'progress_bar': logs}


# Use Adam optimizer with learning rate to minimize the loss and converge the model
    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=self.learning_rate)
        return optimizer

    ####################
    # DATA RELATED HOOKS
    ####################
    
# Let’s prepare the data for MNIST datasets provided by torchvision.datasets library.
    def prepare_data(self):
        # download
        MNIST(self.data_dir, train=True, download=True)
        MNIST(self.data_dir, train=False, download=True)

# Use dataloader to load the data for each training, validation and testing step
    def setup(self, stage=None):

        # Assign train/val datasets for use in dataloaders
        if stage == "fit" or stage is None:
            mnist_full = MNIST(self.data_dir, train=True, transform=self.transform)
            self.mnist_train, self.mnist_val = random_split(mnist_full, [55000, 5000])

        # Assign test dataset for use in dataloader(s)
        if stage == "test" or stage is None:
            self.mnist_test = MNIST(self.data_dir, train=False, transform=self.transform)

    def train_dataloader(self):
        return DataLoader(self.mnist_train, batch_size=BATCH_SIZE)

    def val_dataloader(self):
        return DataLoader(self.mnist_val, batch_size=BATCH_SIZE)

    def test_dataloader(self):
        return DataLoader(self.mnist_test, batch_size=BATCH_SIZE)
        

Train the model using the trainer class and then invoke the fit method to actually train the model

In [6]:
model = LitMNIST()

# Define the lightning trainer
trainer = Trainer(
    gpus=AVAIL_GPUS,
    max_epochs=MAX_EPOCHS,
    progress_bar_refresh_rate=20, logger=logger,
)
trainer.tune(model)
trainer.fit(model)

  f"Setting `Trainer(progress_bar_refresh_rate={progress_bar_refresh_rate})` is deprecated in v1.5 and"
INFO:pytorch_lightning.utilities.distributed:GPU available: False, used: False
INFO:pytorch_lightning.utilities.distributed:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.distributed:IPU available: False, using: 0 IPUs


Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw



INFO:pytorch_lightning.callbacks.model_summary:
  | Name     | Type       | Params
----------------------------------------
0 | model    | Sequential | 55.1 K
1 | accuracy | Accuracy   | 0     
----------------------------------------
55.1 K    Trainable params
0         Non-trainable params
55.1 K    Total params
0.220     Total estimated model params size (MB)


Validation sanity check: 0it [00:00, ?it/s]

Training: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Test the model by using the trainer.test() function. It will trigger the test_dataloader, and test_step and calculates the total accuracy of the test datasets

In [7]:
trainer.test()

  f"`.{fn}(ckpt_path=None)` was called without a model."
INFO:pytorch_lightning.utilities.distributed:Restoring states from the checkpoint path at tb_logs/my_model_tensorboard/version_0/checkpoints/epoch=4-step=4299.ckpt
INFO:pytorch_lightning.utilities.distributed:Loaded model weights from checkpoint at tb_logs/my_model_tensorboard/version_0/checkpoints/epoch=4-step=4299.ckpt


Testing: 0it [00:00, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{}
--------------------------------------------------------------------------------


[{}]

In [8]:
# Lets again train our AI model
trainer.fit(model)

INFO:pytorch_lightning.callbacks.model_summary:
  | Name     | Type       | Params
----------------------------------------
0 | model    | Sequential | 55.1 K
1 | accuracy | Accuracy   | 0     
----------------------------------------
55.1 K    Trainable params
0         Non-trainable params
55.1 K    Total params
0.220     Total estimated model params size (MB)
  rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.")


Validation sanity check: 0it [00:00, ?it/s]

Each version of the experiments are stored on the tb_logs directory. You can check the list of experiments using following code snippets

In [9]:
! ls tb_logs/my_model_tensorboard/

version_0


Use EventAccumulator for loading events and to accumulate the values from tensor board.


In [10]:
from tensorboard.backend.event_processing import event_accumulator
import os

# creating the log folder to store json log file of each AI model version

os.makedirs("formattedjson", exist_ok=True)
os.makedirs("finallogfile", exist_ok=True)
os.makedirs("model", exist_ok=True)


versions = os.listdir('tb_logs/my_model_tensorboard')
if versions:
    versions.sort(key=lambda version_string : list(
    map(int, re.findall(r'\d+', version_string)))[0])
    latest_version = versions[-1]

print(latest_version)

if not os.path.isdir(f"formattedjson/{latest_version}"):
  os.mkdir(f"formattedjson/{latest_version}")

if not os.path.isdir(f"formattedjson/{latest_version}/train_log"):
  os.mkdir(f"formattedjson/{latest_version}/train_log")   

if not os.path.isdir(f"formattedjson/{latest_version}/test_log"):
  os.mkdir(f"formattedjson/{latest_version}/test_log")

if not os.path.isdir(f"formattedjson/{latest_version}/metrics"):
  os.mkdir(f"formattedjson/{latest_version}/metrics")   

if not os.path.isdir(f"finallogfile/{latest_version}"):
  os.mkdir(f"finallogfile/{latest_version}") 

if not os.path.isdir(f"model/{latest_version}"):
  os.mkdir(f"model/{latest_version}") 

version_0


Create the function record_log(latest_version) to extract the log file from each version of the experiment. The scalars_var variable has the list of all scalar parameters of the AI model.

In [11]:
# function to write json log file in individual model version
def record_log(latest_version):
  ea = event_accumulator.EventAccumulator(f'tb_logs/my_model_tensorboard/{latest_version}', size_guidance={event_accumulator.SCALARS: 0})
  ea.Reload()
  
  # saving each logs into json log file
  eventTags = ea.Tags()
  scalars_var = eventTags["scalars"]
  print(f'scalar: ', scalars_var)



  # exporting the logs file in json file inside formattedjson folder
  for i in scalars_var:
    if "train" in i or "val" in i:
      folder_name = "train_log"
    elif "test" in i:
      folder_name = "test_log"
    else:
      folder_name = "metrics"

    # Removing the / with _ while saving logs file , if any logfile name contains any
    file_path = f"formattedjson/{latest_version}/{folder_name}/{i.replace('/', '_')}.json"

    # Converting pandas dataframe into json file format
    pd.DataFrame(ea.Scalars(i)).to_json(file_path, orient = 'records')


    # Extracting the hyperparameter from scalar to json file
  data = ea._plugin_to_tag_to_content["hparams"]["_hparams_/session_start_info"]
  hparam_data = HParamsPluginData.FromString(data).session_start_info.hparams
  hparam_dict = {key: hparam_data[key].ListFields()[0][1] for key in hparam_data.keys()}

# Storing the hyperparameter as hparams.json file
  with open(f"formattedjson/{latest_version}/metrics/hparams.json", "w") as outfile:
    json.dump(hparam_dict, outfile)

Lets combine all the model metrics into a single formatted log file.

In [12]:
# function for combining logs into final single log file
def record_final_log(latest_version):
  res = []
  exclude_dir = set(['metrics'])

  dir_path = f'formattedjson/{latest_version}'

  for parent_path, dirs, filenames in os.walk(dir_path):
    dirs[:] = [d for d in dirs if d not in exclude_dir]
    
    for f in filenames:
      res.append(os.path.join(parent_path, f))

  epochs ={}
  test_epochs = {}

  for metrics_path in res:
    epochs_values = pd.read_json(metrics_path)

    for index, row in epochs_values.iterrows():
    
      metric_split_name = metrics_path.split("/")
      splited_text = os.path.splitext(metric_split_name[-1])[0]
    
    # For windows os directory path structure, check for \\ in log file path
      if "\\" in splited_text:
        new_splited_text = splited_text.split("\\")
        splited_text = new_splited_text[-1]

      if "test" in splited_text:
        _epoch_value = test_epochs.get(row['step'], {})

      else:
        _epoch_value = epochs.get(row['step'], {})
#         print(f"epoch value:", _epoch_value)

      _value = _epoch_value.get('epochs_values', None)
      final_value = row['value']
      if _value:
        final_value = mean([final_value, _value])

      _epoch_value.update({splited_text: final_value})
        
      if "test" in splited_text:
        test_epochs.update({row['step']: _epoch_value})
      else:
        epochs.update({row['step']: _epoch_value})

  final_epochs={}
  final_test_epochs = {}

  for i in range(MAX_EPOCHS):
    final_epochs.update({i: epochs[i]})
    
  for index, value in enumerate(test_epochs):
    final_test_epochs.update(test_epochs[value])  

  hparam_file = open(f'formattedjson/{latest_version}/metrics/hparams.json')
  hparams_dict = json.load(hparam_file)
    
  # Date time of project log
  log_date = datetime.now()
  formatted_log_date = log_date.strftime("%m/%d/%Y, %H:%M:%S")

  # Create experiment number from versions of, tensorboard experiment version
  version_number = latest_version.split("_")
  testver = version_number[-1]
  testno = int(testver)
#   testno = int(testver) +1    


  final_data = [{
      "exp": {
          "exp_no": 'exp_'+ str(testno),
          "datetime": formatted_log_date,
          "hyperparameters": hparams_dict,
          "test_metrics":final_test_epochs,
          "epochs": final_epochs
      }
      
  }]

  with open(f"finallogfile/{latest_version}/log_exp_{testno}.json", "w") as outfile:
    json.dump(final_data, outfile) 

record_log(latest_version)
record_final_log(latest_version)


scalar:  ['hp_metric', 'val_loss', 'val_accuracy', 'train_loss', 'train_acc', 'train_recall', 'train_precision', 'train_f1_score', 'test_loss', 'test_accuracy', 'test_f1_score', 'test_recall', 'test_precision']


Lets upload the log files to the OCI bucket named “BUCKET_NAME” by using the upload_file_bucket function.

In [None]:
def upload_file_bucket(latest_version):
    # creating experiment number from latest_version
    version_number = latest_version.split("_")
    testver = version_number[-1]
    testno = int(testver)  
    
    json_path = f"finallogfile/{latest_version}/log_exp_{testno}.json"
    print(json_path)
    
    split_name = json_path.split("/")
    splited_text = split_name[-1]
    print(splited_text)

    # Upload a File to you OCI Bucket, 1st value is the path of the directory, 2nd value is bucket name and 3rd value is file name with version foldername
    s3.meta.client.upload_file(json_path, BUCKET_NAME, PROJECT_NAME +'/'+VERSION_NO+'/logs/'+splited_text)
    
upload_file_bucket(latest_version)     

Let’s upload the model.pkl file to the OCI bucket by using the following code snippets

In [16]:
def upload_model(latest_version):
    
    # creating experiment number from latest_version
    version_number = latest_version.split("_")
    testver = version_number[-1]
    testno = int(testver)
#         testno = int(testver) +1  

    
    # save the model to local disk
    model_name = f'model/{latest_version}/{PROJECT_NAME}_model_{testno}.pkl'
    with open(model_name, 'wb') as files:
      pickle.dump(model, files)
    
    
    json_path = f"model/{latest_version}/{PROJECT_NAME}_model_{testno}.pkl"
    print(json_path)
    split_name = json_path.split("/")
    splited_text = split_name[-1]
    print(splited_text)
    
    
    # Upload a File to you OCI Bucket, 1st value is the path of the directory, 2nd value is bucket name and 3rd value is file name with version foldername
    s3.meta.client.upload_file(json_path, BUCKET_NAME, PROJECT_NAME +'/'+VERSION_NO+'/artifacts/model/'+splited_text)
    
upload_model(latest_version)    

model/version_0/MNIST_model_0.pkl
MNIST_model_0.pkl


Lets upload the MNIST datasets to the OCI bucket named as artifact/ datasets

In [13]:
# Create the folder to store the zip datasets
if not os.path.isdir(f"zipdatasets"):
  os.mkdir(f"zipdatasets")

Lets zip the datasets stored in the data/MNIST/processed directory as test_datasets.zip and train_datasets.zip

In [14]:
import zipfile

filename=f"data/MNIST/processed"
test_zip_files=zipfile.ZipFile(f'zipdatasets/test_datasets.zip', 'w')
train_zip_files=zipfile.ZipFile(f'zipdatasets/train_datasets.zip', 'w')

print("Zip started.....")
for root, dirs, files in os.walk(filename):
    for file in files:
        print(file)
        if "test" in file:
            test_zip_files.write(os.path.join(root,file), compress_type=zipfile.ZIP_DEFLATED)
            test_zip_files.close()            
        if "train" in file:
            train_zip_files.write(os.path.join(root,file), compress_type=zipfile.ZIP_DEFLATED)
            train_zip_files.close()            
            
print("Zip completed!")


Zip started.....
Zip completed!


Lets upload the zip datasets created on the "zipdatasets" directory 

In [1]:
# Upload the zip datasets to oracle cloud artifacts buckets
data_zip_path = f'zipdatasets'
for parent_path, dirs, filenames in os.walk(data_zip_path):
    for f in filenames:
        json_path=os.path.join(parent_path, f)
        s3.meta.client.upload_file(json_path, BUCKET_NAME, PROJECT_NAME + '/'+VERSION_NO+'/artifacts/datasets/'+f)         
print("Zip datasets upload completed")

Zip datasets upload completed
