## Downloading and Zipping the Datsets:

---


#### 1) Div2k : https://www.kaggle.com/datasets/joe1995/div2k-dataset
#### 2) PASCAL VOC 2012 : https://www.kaggle.com/datasets/huanghanchina/pascal-voc-2012  

## Imports

In [1]:
import os 
import glob
import random

import PIL
from PIL import Image

import skimage 
from skimage.io import imread, imshow, imsave
from skimage.transform import resize

import numpy as np
import matplotlib.pyplot as plt

## Kaggle Configuration

In [2]:
!pip install kaggle



In [3]:
!mkdir ~/.kaggle

In [4]:
!cp kaggle.json ~/.kaggle/

In [5]:
!chmod 600 ~/.kaggle/kaggle.json

## Downloading the PASCAL VOC 2012 Dataset

In [6]:
!kaggle datasets download -d huanghanchina/pascal-voc-2012

Downloading pascal-voc-2012.zip to /content
100% 3.62G/3.63G [00:25<00:00, 158MB/s]
100% 3.63G/3.63G [00:25<00:00, 152MB/s]


### Extracting the PASCAL VOC 2012 Dataset

In [7]:
!mkdir pascal_voc_2012

In [None]:
!unzip pascal-voc-2012.zip -d pascal_voc_2012

## Downloading the Div2k Dataset

In [9]:
!kaggle datasets download -d joe1995/div2k-dataset

Downloading div2k-dataset.zip to /content
100% 3.70G/3.71G [00:29<00:00, 169MB/s]
100% 3.71G/3.71G [00:29<00:00, 136MB/s]


### Extracting Div2k Dataset

In [10]:
!mkdir div2k

In [None]:
!unzip div2k-dataset.zip -d div2k

## Creating the Dataset for Colorization 

In [12]:
!mkdir colorDataset

### Selecting the Images from Div2k

In [13]:
!mkdir div2k_selected

In [14]:
fileNames = glob.glob('/content/div2k/DIV2K_train_HR/DIV2K_train_HR/*.png')

In [15]:
fileNames += glob.glob('/content/div2k/DIV2K_valid_HR/DIV2K_valid_HR/*.png')

In [16]:
len(fileNames)

900

In [None]:
for files in fileNames:
    print(files)

In [18]:
imgSize = (256,256)
targetDir = '/content/div2k_selected/'

In [19]:
for files in fileNames:
    
    #  getting the fileName
    fileNo = files.split('/')[-1].split('.')[0]
    newFileName = 'div_' + fileNo + '.jpg'
    
    # reading the image
    img = Image.open(files)

    # resizing the image
    img = img.resize(imgSize)
    
    # saving the image in the target dir
    fileName = targetDir + newFileName
    img.save(fileName)

### Selecting the Images from PASCAL VOC 2012

In [20]:
!mkdir pascal2012_selected

In [21]:
_2008 = glob.glob('/content/pascal_voc_2012/VOC2012/JPEGImages/2008_*.jpg')
_2009 = glob.glob('/content/pascal_voc_2012/VOC2012/JPEGImages/2009_*.jpg')
_2010 = glob.glob('/content/pascal_voc_2012/VOC2012/JPEGImages/2010_*.jpg')
_2011 = glob.glob('/content/pascal_voc_2012/VOC2012/JPEGImages/2011_*.jpg')
_2012 = glob.glob('/content/pascal_voc_2012/VOC2012/JPEGImages/2012_*.jpg')

In [22]:
allDataYear = [_2008,_2009,_2010,_2011,_2012]

In [23]:
for data in allDataYear:
    # selecting the random 500 images from each of the year
    randomFiles = random.sample(data, 1000)

    for files in randomFiles:
        #  getting the fileName
        fileNo = files.split('/')[-1].split('.')[0]
        newFileName = 'pascal_' + fileNo + '.jpg'
        
        # reading the image
        img = Image.open(files)

        # resizing the image
        img = img.resize(imgSize)
        
        # saving the image in the target dir
        fileName = '/content/pascal2012_selected/' + newFileName

        img.save(fileName)
        # break

## Making the Dataset for Colorization

In [24]:
targetFolder = '/content/colorDataset/'
source_div2k = '/content/div2k_selected/*.jpg'
source_pascal = '/content/pascal2012_selected/*.jpg'

In [25]:
print(f'Images in Div2k : {len(glob.glob(source_div2k))}')
print(f'Images in Pascal 2012 : {len(glob.glob(source_pascal))}')

Images in Div2k : 900
Images in Pascal 2012 : 5000


In [26]:
count = 0 
for imgFile in glob.glob(source_div2k) + random.sample(glob.glob(source_pascal), len(glob.glob(source_pascal))):

    # new file name for the images in the target dir
    fileName = targetFolder + str(count) + '.jpg' 
    count += 1

    # reading the images 
    img = Image.open(imgFile)
    # saving the image in the target dir
    img.save(fileName)
    if count % 100 == 0:
        print(f'Moved : {count} Files')    
print(f'Success')

Moved : 100 Files
Moved : 200 Files
Moved : 300 Files
Moved : 400 Files
Moved : 500 Files
Moved : 600 Files
Moved : 700 Files
Moved : 800 Files
Moved : 900 Files
Moved : 1000 Files
Moved : 1100 Files
Moved : 1200 Files
Moved : 1300 Files
Moved : 1400 Files
Moved : 1500 Files
Moved : 1600 Files
Moved : 1700 Files
Moved : 1800 Files
Moved : 1900 Files
Moved : 2000 Files
Moved : 2100 Files
Moved : 2200 Files
Moved : 2300 Files
Moved : 2400 Files
Moved : 2500 Files
Moved : 2600 Files
Moved : 2700 Files
Moved : 2800 Files
Moved : 2900 Files
Moved : 3000 Files
Moved : 3100 Files
Moved : 3200 Files
Moved : 3300 Files
Moved : 3400 Files
Moved : 3500 Files
Moved : 3600 Files
Moved : 3700 Files
Moved : 3800 Files
Moved : 3900 Files
Moved : 4000 Files
Moved : 4100 Files
Moved : 4200 Files
Moved : 4300 Files
Moved : 4400 Files
Moved : 4500 Files
Moved : 4600 Files
Moved : 4700 Files
Moved : 4800 Files
Moved : 4900 Files
Moved : 5000 Files
Moved : 5100 Files
Moved : 5200 Files
Moved : 5300 Files
Mo

## Making the Zip file of ColorDataset

In [None]:
!zip -r /content/ColorDataset.zip /content/colorDataset

In [28]:
# adding the zip file to the drice
!cp /content/ColorDataset.zip /content/drive/MyDrive/Datasets/ColorDataset