# Data Processing

This notebook can be used to download data from Roboflow using the Roboflow API for multiple versions that have been created through different users labeling data in Roboflow, and combine them into one dataset. 


This notebook will 

#### Install Required Packages


In [1]:
!pip install -q roboflow

### Download labeled image data from Roboflow

This code will download each version from the livestalk project using the Roboflow API and place them in folders based on version number (eg livestalk-1, livestalk-2, etc.)

In [2]:
from roboflow import Roboflow
import os

rf = Roboflow(api_key="NsFcZFTbV5oOF4pZS7MY")
project_name = 'livestalk'
num_versions = 3 #increase as versions go up
project = rf.workspace().project(project_name)

def get_version(project,num):
    dataset = project.version(num).download("yolov5")
    
    train_dir = f'{project_name}-{num}/train/images'
    valid_dir = f'{project_name}-{num}/valid/images'
    
    train_count = len([name for name in os.listdir(train_dir)])
    valid_count = len([name for name in os.listdir(valid_dir)])
    
    print(f'Version {num} extraction complete: \n- {train_count:,} training records \n- {valid_count:,} validation records')
    return

get_version(project,1)
get_version(project,2)
get_version(project,3)

# (!!) add more versions here as  we label them (!!)

loading Roboflow workspace...
loading Roboflow project...
Downloading Dataset Version Zip in livestalk-1 to yolov5pytorch: 100% [24533636 / 24533636] bytes


Extracting Dataset Version Zip to livestalk-1 in yolov5pytorch:: 100%|█| 1381/13

Version 1 extraction complete: 
- 549 training records 
- 137 validation records





Downloading Dataset Version Zip in livestalk-2 to yolov5pytorch: 100% [26407695 / 26407695] bytes


Extracting Dataset Version Zip to livestalk-2 in yolov5pytorch:: 100%|█| 1487/14

Version 2 extraction complete: 
- 591 training records 
- 148 validation records





Downloading Dataset Version Zip in livestalk-3 to yolov5pytorch: 100% [26434028 / 26434028] bytes


Extracting Dataset Version Zip to livestalk-3 in yolov5pytorch:: 100%|█| 1487/14

Version 3 extraction complete: 
- 591 training records 
- 148 validation records





### Combine versions into one

The shell script `combine_data.sh` will create the directory livestalk-data and all subfolders, will move the contents of all `images` and `labels` folders, and create a new data.yaml file. Note that there is only one class currently -- `cow`.

**NOTE** as more versions are added, the `combine_data.sh` file needs to be modified

In [3]:
!chmod u+x combine_data.sh
! ./combine_data.sh

new directories created
moving data from livestalk-1 to livestalk-data
moving data from livestalk-2 to livestalk-data
moving data from livestalk-3 to livestalk-data
all versions combined into livestalk-data
There are 643 files in the consolidated training dir
There are 183 files in the consolidated validation dir
