# Preprocessing Video Data
Preprocessing videos by extracting frames from them and creating a CSV file that stores information about the extracted frames which contains their file path and labels.

## Importing necessary modules
cv2 : For. <br>
os : For interacting with the operating system.<br>
pandas : For data manipulation and analysis. <br>
tqdm : For displaying progress bars during iterations.

In [2]:
import cv2
import os
import pandas as pd
from tqdm import tqdm

## Directories and Paths
Paths specifying where the input video files are located, where the extracted frames will be saved, and where the frames information CSV file will be stored.

In [5]:
# Paths to the directories containing harassment and non-harassment videos
harassment_video_dir = r'C:\SUDHARSHAN\Machine Learning\Application-of-Neural-Networks-for-Detection-of-Sexual-Harassment-in-Workspace-main\Application-of-Neural-Networks-for-Detection-of-Sexual-Harassment-in-Workspace-main\Dataset\Harassment'
non_harassment_video_dir = r'C:\SUDHARSHAN\Machine Learning\Application-of-Neural-Networks-for-Detection-of-Sexual-Harassment-in-Workspace-main\Application-of-Neural-Networks-for-Detection-of-Sexual-Harassment-in-Workspace-main\Dataset\Normal'

# Path to the directory where you want to save the frames
output_dir = r'C:\SUDHARSHAN\Machine Learning\preprocessed_imgs'

# Path to save the CSV file
csv_filename = r'C:\SUDHARSHAN\Machine Learning\frames_info.csv'

## Preprocessing Videos

In [6]:
def process_videos(video_dir, label):
    video_files = [file for file in os.listdir(video_dir) if file.endswith('.mp4')]
    frames_info = []

    for video_file in tqdm(video_files):
        video_name = os.path.splitext(video_file)[0] #extracts the filename without its extension 
        os.makedirs(output_dir, exist_ok=True) #ensures the directory exists
        
        video_path = os.path.join(video_dir, video_file) #constructs full path
        cap = cv2.VideoCapture(video_path) #opens the video file for reading
        
        frame_count = 0 #keeps track of the no.  of frames
        
        while cap.isOpened():       #The loop continues until frames can still be read from the video
            ret, frame = cap.read() #reads frame from the video
            if not ret:
                break
            
            frame_filename = f"{video_name}_frame_{frame_count:04d}.jpg" #file name is constructed
            frame_path = os.path.join(output_dir, frame_filename) #full path is created
            
            cv2.imwrite(frame_path, frame) #frame is written as an image
            
            frames_info.append({'video_file': video_file, 'frame_filename': frame_path, 'label': label})
            
            frame_count += 1
        
        cap.release() #After processing all frames in the current video, the video capture is released

    return frames_info

The functionsa are called separately for both harassment and non harassment videos.<br>
The label parameter is set to 1 for harassment and 0 for non-harassment.

In [None]:
# Process harassment videos and non-harassment videos separately
harassment_frames = process_videos(harassment_video_dir, label=1)
non_harassment_frames = process_videos(non_harassment_video_dir, label=0) 

## Creating a list of frames information
The extracted frame information from both the categories is combined into a single list.

In [None]:
# Combine the frames information from both categories
frames_info = harassment_frames + non_harassment_frames

## Creating DataFrame and saving CSV
A dataframe is created using the 'farmes_info' list.<br>
This contains the video filename and the path of the frames extracted from the video along with their class labels.

In [7]:
# Create a DataFrame from frames_info list
frames_df = pd.DataFrame(frames_info)

# Save the DataFrame to a CSV file
frames_df.to_csv(csv_filename, index=False)

print("Frames extraction and CSV creation completed.")

100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [08:12<00:00,  9.84s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [09:44<00:00, 11.70s/it]

Frames extraction and CSV creation completed.



