## Image Extraction

In [1]:
import numpy as np 
import pandas as pd 
import os 

Image Extraction: Procedure

+ Loading the mobile phone image dataset
+ Image Analysis: identify the distribution and the graphical representation of an image
+ Dowloading the images
+ Storing the images in seperate folders: train, test and validation

### Loading mobile image dataset

1. Select a folder

In [2]:
# Implement function for selecting a folder 
def select_folder(sel_folder: str) -> str:
    # Initialise project path and data folder name
    proj_path = "C:\\Development\\Projects\\MachineLearning\\Mobile-Image_Classifier-System"
    source_folder_name = "src"

    # Create source folder variable
    source_folder = os.path.join(proj_path, source_folder_name)

    # Iteration: check if all required folders exist
    for folder in os.listdir(source_folder):
        try:
            # Create folder path variable
            folder_path = os.path.join(source_folder, folder)

            # Check if folder exist
            if not os.path.exists(folder_path):
                # Create a new existing folder
                new_folder = os.path.join(folder_path, folder)
                os.mkdir(new_folder)
        except OSError:
            # print(folder_path)
            print(new_folder)

    # Select the required folder 
    if sel_folder in os.listdir(source_folder):
        selected_folder_path = os.path.join(source_folder, sel_folder)
        return selected_folder_path


2. Select a csv-file from a folder

In [3]:
# Select a CSV-file from a folder
def select_csv_file(filename: str, folder: str) -> str:
    # Select a folder
    sel_folder = select_folder(folder)
    
    # Select a csv-file (if it exists)
    csv_filename = f"{filename}.csv"
    
    if csv_filename in os.listdir(sel_folder):
        filename_path = os.path.join(sel_folder, csv_filename)
        return filename_path

    # return sel_folder

In [6]:
# Load the dataset
csv_file = select_csv_file(filename="mobiles", folder="data")
data = pd.read_csv(csv_file, index_col=0)
data.head(1)

# Rename the columns
# data.columns

Unnamed: 0,Names,Image_Links,Stars,Rating&Reviews,Price_Details,Memory,Camara_Info,Display,Battery,Processor,Warranty
0,"SAMSUNG Galaxy F13 (Waterfall Blue, 64 GB)",https://rukminim1.flixcart.com/image/312/312/x...,4.4,"1,20,759 Ratings & 7,003 Reviews","₹9,699\n₹14,99935% off",4 GB RAM | 64 GB ROM | Expandable Upto 1 TB,50MP + 5MP + 2MP | 8MP Front Camera,16.76 cm (6.6 inch) Full HD+ Display,6000 mAh Lithium Ion Battery,Exynos 850 Processor,1 Year Warranty Provided By the Manufacturer f...


## Image Analysis 

+ Identify the image distribution: number of images per mobile phone brand class
+ Detect irrelevant images and nullvalues

In [7]:
# Extract image URLS
image_ds = data[["Image_Links", "Names"]]
image_ds.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1148 entries, 0 to 1147
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Image_Links  1143 non-null   object
 1   Names        943 non-null    object
dtypes: object(2)
memory usage: 26.9+ KB


In [None]:
# Number of nullvalues for each image feature data 
image_feature = image_ds[["Image_Links"]]
detect_nulls = image_feature[["Image_Links"] == np.nan]
image_with_nulls = image_feature[detect_nulls]
image_with_nulls

KeyError: False

### Insight of the Image Analysis

The image datasets shows 1147 images in total and contains a propertionate amount of realistic image data and irrelevant image data. From the descriptive analysis of the dataset here is an overview of the image data distribution: 

+ Image links: contains 1143 real data and 4 null values 
+ 