# Introduction to Image
### **1- Image:** 
In Python OpenCV, an image is defined as a 3-dimensional NumPy array (ndarray) representing the pixel values of the image.

Specifically:

- The array has shape (height, width, channels), where:
    - height is the number of rows (pixels) in the image.
    - width is the number of columns (pixels) in the image.
    - channels is the number of color channels in the image (typically 3 for RGB images, 4 for RGBA images with an alpha channel).
- Each pixel is represented by a tuple of values (e.g., (R, G, B) for an RGB image), where each value ranges from 0 to 255 (inclusive).

For example, a 640x480 RGB image would be represented as a NumPy array with shape (480, 640, 3), where each pixel is a tuple of three values (R, G, B).

In OpenCV, images can be read, manipulated, and displayed using various functions, such as cv2.imread(), cv2.imshow(), and cv2.imwrite().

### **2-write down the types of images:** 
Here are the common types of images:

1. RGB (Red, Green, Blue): Color images with 3 channels (R, G, B).
2. RGBA (Red, Green, Blue, Alpha): Color images with 4 channels (R, G, B, A), including an alpha channel for transparency.
3. Grayscale: Black and white images with 1 channel (intensity).
4. Binary: Black and white images with 1 channel (0 or 255).
5. Indexed Color: Images with a limited palette of colors, stored as an index into a color table.
6. CMYK (Cyan, Magenta, Yellow, Black): Color images used in printing, with 4 channels (C, M, Y, K).
7. YUV (Luminance and Chrominance): Color images with 3 channels (Y, U, V), used in video encoding.
8. HSV (Hue, Saturation, Value): Color images with 3 channels (H, S, V), representing color in a more intuitive way.
9. Depth Image: Images representing depth information, often used in computer vision and robotics.
10. Hyperspectral Image: Images with a large number of channels, each representing a specific wavelength of light.

These types of images can be used in various applications, such as computer vision, graphics, printing, and more.

### **3-BIT:** 
A BIT is a unit of information in computing and digital communications. It represents a single binary digit that can have only two values:

- 0 (zero)
- 1 (one)

A bit is the basic building block of binary code, which is used to represent information in computers. It's the smallest unit of information that can be stored or transmitted.

In terms of image processing, bits are used to represent the color depth or pixel depth of an image. For example:

- 1-bit images are black and white (2 colors)
- 4-bit images have 16 colors
- 8-bit images have 256 colors (typical for grayscale images)
- 24-bit images have 16,777,216 colors (true color images)

The more bits used to represent an image, the more colors and detail it can contain.

### **4- BIT Depth:** 
Bit depth, also known as color depth or pixel depth, refers to the number of bits used to represent the color or intensity of a single pixel in a digital image.

Here are some common bit depths:

- 1-bit: Black and white (2 colors)
- 4-bit: 16 colors
- 8-bit: 256 colors (grayscale) or 16,777,216 colors (true color)
- 16-bit: 65,536 colors (high color)
- 24-bit: 16,777,216 colors (true color)
- 32-bit: 4,294,967,296 colors (deep color)
- 48-bit: 281,474,976,710,656 colors (very deep color)
- 64-bit: 18,446,744,073,709,551,616 colors (extremely deep color)

A higher bit depth means more colors and a more detailed representation of the image. It also increases the file size and memory requirements.

In image processing, bit depth is important for:

- Color accuracy
- Gradient smoothness
- Shadow and highlight detail
- Printing and publishing

Note that bit depth is not the same as image resolution, which refers to the number of pixels in the image.


### **5-Bitonal Image:** 
A bitonal image is a digital image that uses only two colors or shades, typically black and white. It is a binary image where each pixel is either:

- 0 (black)
- 1 (white)

Bitonal images are often used in applications where only two colors are needed, such as:

- Black and white printing
- Fax transmission
- Document scanning
- Barcode recognition
- Optical Character Recognition (OCR)

Bitonal images are usually stored with a 1-bit depth, meaning each pixel is represented by a single bit (0 or 1). This results in a very compact image file size.

Some common examples of bitonal images include:

- Text documents
- Line art
- Logos
- Icons
- Barcode images

Bitonal images can be displayed on any device that supports binary images, and they are often used in applications where color is not necessary or would increase the file size unnecessarily.

### **6- Gray Image:** 
A gray image, also known as a grayscale image, is a digital image that uses only shades of gray, ranging from black (typically represented as 0) to white (typically represented as 255). Gray images do not have any color information, only intensity values.

In a gray image, each pixel is represented by a single value, usually an integer between 0 and 255, which indicates the brightness or intensity of that pixel. The higher the value, the brighter the pixel.

Gray images are often used in applications where color is not necessary or would increase the file size unnecessarily, such as:

- Medical imaging (e.g., X-rays, CT scans)
- Scientific imaging (e.g., astronomy, microscopy)
- Document scanning
- Fax transmission
- Black and white photography

Gray images can be stored with a variety of bit depths, including:

- 8-bit (256 shades of gray)
- 16-bit (65,536 shades of gray)
- 32-bit (4,294,967,296 shades of gray)

Gray images are useful for:

- Reducing file size
- Enhancing contrast and detail
- Improving image processing performance
- Focusing on texture and pattern recognition

Note that gray images are different from bitonal images, which only have two shades (black and white). Gray images have a range of shades, creating a more nuanced and detailed representation of the image.

### **7- RGB Image:** 
An RGB image is a digital image that uses the RGB color model to represent the colors of the image. RGB stands for Red, Green, and Blue, which are the three primary colors used to create the image.

In an RGB image, each pixel is represented by three values:

- Red (R)
- Green (G)
- Blue (B)

Each value ranges from 0 (minimum intensity) to 255 (maximum intensity). The combination of these three values determines the final color of the pixel.

RGB images can have various bit depths, such as:

- 8-bit RGB (24-bit): 256 x 256 x 256 = 16,777,216 colors
- 16-bit RGB (48-bit): 65,536 x 65,536 x 65,536 = 281,474,976,710,656 colors
- 32-bit RGB (96-bit): 4,294,967,296 x 4,294,967,296 x 4,294,967,296 = 18,446,744,073,709,551,616 colors

RGB images are commonly used in:

- Digital photography
- Computer graphics
- Web design
- Printing (although CMYK is more common in printing)

The advantages of RGB images include:

- Wide color gamut
- High color accuracy
- Easy to edit and manipulate

However, RGB images may not be suitable for printing, as the color model is different from the CMYK model used in printing.

### **8- Tones:** 
Tones refer to the different shades or hues of a color. In the context of imaging and color theory, tones can refer to:

1. Color tones: Different shades of a color, such as light blue, sky blue, and navy blue.
2. Grayscale tones: Different shades of gray, ranging from black to white.
3. Skin tones: The range of colors that represent human skin, from pale to dark.
4. Sepia tones: A warm, brownish-gray color, often used in vintage photography.
5. Pastel tones: Soft, pale colors, often used in design and art.

Tones can be used to:

1. Add depth and dimension to an image
2. Create mood and atmosphere
3. Enhance contrast and visual interest
4. Represent different textures and materials
5. Create a specific aesthetic or style

In image editing and color grading, tones can be adjusted using various tools and techniques, such as:

1. Levels and curves
2. Color balance and grading
3. Exposure and contrast adjustments
4. Color filters and overlays
5. Toning and tinting tools

In [None]:
# pip install opencv-python 

1- Reading an Image and Displaying

In [2]:
# reading the image 

# import libraries
import numpy as np 
import cv2 as cv

# read image
img=cv.imread("Capture.PNG") 

In [6]:
# display image
cv.imshow("Origninal Image", img)

cv.waitKey(0)
cv.destroyAllWindows()

In [5]:
# Hands on Practice
import numpy as np 
import cv2 as cv

# read image
img=cv.imread("Capture.PNG") 
cv.imshow("pheli image", img)

cv.waitKey(0)    # 0 is forever show

-1

2- Resizing the image

In [7]:
import numpy as np 
import cv2 as cv

# read image
img=cv.imread("Capture.PNG") 
img1=cv.resize(img, (800,600))     # (800,600) is the size of new image

cv.imshow("pheli image", img)      # pheli image display
cv.imshow("dosri image", img1)     # Dosri image display

cv.waitKey(0)    # 0 is forever show   
cv.destroyAllWindows()   # closing all image

3- Converting to Gray Scale image

In [8]:
# import libraries
import numpy as np 
import cv2 as cv

# reading and resizing image
img=cv.imread("Capture.PNG") 
img=cv.resize(img, (800,600))     # (800,600) is the size of new image

# CONVERSION 
gray_img=cv.cvtColor(img1, cv.COLOR_BGR2GRAY) 

# Displaying 
cv.imshow("pheli Iage", img)
cv.imshow("Gray image", gray_img)      # pheli image display

# delay code
cv.waitKey(0)    # 0 is forever show   
cv.destroyAllWindows()   # closing all image

4- Image to black and White 

In [9]:
import cv2 as cv

img=cv.imread("Capture.PNG") 
gray_img=cv.cvtColor(img, cv.COLOR_BGR2GRAY)


# (thresh, b_w)= cv.threshold(gray_img,127,155, cv.THRESH_BINARY)     or
(thresh, binary)= cv.threshold(gray_img,127,155, cv.THRESH_BINARY) 

# Displaying 
cv.imshow("Original Image", img)
cv.imshow("Gray Image", gray_img)
cv.imshow("Binary Image", binary)

# delay code
cv.waitKey(0)    # 0 is forever show   
cv.destroyAllWindows()   # closing all image

# press q to close the windows

5- Saving or writing Image

In [10]:
import cv2 as cv
from cv2 import imwrite

# reading and converting image
img=cv.imread("Capture.PNG") 
gray_img=cv.cvtColor(img, cv.COLOR_BGR2GRAY)

# saving the image
imwrite("screen.png", gray_img)

# you can see the image 

True

6- Basic funstions and manipulations in OpenCV

In [11]:
import cv2 as cv 
img=cv.imread("Capture.PNG")

# 1- resize
resize_img=cv.resize(img, (450, 250))  # width and Heigth

# 2- gray Image
gray_img=cv.cvtColor(img,cv.COLOR_BGR2GRAY)

# 3- Black and white image
(thresh, binary)= cv.threshold(img,127,155, cv.THRESH_BINARY) 

# 4- Blurred Image
blurr_img=cv.GaussianBlur(img, (7,7), 0)    # (7,7) is kernel size and it must be number always. and zero sigma google it
# kernel size is matrix represents intensity. 


cv.imshow("Original", img)
cv.imshow("Resized", resize_img)
cv.imshow("Black and White", binary)
cv.imshow("Gray Image", gray_img)
cv.imshow("Blurred Image", blurr_img)

cv.waitKey(0)
cv.destroyAllWindows()

7- Edge Detection 

In [12]:
import cv2 as cv 
import numpy as np 
img=cv.imread("Capture.PNG")

# 6- edge detection 
edge_img=cv.Canny(img, 47,47)

# # 7- a- Modification of thickness of lines
# dilated_img=cv.dilate(edge_img, (23,23),iterations=1)


# 7- b- Modification of thickness of lines
mat_kernel=np.ones((7,7), np.uint8)
dilated_img=cv.dilate(edge_img, (mat_kernel),iterations=1)    # dilation mean thickness of edge lines 

# 8- make thinner lines of edges 
erude_img=cv.erode(dilated_img, mat_kernel, iterations=1)


cv.imshow("Detected Edges", edge_img)
cv.imshow("Dilated Image", dilated_img)
cv.imshow("Eroded Image", erude_img)

cv.waitKey(0)
cv.destroyAllWindows()

8- Cropping the Image, we will use numpy not OpenCV

In [13]:
# before going to cropping printing the dimensions of images  
import cv2 as cv 
import numpy as np 

# 1- original image
img=cv.imread("Capture.PNG")

# 2- resize image
resize_img=cv.resize(img, (450, 250))  # width and Heigth

# 3- cropping image
crop_img=resize_img[0:200, 200:300]    # [Height, width] from: to...


cv.imshow("Original", img)
cv.imshow("Resized", resize_img)
cv.imshow("Crpped Image", crop_img)


cv.waitKey(0)
cv.destroyAllWindows()

print("Dimension of original image array:", img.shape)
print("Dimension of resized image array:", resize_img.shape)
print("Dimension of Cropped Image:",crop_img.shape)

Dimension of original image array: (768, 773, 3)
Dimension of resized image array: (250, 450, 3)
Dimension of Cropped Image: (200, 100, 3)


9- Joining two images (To look them side by side)

In [2]:
# joining two images
import cv2 as cv
import numpy as np

# reading image from folder
img=cv.imread("resources/image1.jpg")

# stacking same image
# 1- horizontal stack
hor_stk=np.hstack((img,img))

# 2- vertical stacking 
ver_stk=np.vstack((img,img))



# Showing image
cv.imshow("Horizontal Stacking:", hor_stk)
cv.imshow("Vertical Stacking:", ver_stk)

cv.waitKey(0)
cv.destroyAllWindows()

- Here we can only stacking images with same shape (width, heigth and color channel--> (600,500,3)). 

- We can not resize the stack image we have to define a function for that. 

- same number of channel are must. 

Here is a Python function that uses OpenCV to stack multiple images of different sizes: 

This stacking function needs understanding. 

In [4]:
import cv2 as cv
import numpy as np

def stack_images(images, axis=0):
    """
    Stack multiple images of different sizes along a specified axis.

    Args:
        images (list): List of images to stack
        axis (int): Axis to stack along (0 for horizontal, 1 for vertical)

    Returns:
        stacked_image (numpy array): Stacked image
    """
    # Get the maximum width and height of the images
    max_width = max(img.shape[1] for img in images)
    max_height = max(img.shape[0] for img in images)

    # Create a blank image with the maximum width and height
    stacked_image = np.zeros((max_height, max_width, 3), dtype=np.uint8)

    # Stack the images
    for i, img in enumerate(images):
        h, w, _ = img.shape
        if axis == 0:  # Horizontal stacking
            stacked_image[:h, i * w:(i + 1) * w, :] = img
        elif axis == 1:  # Vertical stacking
            stacked_image[i * h:(i + 1) * h, :w, :] = img

    return stacked_image

Here's an explanation of the code:

1. We first get the maximum width and height of the input images using list comprehensions.
2. We create a blank image with the maximum width and height using NumPy.
3. We iterate through the input images and stack them along the specified axis (0 for horizontal, 1 for vertical).
4. We use NumPy slicing to copy each image into the stacked image array.
5. Finally, we return the stacked image.

You can use this function like this:

In [None]:
images = [cv.imread('image1.jpg'), cv.imread('image2.jpg'), cv.imread('image3.jpg')]
stacked_image = stack_images(images, axis=0)  # Horizontal stacking



cv.imshow('Stacked Image', stacked_image)


cv.waitKey(0)
cv.destroyAllWindows()

10- How to change the perspective of an image: 

In [22]:
# import libraries
import cv2 as cv
import numpy as np

# reading image from folder
img=cv.imread("resources/warp.jpeg")
print(img.shape)

(220, 195, 3)


In [30]:
# import libraries
import cv2 as cv
import numpy as np

# reading image from folder
img=cv.imread("resources/warp.jpeg")

# defining points  1 (How we can get it: check the next topic "Coordinate of an image")
point1=np.float32([[21,112],[171,116],[128,41],[58,45]])
width=195
height=220
# or 
# width, height= 195,220

# defining point 2
point2=np.float32([[0,0],[width,0],[0,height],[width,height]])

# defining matrix
matrix=cv.getPerspectiveTransform(point1,point2)
# out_img=cv.warpPerspectiveT(img,matrix,(width,height))
out_img=cv.warpPerspective(img, matrix, (width,height))
# resize the out image
# out_img=cv.resize(img, (450, 250))  # width and Heigth

cv.imshow("Original Image:", img)
cv.imshow("Prespected or Transformed Image:", out_img)

# writing image
cv.imwrite("resources/prespected.jpeg", out_img)

cv.waitKey(0)
cv.destroyAllWindows()

11- 
 - Coordinates of image 
 - BGR Color codes fro an image

In [None]:
# import libraries
import cv2 as cv
import numpy as np

# defining a function
def find_cood(event, x,y,flags, params):
    if event==cv.EVENT_LBUTTONDOWN:
        # left mouse click
        print(x,"", y)
        # how to define or print on the same image or window
        font=cv.FONT_HERSHEY_PLAIN
        cv.putText(img,str(x) + "," + str(y),(x,y),font,1,(255,0,179),thickness=2)
        # show the text on image and img itself
        cv.imshow("Image",img)
    
    # for color finding
    if event==cv.EVENT_RBUTTONDOWN:
        print(x,"",y)
        
        font=cv.FONT_HERSHEY_SIMPLEX
        b=img[y,x,0]
        g=img[y,x,1]
        r=img[y,x,2]
        
        cv.putText(img,str(b)+","+ str(g) + "," + str(r) , (x,y), font, 1, (255,897,0),2)
        cv.imshow("Image",img)

# final function to read and display
if __name__ == "__main__":
    # reading an image
    img=cv.imread("resources/warp.jpeg",1)
    # display an image
    cv.imshow("Image",img)
    # setting call back function 
    cv.setMouseCallback("Image", find_cood)
    cv.waitKey(0)
    cv.destroyAllWindows()

12- Object Selection Based On color

In [None]:
import numpy as np
import cv2 as cv

# reading image
img=cv.imread("resources/image2.jpg") # here we need to change the image rest the following code will remain the same 

# convert to hsv (Hue, saturation and value)
hsv_img=cv.cvtColor(img, cv.COLOR_BGR2HSV)

# sliders making
def slider():
    pass

path="resources/image2.jpg"

cv.namedWindow("Bars")
cv.resizeWindow("Bars",900,300)

cv.createTrackbar("Hue Min","Bars",0,179,slider)
cv.createTrackbar("Hue Max","Bars",179,179,slider)
cv.createTrackbar("Sat Min","Bars",0,255,slider)
cv.createTrackbar("Sat Max","Bars",255,255,slider)
cv.createTrackbar("Val Min","Bars",0,255,slider)
cv.createTrackbar("Val Max","Bars",255,255,slider)


img=cv.imread(path)
hsv_img=cv.cvtColor(img, cv.COLOR_BGR2HSV)

# hue_min=cv.getTrackbarPos("Hue Min", "Bars")
# print(hue_min)


# while loop
while True:
    img=cv.imread(path)
    hsv_img=cv.cvtColor(img, cv.COLOR_BGR2HSV)
    hue_min=cv.getTrackbarPos("Hue Min", "Bars")
    hue_max=cv.getTrackbarPos("Hue Max", "Bars")
    sat_min=cv.getTrackbarPos("Sat Min", "Bars")
    sat_max=cv.getTrackbarPos("Sat Max", "Bars")
    val_min=cv.getTrackbarPos("Val Min", "Bars")
    val_max=cv.getTrackbarPos("Val Max", "Bars")
    print(hue_min, hue_max,sat_min,sat_max,val_min,val_max)
    
    # to these changes inside the image
    lower=np.array([hue_min, sat_min, val_min])
    upper=np.array([hue_max, sat_max, val_max])
    
    mask_img=cv.inRange(hsv_img,lower,upper)
    out_img=cv.bitwise_and(img, img, mask=mask_image)
    
    cv.imshow("Original", img)
    cv.imshow("HSV Image", hsv_img)
    cv.imshow("Mask", mask_img)
    cv.imshow("Final Image",out_img)
    if cv.waitKey(1) & 0xFF == ord("q"):
        break

cv.destroyAllWindows()

13- Face Detection in Image

In [None]:
# import libraries
import cv2 as cv

face_cascade=cv.CascadeClassifier("resources/haarcascade_frontalface_default.xml")

img=cv.imread(resources/image3.jpg)
# cv.resize(img, (value, value))

# making gray image
gray_img=cv.cvtColor(img, cv.COLOR_BGR2GRAY)

# face detected in gray image
faces=face_cascade.detectMultiScale(gray_img, 1.1,4)

# draw a rectangle around the face
for (x,y,w,h) in faces:
    cv.rectangle(img, (x,y), (x+y, y+h), (255,0,0), 2)

# show the detected face and save
cv.imshow("face Detected, img")
cv.imwrite("resources/Detected Faces.png", img)

cv.waitKey(0)
cv.destroyAllWindows()

# 1- download faces image and try for this code
# 2- haarcascade_frontalface_default.xml     ===>  downloading this 