This project used to extract features from input folder images , these features are mainly used for ad clicking predictions . These features were used in https://www.kaggle.com/c/avito-demand-prediction contest.
Install all packages in the requirements.txt
pip install -r requirements.txt
Run example.py to get a results.csv file including all features extracted from input test_data Folder.
image = cv2.imread(image_path)
Used to calculate simplicity of input image.
calculate_image_simplicity(image,c_threshold = 0.01,nchannels=3,nbins =8)
Used to extract basic image segmentation statistics (tuple of 10 features).
image_basic_segment_stats(image)
Used to extract number of faces from input image using pretrained HaarCascade from opencv.
image_face_feats(image)
number of sift keypoints extracted from input image
image_sift_feats(image)
get image simplicity feature from RGB image
image_rgb_simplicity(image)
get image simplicity features from hsv image
image_hsv_simplicity(image)
image features from histogram of HSV images
image_hue_histogram(image)
used for simplicity features on grayscale images
image_grayscale_simplicity(image)
used to calculate image sharpness score
image_sharpness(image)
used to calculate image contrast score
image_contrast(image)
used to calculate image saturation
image_saturation(image)
used to calculate image brightness score
image_brightness(image)
used to calculate colorfulness score based on the paper
image_colorfulness(image)
used to calculate all previous features and put it in a dataframe saved to csv
extract_image_feats(out file name , input file list of images, number of parallel jobs)
Results scored on https://www.kaggle.com/c/avito-demand-prediction contest
These features with simple LightGBM model it got me (Root Mean Squared Error (RMSE):
- 0.2207 on public leaderboard
- 0.2246 on private leaderboard
- Dimitri Ad Clicking prediction paper https://maths-people.anu.edu.au/~johnm/courses/mathdm/talks/dimitri-clickadvert.pdf
- Opencv Haar Cascade model