Skip to content

fabsta/k_hpa

Repository files navigation

Human protein atlas image classification

competition link

Dataset

Interesting kernels:

fastai v1. starter (github), datablocks api

useful links data augmentation

Workflow

  • Import libraries
  • Define data path,
  • Defining data loader
  • Define focal loss and accuracy
  • Define custom architecture
  • Get learner ready
  • Start training
  • Train head of model
  • Unfreeze all weights and train entire model
  • Test time augmentation
  • Validation, F1 score
  • Submission

Things to check out: gpu-stats in python notebook

github todo lists add overview of important libaries on top of each file

Needs sorting

Data

Preprocessing

  • Discarding yellow runs faster, doesn't change result. more discards?
  • Merge two semantically similar channels

Training

Model

Score

  • Check if f1-score is used
  • Threshold selection for multi-label classification, paper
  • Check out code from here

other competitions

image preprocessing

filters: green filter for prediction, others for reference merging images improves score Discarding yellow runs faster, doesnt change result. more discards? Green is the protein itself. The other colors are other parts of the cell. While they are not required, they can provide useful information. yellow means endoplasmatic reticulum?

Train/Test data split: multilabel stratification python package class imbalance “huge” difference between validation and test score Use cross-validation to get better understanding of predictions on diff validation sets

f1 metric: order of ids is important! macro f1 score sklearn.metrics.f1_score with average="macro" focal loss + soft F1 and focal loss - log(soft F1) for faster convergence

LB: LB probing using all labels benchmark

Postprocessing: Threshold selection for multi-label classification

Improvement ideas:

  • split big image into smaller

  • make a network that will give me bounding boxes of cells to process. Then from the large scale images I can get smaller images of cells to train a network on.

  • ensembl ideas (nasnet)

if there the image contain e.g. "one" object of type A, it is has a label A. if it contains two, three, …. objects, it is has still a label A

you can always break the image into smaller image an ensemble back again. My CDiscount challenge solution gives you a clue on how to do it!

  • Use features from different resolutions

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors