Skip to content

star-bits/boostcampAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

boostcampAI

🦆 부스트캠프 AI Tech 학습 내용 정리 (3기, CV)

Week 1

  • python stuff: list comprehension, lambda, map, asterisk stuff(variable-length arguments, kwargs, unpacking), OOP, read(), pickle, csv, html parsing, xml, json
  • numpy and pandas
  • GD, Probability, Inference
  • 심화 과제 1 정리: GD
  • 심화 과제 2 정리: Backprop
  • 심화 과제 3 정리: Maximum Liklihood Estimation

Week 2

  • AutoGrad stuff: 일반 방정식과 cost function이 들어간 forward propagation의 차이, or lack thereof; Linear regression에서 J와 J의 미분, 그리고 chain rule. ⭐
  • PyTorch axis: on numpy and PyTorch axis. TL;DR: axis 0 is always the 'most layered' axis - t.shape: torch.Size([(axis 0), (axis 1), (axis 2)]) ⭐
  • 기본 과제 1 정리: PyTorch Function, Module, Model
  • 기본 과제 2 정리: PyTorch Dataset, DataLoader
  • 심화 과제 1 정리: loading pretrained model, modifying the number of output features of a layer, transforming dataset - Grayscale and ToTensor, hyperparameters, train and test ⭐
  • iterable (object) and iterator
  • generator

Week 3

Week 4

Week 5

Week 6-7: Image Classification

Week 8-9

  • 기본 과제 1 정리: resnet34 implementation from scratch: ConvBlock(nn.Sequential(*layers[nn.Conv2d, nn.BatchNorm2d, nn.ReLU])) -> ResBlock(nn.Sequential(*layers[ConvBlock, ConvBlock, residual])) -> ResNet ⭐
  • 기본 과제 2 정리: Data Augmentation - transforms.Compose([RandomCrop, ToTensor, Resize, Normalize]), Channel order: {cv2: BGR, torch: RGB}, Dimension: {cv2: (height, width, channel), torch conv2d layer: (batch_size, channel, height, width)} ⭐
  • 기본 과제 3 정리: vgg11 implementation from scratch, semantic segmentation using vgg11 modified as FCN by replacing fc layer with 1x1 conv layer
  • 심화 과제 1 정리: visualizing conv1 filters, visualizing model activations using forward hook, visualizing saliency map (gradient_logit/gradient_image), visualizing Grad-CAM ⭐
  • 기본 과제 4 정리: CGAN - G(concat(emb(z), emb(y))), D(concat(emb(x), emb(y))) ⭐
  • 기본 과제 5 정리: Multi-modal
  • 심화 과제 2 정리: Hourglass network, torchsummary summary
  • 심화 과제 3 정리: Depth map
  • More AutoGrad stuff

Week 10-12: Object Detection

  • Two-Stage Detectors: R-CNN, SPPNet (ROI projection: projection of selective search result onto a feature map, Spatial Pyramid Pooling: n by n grid pooling - fixed fc layer size) solves multiple CNN problem and image warping problem, Fast R-CNN (multi-task loss: classification loss + bounding box regression loss), Faster R-CNN (Region Proposal Network: apply anchor boxes on feature map cells)
  • Feature Pyramid Network: FPN (top-down pathway: mixing low level and high level feature maps), PANet (bottom-up path augmentation, adaptive feature pooling: ROI from all stages), Recursive FPN, Bi-directional FPN, NAS(Neural Architecture Search)FPN
  • One-Stage Detectors: YOLO (loss: localization loss + confidence loss + classification loss), SSD(multi-scale feature maps, no fc layer, has anchor box), RetinaNet(background class imbalance - solved by focal loss)
  • More on Two-Stage Detectors: Faster R-CNN (image -> through ConvNet -> feature map -> through RPN -> ROI; (ROI + feature map) -> through ROI pooling -> through (classification head + regressor head) -> output) has 3 networks: (ConvNet, RPN, cls+reg head), RPN: (9 anchor boxes, 0 or 1 classification, NMS), Cascade R-CNN, Deformable convolution, Transformer (Q, K, V created by W_Q, W_K, W_V; Attention map from Q, K), Swin ⭐
  • More on One-Stage Detectors: Two-stage detectors: prediction doesn't happen at every pixel (of a final feature map), proposals from RPN gets projected onto a feature map, and after going through ROI pooling, output gets delivered to cls head and reg head; One-stage detectors: prediction gets made from every pixel (of a final feature map), doesn't have RPN - detector itself is an alteration of RPN, each pixel gets anchor boxes and classification and bbox regression comes right after ⭐
  • 기본 미션 1 정리: bbox mAP
  • 심화 미션 1 정리: bbox mAP (advanced)
  • 기본 미션 2 정리: Faster R-CNN
  • 기본 미션 4 정리: FPN
  • 심화 미션 4 정리: Faster R-CNN FPN
  • 기본 미션 5 정리: YOLO
  • 심화 미션 5 정리: YOLO inference
  • 심화 미션 6 정리: YOLOv3
  • 기본 미션 7 정리: WBF(Weighted Boxes Fusion) ensemble
  • Faster R-CNN with Swin-L backbone: config.py, train.ipynb, infer.ipynb
  • UniverseNet with Swin-L backbone: config.py, train.ipynb, infer.ipynb

Week 13-14: Text Detection

Week 15-17: Semantic Segmentation

  • Semantic Segmentation Pipeline (torchvision.models.segmentation.fcn_resnet50)
  • Semantic Segmentation Models: FCN-32s, FCN-16s, FCN-8s, DeconvNet, SegNet, DeepLabV1, DialatedNet, DeepLabV2-VGG16, DeepLabV2-ResNet101, DeepLabV3-ResNet101, DeepLabV3Plus-Xception, UNet, UNet++
  • UNet3Plus.py: UNet3+ from scratch
  • UNet3+ Implementation

Week 18-22: Final Project

  • 웹캠을 이용한 가상 마우스: 손과 웹캠으로 마우스 커서 조작하기
    • 웹캠으로 손의 키포인트를 찾고, 제스쳐를 인식해 마우스 커서를 조작하는 프로젝트.
    • 해당 태스크에 특화된 데이터셋을 만들기 위해 직접 다양한 환경에서 데이터를 만들어주는 코드와, 그 데이터의 COCO 포맷 어노테이션 파일을 만들어주는 코드를 작성.
    • MMPose에서 ImageNet pretrained MobileNetV3-Large를 이용해, 대규모 공개 데이터셋(FreiHAND)으로 학습하고, 그 모델을 다시 pretrained삼아 직접 제작한 데이터셋에 재학습하는 방식으로 핸드 키포인트 디텍션 정확도 및 안정성 향상.

About

부스트캠프 AI Tech 학습 내용 정리

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages