🦆 부스트캠프 AI Tech 학습 내용 정리 (3기, CV)
- python stuff: list comprehension, lambda, map, asterisk stuff(variable-length arguments, kwargs, unpacking), OOP, read(), pickle, csv, html parsing, xml, json
- numpy and pandas
- GD, Probability, Inference
- 심화 과제 1 정리: GD
- 심화 과제 2 정리: Backprop
- 심화 과제 3 정리: Maximum Liklihood Estimation
- AutoGrad stuff: 일반 방정식과 cost function이 들어간 forward propagation의 차이, or lack thereof; Linear regression에서 J와 J의 미분, 그리고 chain rule. ⭐
- PyTorch axis: on numpy and PyTorch axis. TL;DR: axis 0 is always the 'most layered' axis - t.shape: torch.Size([(axis 0), (axis 1), (axis 2)]) ⭐
- 기본 과제 1 정리: PyTorch Function, Module, Model
- 기본 과제 2 정리: PyTorch Dataset, DataLoader
- 심화 과제 1 정리: loading pretrained model, modifying the number of output features of a layer, transforming dataset - Grayscale and ToTensor, hyperparameters, train and test ⭐
- iterable (object) and iterator
- generator
- matplotlib.pyplot as plt: fig, ax = plt.subplots(m, n); ax[i].plot(x, y); plt.show()
- mpl: bar, line, scatter
- mpl: text, color, facet, misc.
- seaborn as sns
- mpl: polar, pie
- mpl: missingno, squarify, pywaffle, matplotlib_venn: missing data, treemap (e.g. finviz), waffle chart, venn diagram
- plotly.express as px
- mpl: custom theme
- visualization techniques
- Optimization - Adam: cross validation (k-fold validation), bootstrapping/bagging/boosting, momentum - directions with intertia, RMSprop - adaptive learning rate, Adam, parameter norm penalty (weight decay)
- CNN: AlexNet - ReLU solves the vanishing gradient problem, VGGNet - smaller kernel size (3x3), GoogLeNet - 1x1 convolution the channel-wise dimension reducer, ResNet - skip connection (addition), DenseNet - skip connection (concatenation)
- RNN: vanishing/exploding gradient in RNN caused by sigmoid and ReLU, LSTM
- Transformer, ViT
- Generative model
- 기본 과제 1 정리: MLP ⭐
- 기본 과제 2 정리: Optimization
- 기본 과제 3 정리: CNN ⭐
- 기본 과제 4 정리: LSTM
- 기본 과제 5 정리: SDPA
- 심화 과제 1 정리: ViT
- 심화 과제 2 정리: AAE
- 미션 2 정리: EDA ⭐
- 미션 3 정리: Augmentation ⭐
- 미션 4 정리: Data Generation ⭐
- 미션 5 정리: Model ⭐
- 미션 6 정리: Pretrained model ⭐
- 미션 7-8 정리: Loss, Optimizer
- 미션 9 정리: Ensemble
- 미션 10 정리: tensorboard, wandb
- 기본 과제 1 정리: resnet34 implementation from scratch: ConvBlock(nn.Sequential(*layers[nn.Conv2d, nn.BatchNorm2d, nn.ReLU])) -> ResBlock(nn.Sequential(*layers[ConvBlock, ConvBlock, residual])) -> ResNet ⭐
- 기본 과제 2 정리: Data Augmentation - transforms.Compose([RandomCrop, ToTensor, Resize, Normalize]), Channel order: {cv2: BGR, torch: RGB}, Dimension: {cv2: (height, width, channel), torch conv2d layer: (batch_size, channel, height, width)} ⭐
- 기본 과제 3 정리: vgg11 implementation from scratch, semantic segmentation using vgg11 modified as FCN by replacing fc layer with 1x1 conv layer
- 심화 과제 1 정리: visualizing conv1 filters, visualizing model activations using forward hook, visualizing saliency map (gradient_logit/gradient_image), visualizing Grad-CAM ⭐
- 기본 과제 4 정리: CGAN - G(concat(emb(z), emb(y))), D(concat(emb(x), emb(y))) ⭐
- 기본 과제 5 정리: Multi-modal
- 심화 과제 2 정리: Hourglass network, torchsummary summary
- 심화 과제 3 정리: Depth map
- More AutoGrad stuff
- Two-Stage Detectors: R-CNN, SPPNet (ROI projection: projection of selective search result onto a feature map, Spatial Pyramid Pooling: n by n grid pooling - fixed fc layer size) solves multiple CNN problem and image warping problem, Fast R-CNN (multi-task loss: classification loss + bounding box regression loss), Faster R-CNN (Region Proposal Network: apply anchor boxes on feature map cells)
- Feature Pyramid Network: FPN (top-down pathway: mixing low level and high level feature maps), PANet (bottom-up path augmentation, adaptive feature pooling: ROI from all stages), Recursive FPN, Bi-directional FPN, NAS(Neural Architecture Search)FPN
- One-Stage Detectors: YOLO (loss: localization loss + confidence loss + classification loss), SSD(multi-scale feature maps, no fc layer, has anchor box), RetinaNet(background class imbalance - solved by focal loss)
- More on Two-Stage Detectors: Faster R-CNN (image -> through ConvNet -> feature map -> through RPN -> ROI; (ROI + feature map) -> through ROI pooling -> through (classification head + regressor head) -> output) has 3 networks: (ConvNet, RPN, cls+reg head), RPN: (9 anchor boxes, 0 or 1 classification, NMS), Cascade R-CNN, Deformable convolution, Transformer (Q, K, V created by W_Q, W_K, W_V; Attention map from Q, K), Swin ⭐
- More on One-Stage Detectors: Two-stage detectors: prediction doesn't happen at every pixel (of a final feature map), proposals from RPN gets projected onto a feature map, and after going through ROI pooling, output gets delivered to cls head and reg head; One-stage detectors: prediction gets made from every pixel (of a final feature map), doesn't have RPN - detector itself is an alteration of RPN, each pixel gets anchor boxes and classification and bbox regression comes right after ⭐
- 기본 미션 1 정리: bbox mAP
- 심화 미션 1 정리: bbox mAP (advanced)
- 기본 미션 2 정리: Faster R-CNN
- 기본 미션 4 정리: FPN
- 심화 미션 4 정리: Faster R-CNN FPN
- 기본 미션 5 정리: YOLO
- 심화 미션 5 정리: YOLO inference
- 심화 미션 6 정리: YOLOv3
- 기본 미션 7 정리: WBF(Weighted Boxes Fusion) ensemble
- Faster R-CNN with Swin-L backbone: config.py, train.ipynb, infer.ipynb ⭐
- UniverseNet with Swin-L backbone: config.py, train.ipynb, infer.ipynb
- EDA.ipynb
- download_ICDAR17.sh
- add_tr_va.sh to rename images
- mlt2ufo_ICDAR17raw2LKJ.py
- mlt2ufo_ICDAR19raw2LKJ.py
- im_mode_test.py
- resize_dataset.py: os.makedirs(), json.load(), json.dump(), cv2.imread(), cv2.resize(), cv2.imwrite()
- cvtPoly2Rect.ipynb: cv2.contourArea(), cv2.minAreaRect(), cv2.boxPoints()
- dataset.py to check the len of dataset
- train.py (lr_scheduler: MultiStepLR | CosineAnnealingLR | CosineAnnealingWarmUpRestarts)
- Semantic Segmentation Pipeline (torchvision.models.segmentation.fcn_resnet50)
- Semantic Segmentation Models: FCN-32s, FCN-16s, FCN-8s, DeconvNet, SegNet, DeepLabV1, DialatedNet, DeepLabV2-VGG16, DeepLabV2-ResNet101, DeepLabV3-ResNet101, DeepLabV3Plus-Xception, UNet, UNet++
- UNet3Plus.py: UNet3+ from scratch
- UNet3+ Implementation ⭐
- 웹캠을 이용한 가상 마우스: 손과 웹캠으로 마우스 커서 조작하기
- 웹캠으로 손의 키포인트를 찾고, 제스쳐를 인식해 마우스 커서를 조작하는 프로젝트.
- 해당 태스크에 특화된 데이터셋을 만들기 위해 직접 다양한 환경에서 데이터를 만들어주는 코드와, 그 데이터의 COCO 포맷 어노테이션 파일을 만들어주는 코드를 작성.
- MMPose에서 ImageNet pretrained MobileNetV3-Large를 이용해, 대규모 공개 데이터셋(FreiHAND)으로 학습하고, 그 모델을 다시 pretrained삼아 직접 제작한 데이터셋에 재학습하는 방식으로 핸드 키포인트 디텍션 정확도 및 안정성 향상.