-
Notifications
You must be signed in to change notification settings - Fork 0
/
readQueueP1.mdpp
174 lines (141 loc) · 5.42 KB
/
readQueueP1.mdpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
### First pass papers
This file contains an overview of the papers I recently red (only superficially in one pass). Sorted by time of reading in reverse order. (✓✓) only given if the observed parts of the paper are extremely well written.
#### DATE SUBJECT
##### Notes
##### Q
#### `2018-10-02` SUBJECT
##### Notes
##### Q
```latex
@inproceedings{bhattacharyya2018long,
title={Long-term on-board prediction of people in traffic scenes under uncertainty},
author={Bhattacharyya, Apratim and Fritz, Mario and Schiele, Bernt},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={4194--4202},
year={2018}
}
```
#### `2018-10-02` SUBJECT (revisit)
##### Notes
##### Q
```latex
@article{yagi2017future,
title={Future Person Localization in First-Person Videos},
author={Yagi, Takuma and Mangalam, Karttikeya and Yonetani, Ryo and Sato, Yoichi},
journal={arXiv preprint arXiv:1711.11217},
year={2017}
}
```
#### `2018-10-02` Siamese Networks (possibly revisit)
##### Q
- category: proposes Siamese NN for facial verification
- Siamese Networks
- use cases:
- recognition + verification
- number training / class can be very low
- classes do not need to be known during training
- not useful alternatives
1. Distance based methods (compute similarity metric between pattern and lib of stored prototypes)
2. Generative models in reduced dimension space
- idea:
- learn f : INPUT -> TARGET sth \| f(input) \|_1 maintains semantic
distance in input space
- mapping f := convolutional NN
```latex
@inproceedings{chopra2005learning,
title={Learning a similarity metric discriminatively, with application to face verification},
author={Chopra, Sumit and Hadsell, Raia and LeCun, Yann},
booktitle={Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on},
volume={1},
pages={539--546},
year={2005},
organization={IEEE}
}
```
#### `2018-10-01` Siamese Networks
##### Notes
- `one shot learning` means to predict on the basis of only one single example of each new class
- normally; for one shot learning, domain specific features are designed
- works fine on similar inputs
- does however not offer robust solution
- Siamese networks (see DeepFace paper)
- := NN-architecture that contains two or more identical subnetworks model architecture (involves distance-based loss)
- learns similarity of two pairs of inputs
- training thus involves pairwise learning -> quadratic pairs to learn
- advantages
- more robust to class imbalance
- good for one shot learning
- learn similarity
- Feature extraction
- employ unique structure to naturally rank similarity between inputs
##### Q
- contribution:
1. Method for learning in `siamese neural networks`
```latex
@inproceedings{koch2015siamese,
title={Siamese neural networks for one-shot image recognition},
author={Koch, Gregory and Zemel, Richard and Salakhutdinov, Ruslan},
booktitle={ICML Deep Learning Workshop},
volume={2},
year={2015}
}
```
#### `2018-10-01` Gaze Anticipation (revisit)
##### Notes
- point of gaze := point which human is fixating
- method (called Deep Future Gaze := `DFG`):
- GAN
- GENERATOR:
1. (PREDICT FRAMES) Two-stream spatial temporal convolutional net
(3d-CNN); one for background; one for foreground
2. Second 3d-CNN employed for gaze anticipations on top
- DISCRIMINATOR
- This GAN is used for predicting the future frames
- [Experimental results and code](https://github.com/Mengmi/deepfuturegaze_gan )
##### Q
- contribution
1. Aforementioned method outperforms state-of-the-art methods
- prediction of the point of gaze in the future (order of a few seconds)
2. Introduces new egocentric dataset
- reproducibility: ✓✓
- clarity: ✓
- correctness: ✓ even though I am extremely surprised that the prediction of the future frames in the next few seconds can be done reliably (gaze anticipation is stacked on top of that)
```latex
@inproceedings{zhang2017deep,
title={Deep Future Gaze: Gaze Anticipation on Egocentric Videos Using Adversarial Networks.},
author={Zhang, Mengmi and Ma, Keng Teck and Lim, Joo-Hwee and Zhao, Qi and Feng, Jiashi},
booktitle={CVPR},
pages={3539--3548},
year={2017}
}
```
#### `2018-10-01` Video summarization (revisit)
##### Notes
- video summarization := "compress" video by extracting important parts
- viewpoint := yields `important` aspects of the image / video
- applying the video summarization we receive similarity metrics wrt the
viewpoint.
- not one optimal viewpoint
##### Q
- contribution meta:
- viewpoint specified as `similarity` (in a semantic sense) which, once
again is depending on the aspect that is considered important.
- focuses on video-level similarities (supervized)
- core contribution:
1. Proposes a novel video summarization method from multiple groups; inspired
by Fisher's discriminant
2. Introduces a dataset for that purpose
3. Developed optimization algorithm t
- usefulness:
- can be used for feature extraction
- clarity: ✓✓
- correctness: ✓
```latex
@inproceedings{kanehira2018aware,
title={aware Video Summarization},
author={Kanehira, Atsushi and Van Gool, Luc and Ushiku, Yoshitaka and Harada, Tatsuya},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={7435--7444},
year={2018}
}
```