/
rfield_notes.txt
138 lines (89 loc) · 4.94 KB
/
rfield_notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
Notes on rfield.py
I = input size
O = output size
S = stride (or inverse stride, if upsampling)
LW = left wing size (the number of elements to the left of the filter central element)
RW = right wing size ( the number of elements to the right of the filter central element)
LP = left padding
RP = right padding
spaced(N, S) := (N-1)*S+1
upsampling : spaced(O, 1) + LW + RW = spaced(I, S) + LP + RP
downsampling: spaced(O, S) + LW + RW = spaced(I, 1) + LP + RP
Illustration:
if upsampling, solving for I:
(I-1)*S+1 + LP + RP = O + LW + RW
IS - S + 1 + LP + RP = O + LW + RW
IS = S - 1 -LP - RP + O + LW + RW
I = (S - 1 - LP - RP + O + LW + RW) / S
if downsampling, solving for I:
(O-1)*S+1 + LW + RW = I + LP + RP
I = (O-1)*S+1 + LW + RW - LP - RP
input_stride
C **** 1
B pp*-*-*-* 2
A ********** 1
In the above diagram, data A begins at one position to the left of data B, in
terms of A's stride. Then, things get more complicated. B's stride is 2, because
LayerAB is downsampling.
data B begins 1/2 of a position to the left of data C, in terms of B's stride.
But the intuition behind the formula was pretty simple. Each new offset builds on
the previous one additively. At each new step, you are given an offset in terms of
the input, which in general is both padded (self.left_pad) and dilated (self.stride_ratio)
But, we don't want to confuse the stride of the output with the coordinate system that
left_ind spacing pad_stride sr pos pos_formula
C * * * 1 1 1/2 pos(B) + left_ind(C) *
B * - * - * 1 1 3 2 pos(A) + left_ind(B) * pad_stride(A)
A * * * * * * * * * * 0 2/3 1 0 0
|
0
Central Idea:
An input tensor of elements is transformed in a series of steps, each producing
a new tensor. At each step, all of the tensor elements have associated with
them a physical coordinate (such as 'x'), and are regularly spaced.
The difference between consecutive elements of a given tensor is called its
'spacing'. But, tensors may have two types of elements: 'value elements' and
'padding elements'. The physical distance between a pair of consecutive value
elements (which, there may be intervening padding elements between this pair of
value elements) is called 'value_spacing'. The physical distance between a
pair of any two consecutive elements, ignoring their type, is called just
'spacing'.
A transformation is characterized by its stride_ratio, which equals
out.value_spacing / in.value_spacing.
Here, there are a few important concepts. First is the spacing between elements in
spacing_ratio. This equals
output_spacing / input_spacing. Note, though that there are two kinds of spacing
for each tensor: unpadded_spacing, and padded_spacing.
The initial input spacing should be such that the output stride is a whole number.
So, this reduces to the problem of finding the multiple for the overall stride that
brings the output to a whole number. Actually, not only that, we need every intermediate
number to be a whole number.
So, how to do that? store each stride as a reduced ratio. For instance, your strides might be:
1, 2/3, 1/2, 1/4
And, so, you'd need to find the least common multiple of the denominators. Actually,
though, since the strides are all running products of either integers or reciprocal
integers, then we have:
We want to find the LCM of the denominators of all of the strides. This implies we need to
accumulate them as we return. Could they be calculated in another way, on the way down?
A simple rule might be, on the way down:
Start assuming you are at unit stride. On the way down, an upsampling layer is no problem,
because it dilates the stride in the layer below. However, a downsampling layer contracts
the stride. So, you need to increase the multiple. Is there any reason why this wouldn't
work?
I think so. Essentially, you want your low watermark to be 1. This is the most dense
layer, and it could be anywhere. So, if you simply bump up the stride every time it
goes below 1, you will by definition have the densist layer at 1.
So, the print function will not print on the way down. It will maintain a density
There are several tasks needed. But, mainly, we need to accumulate the following data for
every layer:
left_pad, right_pad, stride, input_size, local_bounds
left_pad and right_pad are easy
And, maintain the min_stride along with that.
At the end, start printing. Simple enough.
all of these are available in one form or another.
Another difficulty is, the recursion logic is intermixed with the measurement
logic. But, for the print function we need to use the measurement logic
in different ways.
So, what is the overall blueprint for print?
As we recurse, collect
Another issue is that of keeping track of strides. Each operation imposes a stride ratio factor
to the existing stride. Some intermediate calculations involve fractional