Neural correlates of perceptual similarity masking in primate V1

Visual detection is a fundamental natural task. Detection becomes more challenging as the similarity between the target and the background in which it is embedded increases, a phenomenon termed ‘similarity masking’. To test the hypothesis that V1 contributes to similarity masking, we used voltage sensitive dye imaging (VSDI) to measure V1 population responses while macaque monkeys performed a detection task under varying levels of target-background similarity. Paradoxically, we find that during an initial transient phase, V1 responses to the target are enhanced, rather than suppressed, by target-background similarity. This effect reverses in the second phase of the response, so that in this phase V1 signals are positively correlated with the behavioral effect of similarity. Finally, we show that a simple model with delayed divisive normalization can qualitatively account for our findings. Overall, our results support the hypothesis that a nonlinear gain control mechanism in V1 contributes to perceptual similarity masking.


21
Searching for, and detecting, visual targets in our environment is a ubiquitous natural task that our 22 visual system performs exceptionally well. A key feature of behavioral detection performance is that the 23 texture similarity between the target and the background in which it is embedded profoundly affects 24 target detectability. The more similar are the target and the background, the harder it is to detect the 25 target (Campbell & Kulikowski 1966, Foley 1994, Sebastian et al 2017, Stromeyer & Julesz 1972, Watson 26 & Solomon 1997, Wilson et al 1983). This phenomenon, which is termed "similarity masking", is the 27 foundation of camouflage. 28 An example of similarity masking is illustrated in figure 1. Detecting a low contrast oriented visual target 29 is easy on a uniform gray background (Fig. 1A). Detectability decreases when the target has a similar 30 orientation as the background (Fig. 1B). The neural basis of similarity masking is not well understood. 31 The main goal of the current study was to test the hypothesis that neural interactions between the 32 representations of the target and background in the primary visual cortex (V1) contribute to the 33 perceptual effect of similarity masking. interactions between the target and the background, suggesting that they could potentially contribute 39 to behavioral masking effects. If nonlinear computations in V1 contribute to similarity masking, we 40 would predict that the signals evoked by a target will be maximally reduced by a background that is 41 similar to the target. 42 We tested this hypothesis by measuring V1 population responses in macaque monkeys while they 43 performed a visual detection task under masking conditions (Fig. 1C-E). Because the nature of 44 contextual modulations in V1 is complex, a second goal of our study was to quantitatively characterize 45 the spatiotemporal dynamics of V1 population responses to different combinations of targets and 46 backgrounds. 47 As a first step, we characterized the behavioral effects of similarity masking in two macaque monkeys, 48 demonstrating clear effects of similarity on target detectability and reaction times. These results confirm 49 that macaque monkeys are a good animal model for human similarity masking. 50 Second, we used voltage-sensitive dye imaging (VSDI; (Grinvald & Hildesheim 2004, Seidemann et al 51 2002, Shoham et al 1999) to measure V1 population responses at two scales: the scale of the 52 retinotopic map and the scale of orientation columns, while the monkeys performed the similarity 53 masking detection task. To study the effect of similarity masking on the neural detection sensitivity, we 54 constructed a task-specific decoder at each scale. Each decoder first pools the responses using a scale-55 dependent spatial template and then combines these responses over time to form a decision variable. 56 The distributions of the decision variable in target-present vs. target-absent trials are used to compute 57 neural sensitivity that can be compared to behavioral sensitivity ). 58 We found that V1 population responses to the target and background display two distinct phases. An 59 initial transient phase that starts at response onset, and a second phase that lasts until stimulus offset or 60 the animal's response. Surprisingly, the first phase displays a paradoxical effect; during this phase the 61 target evoked response is strongest when the target and background are similar and is therefore anti-62 correlated with behavior. This effect reverses in the second phase so that in this phase the target-63 evoked response is reduced with increased target-background similarity. V1 responses during this 64 second phase are therefore consistent with behavior. 65 We also observed complex spatiotemporal dynamics of the population response to the target and 66 background stimuli, including a repulsion of V1 columnar-scale representation of target orientation in 67 the direction away from the background orientation. 68 Finally, we show that a simple dynamic population gain control model can qualitatively account for our 69 physiological and behavioral results, and that the estimated properties of the gain-control mechanism 70 are consistent with a principled computational approach to feature encoding and decoding. Overall, our 71 results are consistent with the hypothesis that contextual interactions between the representations of 72 the target and background in V1 are likely to contribute to the perceptual phenomena of similarity 73 masking. 74 75

Behavioral effect of target-background similarity masking 78
To study the neural basis of visual similarity masking, we trained two monkeys (macaque mulatta) to 79 perform a visual detection task in which a small horizontal target appeared on a larger background at a 80 known location in half of the trials (Fig. 1E). The monkey indicated target absence by maintaining 81 fixation and target presence by making a saccadic eye movement to the target location as soon as it was 82 detected. Within a block of trials, the contrast of the target and the background were fixed, while the 83 orientation of the background varied randomly from trial to trial, allowing us to test for the effect of 84 target-background orientation similarity on behavioral and neural detection sensitivities. 85 We tested the behavioral effect of similarity masking over five combinations of target and background 86 contrasts ( Fig. 2A). For each combination, we measured the behavioral sensitivity as a function of 87 background orientation. Performance with no background (uniform gray screen) served as a baseline 88 ( Fig. 2A We also examined the effect of target-background orientation similarity on the monkeys' reaction times 98 (Fig. 2C). We find two distinct effects of orientation similarity on reaction times. At higher target and 99 background contrasts, reaction times are maximal when the background and target have the same 100 orientation (when detectability is lowest and the task is hardest) and monotonically decrease as target-101 background similarity decreases (detectability increases and the task becomes easier). Surprisingly, at 102 lower target and background contrasts, reaction times are low when the background matches target 103 orientation, then increases as the background-target orientation difference increases, and then drops 104 again when the background approaches the orthogonal orientation to the target. Thus, under these 105 conditions, we see an interesting decoupling between difficulty and reaction time, so that reactions 106 times can be shortest in the harder conditions. This surprising effect is present in both monkeys. 107 Our next goal was to test the hypothesis that contextual interactions between the representations of 108 the target and background in V1 contribute to the observed behavioral similarity masking results. 109

Neural population responses to target and background stimuli in macaque V1 110
While the monkeys performed the similarity masking detection task, we used VSDI to measure V1 111 population responses to the target and the background. In each cranial window, we first used a fast and 112 efficient VSDI protocol to obtain a detailed retinotopic map (Yang et al 2007). We then positioned the 113 target so that its neural representation fell at the center of our imaging area. 114 The target elicits V1 population activity at two fundamental spatial scales. At the large retinotopic scale, 115 the target evokes an activity envelope that spreads over several mm 2 and is well fitted by a two-116 dimensional (2D) Gaussian (Fig. 3B, top row; (Chen et al 2006, Chen et al 2012, Sit et al 2009)). Our 8x8 117 mm 2 imaging area allows us to capture this entire target-responsive region. Because the background is 118 much larger than the target and is centered at the same location in the visual field, it produces a 119 relatively uniform response within the imaging area (Fig. 3B, second row). Similarly, the target-plus-120 background stimulus elicits activity within the entire imaged area, with a relatively elevated activity at 121 the retinotopic region corresponding to the target location (Fig. 3B, 3 rd row). However, the target-122 evoked response in the presence of the background (response to target plus background minus 123 response to the background alone) appears significantly weaker than the response to the target alone 124 (Fig. 3B, 1 Table 1 and 146 Fig. S1 for the full set of tested target/background combinations). When presented on a uniform gray 147 background, the target-related retinotopic signal begins to rise ~40 ms after target onset, reaches its 148 peak ~100 ms after stimulus onset, and remains high for the next 100 ms (Fig. 4B,H, black curve). 149 However, when the same target is added to the background, the target-related retinotopic signals 150 display a wide range of responses that depend on background orientation (Fig. 4B,H, colored curves). 151 Our main interest here is in the target-evoked response in the presence of the background (Fig. 4C,I), 152 which can be extracted by subtracting the response to the background alone (Fig. 4A,G) from the 153 response to the target plus background (Fig. 4B,H). If V1 contextual interactions at the retinotopic scale 154 contribute to the behavioral effect of similarity masking, we would expect that target-evoked responses 155 would be weakest with high target-background similarity (similar target and background orientations) 156 and strongest with low target-background similarity (orthogonal target and background orientations). 157 Surprisingly, we find that the target-evoked response in V1 displays two distinct phases, with the early 158 phase showing a paradoxical neural dependence on target-background orientation similarity that is anti-159 correlated with the behavioral masking effect, and with a later phase that is consistent with the 160 behavioral masking effect. Specifically, in the early phase which starts at response onset, the target-161 evoked response is highest when the background matches the target orientation even though 162 behaviorally this is the condition in which detection performance is the worst. However, after this initial 163 phase, the high-similarity target-evoked response starts to drop, while the low-similarity target-evoked 164 response continues to build up, so that in the later stages the target-evoked response is strongest on the 165 dissimilar background and weakest on the similar background, consistent with the behavioral effect of 166 similarity. 167 To quantify the relation between the neural and behavioral effects of similarity masking, we computed 168 the correlation between the effect of orientation similarity on behavior and its effect on the integrated 169 neural responses ( paradoxical negative correlation between the early neural V1 response and behavior, weak positive 171 correlation between the late neural V1 response and behavior, and no correlation between behavior 172 and the integrated neural response. 173 Because the target and background are defined by their orientation, the correspondence between the 174 neural signals in V1 and behavior may be better captured by V1 responses at the columnar scale. Our 175 next step was therefore to examine the dynamics of the columnar-scale target-evoked responses in V1. 176

Neural effects of target-background similarity masking at the scale of orientation columns 177
To study the effect of similarity masking on V1 responses at the columnar scale, we developed a linear 178 columnar decoder of the VSDI signals ( Fig 3E). The columnar decoder takes into account the location of 179 the orientation columns within the retinotopic envelope of the target-evoked response. Because the 180 target is horizontal, the output of the columnar template is expected to be positive for the horizontal 181 target and background stimuli and negative for the vertical background stimulus (since the horizontal 182 and vertical columnar maps are anti-correlated). 183 As with the output of the retinotopic-scale template, the output of the columnar-scale template displays 184 two distinct phases. Figure 5 shows the time course of the columnar template signals to background 185 alone (Fig. 5A,G), the target plus background (Fig. 5B,H), and target in the presence of the background 186 ( Fig. 5C,I). In the early phase, the target-evoked response is highest when the background and target 187 have similar orientations, producing a paradoxical neural response that is anti-correlated with the 188 behavioral masking effect (Fig. 5D,E,J,K). In the second phase the trend reverses and the target-evoked 189 response is strongest on the dissimilar background and weakest on the similar background, consistent 190 with the behavioral effect of similarity ( Fig. 5D,E,J,K). Similar results were obtained with other target and 191 background contrast combinations ( Figure S2). 192 Because the first phase of the response is shorter than the second phase, when V1 response is 193 integrated over both phases, the overall response is positively correlated with the behavioral masking 194 effect. Therefore, our results suggest that the neural masking effect at the columnar scale in V1 could 195 play a major role in the behavioral similarity masking effects. 196

Dynamics of columnar-scale orientation population trajectories 197
Our decoding analysis focuses on the columnar-scale orientation signals along the 0°-90° axis and 198 reveals complex columnar-scale dynamic interactions between the target-evoked response and the 199 response evoked by the background (Fig. 5). To examine these dynamics in more details, we performed 200 two types of population-vector analyses (Fig. 6). First, we assigned each pixel within the retinotopic 201 footprint of the target-evoked response to one of 12 equally spaced preferred orientations, creating 12 202 orientation selective clusters of pixels. We then computed for each stimulus the response in each 203 orientation selective cluster in each frame and displayed the population orientation tuning curve as a 204 function of time ( Fig. 6C) and as population vector dynamic trajectory in the polar space spanned by the 205 12 orientations (Fig. 6D); i.e., the orientation θ and magnitude R θ of the peak of the population 206 response over time. 207 The population vector analysis reveals that V1 responses to the target or background alone are 208 consistent with the stimulus orientation. In background only trials, shortly after stimulus onset the peak 209 of the population tuning curve closely matches background orientation ( target-background interactions which could lead to a population tuning peak (white curve) that 217 significantly deviates from the target orientation. For example, in some conditions, we observe an 218 orientation tuning peak that is repelled from target orientation in the direction away from background 219 orientation. An interesting goal of future studies would be to examine potential perceptual correlates of 220 these interactions. 221 To further examine the dynamics of the population vector, we plotted the response trajectories for each 222 stimulus using the vector representation in polar coordinates ( Fig. 7B-E, G-J). After stimulus onset, the 223 population vector for background only moves in the direction corresponding to the background 224 orientation (Fig. 7B,G) and for target only moves in direction corresponding to the target (Fig. 7E,J). 225 The trajectories in the target plus background conditions are more complex. For example, when 226 background orientation is at +/-45 deg to the target, the population response is initially dominated by 227 the background, but then in mid-flight, the population response changes direction and turns toward the 228 direction of the target orientation. 229 Such complex interactions can be used to constrain models of V1 population response. 230

Dynamic gain control model qualitatively captures similarity masking effects in V1 231
Our next goal was to determine whether the observed interactions between the background-and 232 target-evoked responses can be qualitatively captured by a gain control model. In this model, 233 orientation columnar response was tuned to one of 12 equally spaced orientations. The responses of 234 each orientation column were specified by the simple normalization model summarized in Figure 8. 235 Specifically, the spatiotemporal input stimulus generates an excitation signal and a normalization signal 236 that are both linear with the input root mean square (rms) contrast. The normalization signal is then 237 combined with a normalization constant to obtain the normalization factor. The normalized response is 238 obtained by dividing the excitation signal by the normalization factor. The final response is then 239 obtained by applying a response exponent , which is similar to applying a spiking nonlinearity. 240 Importantly, the excitation and normalization signals can differ in their spatial extent, orientation tuning 241 width, and temporal impulse response (see Methods for model parameters). 242 We find that this simple model qualitatively captures our key results. First, in response to background 243 alone (Fig. 8B), the modeled population vector peaked at ~100 ms after stimulus onset and then 244 dropped to a lower amplitude, as in our data (Fig. 5A,G). This reduction in response amplitude was due 245 to normalization signal that was delayed relative to the excitation signal. Second, as in the real data, 246 response to the target plus background is less than the sum of the responses to each component 247 separately. Third, as in our results, the target-evoked response in the presence of the background is 248 biphasic, having a brief early component in which the response is enhanced by target-background 249 similarity, and a longer-lasting late component in which the response is suppressed by target-250 background similarity (Fig. 8D). This leads to an early phase in which the response is anticorrelated with 251 the behavioral effect of similarity masking, and a late phase that is positively correlated with the 252 behavioral effect of similarity masking (Fig. 8E). 253 Finally, this simple model can also display the curved trajectories of the population vector in response to 254 the target plus background (compare Fig. 8G to Fig. 7C

260
To test the hypothesis that nonlinear computations in V1 contribute to the perceptual effect of similarity 261 masking, we used voltage-sensitive dye imaging (VSDI) to measure neural population responses from V1 262 in two macaque monkeys while they performed a visual detection task in which a small oriented target 263 was detected in the presence of a larger background of varied orientations. Like human observers, the 264 monkeys were strongly affected by the orientation similarity of the target and the background. Their 265 detection threshold increased with increased target-background orientation similarity, while their 266 reaction times showed complex, and in some cases non-monotonic, dependency on target-background 267 orientation similarity. 268 To quantify the neural effects of similarity masking, we measured neural sensitivity to the target at two 269 fundamental spatial scales of V1 topographic representations. The large scale of the retinotopic map 270 and the finer scale of the columnar orientation map. We discovered that at both scales, V1 population 271 responses to the target and background display two distinct phases. An initial transient phase in which 272 target-evoked V1 response is strongest when the target and background have similar orientations. At 273 this early phase, V1 responses are therefore paradoxically anti-correlated with the behavioral effect of 274 similarity masking. In the second phase, the masking effect reverses, and the target-evoked response is 275 maximally reduced when the target and background are similar. In this second sustained phase V1 276 population responses are therefore consistent with the behavioral similarity masking effect. 277 The positive correlation between the neural and behavioral masking effects occurred earlier and was 278 more robust at the columnar scale than at the retinotopic scale, suggesting that behavioral performance 279 in our task is dominated by columnar scale signals in the second phase of the response. To the best of 280 our knowledge, this is the first demonstration of such decoupling between V1 responses at the 281 retinotopic and columnar scales, and the first demonstration that columnar scale signals are a better 282 predictor of behavioral performance in a detection task. 283 We find that when the target and background have similar orientations, columnar-scale information 284 about the target is restricted to the first phase of the response and then largely disappears during the 285 second phase of the response. These physiological results could be related to the surprising mismatch 286 between task difficulty and reaction times. Rather than having reaction times that monotonically 287 increase with task difficulty, in our masking detection task, reaction times can be shortest when target 288 and background orientations match, even though it is hardest to detect the target under these 289 conditions. The short reaction time to this stimulus may be the consequence of the target information 290 being best represented in the early phase of the response. ). An important goal for future studies would be to test for this possibility. 302 Using the population vector analysis, we find that columnar scale V1 representations are initially 303 dominated by the orientation of the background. The target orientation then appears in the second 304 phase of the response, which leads to curved population vector trajectories (Fig. 7C, H) responses, we tested whether a simple dynamic gain control model could account for our findings (Fig.  311  8). We find that a simple gain control model can qualitatively account for our results, but that in order to 312 do so, the model has to display two important properties. First, to account for the biphasic nature of V1 313 response, the divisive normalization signals have to be delayed relative to the excitatory signal. Second, 314 in order to account for the reduced neural sensitivity with target-background similarity in the second 315 phase of the response, the divisive normalization signal has to be orientation selective. Because in 316 primates and carnivores, robust orientation selectivity first emerges in V1 (Hubel & Wiesel 1959, Hubel 317 & Wiesel 1968), these results suggest that a significant portion of the nonlinear interactions observed in 318 the current study originate in V1 rather than being inherited from the ascending inputs that V1 receives 319 from the LGN. While our experimental and computational results point to a delayed gain control signal 320 that operates at the level of V1, they do not directly speak to the circuit and biophysical mechanisms 321 that contribute to the implementation of this gain control in V1. Multiple candidate mechanisms for 322 implementing project. This work was supported by NIH grants EY-016454 to E.S., EY-024662 to W.S.G. and E.S., BRAIN 344 U01-NS099720 to E.S. and W.S.G., and DARPA-NESD0-N66001-17-C-4012 to E.S. 345 Information.) (C-E) Orientation masking was assessed in two awake behaving macaque monkeys 352 performing a target detection task. Monkey commence the task by fixating at the small bright square 353 (C). A few moments later, a 4° raise-cosine-masked background grating was flashed at ~3° eccentricity 354 for target detection (D). In 50% of the trials, a small additive horizontal Gabor target was also added to 355 the background (E). The monkey indicated the presence of the target by making a saccade to the target 356 location, and indicated target absent by maintaining gaze at the fixation point. The Gabor target was 357 always the same -a cosine centered, horizontal Gabor at 4cpd on 0.33° FWHM envelope. The 358 background grating was also cosine-centered at 4cpd such that the background completely aligned with 359 the target when they were the same orientation (as in B). Orientation of the grating ranged from 0° to 360 90° with respect to the Gabor target and was randomized between trials. Bg -background; TBg -target 361 plus background.  shown in (C)) and the behavior. (F) Behavior performance in d' was calculated as described in Figure 2. 419 Size of markers indicate the number of trials tested for each orientation. Data was pooled across 8 420 experiments from both monkeys (see Table 1). (G-L) Same as (A-F) for recordings with 24% target 421 contrast (T24) and 12% background contrast (Bg12). Similar trends were observed. 422 between each frame of the retinotopic response in (B) against the overall behavior performance for 512 each combinatory background and target contrasts (Figure 2A pooled across monkeys). Red dots 513 indicate frames reaching statistical significance (p < 0.05, t-test for correlation coefficient, see Methods). 514 The vertical offset of the gray horizontal line indicates the correlation between the integrated response 515 (50-200ms) and the behavior. Data was pooled across experiments from both monkeys (see Table 1). 516  Figure S1 with the averaged response examined at the 519 columnar scale. The columnar template extracts the relative strength between neural activity aligned 520 and orthogonal to the target orientation. Positive response represents relatively stronger activation of 521 the neurons tuned to the target orientation (0°), and negative response represent stronger activation 522 for neurons tuned to the orthogonal orientation (90°). Data pooling and counts are the same as 523 reported in Figure S1. 524

525
All procedures have been approved by the University of Texas Institutional Animal Care and Use 526 Committee and conform to NIH standards. 527 Widefield Voltage-Sensitive Dye Imaging 528 The experimental technique for widefield voltage-sensitive dye (VSD) imaging of neural response in 529 awake, behaving macaques was adapted from previous studies (Bai et   Monkeys were trained to detect a small additive horizontal Gabor target (4cpd, with σ = 0.14°, 0.33° 555 FWHM envelope) centered on a sinusoidal grating background mask of the same spatial frequency (4° 556 raised-cosine windowed). The background grating was oriented at 0°, ±15°, ±30°, ±45°, ±60, and 90° from 557 the Gabor target orientation. Both the Gabor and background grating were bright-centered -that is, the 558 0° orientation background was completely in phase with the target. The contrast of the target and 559 background were varied in combinations of levels, reported in Michelson contrast: 560 For each experiment, a Fixation recording block and a Detection recording block were made using the 562 same target and background conditions. In both blocks, the target and background were centered at 1.6° to 3° eccentricity from the fixation point and -40° to -65° (from the right horizontal meridian) in the visual 564 hemifield corresponding to the working cortical chamber. At these coordinates, the spatial extent of the 565 target was fully imaged through the cortical window, and the larger oriented background uniformly 566 activated the entire imaging area. 567 In the Fixation block, the monkeys were required to remain fixated for each imaging trial while either the 568 target or full-field sinusoidal gratings at 100% contrast were flashed at 5Hz (60ms ON, 140ms OFF) for 1.0s. 569 These recordings were processed to obtain retinotopic and columnar orientation response maps that 570 were used to decode the detection recording responses from the same experiment day (Fig. 3C-D). 571 In the Detection recordings blocks, a background with random orientation appeared on every trial, with a 572 50% chance of an accompanying Gabor target (Fig. 1C-E). The monkeys were tasked to report the presence 573 of the target. Each trial began with fixation on a bright 0.1° square. An auditory tone and the dimming of 574 the fixation square cued the monkey to the start of the detection task trial. 250ms later, the background 575 with or without the target was presented. The monkeys were trained to maintain gaze at the fixation cue 576 on target absent trials or saccade to and hold gaze (for 150ms) at the target position to indicate target 577 detection (with a 75ms minimum allowed reaction time). When the target was present, it remained on 578 screen for a maximum of 250ms or was extinguished immediately upon the monkeys' saccade initiation. 579 The monkey was given 600ms to make the saccade or to hold fixation and was subsequently rewarded on 580 correct choices: stay (correct reject) on target absent trials, or saccade to target (hit) on target present 581 trials. The target and background contrast level were fixed for each recording block. The probability of 582 each orientated background and target presence were balanced for each recording block. 583 A separate target only Detection block on uniform gray background was also taken on each experiment 584 day using the same routine as above. This block was later used as the reference data to normalize 585 response amplitudes across experiment days. 586 Experiments were conducted with custom code using TEMPO real-time control system (Reflective 587 Computing). The visual stimulus was presented on a Sony CRT (1024x768 @ 100 Hz), distanced 108 cm 588 from the animal (50 pixels-per-degree), with mean luminance 50 cd/m 2 . The visual stimulus was generated 589 using in-house real-time graphics software (glib). 590 Behavior Performance and Reaction Time 591 Behavior performance was calculated for each background orientation and reported in detection 592 sensitivity index d' (d-prime). D-prime and criterion were estimated as: 593 where Φ ( ) is the inverse transform of the cumulative normal distribution; and ( ) and ( ) 595 represent the proportion of hits and false alarms respectively. To avoid leaving out the data in some 596 conditions (e.g., when there are no false alarms), we scaled all the proportions to be between 0.005 and 597 0.995 ( ( ) = 0.005 + 0.99 • ( )).

598
Mapping between the unbiased percentage correct response and d' is as follows and is depicted in Figure  599 2B: 600 = 2 • Φ ( ); = Φ 601 D-prime performance across orientations was fitted with an inverted, dc-shifted Gaussian: The reaction times of the animals were calculated from the stimulus onset to the onset of saccade. 604 Consequently, there was no measurement of reaction on trials where the animals remain fixated at the 605 fixation point. 606

VSD Imaging 607
For each trial, an image sequence was captured for a total of 1.2 seconds including pre-stimulus and post-608 stimulus frames. The image sequence was analyzed to extract the response using a variant of the previous 609 reported routines ( Image stabilization was introduced as the first stage of pre-processing to de-accentuate blood vessel 611 edges in the ΔF/F response map caused by micro movements of the camera and/or the cortex during 612 imaging. The image intensity across time at each individual pixel was modelled with separable motion-613 free ( ( )) and motion-related ( ⃗ . ⃗( )) components as follows: 614 For each trial, a single global motion vector ⃗( ) was obtained by estimating the translational motion of 616 the center portion of the images (1/4 of the imaging area). The motion coefficients ⃗ for each pixel was 617 then obtained using least squares fitting to the model. The motion-corrected image is ( ) . This 618 approach to image stabilization (compared to traditional image registration approach) has the advantage 619 of correcting for non-rigid movements (rotations, expansion/contractions, affine transformation, local 620 distortions, etc.) and sub-pixel motion. 621

Retinotopic and Columnar Template Decoding 622
Template decoding was used to summarize the retinotopic and columnar response for each image frame. 623 The retinotopic response map of the target and columnar orientation map of the imaging area were 624 estimated from the Fixation recording blocks, in which response were stimulated with visual presentation 625 at 5Hz. The preprocessing steps for the Fixation recording blocks were: image stabilization, 5Hz FFT 626 response extraction, ΔF/F normalization, then down-sampling to 128x128 pixels (from 512x512). Baseline 627 florescence ( ) was estimated by the average florescence over frames -80 to 0ms relative to stimulus 628 onset. ΔF/F was calculated as: 629 The retinotopic response map of the Gabor target ( ( )) was estimated by fitting a 2D Gaussian over 631 the 5Hz flashing Gabor amplitude response. The full-ROI (8x8mm 2 imaging area) columnar orientation 632 response maps ( ( )) were estimated from the flashing full-field grating, where the 5Hz FFT grating 633 response amplitudes were bandpass filtered from 0.8 to 3.0 cycles/mm and the orientation tuning of each 634 pixel was estimated as described previously (Chen et al 2012). Subsequently, the full-ROI orientation map 635 was windowed by the retinotopic map to co-localize the retinotopic and the columnar decoders. The 636 columnar map comprised of pixelwise response magnitude ( ( )) and tuning angle ( ( )), represented 637 in modified Euler's form: 638 The p-value for the weighted coefficients were estimated using the standard t-score replacing the degree 709 of freedom with an entropy based effective estimate from the weights. Two animals were examined to verify the consistency of experimental approach and results. Multiple 769 recordings were made from the same animals. The number of recordings were based on previous 770 experience; no statistical method was used to predetermine sample size.

Data and Code Availability 773
The data and custom code from this study will be made available upon manuscript acceptance. 774 775 776 777 778