Skip to content
Cross-View Image Synthesis using Conditional GANs
Branch: master
Clone or download
Latest commit 2403349 Nov 29, 2018
Type Name Latest commit message Commit time
Failed to load latest commit information.
Evaluation Add files via upload Jul 19, 2018
data Add files via upload Jul 6, 2018
datasets Delete ss Jul 6, 2018
scripts Add files via upload Jul 6, 2018
util Add files via upload Jul 6, 2018 Update Nov 29, 2018
_config.yml Set theme jekyll-theme-cayman Apr 5, 2018
models.lua Add files via upload Jul 6, 2018
test_256.jpg Add files via upload Mar 28, 2018
test_fork.lua Add files via upload Jul 6, 2018
test_pix2pix.lua Add files via upload Jul 6, 2018
test_segmap_seq.lua Update and rename test_seg_seq.lua to test_segmap_seq.lua Jul 19, 2018
test_seq.lua Add files via upload Jul 6, 2018
train_pix2pix.lua Add files via upload Jul 6, 2018
train_seq.lua Update train_seq.lua Nov 14, 2018


[Project] [Paper]


Learning to generate natural scenes has always been a challenging task in computer vision. It is even more painstaking when the generation is conditioned on images with drastically different views. This is mainly because understanding, corresponding, and transforming appearance and semantic information across the views is not trivial. In this paper, we attempt to solve the novel problem of cross-view image synthesis, aerial to street-view and vice versa, using conditional generative adversarial networks (cGAN). Two new architectures called Crossview Fork (X-Fork) and Crossview Sequential (X-Seq) are proposed to generate scenes with resolutions of 64x64 and 256x256 pixels. X-Fork architecture has a single discriminator and a single generator. The generator hallucinates both the image and its semantic segmentation in the target view. X-Seq architecture utilizes two cGANs. The first one generates the target image which is subsequently fed to the second cGAN for generating its corresponding semantic segmentation map. The feedback from the second cGAN helps the first cGAN generate sharper images. Both of our proposed architectures learn to generate natural images as well as their semantic segmentation maps. The proposed methods show that they are able to capture and maintain the true semantics of objects in source and target views better than the traditional image-to-image translation method which considers only the visual appearance of the scene. Extensive qualitative and quantitative evaluations support the effectiveness of our frameworks, compared to two state of the art methods, for natural scene generation across drastically different views.


Our code is borrowed from pix2pix. The data loader is modified to handle images and semantic segmentation maps.


Getting Started

luarocks install nngraph
luarocks install
  • Clone this repo:
git clone
cd cross-view-image-synthesis
  • Training the model
DATA_ROOT=./datasets/AB_AsBs name=sample_images which_direction=a2g phase=sample th train_fork.lua
  • For CPU only training:
DATA_ROOT=./datasets/AB_AsBs name=sample_images which_direction=a2g phase=sample gpu=0 cudnn=0 th train_fork.lua
  • Testing the model:
DATA_ROOT=./datasets/AB_AsBs name=sample_images which_direction=a2g phase=sample which_epoch=35 th test_fork.lua 

The test results will be saved to: ./results/sample_images/35_net_G_sample/images/.

Training and Test data


The original datasets are available here:

  1. GT-CrossView
  2. CVUSA

Ground Truth semantic segmentation maps are not available for the datasets. We used RefineNet trained on CityScapes for generating semantic segmentation maps and used them as Gound Truth segmaps in our experiments. Please cite their papers if you use the dataset.

Train/Test splits for Dayton dataset can be downloaded from here Dayton.

Generating Pairs

Refer to pix2pix for steps and code to generate pairs of images required for training/testing.

First concatenate the streetview and aerial images followed by concatenating their segmentation maps and finally concatenating them all along the columns. Each concatenated image file in the dataset will contain {A,B,As,Bs}, where A=streetview image, B=aerial image, As=segmentation map for streetview image, and Bs=segmentation map for aerial image.


DATA_ROOT=/path/to/data/ name=expt_name which_direction=a2g th train_fork.lua

Switch a2g to g2a to train in opposite direction.

Models are saved to ./checkpoints/expt_name (can be changed by passing checkpoint_dir=your_dir in train_fork.lua).

See opt in train_fork.lua for additional training options.


DATA_ROOT=/path/to/data/ name=expt_name which_direction=a2g phase=val th test_fork.lua

This will run the model named expt_name in direction a2g on all images in /path/to/data/val.

Result images, and a webpage to view them, are saved to ./results/expt_name (can be changed by passing results_dir=your_dir in test_fork.lua).

See opt in test_fork.lua for additional testing options.


Pretrained models can be downloaded here.

[X-Pix2pix] [X-Fork] [X-Seq]

Place the models in ./checkpoints/ after the download has finished.


Some qualitative results on GT-CrossView Dataset:


CVPR Poster



If you use this code for your research, please cite our paper: bibtex


Please contact: ''

You can’t perform that action at this time.