We have a csv file with two columns: longitude and latitude. Each coordinate pair is the center of a volcano around the world. There are 1,509 volcanoes in our dataset. The original coordinate reference system is geographic coordinates with datum WGS84. We want to make a coordinate transformation of these data points to World Mercator. It will take much too long to manually transform these coordinates as we have done in the notebooks before. Therefore, our new code will read the csv file and create a new csv file.

Check that the pathway of in_path and out_path matches the directory where the csv file is. In this example, the volcanoes file (volc_longlat.csv) is in the directory data/ch2-5. Run the code, you will know the process is finished when the message "process completed" and the time of execution are returned:

In [1]:
# Old API
# Import libraries
import csv, pyproj
from functools import partial
from os import listdir, path

# Time the execution of the code
import time
start_time = time.time()

# Remove warnings
import warnings
warnings.simplefilter('ignore')

# Define some constants at the top

lon = 'LONGITUDE' #name of longitude field in original files
lat = 'LATITUDE' #name of latitude field in original files
f_x = 'x' #name of new x value field in new projected files
f_y = 'y' #name of new y value field in new projected files
in_path = path.abspath('../data/ch2-5') #input directory
out_path = path.abspath('../data/ch2-5') #output directory
input_projection = 'EPSG:4326' #WGS84
output_projection = 'EPSG:3395' #World Mercator

# Get CSVs to reproject from input path
files= [f for f in listdir(in_path) if f.endswith('.csv')]

# Define partial function for use later when reprojecting
project = partial(
    pyproj.transform,
    pyproj.Proj(init=input_projection),
    pyproj.Proj(init=output_projection))

for csvfile in files:
    # Open a writer, appending '_project' onto the base name
    with open(path.join(out_path, csvfile.replace('.csv','_project.csv')), 'w') as w:
        # Open the reader
        with open(path.join( in_path, csvfile), 'r') as r:
            reader = csv.DictReader(r, dialect='excel')
            # Create new fieldnames list from reader
            # replacing lon and lat fields with 
            # x and y fields
            fn = [x for x in reader.fieldnames]
            fn[fn.index(lon)] = f_x
            fn[fn.index(lat)] = f_y
            writer = csv.DictWriter(w, fieldnames=fn)
            # Write the output
            writer.writeheader()
            for row in reader:
                x,y = (float(row[lon]), float(row[lat]))
                try:
                    # Add x,y keys and remove lon, lat keys
                    # project point
                    row[f_x], row[f_y] = project(x, y) 
                    row.pop(lon, None)
                    row.pop(lat, None)
                    writer.writerow(row)
                except Exception as e:
                    # If coordinates are out of bounds, 
                    # skip row and print the error
                    print (e)
print('process completed')
end_time = time.time()
print("it took {} seconds to run the code".format(end_time-start_time))

process completed
it took 162.9448902606964 seconds to run the code


In [2]:
# New API
# Import libraries
import csv, time
from os import path
from pyproj import Transformer, CRS

src_file = 'volc_longlat.csv'
dst_file = 'volc_projected.csv'

src_dir = path.abspath('../data/ch2-5') # input directory
dst_dir = path.abspath('../data/ch2-5') # output directory

src_path = path.join(src_dir, src_file)
dst_path = path.join(dst_dir, dst_file)

src_crs = CRS("EPSG:4326") #WGS84
dst_crs = CRS("EPSG:3395") #World Mercator

# create coordinate transformer
# always_xy=True makes projector.transform() accept lon, lat (GIS order) instead of lat, lon
# for more info see the doc https://pyproj4.github.io/pyproj/stable/api/transformer.html?highlight=transformer#pyproj.transformer.Transformer.from_crs
projector = Transformer.from_crs(src_crs, dst_crs, always_xy=True)

# source csv file has lon, lat columns
src_header = ['LONGITUDE', 'LATITUDE']

# destinatin csv file will have x, y columns
dst_header = ['x', 'y']

# start benchmark timer
start_time = time.time()

# open destination file in write mode
with open(dst_path, 'w') as w:
    # open source file in read mode
    with open(src_path, 'r') as r:
        reader = csv.reader(r, dialect='excel')
        input_headers = next(reader) # read and skip first header row ['LONGITUDE', 'LATITUDE']        

        writer = csv.writer(w, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
        writer.writerow(dst_header)   # Write the output header
        for row in reader:
            try:
                # convert string values inside row into float values
                lon, lat = [float(val) for val in row]
                x, y = projector.transform(lon, lat)
                writer.writerow([ x, y ])
            except Exception as e:
                # If coordinates are out of bounds, 
                # skip row and print the error
                print (e)

# stop benchmarking
end_time = time.time()

print('process completed')
print("it took {} seconds to run the code".format(end_time-start_time))

process completed
it took 0.10211896896362305 seconds to run the code


It takes about 55 seconds to run this code. Check the newly created csv file and notice that you now have a listing of coordinates in meters. The EPSG definition of the output coordinate reference system is listed under output_projection. You can easily change this variable to another EPSG and rerun the script. Notice that the code is written so that every csv file in the directory will undergo a coordinate transformation. 