This program reads a tab separated file containing the author name affiliation, and generate the coordinates corresponding to their institude.

One line of the file looks like this:     
`Adam	Y.	Université de Liège	Liège	Belgium`      
The city column was added to make easier the localisation.

For each year, two output files are created:
1. a file storing only the coordinates, that can be used directly with leaflet (see [leaflet](./leaflet) folder) 
2. a file that is the same as the input file but with 2 extra columns for the coordinates.

If the coordinates cannot be found for some reasons, a fill value (e.g. -999) is used, and the coordinates will have to be found "manually", i.e., using other resources.

In [1]:
import os
import glob
import logging
from collections import Counter
from liegecolloquium import Participant
from importlib import reload

# Configuration

In [2]:
datadir = "../data/2process/"
outputdir = "../data/"
logger = logging.getLogger()
logger.setLevel(logging.INFO)

In [3]:
filelist = sorted(glob.glob(os.path.join(datadir, "ParticipantList*.tsv")))
logging.info("Working on {0} file(s)".format(len(filelist)))

INFO:root:Working on 2 file(s)


In [4]:
participantlist = []
locationlist = []

for datafiles in filelist:
    np = 0
    logging.info("Working on file {0}".format(datafiles))
    outputbasename = os.path.basename(datafiles).replace(".tsv", ".dat")
    coordbasename = os.path.basename(datafiles).replace(".tsv", ".coord")
    outputfile = os.path.join(outputdir, outputbasename)
    coordfile = os.path.join(outputdir, coordbasename)
    logging.info("Writing info in file {0}".format(outputfile))
    logging.info("Writing coordinates in file {0}".format(outputfile))
    
    with open(coordfile, 'w') as cf:
        cf.write("var coords = [\n")

    with open(datafiles, 'r') as f:
        for line in f:
            
            if line == '\n':
                logger.debug("Empty line")
            else:
                np += 1
                l = line.rstrip().split('\t')
                participant = Participant(l[0], l[1], l[2], l[3], l[4])
                logger.info("Working with {0}".format(participant))

                # Modify country name via dictionary
                participant.replace_country()

                # Find location ; check if in the list before using geopy
                loc1 = ", ".join((participant.affiliation, 
                                  participant.city, 
                                  participant.country))
                # logger.debug(loc1)

                if loc1 in locationlist:
                    logger.debug("Already in the list")
                    ind = locationlist.index(loc1)
                    logger.debug("Position in the list: {0}".format(ind))
                    participant.lat = participantlist[ind].lat
                    participant.lon = participantlist[ind].lon
                    participant.countryname = participantlist[ind].countryname
                    locationlist.append(locationlist.index(loc1))
                else:
                    logger.debug("Not in the list, use geolocator to get coordinates")
                    locationstring, location = participant.get_location()
                    locationlist.append(locationstring)

                # Write info to a text file
                participant.write_to(outputfile)

                # Write coordinates (only) into another file
                participant.write_coords_to(coordfile)

                # Append to list
                participantlist.append(participant)
            
    with open(coordfile, 'a') as cf:
        cf.write("]")
logger.info("\n Job finished")
logger.info("{np} participants".format)

INFO:root:Working on file ../data/2process/ParticipantList-1971.tsv
INFO:root:Writing info in file ../data/ParticipantList-1971.dat
INFO:root:Writing coordinates in file ../data/ParticipantList-1971.dat
INFO:root:Working with Participant Y. Adam (Belgium)
INFO:root:Working with Participant M. Betz (Belgium)
INFO:root:Working with Participant C. Bon (Belgium)
INFO:root:Working with Participant K.F. Bowden (U.K.)
INFO:root:Working with Participant Y. Bozdouganov (Bulgaria)
INFO:root:Working with Participant C. Brofferio (Italy)
INFO:root:Working with Participant A.G. Cavanié (France)
INFO:root:Working with Participant M.  Crepon (France)
INFO:root:Working with Participant P. Defrise (Belgium)
INFO:root:Working with Participant B. Delcourt (Belgium)
INFO:root:Working with Participant C. Demuth (Belgium)
INFO:root:Working with Participant A.E. Distèche (Belgium)
INFO:root:Working with Participant A. Droissart (Belgium)
INFO:root:Working with Participant E. Eskinazi (Belgium)
INFO:root:Work

INFO:root:Working with Participant Kerstin Stelzer (Germany)
INFO:root:Working with Participant Gavin Tilstone (United Kingdom)
INFO:root:Working with Participant Igor Tomazic (Belgium)
INFO:root:Working with Participant Charles Troupin (Belgium)
INFO:root:Working with Participant Marta Umbert (Spain)
INFO:root:Working with Participant Dimitry Van der Zande (Belgium)
INFO:root:Working with Participant Quinten Vanhellemont (Belgium)
INFO:root:Working with Participant James While (United Kingdom)
INFO:root:Working with Participant Yanhong Wu (China)
INFO:root:Working with Participant Dingtian Yang (China)
INFO:root:Working with Participant Chaoyu Yang (China)
INFO:root:Working with Participant Minwei Zhang (China)
INFO:root:
 Job finished
INFO:root:<built-in method format of str object at 0x7f957d812660>
