-
Notifications
You must be signed in to change notification settings - Fork 13
Add images from the Toronto Public Library #24
Conversation
oldtoronto/filter_star_images.py
Outdated
return ( | ||
'TSPA' in record['license'] or | ||
'-TS-' in record['uniqueID'] or | ||
'Toronto Star' in (record['provenance'] or []) or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the record contain Nones or can we get away with .get(...)
either way you could collapse the last two lines with an or
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It contains None
s. Not sure I follow the comment about the or
. I added a comment.
num_star += 1 | ||
continue | ||
num_output += 1 | ||
print(row, end='') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you mean to leave this print statement in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. If I take it out, the program won't do anything! (It's a line filter.)
oldtoronto/generate_geojson.py
Outdated
|
||
SOURCE_TPL = 'tpl' | ||
SOURCE_ARCHIVES = 'toronto-archives' | ||
SOURCES = {SOURCE_TPL, SOURCE_ARCHIVES} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd prefer all constants on top of the file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
oldtoronto/generate_geojson.py
Outdated
@@ -155,7 +216,7 @@ def load_patch_csv(patch_csv): | |||
|
|||
image_url = record.get('imageLink') | |||
assert image_url | |||
dims = path_to_size.get(os.path.basename(image_url)) | |||
dims = path_to_size.get(os.path.basename(image_url)) or path_to_size.get(id_ + '.jpg') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make the second clause of the or be the default for the first get?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
|
||
if __name__ == '__main__': | ||
urls_file_input, ndjson_output = sys.argv[1:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here and above, it might be overkill, but I'd consider using argparse
This adds 3,991 images from the Toronto Public Library to OldTO. It excludes images from the Toronto Star, which have a more restrictive copyright. (Including these would get us 3-4x more images.)
We use the same geocoding techniques as for the Toronto Archives and run a separate pipeline to generate another
images.geojson
file. The final step mergesdata/toronto-archives/images.geojson
+data/tpl/images.geojson
→data/images.geojson
.Changes:
data/toronto-archives
data/tpl
.generate_geocodes.py
scripts/check_data.sh
; I'm not sure if it ever worked before.There are no diffs between
data/images.geojson
onmaster
anddata/toronto-archives/images.geojson
on this branch. The changes are entirely to add new images from the TPL.@DOsinga There's now a
tpl_fields
entry in feature properties for some images. I'm not sure if this will affect the API server database. I only tested with the dev server.cc @mebreuer
TBD: do we want to host these out of GCS? The medium-size versions are in, e.g.
gs://sidewalk-old-toronto/toronto-public-library/DC-964-6-43.jpg
.TPL image:
Toronto Archives image: