New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subregion analysis for GPS data #44
Conversation
…s for gps formatted data, also cleanup and some renaming
README.md
Outdated
|
||
geo_lng: Counts the languages that make up the image tags, and whether or not they are local to the country the image is from. Also extracts image-level features to compare if locals and tourist portray a country differently | ||
|
||
Note: Geography-Based analyses require a mapping from images to location. The 2 primary ways we've encountered these mappings in existing datasets are geography labels (ie. String formatted locations like 'Manhattan'), and GPS labels (latitude and longitude coordinate pairs). Our analyses supports both types of geography mappings. Namely, the user should specify in their dataset class the `geography_info_type` to be one of the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The 2 primary ways we've encountered these mappings in existing datasets are geography labels" -> something like "The 2 formats of geography annotations supported are"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modified, thank you
@@ -731,13 +776,52 @@ | |||
" subregion_pvalues_over[p] = tag_info\n", | |||
" else:\n", | |||
" subregion_pvalues_under[p] = tag_info\n", | |||
" \n", | |||
"elif (dataset.geography_info_type == \"STRING_FORMATTED_LABEL\" and dataset.geography_label_string_type == \"REGION_LABEL\"):\n", | |||
" print(\"Geo_tag work for region label formatted dataset\")\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this print necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed, thanks
" \n", | ||
"elif (dataset.geography_info_type == \"STRING_FORMATTED_LABEL\" and dataset.geography_label_string_type == \"REGION_LABEL\"):\n", | ||
" print(\"Geo_tag work for region label formatted dataset\")\n", | ||
" if not os.path.exists(\"results/{0}/6\".format(folder_name)):\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "6" has been deprecated for the new names of the analyses, please map accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, replaced '6' with "geo_tag"
" total_counts = total_counts.astype(int)\n", | ||
" sum_total_counts = int(np.sum(total_counts))\n", | ||
"\n", | ||
" if not os.path.exists('checkpoints/{}/6_a.pkl'.format(folder_name)):\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here. if prerun_geo doesn't have these changes either, please add in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed 6_a to new naming convention ("geo_tag_6a" ). Also added new changes to prerun_geo
measurements/geography_based.py
Outdated
combined_dict["region_to_id"] = region_to_id_map | ||
combined_dict["id_to_region"] = id_to_region_map | ||
|
||
pickle.dump(combined_dict, open("results/{}/geo_ctr_region.pkl".format(args.folder), "wb")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a very specific naming convention of geo_{3 letter acronym}, how does geo_ctr_region fit into that? why can't it just be geo_ctr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
geo_ctr is still being used, this is just a private function called by geo_ctr when it detects gps data. I thought this would help keep things organized, so geo_ctr doesn't get bloated with the different options. can put this within geo_ctr though if that would make things clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my problem is with the name of the pickle result file, not the function itself
measurements/geography_based.py
Outdated
print("redirecting to geo_tag_gps()...") | ||
return geo_tag_gps(dataloader, args) | ||
if (dataloader.dataset.geography_info_type == "STRING_FORMATTED_LABEL" and dataloader.dataset.geography_label_string_type == "REGION_LABEL"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elif for readability
measurements/geography_based.py
Outdated
print('running geo_ctr_region() first to get necessary info...') | ||
geo_ctr_region(dataloader, args) | ||
|
||
counts = pickle.load(open("results/{}/geo_ctr_region.pkl".format(args.folder), "rb")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same name comment as before about geo_ct_region
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, this is a private function, called from geo_ctr()
measurements/prerun_analyzegeo.py
Outdated
pvalues_under, pvalues_over = pickle.load(open('checkpoints/{}/geo_tag_a.pkl'.format(folder_name), 'rb')) | ||
|
||
elif dataset.geography_info_type == "GPS_LABEL": | ||
info_stats = pickle.load(open("results/{}/geo_tag_gps.pkl".format(folder_name), "rb")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, wondering why the pickle is geo_tag_gps rather than just geo_tag?
added functionality for (1) where user can now perform subregion analysis for GPS data. subregion boundaries have been updated so that we match the UN-geoscheme boundaries as in the original analysis
furthermore, also added functionality for (4)