Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subregion analysis for GPS data #44

Merged
merged 18 commits into from May 4, 2021

Conversation

aliu22
Copy link
Contributor

@aliu22 aliu22 commented Apr 20, 2021

added functionality for (1) where user can now perform subregion analysis for GPS data. subregion boundaries have been updated so that we match the UN-geoscheme boundaries as in the original analysis

furthermore, also added functionality for (4)

image

…s for gps formatted data, also cleanup and some renaming
@aliu22 aliu22 marked this pull request as ready for review April 20, 2021 17:59
@aliu22 aliu22 marked this pull request as draft April 27, 2021 20:06
@aliu22 aliu22 marked this pull request as ready for review April 29, 2021 03:35
@aliu22 aliu22 marked this pull request as draft April 29, 2021 03:54
@aliu22
Copy link
Contributor Author

aliu22 commented Apr 29, 2021

Added automated distinguishing between country and region for "string-based" geography labels, (2) and (4).
line 723 of datasets.py
image

@aliu22 aliu22 marked this pull request as ready for review April 29, 2021 19:10
README.md Outdated

geo_lng: Counts the languages that make up the image tags, and whether or not they are local to the country the image is from. Also extracts image-level features to compare if locals and tourist portray a country differently

Note: Geography-Based analyses require a mapping from images to location. The 2 primary ways we've encountered these mappings in existing datasets are geography labels (ie. String formatted locations like 'Manhattan'), and GPS labels (latitude and longitude coordinate pairs). Our analyses supports both types of geography mappings. Namely, the user should specify in their dataset class the `geography_info_type` to be one of the following:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The 2 primary ways we've encountered these mappings in existing datasets are geography labels" -> something like "The 2 formats of geography annotations supported are"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified, thank you

@@ -731,13 +776,52 @@
" subregion_pvalues_over[p] = tag_info\n",
" else:\n",
" subregion_pvalues_under[p] = tag_info\n",
" \n",
"elif (dataset.geography_info_type == \"STRING_FORMATTED_LABEL\" and dataset.geography_label_string_type == \"REGION_LABEL\"):\n",
" print(\"Geo_tag work for region label formatted dataset\")\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this print necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed, thanks

" \n",
"elif (dataset.geography_info_type == \"STRING_FORMATTED_LABEL\" and dataset.geography_label_string_type == \"REGION_LABEL\"):\n",
" print(\"Geo_tag work for region label formatted dataset\")\n",
" if not os.path.exists(\"results/{0}/6\".format(folder_name)):\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "6" has been deprecated for the new names of the analyses, please map accordingly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, replaced '6' with "geo_tag"

" total_counts = total_counts.astype(int)\n",
" sum_total_counts = int(np.sum(total_counts))\n",
"\n",
" if not os.path.exists('checkpoints/{}/6_a.pkl'.format(folder_name)):\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. if prerun_geo doesn't have these changes either, please add in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed 6_a to new naming convention ("geo_tag_6a" ). Also added new changes to prerun_geo

combined_dict["region_to_id"] = region_to_id_map
combined_dict["id_to_region"] = id_to_region_map

pickle.dump(combined_dict, open("results/{}/geo_ctr_region.pkl".format(args.folder), "wb"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a very specific naming convention of geo_{3 letter acronym}, how does geo_ctr_region fit into that? why can't it just be geo_ctr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

geo_ctr is still being used, this is just a private function called by geo_ctr when it detects gps data. I thought this would help keep things organized, so geo_ctr doesn't get bloated with the different options. can put this within geo_ctr though if that would make things clearer

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my problem is with the name of the pickle result file, not the function itself

print("redirecting to geo_tag_gps()...")
return geo_tag_gps(dataloader, args)
if (dataloader.dataset.geography_info_type == "STRING_FORMATTED_LABEL" and dataloader.dataset.geography_label_string_type == "REGION_LABEL"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elif for readability

print('running geo_ctr_region() first to get necessary info...')
geo_ctr_region(dataloader, args)

counts = pickle.load(open("results/{}/geo_ctr_region.pkl".format(args.folder), "rb"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same name comment as before about geo_ct_region

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, this is a private function, called from geo_ctr()

pvalues_under, pvalues_over = pickle.load(open('checkpoints/{}/geo_tag_a.pkl'.format(folder_name), 'rb'))

elif dataset.geography_info_type == "GPS_LABEL":
info_stats = pickle.load(open("results/{}/geo_tag_gps.pkl".format(folder_name), "rb"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, wondering why the pickle is geo_tag_gps rather than just geo_tag?

@Angelina-Wang Angelina-Wang merged commit 5ff94bf into princetonvisualai:master May 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants