Subregion analysis for GPS data #44

aliu22 · 2021-04-20T01:40:15Z

added functionality for (1) where user can now perform subregion analysis for GPS data. subregion boundaries have been updated so that we match the UN-geoscheme boundaries as in the original analysis

furthermore, also added functionality for (4)

…s for gps formatted data, also cleanup and some renaming

datasets.py

measurements/geography_based.py

analysis_notebooks/Geography Analysis.ipynb

…ng cleanup

aliu22 · 2021-04-29T19:10:11Z

Added automated distinguishing between country and region for "string-based" geography labels, (2) and (4).
line 723 of datasets.py

Angelina-Wang · 2021-04-29T19:16:52Z

README.md


 geo_lng: Counts the languages that make up the image tags, and whether or not they are local to the country the image is from. Also extracts image-level features to compare if locals and tourist portray a country differently

+Note: Geography-Based analyses require a mapping from images to location. The 2 primary ways we've encountered these mappings in existing datasets are geography labels (ie. String formatted locations like 'Manhattan'), and GPS labels (latitude and longitude coordinate pairs). Our analyses supports both types of geography mappings. Namely, the user should specify in their dataset class the `geography_info_type` to be one of the following:


"The 2 primary ways we've encountered these mappings in existing datasets are geography labels" -> something like "The 2 formats of geography annotations supported are"

modified, thank you

Angelina-Wang · 2021-04-29T19:18:40Z

analysis_notebooks/Geography Analysis.ipynb

@@ -731,13 +776,52 @@
    "                subregion_pvalues_over[p] = tag_info\n",
    "            else:\n",
    "                subregion_pvalues_under[p] = tag_info\n",
+    "                \n",
+    "elif (dataset.geography_info_type == \"STRING_FORMATTED_LABEL\" and dataset.geography_label_string_type == \"REGION_LABEL\"):\n",
+    "    print(\"Geo_tag work for region label formatted dataset\")\n",


is this print necessary?

removed, thanks

Angelina-Wang · 2021-04-29T19:18:58Z

analysis_notebooks/Geography Analysis.ipynb

+    "                \n",
+    "elif (dataset.geography_info_type == \"STRING_FORMATTED_LABEL\" and dataset.geography_label_string_type == \"REGION_LABEL\"):\n",
+    "    print(\"Geo_tag work for region label formatted dataset\")\n",
+    "    if not os.path.exists(\"results/{0}/6\".format(folder_name)):\n",


I think "6" has been deprecated for the new names of the analyses, please map accordingly

done, replaced '6' with "geo_tag"

Angelina-Wang · 2021-04-29T19:19:17Z

analysis_notebooks/Geography Analysis.ipynb

+    "    total_counts = total_counts.astype(int)\n",
+    "    sum_total_counts = int(np.sum(total_counts))\n",
+    "\n",
+    "    if not os.path.exists('checkpoints/{}/6_a.pkl'.format(folder_name)):\n",


same here. if prerun_geo doesn't have these changes either, please add in

changed 6_a to new naming convention ("geo_tag_6a" ). Also added new changes to prerun_geo

Angelina-Wang · 2021-04-29T19:23:24Z

measurements/geography_based.py

+    combined_dict["region_to_id"] = region_to_id_map
+    combined_dict["id_to_region"] = id_to_region_map
+
+    pickle.dump(combined_dict, open("results/{}/geo_ctr_region.pkl".format(args.folder), "wb"))


we have a very specific naming convention of geo_{3 letter acronym}, how does geo_ctr_region fit into that? why can't it just be geo_ctr?

geo_ctr is still being used, this is just a private function called by geo_ctr when it detects gps data. I thought this would help keep things organized, so geo_ctr doesn't get bloated with the different options. can put this within geo_ctr though if that would make things clearer

my problem is with the name of the pickle result file, not the function itself

Angelina-Wang · 2021-04-29T19:23:54Z

measurements/geography_based.py

        print("redirecting to geo_tag_gps()...")
        return geo_tag_gps(dataloader, args)
+    if (dataloader.dataset.geography_info_type == "STRING_FORMATTED_LABEL" and dataloader.dataset.geography_label_string_type == "REGION_LABEL"):


elif for readability

Angelina-Wang · 2021-04-29T19:24:17Z

measurements/geography_based.py

+        print('running geo_ctr_region() first to get necessary info...')
+        geo_ctr_region(dataloader, args)
+
+    counts = pickle.load(open("results/{}/geo_ctr_region.pkl".format(args.folder), "rb"))


same name comment as before about geo_ct_region

same, this is a private function, called from geo_ctr()

Angelina-Wang · 2021-05-03T20:09:39Z

measurements/prerun_analyzegeo.py

+            pvalues_under, pvalues_over = pickle.load(open('checkpoints/{}/geo_tag_a.pkl'.format(folder_name), 'rb'))
+
+    elif dataset.geography_info_type == "GPS_LABEL":
+        info_stats = pickle.load(open("results/{}/geo_tag_gps.pkl".format(folder_name), "rb")) 


same, wondering why the pickle is geo_tag_gps rather than just geo_tag?

add functionality of (1) where user can now perform subregion analysi…

fc208c3

…s for gps formatted data, also cleanup and some renaming

aliu22 marked this pull request as ready for review April 20, 2021 17:59

give path for subregions shapefile in comments

ac202e4

Angelina-Wang reviewed Apr 20, 2021

View reviewed changes

datasets.py Show resolved Hide resolved

Angelina-Wang reviewed Apr 20, 2021

View reviewed changes

measurements/geography_based.py Show resolved Hide resolved

Angelina-Wang reviewed Apr 20, 2021

View reviewed changes

analysis_notebooks/Geography Analysis.ipynb Outdated Show resolved Hide resolved

Merge branch 'master' into geo_part2

56b08f6

aliu22 marked this pull request as draft April 27, 2021 20:06

aliu22 added 5 commits April 27, 2021 16:40

added region_labels functionality, subregion analysis for gps, renami…

9505ad3

…ng cleanup

update YFCC

d6e5343

Merge branch 'master' into geo_part2

78cd536

testing and minor fixes

b4da2db

readme

ffd84fa

aliu22 marked this pull request as ready for review April 29, 2021 03:35

aliu22 marked this pull request as draft April 29, 2021 03:54

aliu22 added 4 commits April 29, 2021 10:40

reaadd commented out scenemapping

a873540

automate distinction between country and region labels

a12ef8a

bug fix

69e9232

README change

aaf6d58

aliu22 marked this pull request as ready for review April 29, 2021 19:10

Angelina-Wang reviewed Apr 29, 2021

View reviewed changes

aliu22 added 2 commits May 2, 2021 00:05

readme rewording

9cee230

refactor 6a and 6b to geo_tag as per new naming convention

9c9c6c2

aliu22 added 2 commits May 2, 2021 00:06

add elif instead of if for readability

97eee74

add new gps and redirection changes to prerun as well

08f1dd8

Angelina-Wang reviewed May 3, 2021

View reviewed changes

aliu22 added 2 commits May 3, 2021 21:26

rename all .pkl files to geo_ctr or geo_tag

03f7131

change pkl file name in prerun as well

e1618ca

Angelina-Wang merged commit 5ff94bf into princetonvisualai:master May 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subregion analysis for GPS data #44

Subregion analysis for GPS data #44

aliu22 commented Apr 20, 2021 •

edited

aliu22 commented Apr 29, 2021

Angelina-Wang Apr 29, 2021

aliu22 May 2, 2021

Angelina-Wang Apr 29, 2021

aliu22 May 2, 2021

Angelina-Wang Apr 29, 2021

aliu22 May 2, 2021

Angelina-Wang Apr 29, 2021

aliu22 May 2, 2021

Angelina-Wang Apr 29, 2021

aliu22 May 2, 2021

Angelina-Wang May 3, 2021

Angelina-Wang Apr 29, 2021

Angelina-Wang Apr 29, 2021

aliu22 May 2, 2021

Angelina-Wang May 3, 2021


		geo_lng: Counts the languages that make up the image tags, and whether or not they are local to the country the image is from. Also extracts image-level features to compare if locals and tourist portray a country differently

		Note: Geography-Based analyses require a mapping from images to location. The 2 primary ways we've encountered these mappings in existing datasets are geography labels (ie. String formatted locations like 'Manhattan'), and GPS labels (latitude and longitude coordinate pairs). Our analyses supports both types of geography mappings. Namely, the user should specify in their dataset class the `geography_info_type` to be one of the following:

Navigation Menu

Subregion analysis for GPS data #44

Subregion analysis for GPS data #44

Conversation

aliu22 commented Apr 20, 2021 • edited

aliu22 commented Apr 29, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aliu22 commented Apr 20, 2021 •

edited