Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Browse files

0.4.0 - geodict, geoparser fixes

  • Loading branch information...
commit 5cc640c6be6b36138ebfa8365d19a70c9da2a5ab 1 parent a921352
Chris Johnson-Roberson authored
Showing with 987 additions and 143 deletions.
  1. +5 −8 README.md
  2. +93 −46 chrome/content/papermachines/processors/geoparser.py
  3. +1 −2  chrome/content/papermachines/processors/geoparser_export.py
  4. +23 −3 chrome/content/papermachines/processors/geoparser_flightpaths.py
  5. 0  chrome/content/papermachines/processors/lib/geodict/__init__.py
  6. +91 −0 chrome/content/papermachines/processors/lib/geodict/data.py
  7. +95 −0 chrome/content/papermachines/processors/lib/geodict/db_funcs.py
  8. BIN  chrome/content/papermachines/processors/lib/geodict/geodict.db
  9. +30 −0 chrome/content/papermachines/processors/lib/geodict/geodict_config.py
  10. +433 −0 chrome/content/papermachines/processors/lib/geodict/geodict_lib.py
  11. +95 −0 chrome/content/papermachines/processors/lib/geodict/jsqlite3.py
  12. BIN  chrome/content/papermachines/processors/lib/geodict/sqlite-jdbc-3.7.2.jar
  13. +7 −0 chrome/content/papermachines/processors/lib/geodict/test.py
  14. +70 −35 chrome/content/papermachines/processors/support/flightpaths.js
  15. +40 −0 chrome/content/papermachines/processors/support/heatmap-gmaps.js
  16. +1 −1  chrome/content/papermachines/processors/templates/geoparser_export.html
  17. +1 −3 chrome/content/papermachines/processors/templates/geoparser_flightpaths.html
  18. +1 −44 chrome/content/papermachines/processors/templates/geoparser_heatmap.html
  19. +1 −1  install.rdf
View
13 README.md
@@ -6,18 +6,18 @@ Paper Machines is an open-source extension for the [Zotero](http://www.zotero.or
This project is a collaboration between historian [Jo Guldi](http://www.joguldi.com) and digital ethnomusicologist [Chris Johnson-Roberson](http://www.chrisjr.org), graciously supported by Google Summer of Code, the William F. Milton Fund, and [metaLAB @ Harvard](http://metalab.harvard.edu/).
-**NOTE:** Paper Machines now bundles Jython 2.7a2 to ensure broader compatibility. If you encounter problems using the extension, please create a Github issue describing what operating system and version of Java you have installed, as well as the nature of the issue.
+**NOTE:** Paper Machines now bundles Jython 2.7a2 to ensure broader compatibility. If you encounter problems using the extension, please create an issue describing what operating system and version of Java you have installed, and the nature of the issue.
## Prerequisites
-In order to run Paper Machines, you will need the following (Java should be installed automatically on Mac OS X 10.6-10.7; if you are running Mac OS 10.8, please download it from the link below):
+In order to run Paper Machines, you will need the following (Java should be installed automatically on Mac OS X 10.6-10.7. If you are running Mac OS 10.8, please download it from the link below):
* [Zotero](http://www.zotero.org/) with PDF indexing tools installed (see the Search pane of Zotero's Preferences)
* a corpus of documents with full text PDF/HTML and high-quality metadata (recommended: at least 1,000 for topic modeling purposes)
* Java ([download page](http://java.com/en/download/index.jsp))
## Installation
-Paper Machines should work either in Zotero for Firefox or Zotero Standalone. To install, you must download the <a href="https://github.com/downloads/chrisjr/papermachines/papermachines-0.4.0pre2.xpi">XPI file</a>. If you wish to use the extension in the Standalone version, right-click on the link and save the XPI file in your Downloads folder. Then, in Zotero Standalone, go to the Tools menu -> Add-Ons. Select the gear icon at the right, then "Install Add-On From File." Navigate to your Downloads folder (or wherever you have saved the XPI file) and open it.
+Paper Machines should work either in Zotero for Firefox or Zotero Standalone. To install, you must download the <a href="http://www.papermachines.org/download/papermachines-0.4.0.xpi">XPI file</a>. If you wish to use the extension in the Standalone version, right-click on the link and save the XPI file in your Downloads folder. Then, in Zotero Standalone, go to the Tools menu -> Add-Ons. Select the gear icon at the right, then "Install Add-On From File." Navigate to your Downloads folder (or wherever you have saved the XPI file) and open it.
## Usage
To begin, right-click (control-click for Mac) on the collection you wish to analyze and select "Extract Texts for Paper Machines." Once the extraction process is complete, this right-click menu will offer several different processes that may be run on a collection, each with an accompanying visualization. Once these processes have been run, selecting "Export Output of Paper Machines..." will allow you to choose which visualizations to export.
@@ -42,19 +42,16 @@ Creates a CSV file with place name, latitude/longitude, the Zotero item ID numbe
Annotates files using the DBpedia Spotlight service, providing a look at what named entities (people, places, organizations, etc.) are mentioned in the texts. Entities are scaled according to the frequency of their occurrence.
### Topic Modeling
-Shows the proportional prevalence of different "topics" (collections of words likely to co-occur) in the corpus, by time or by subcollection. This uses the [MALLET](http://mallet.cs.umass.edu) package to perform [latent Dirichlet allocation](http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation), and by default displays the 5 most "coherent" topics, based on a metric devised by [Mimno et al.](http://www.cs.princeton.edu/~mimno/papers/mimno-semantic-emnlp.pdf) A variety of topic model parameters can be specified before the model is created. The default values should be suitable for general purpose use, but they may be adjusted to produce a better model.
+Shows the proportional prevalence of different "topics" (collections of words likely to co-occur) in the corpus, by time or by subcollection. This uses the [MALLET](http://mallet.cs.umass.edu) package to perform [latent Dirichlet allocation](http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation), and by default displays the 5 most "coherent" topics, based on a metric devised by [Mimno et al.](http://www.cs.princeton.edu/~mimno/papers/mimno-semantic-emnlp.pdf) A variety of topic model hyperparameters can be specified before the model is created.
After the model is generated, clicking "Save" in display will open a new window with the graph displayed free of interactive controls; this window may be saved as an ".SVG" file or captured via screenshot. It will also, in the original window, preserve the current selection of topics, search terms, and time scale as a permalink; please bookmark this if you wish to return to a specific view with interactive controls intact.
#### JSTOR Data For Research
The topic model can be supplemented with datasets from [JSTOR Data For Research](http://dfr.jstor.org/). You must first [register](http://dfr.jstor.org/accounts/register/) for an account, after which you may search for additional articles based on keywords, years of publiation, specific journals, and so on. Once the search is to your liking, go to the Dataset Requests menu at the upper right and click "Submit New Request." Check the "Citations" and "Word Counts" boxes, select CSV output format, and enter a short job title that describes your query. Once you click "Submit Job", you will be taken to a history of your submitted requests. You will be e-mailed once the dataset is complete. Click "Download (#### docs)" in the Full Dataset column, and a zip file timestamped with the request time will be downloaded. This file (or several files with related queries) may then be incorporated into a model by selecting "By Time (With JSTOR DFR)" in the Topic Modeling submenu of Paper Machines. Multiple dataset zips will be merged and duplicates discarded before analysis begins; be warned, this may take a considerable amount of time before it begins to show progress (~15-30 minutes).
-### Classification
-This allows you to train the computer to infer the common features of the documents under each subcollection; subsequently, a set of texts in a different folder can be sorted automatically based on this training. At the moment, the probability distribution for each text is given in plain text; the ability to automatically generate a new collection according to this sorting is forthcoming.
-
### Preferences
-Currently, the language stoplist in use, types of data to extract, default parameters for topic modeling, and an experimental periodical import feature (intended for PDFs with OCR and correct metadata) may be adjusted in the preference pane.
+Currently, the language stoplist in use, types of data to extract, and default parameters for topic modeling may be adjusted in the preference pane. Any custom stopwords may be added to the "Stop Words" pane, one per line, to help eliminate irrelevant terms from your data.
## Acknowledgements
Special thanks to [Matthew Battles](http://metalab.harvard.edu/people/) for providing space, guidance, and support for me at metaLAB. My gratitude also to the creators of all the open-source projects and services upon which this project relies:
View
139 chrome/content/papermachines/processors/geoparser.py
@@ -1,48 +1,88 @@
#!/usr/bin/env python2.7
import sys, os, json, logging, traceback, base64, time, codecs, urllib, urllib2
-from xml.etree import ElementTree as ET
+from collections import defaultdict
+from lib.classpath import classPathHacker
+
import textprocessor
class Geoparser(textprocessor.TextProcessor):
"""
- Geoparsing using Europeana service (experimental)
+ Geoparsing using Pete Warden's geodict
"""
- def _basic_params(self):
- self.name = "geoparser"
- self.dry_run = False
- self.require_stopwords = False
-
- def annotate(self, text):
- values = {'freeText': text[0:10000].encode('utf-8', 'ignore')}
- data = urllib.urlencode(values)
- req = urllib2.Request("http://europeana-geo.isti.cnr.it/geoparser/geoparsing/freeText", data)
- response = urllib2.urlopen(req)
- annotation = response.read()
- return annotation
-
- def get_places(self, xml_string):
- xml_string = xml_string.replace("\n", " ")
- elem = ET.fromstring(xml_string)
- annotated = elem.find('annotatedText')
-
- current_length = 0
- for entity in annotated.getiterator():
- if entity.tag == 'PLACE':
- place = {"name": entity.text, "entityURI": entity.get("entityURI"), "latitude": entity.get("latitude"), "longitude": entity.get("longitude")}
- if entity.text is not None:
- reference = [current_length, current_length + len(entity.text)]
- current_length += len(entity.text)
- if entity.tail is not None:
- current_length += len(entity.tail)
- yield place, reference
- else:
- if entity.text is not None:
- current_length += len(entity.text)
- if entity.tail is not None:
- current_length += len(entity.tail)
+ def get_containing_paragraph(self, text, match):
+ start = match[0]
+ end = match[1]
+ chars_added = 0
+ c = text[start]
+ while c != '\n' and chars_added < 50 and start > 0:
+ start -= 1
+ chars_added += 1
+ c = text[start]
+
+ chars_added = 0
+ end = min(len(text) - 1, end)
+ c = text[end]
+
+ while c != '\n' and chars_added < 50 and end < len(text):
+ c = text[end]
+ end += 1
+ chars_added += 1
+
+ return text[start:end]
+
+ def contexts_from_geoparse_obj(self, geoparse_obj, filename):
+ contexts_obj = defaultdict(list)
+ with codecs.open(filename, 'rU', encoding='utf-8') as f:
+ text = f.read()
+
+ for entityURI, matchlist in geoparse_obj.get("references", {}).iteritems():
+ for match in matchlist:
+ paragraph = self.get_containing_paragraph(text, match)
+ geonameid = entityURI.split('/')[-1]
+ contexts_obj[geonameid].append(paragraph)
+
+ contexts_json = filename.replace(".txt", "_contexts.json")
+ contexts_obj = dict(contexts_obj)
+ with file(contexts_json, 'w') as f:
+ json.dump(contexts_obj, f)
+ return contexts_obj
+
+ def get_places(self, string, find_func):
+ try:
+ geodict_locations = find_func(string)
+ for location in geodict_locations:
+ found_tokens = location['found_tokens']
+ start_index = found_tokens[0]['start_index']
+ end_index = found_tokens[len(found_tokens)-1]['end_index']
+ name = string[start_index:(end_index+1)]
+ geonameid = found_tokens[0].get('geonameid', None)
+ entityURI = "http://sws.geonames.org/" + str(geonameid) if geonameid else None
+ geotype = found_tokens[0]['type'].lower()
+ lat = found_tokens[0]['lat']
+ lon = found_tokens[0]['lon']
+
+ if entityURI is None:
+ continue
+
+ place = {"name": name, "entityURI": entityURI, "latitude": lat, "longitude": lon, "type": geotype}
+ reference = [start_index, end_index]
+ yield place, reference
+ except:
+ logging.error(traceback.format_exc())
def run_geoparser(self):
+ import __builtin__
+ jarLoad = classPathHacker()
+ sqlitePath = os.path.join(self.cwd, "lib", "geodict", "sqlite-jdbc-3.7.2.jar")
+ jarLoad.addFile(sqlitePath)
+
+ import lib.geodict.geodict_config
+
+ self.database_path = os.path.join(self.cwd, "lib", "geodict", "geodict.db")
+
+ from lib.geodict.geodict_lib import GeodictParser
+
geo_parsed = {}
places_by_entityURI = {}
@@ -57,11 +97,14 @@ def run_geoparser(self):
self.update_progress()
file_geoparsed = filename.replace(".txt", "_geoparse.json")
+ contexts_json = filename.replace(".txt", "_contexts.json")
if os.path.exists(file_geoparsed):
try:
geoparse_obj = json.load(file(file_geoparsed))
if "places_by_entityURI" in geoparse_obj:
+ if not os.path.exists(contexts_json):
+ self.contexts_from_geoparse_obj(geoparse_obj, filename)
continue
else:
os.remove(file_geoparsed)
@@ -75,24 +118,25 @@ def run_geoparser(self):
id = self.metadata[filename]['itemID']
str_to_parse = self.metadata[filename]['place']
last_index = len(str_to_parse)
- str_to_parse += codecs.open(filename, 'r', encoding='utf8').read()[0:(48000 - last_index)] #50k characters, shortened by initial place string
+ str_to_parse += codecs.open(filename, 'rU', encoding='utf8').read()
city = None
places = set()
- xml_filename = filename.replace('.txt', '_geoparse.xml')
+ json_filename = filename.replace('.txt', '_geodict.json')
- if not os.path.exists(xml_filename):
- annotation = self.annotate(str_to_parse)
- with codecs.open(xml_filename, 'w', encoding='utf8') as xml_file:
- xml_file.write(annotation.decode('utf-8'))
+ if not os.path.exists(json_filename):
+ parser = GeodictParser(self.database_path)
+ places_found = list(self.get_places(str_to_parse, parser.find_locations_in_text))
+ with codecs.open(json_filename, 'w', encoding='utf8') as json_file:
+ json.dump(places_found, json_file)
else:
- with codecs.open(xml_filename, 'r', encoding='utf8') as xml_file:
- annotation = xml_file.read()
+ with codecs.open(json_filename, 'r', encoding='utf8') as json_file:
+ places_found = json.load(json_file)
- for place, reference in self.get_places(annotation):
+ for (place, reference) in places_found:
entityURI = place["entityURI"]
- geoparse_obj['places_by_entityURI'][entityURI] = {'name': place["name"], 'type': 'unknown', 'coordinates': [place["longitude"], place["latitude"]]}
+ geoparse_obj['places_by_entityURI'][entityURI] = {'name': place["name"], 'type': place["type"], 'coordinates': [place["longitude"], place["latitude"]]}
if reference[0] < last_index:
city = entityURI
@@ -133,7 +177,10 @@ def run_geoparser(self):
geoparse_obj['places'] = list(places)
geoparse_obj['city'] = city
- json.dump(geoparse_obj, file(file_geoparsed, 'w'))
+ with file(file_geoparsed, 'w') as f:
+ json.dump(geoparse_obj, f)
+ if not os.path.exists(contexts_json):
+ self.contexts_from_geoparse_obj(geoparse_obj, filename)
time.sleep(0.2)
except (KeyboardInterrupt, SystemExit):
raise
View
3  chrome/content/papermachines/processors/geoparser_export.py
@@ -42,7 +42,7 @@ def process(self):
title = os.path.basename(filename)
itemID = self.metadata[filename]['itemID']
year = self.metadata[filename]['year']
- text = codecs.open(filename, 'r', encoding='utf-8', errors='ignore').read()
+ text = codecs.open(filename, 'rU', encoding='utf-8', errors='replace').read()
maximum_length = len(text)
for entityURI, ranges in geoparse_obj["references"].iteritems():
@@ -65,7 +65,6 @@ def process(self):
logging.info(traceback.format_exc())
params = {"CSVPATH": csv_output_filename}
-# "CSVFILEURL": "file://" + urllib.pathname2url(os.path.dirname(csv_output_filename))}
self.write_html(params)
logging.info("finished")
View
26 chrome/content/papermachines/processors/geoparser_flightpaths.py
@@ -1,5 +1,6 @@
#!/usr/bin/env python2.7
import sys, os, json, logging, traceback, base64, time, codecs
+from collections import defaultdict
import cPickle as pickle
import geoparser
@@ -29,12 +30,23 @@ def process(self):
linksByYear = {}
itemIDToYear = {}
places = {}
+ contexts = defaultdict(dict)
- for rowdict in self.parse_csv(csv_input):
- validEntityURIs.add(rowdict["entityURI"])
+ try:
+ for rowdict in self.parse_csv(csv_input):
+ validEntityURIs.add(rowdict["entityURI"])
+ except:
+ logging.error(traceback.format_exc())
+ sys.exit(1)
+
+ if len(validEntityURIs) == 0: #empty csv file
+ os.remove(csv_input)
+ logging.error("Geoparser output file was empty!")
+ sys.exit(1)
for filename in self.files:
file_geoparsed = filename.replace(".txt", "_geoparse.json")
+ contexts_json = filename.replace(".txt", "_contexts.json")
if os.path.exists(file_geoparsed):
try:
geoparse_obj = json.load(file(file_geoparsed))
@@ -71,6 +83,13 @@ def process(self):
if itemID not in linksByYear[year][edge]:
linksByYear[year][edge][itemID] = 0
linksByYear[year][edge][itemID] += 1
+ if os.path.exists(contexts_json):
+ with file(contexts_json) as f:
+ contexts_obj = json.load(f)
+ else:
+ contexts_obj = self.contexts_from_geoparse_obj(geoparse_obj, filename)
+ for geonameid, paragraphs in contexts_obj.iteritems():
+ contexts[geonameid].update({itemID: paragraphs})
except:
logging.info(traceback.format_exc())
@@ -104,7 +123,8 @@ def process(self):
"ENDDATE": max(linksByYear.keys()),
"ENTITYURIS": places,
"YEARS": years,
- "LINKS_BY_YEAR": groupedLinksByYear
+ "LINKS_BY_YEAR": groupedLinksByYear,
+ "CONTEXTS": dict(contexts)
}
self.write_html(params)
View
0  chrome/content/papermachines/processors/lib/geodict/__init__.py
No changes.
View
91 chrome/content/papermachines/processors/lib/geodict/data.py
@@ -0,0 +1,91 @@
+import jsqlite3, string, StringIO
+import geodict_config
+
+def get_database_connection():
+ db=jsqlite3.connect(geodict_config.database+'.db')
+ cursor=db.cursor()
+ return cursor
+
+def get_cities(pulled_word,current_word,country_code,region_code):
+ cursor = get_database_connection()
+ select = 'SELECT * FROM cities WHERE last_word=?'
+ values = (pulled_word, )
+ if country_code is not None:
+ select += ' AND country=?'
+
+ if region_code is not None:
+ select += ' AND region_code=?'
+
+ # There may be multiple cities with the same name, so pick the one with the largest population
+ select += ' ORDER BY population;'
+ # Unfortunately tuples are immutable, so I have to use this logic to set up the correct ones
+ if country_code is None and region_code is None:
+ values = (current_word, )
+ elif country_code is not None and region_code is None:
+ values = (current_word, country_code)
+ elif country_code is None and region_code is not None:
+ values = (current_word, region_code)
+ else:
+ values = (current_word, country_code, region_code)
+
+ values = [v.lower() for v in values]
+
+ cursor.execute(select, values)
+ candidate_rows = cursor.fetchall()
+ # print candidate_rows
+
+ name_map = {}
+ for candidate_row in candidate_rows:
+ # print candidate_row
+ candidate_dict = get_dict_from_row(cursor, candidate_row)
+ # print candidate_dict
+ name = candidate_dict['city'].lower()
+ name_map[name] = candidate_dict
+ return name_map
+
+# Converts the result of a MySQL fetch into an associative dictionary, rather than a numerically indexed list
+def get_dict_from_row(cursor, row):
+ d = {}
+ for idx,col in enumerate(cursor.description):
+ d[col[0]] = row[idx]
+ return d
+
+# Functions that look at a small portion of the text, and try to identify any location identifiers
+
+# Caches the countries and regions tables in memory
+
+def setup_countries_cache():
+ countries_cache = {}
+ cursor = get_database_connection()
+ select = 'SELECT * FROM countries;'
+ cursor.execute(select)
+ candidate_rows = cursor.fetchall()
+
+ for candidate_row in candidate_rows:
+ candidate_dict = get_dict_from_row(cursor, candidate_row)
+ last_word = candidate_dict['last_word'].lower()
+ if last_word not in countries_cache:
+ countries_cache[last_word] = []
+ countries_cache[last_word].append(candidate_dict)
+ return countries_cache
+
+def setup_regions_cache():
+ regions_cache = {}
+ cursor = get_database_connection()
+ select = 'SELECT * FROM regions;'
+ cursor.execute(select)
+ candidate_rows = cursor.fetchall()
+
+ for candidate_row in candidate_rows:
+ candidate_dict = get_dict_from_row(cursor, candidate_row)
+ last_word = candidate_dict['last_word'].lower()
+ if last_word not in regions_cache:
+ regions_cache[last_word] = []
+ regions_cache[last_word].append(candidate_dict)
+ return regions_cache
+
+def is_initialized(name):
+ cursor = get_database_connection()
+ cursor.execute("SELECT COUNT(*) FROM sqlite_master WHERE name = ?;",[name])
+ return cursor.fetchone()[0] > 0
+
View
95 chrome/content/papermachines/processors/lib/geodict/db_funcs.py
@@ -0,0 +1,95 @@
+import jsqlite3, string, StringIO
+import geodict_config
+
+class GeodictDatabase:
+ def __init__(self, database_path):
+ self.database_path = database_path
+
+ def get_database_connection(self):
+ db=jsqlite3.connect(self.database_path)
+ cursor=db.cursor()
+ return cursor
+
+ def get_cities(self, pulled_word,current_word,country_code,region_code):
+ cursor = self.get_database_connection()
+ select = 'SELECT * FROM cities WHERE last_word=?'
+ values = (pulled_word, )
+ if country_code is not None:
+ select += ' AND country=?'
+
+ if region_code is not None:
+ select += ' AND region_code=?'
+
+ # There may be multiple cities with the same name, so pick the one with the largest population
+ select += ' ORDER BY population;'
+ # Unfortunately tuples are immutable, so I have to use this logic to set up the correct ones
+ if country_code is None and region_code is None:
+ values = (current_word, )
+ elif country_code is not None and region_code is None:
+ values = (current_word, country_code)
+ elif country_code is None and region_code is not None:
+ values = (current_word, region_code)
+ else:
+ values = (current_word, country_code, region_code)
+
+ values = [v.lower() for v in values]
+
+ cursor.execute(select, values)
+ candidate_rows = cursor.fetchall()
+ # print candidate_rows
+
+ name_map = {}
+ for candidate_row in candidate_rows:
+ # print candidate_row
+ candidate_dict = self.get_dict_from_row(cursor, candidate_row)
+ # print candidate_dict
+ name = candidate_dict['city'].lower()
+ name_map[name] = candidate_dict
+ return name_map
+
+ # Converts the result of a MySQL fetch into an associative dictionary, rather than a numerically indexed list
+ def get_dict_from_row(self, cursor, row):
+ d = {}
+ for idx,col in enumerate(cursor.description):
+ d[col[0]] = row[idx]
+ return d
+
+ # Functions that look at a small portion of the text, and try to identify any location identifiers
+
+ # Caches the countries and regions tables in memory
+
+ def setup_countries_cache(self):
+ countries_cache = {}
+ cursor = self.get_database_connection()
+ select = 'SELECT * FROM countries;'
+ cursor.execute(select)
+ candidate_rows = cursor.fetchall()
+
+ for candidate_row in candidate_rows:
+ candidate_dict = self.get_dict_from_row(cursor, candidate_row)
+ last_word = candidate_dict['last_word'].lower()
+ if last_word not in countries_cache:
+ countries_cache[last_word] = []
+ countries_cache[last_word].append(candidate_dict)
+ return countries_cache
+
+ def setup_regions_cache(self):
+ regions_cache = {}
+ cursor = self.get_database_connection()
+ select = 'SELECT * FROM regions;'
+ cursor.execute(select)
+ candidate_rows = cursor.fetchall()
+
+ for candidate_row in candidate_rows:
+ candidate_dict = self.get_dict_from_row(cursor, candidate_row)
+ last_word = candidate_dict['last_word'].lower()
+ if last_word not in regions_cache:
+ regions_cache[last_word] = []
+ regions_cache[last_word].append(candidate_dict)
+ return regions_cache
+
+ def is_initialized(self, name):
+ cursor = self.get_database_connection()
+ cursor.execute("SELECT COUNT(*) FROM sqlite_master WHERE name = ?;",[name])
+ return cursor.fetchone()[0] > 0
+
View
BIN  chrome/content/papermachines/processors/lib/geodict/geodict.db
Binary file not shown
View
30 chrome/content/papermachines/processors/lib/geodict/geodict_config.py
@@ -0,0 +1,30 @@
+# Geodict
+# Copyright (C) 2010 Pete Warden <pete@petewarden.com>
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+# The location of the source data to be loaded into your database
+source_folder = './source_data/'
+
+# The name of the database to create
+database = 'geodict'
+
+# The maximum number of words in any name
+word_max = 3
+
+# Words that provide evidence that what follows them is a location
+location_words = {
+ 'at': True,
+ 'in': True
+}
View
433 chrome/content/papermachines/processors/lib/geodict/geodict_lib.py
@@ -0,0 +1,433 @@
+# Geodict
+# Copyright (C) 2010 Pete Warden <pete@petewarden.com>
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+import string, StringIO
+import geodict_config
+import db_funcs
+# The main entry point. This function takes an unstructured text string and returns a list of all the
+# fragments it could identify as locations, together with lat/lon positions
+
+class GeodictParser:
+ def __init__(self, database_path):
+ self.data = db_funcs.GeodictDatabase(database_path)
+ self.countries_cache = self.data.setup_countries_cache()
+ self.regions_cache = self.data.setup_regions_cache()
+
+ # Characters to ignore when pulling out words
+ self.whitespace = set(string.whitespace+"'\",.-/\n\r<>")
+ self.tokenized_words = {}
+
+ def find_locations_in_text(self, text):
+ current_index = len(text)-1
+ result = []
+
+ # This loop goes through the text string in *reverse* order. Since locations in English are typically
+ # described with the broadest category last, preceded by more and more specific designations towards
+ # the beginning, it simplifies things to walk the string in that direction too
+ while current_index>=0:
+
+ current_word, pulled_index, ignored_skipped = self.pull_word_from_end(text, current_index)
+ lower_word = current_word.lower()
+ could_be_country = lower_word in self.countries_cache
+ could_be_region = lower_word in self.regions_cache
+
+ if not could_be_country and not could_be_region:
+ current_index = pulled_index
+ continue
+
+ # This holds the results of the match function for the final element of the sequence. This lets us
+ # optimize out repeated calls to see if the end of the current string is a country for example
+ match_cache = {}
+
+ # These 'token sequences' describe patterns of discrete location elements that we'll look for.
+ for token_sequence in self.token_sequences:
+
+ # The sequences are specified in the order they'll occur in the text, but since we're walking
+ # backwards we need to reverse them and go through the sequence in that order too
+ token_sequence = token_sequence[::-1]
+
+ # Now go through the sequence and see if we can match up all the tokens in it with parts of
+ # the string
+ token_result = None
+ token_index = current_index
+ for token_position, token_name in enumerate(token_sequence):
+
+ # The token definition describes how to recognize part of a string as a match. Typical
+ # tokens include country, city and region names
+ token_definition = self.token_definitions[token_name]
+ match_function = token_definition['match_function']
+
+ # This logic optimizes out repeated calls to the same match function
+ if token_position == 0 and token_name in match_cache:
+ token_result = match_cache[token_name]
+ else:
+ # The meat of the algorithm, checks the ending of the current string against the
+ # token testing function, eg seeing if it matches a country name
+ token_result = match_function(self, text, token_index, token_result)
+ if token_position == 0:
+ match_cache[token_name] = token_result
+
+ if token_result is None:
+ # The string doesn't match this token, so the sequence as a whole isn't a match
+ break
+ else:
+ # The current token did match, so move backwards through the string to the start of
+ # the matched portion, and see if the preceding words match the next required token
+ token_index = token_result['found_tokens'][0]['start_index']-1
+
+ # We got through the whole sequence and all the tokens match, so we have a winner!
+ if token_result is not None:
+ break
+
+ if token_result is None:
+ # None of the sequences matched, so back up a word and start over again
+ ignored_word, current_index, end_skipped = self.pull_word_from_end(text, current_index)
+ else:
+ # We found a matching sequence, so add the information to the result
+ result.append(token_result)
+ found_tokens = token_result['found_tokens']
+ current_index = found_tokens[0]['start_index']-1
+
+ # Reverse the result so it's in the order that the locations occured in the text
+ result = result[::-1]
+
+ return result
+
+ # Functions that look at a small portion of the text, and try to identify any location identifiers
+
+ # Matches the current fragment against our database of countries
+ def is_country(self, text, text_starting_index, previous_result):
+
+ current_word = ''
+ current_index = text_starting_index
+ pulled_word_count = 0
+ found_row = None
+
+ # Walk backwards through the current fragment, pulling out words and seeing if they match
+ # the country names we know about
+ while pulled_word_count < geodict_config.word_max:
+ pulled_word, current_index, end_skipped = self.pull_word_from_end(text, current_index)
+ pulled_word_count += 1
+ if current_word == '':
+ # This is the first time through, so the full word is just the one we pulled
+ current_word = pulled_word
+ # Make a note of the real end of the word, ignoring any trailing whitespace
+ word_end_index = (text_starting_index-end_skipped)
+
+ # We've indexed the locations by the word they end with, so find all of them
+ # that have the current word as a suffix
+ last_word = pulled_word.lower()
+ if last_word not in self.countries_cache:
+ break
+ candidate_dicts = self.countries_cache[last_word]
+
+ name_map = {}
+ for candidate_dict in candidate_dicts:
+ name = candidate_dict['country'].lower()
+ name_map[name] = candidate_dict
+ else:
+ #
+ current_word = pulled_word+' '+current_word
+
+ # This happens if we've walked backwards all the way to the start of the string
+ if current_word == '':
+ return None
+
+ # If the first letter of the name is lower case, then it can't be the start of a country
+ # Somewhat arbitrary, but for my purposes it's better to miss some ambiguous ones like this
+ # than to pull in erroneous words as countries (eg thinking the 'uk' in .co.uk is a country)
+ if current_word[0:1].islower():
+ continue
+
+ name_key = current_word.lower()
+ if name_key in name_map:
+ found_row = name_map[name_key]
+
+ if found_row is not None:
+ # We've found a valid country name
+ break
+ if current_index < 0:
+ # We've walked back to the start of the string
+ break
+
+ if found_row is None:
+ # We've walked backwards through the current words, and haven't found a good country match
+ return None
+
+ # Were there any tokens found already in the sequence? Unlikely with countries, but for
+ # consistency's sake I'm leaving the logic in
+ if previous_result is None:
+ current_result = {
+ 'found_tokens': [],
+ }
+ else:
+ current_result = previous_result
+
+ country_code = found_row['country_code']
+ lat = found_row['lat']
+ lon = found_row['lon']
+ geonameid = found_row['geonameid']
+
+ # Prepend all the information we've found out about this location to the start of the 'found_tokens'
+ # array in the result
+ current_result['found_tokens'].insert(0, {
+ 'type': 'COUNTRY',
+ 'code': country_code,
+ 'geonameid': geonameid,
+ 'lat': lat,
+ 'lon': lon,
+ 'matched_string': current_word,
+ 'start_index': (current_index+1),
+ 'end_index': word_end_index
+ })
+
+ return current_result
+
+ # Looks through our database of towns and cities around the world to locate any that match the
+ # words at the end of the current text fragment
+ def is_city(self, text, text_starting_index, previous_result):
+
+ # If we're part of a sequence, then use any country or region information to narrow down our search
+ country_code = None
+ region_code = None
+ if previous_result is not None:
+ found_tokens = previous_result['found_tokens']
+ for found_token in found_tokens:
+ type = found_token['type']
+ if type == 'COUNTRY':
+ country_code = found_token['code']
+ elif type == 'REGION':
+ region_code = found_token['code']
+
+ current_word = ''
+ current_index = text_starting_index
+ pulled_word_count = 0
+ found_row = None
+ while pulled_word_count < geodict_config.word_max:
+ pulled_word, current_index, end_skipped = self.pull_word_from_end(text, current_index)
+ pulled_word_count += 1
+
+ if current_word == '':
+ current_word = pulled_word
+ word_end_index = (text_starting_index-end_skipped)
+
+ name_map = self.data.get_cities(pulled_word,current_word,country_code,region_code)
+ if len(name_map) < 1:
+ break
+ else:
+ current_word = pulled_word+' '+current_word
+
+ if current_word == '':
+ return None
+
+ if current_word[0:1].islower():
+ continue
+
+ name_key = current_word.lower()
+ if name_key in name_map:
+ found_row = name_map[name_key]
+
+ if found_row is not None:
+ break
+ if current_index < 0:
+ break
+
+ if found_row is None:
+ return None
+
+ if previous_result is None:
+ current_result = {
+ 'found_tokens': [],
+ }
+ else:
+ current_result = previous_result
+
+ lat = found_row['lat']
+ lon = found_row['lon']
+ geonameid = found_row['geonameid']
+
+ current_result['found_tokens'].insert(0, {
+ 'type': 'CITY',
+ 'lat': lat,
+ 'lon': lon,
+ 'geonameid': geonameid,
+ 'matched_string': current_word,
+ 'start_index': (current_index+1),
+ 'end_index': word_end_index
+ })
+
+ return current_result
+
+ # This looks for sub-regions within countries. At the moment the only values in the database are for US states
+ def is_region(self, text, text_starting_index, previous_result):
+
+ # Narrow down the search by country, if we already have it
+ country_code = None
+ if previous_result is not None:
+ found_tokens = previous_result['found_tokens']
+ for found_token in found_tokens:
+ type = found_token['type']
+ if type == 'COUNTRY':
+ country_code = found_token['code']
+
+ current_word = ''
+ current_index = text_starting_index
+ pulled_word_count = 0
+ found_row = None
+ while pulled_word_count < geodict_config.word_max:
+ pulled_word, current_index, end_skipped = self.pull_word_from_end(text, current_index)
+ pulled_word_count += 1
+ if current_word == '':
+ current_word = pulled_word
+ word_end_index = (text_starting_index-end_skipped)
+
+ last_word = pulled_word.lower()
+ if last_word not in self.regions_cache:
+ break
+ all_candidate_dicts = self.regions_cache[last_word]
+ if country_code is not None:
+ candidate_dicts = []
+ for possible_dict in all_candidate_dicts:
+ candidate_country = possible_dict['country_code']
+ if candidate_country.lower() == country_code.lower():
+ candidate_dicts.append(possible_dict)
+ else:
+ candidate_dicts = all_candidate_dicts
+
+ name_map = {}
+ for candidate_dict in candidate_dicts:
+ name = candidate_dict['region'].lower()
+ name_map[name] = candidate_dict
+ else:
+ current_word = pulled_word+' '+current_word
+
+ if current_word == '':
+ return None
+
+ if current_word[0:1].islower():
+ continue
+
+ name_key = current_word.lower()
+ if name_key in name_map:
+ found_row = name_map[name_key]
+
+ if found_row is not None:
+ break
+ if current_index < 0:
+ break
+
+ if found_row is None:
+ return None
+
+ if previous_result is None:
+ current_result = {
+ 'found_tokens': [],
+ }
+ else:
+ current_result = previous_result
+
+ region_code = found_row['region_code']
+ lat = found_row['lat']
+ lon = found_row['lon']
+ geonameid = found_row['geonameid']
+
+ current_result['found_tokens'].insert(0, {
+ 'type': 'REGION',
+ 'code': region_code,
+ 'geonameid': geonameid,
+ 'lat': lat,
+ 'lon': lon,
+ 'matched_string': current_word,
+ 'start_index': (current_index+1),
+ 'end_index': word_end_index
+ })
+
+ return current_result
+
+ # A special case - used to look for 'at' or 'in' before a possible location word. This helps me be more certain
+ # that it really is a location in this context. Think 'the New York Times' vs 'in New York' - with the latter
+ # fragment we can be pretty sure it's talking about a location
+ def is_location_word(self, text, text_starting_index, previous_result):
+
+ current_index = text_starting_index
+ current_word, current_index, end_skipped = self.pull_word_from_end(text, current_index)
+ word_end_index = (text_starting_index-end_skipped)
+ if current_word == '':
+ return None
+
+ current_word = current_word.lower()
+
+ if current_word not in geodict_config.location_words:
+ return None
+
+ return previous_result
+
+ # Walks backwards through the text from the end, pulling out a single unbroken sequence of non-whitespace
+ # characters, trimming any whitespace off the end
+ def pull_word_from_end(self, text, index, use_cache=True):
+
+ if use_cache and index in self.tokenized_words:
+ return self.tokenized_words[index]
+
+ found_word = ''
+ current_index = index
+ end_skipped = 0
+ while current_index>=0:
+ current_char = text[current_index]
+ current_index -= 1
+
+ if current_char in self.whitespace:
+ if found_word is '':
+ end_skipped += 1
+ continue
+ else:
+ current_index += 1
+ break
+
+ found_word += current_char
+
+ # reverse the result (since we're appending for efficiency's sake)
+ found_word = found_word[::-1]
+
+ result = (found_word, current_index, end_skipped)
+ self.tokenized_words[index] = result
+
+ return result
+
+ token_definitions = {
+ 'COUNTRY': {
+ 'match_function': is_country
+ },
+ 'CITY': {
+ 'match_function': is_city
+ },
+ 'REGION': {
+ 'match_function': is_region
+ },
+ 'LOCATION_WORD': {
+ 'match_function': is_location_word
+ }
+ }
+
+ # Particular sequences of those location words that give us more confidence they're actually describing
+ # a place in the text, and aren't coincidental names (eg 'New York Times')
+ token_sequences = [
+ [ 'CITY', 'COUNTRY' ],
+ [ 'CITY', 'REGION' ],
+ [ 'REGION', 'COUNTRY' ],
+ [ 'COUNTRY' ],
+ [ 'LOCATION_WORD', 'REGION' ], # Regions are too common as words to use without additional evidence
+ ]
View
95 chrome/content/papermachines/processors/lib/geodict/jsqlite3.py
@@ -0,0 +1,95 @@
+#!/usr/bin/env python2.7
+import sys, os, logging
+from java.lang import Class
+
+from java.sql import Connection, DriverManager, ResultSet, ResultSetMetaData, SQLException, Statement, PreparedStatement
+
+def connect(db_name):
+ return SqliteDB(db_name)
+
+def getConnection(jdbc_url, driverName):
+ """
+ Given the name of a JDBC driver class and the url to be used
+ to connect to a database, attempt to obtain a connection to
+ the database.
+ """
+ try:
+ Class.forName(driverName).newInstance()
+ except Exception, msg:
+ logging.error(msg)
+ sys.exit(-1)
+
+ try:
+ dbConn = DriverManager.getConnection(jdbc_url)
+ except SQLException, msg:
+ logging.error(msg)
+ sys.exit(-1)
+
+ return dbConn
+
+class SqliteDB:
+ def __init__(self, name):
+ self.connection = getConnection("jdbc:sqlite:"+name, "org.sqlite.JDBC")
+ def cursor(self):
+ return FakeCursor(self.connection)
+
+class FakeCursor:
+ def __init__(self, connection):
+ self.connection = connection
+ self.statement = None
+ self.results = None
+ self.types = None
+ self.columns = None
+
+ def execute(self, query, values = []):
+ self.results = None
+ self.types = None
+ self.columns = None
+ self.statement = self.connection.prepareStatement(query)
+ for i, v in enumerate(values):
+ i = i + 1
+ if isinstance(v, basestring):
+ self.statement.setString(i, v)
+ else:
+ self.statement.setInt(i, v)
+ self.results = self.statement.executeQuery()
+
+ @property
+ def description(self):
+ if self.results is not None:
+ self.columns = []
+ self.types = []
+ md = self.results.getMetaData()
+ for i in range(md.getColumnCount()):
+ i = i + 1
+ this_type = md.getColumnTypeName(i)
+ self.types.append(this_type)
+ self.columns.append((md.getColumnName(i), this_type))
+ return self.columns
+
+ def fetchall(self):
+ cols = self.description
+ return self.fetchall_iter()
+
+ def fetchall_iter(self):
+ while self.results.next():
+ yield self.row_inner()
+
+ def row_inner(self):
+ row = []
+ for i, t in enumerate(self.types):
+ i = i + 1
+ if t == 'text':
+ row.append(self.results.getString(i))
+ elif t == 'integer':
+ row.append(self.results.getInt(i))
+ elif t == 'float':
+ row.append(self.results.getFloat(i))
+ return row
+
+ def fetchone(self):
+ cols = self.description
+ self.results.next()
+ return self.row_inner()
+
+
View
BIN  chrome/content/papermachines/processors/lib/geodict/sqlite-jdbc-3.7.2.jar
Binary file not shown
View
7 chrome/content/papermachines/processors/lib/geodict/test.py
@@ -0,0 +1,7 @@
+#!/usr/bin/python
+import jsqlite3
+
+con = jsqlite3.connect("geodict.db")
+cur = con.cursor()
+cur.execute("SELECT COUNT(*) FROM sqlite_master WHERE name = ?;",['cities'])
+print cur.fetchone()
View
105 chrome/content/papermachines/processors/support/flightpaths.js
@@ -18,8 +18,21 @@ var legend_m = [30, 30, 30, 30], // margins
legend_w = 240 - legend_m[1] - legend_m[3], // width
legend_h = 340 - legend_m[0] - legend_m[2]; // height
+var yearTaper = d3.scale.linear().domain([0, 20]).range([0,1]);
var link_polylines = {};
+function geoid(uri) {
+ return uri.split('/').slice(-1)[0];
+}
+function yearStillShowing(year){
+ return endDate == year;
+ // return endDate - year >= 0 && endDate - year < 20;
+}
+
+function fadeWithTime(d) {
+ return 0.8; //* yearTaper(endDate - d.year);
+}
+
var playPause = function () {
if (animating) {
clearInterval(animating);
@@ -67,7 +80,7 @@ var timeAction = function () {
entityURIs[d.edge[1]].sources[d.edge[0]] += 1;
}
});
- })(year == endDate);
+ })(yearStillShowing(year)); //(year == endDate);
// })(year >= startDate && ((startDate == endDate && year == endDate) || year < endDate));
}
circleOverlay.draw(true);
@@ -84,8 +97,10 @@ var updateCircleData = function () {
max = d.counts[year];
}
// if (year >= startDate && ((startDate == endDate && year == endDate) || year < endDate)) {
- if (year == endDate) {
+ // if (year == endDate) {
+ if (yearStillShowing(year)) {
sum = d.counts[year];
+ entityURIs[uri].year = year;
}
}
// if (sum > max) { max = sum; }
@@ -301,7 +316,7 @@ ArcOverlay.prototype.draw = function(force) {
.attr("fill", "none")
// .attr("stroke", "url(#fade)")
.attr("stroke", colorArcsByOrigin)
- .attr("stroke-opacity", 0.1)
+ .attr("stroke-opacity", 0.5)
.enter().append("path")
.attr("d", pathFromArc)
.attr("fill", "none")
@@ -310,7 +325,7 @@ ArcOverlay.prototype.draw = function(force) {
.attr("id", function (d) { return "arc" + d.value.id; })
// .attr("stroke","url(#fade)")
.attr("stroke", colorArcsByOrigin)
- .attr("stroke-opacity", 0.1);
+ .attr("stroke-opacity", 0.5);
};
function sanitize(key) {
@@ -419,7 +434,7 @@ CircleOverlay.prototype.draw = function(force) {
.attr("cy", circleY)
.attr("r", circleRadius)
.attr("fill", colorBySources)
- .attr("fill-opacity", "0.3")
+ .attr("fill-opacity", fadeWithTime)
.attr("stroke", "#fff").attr("stroke-opacity", "0.3")
.attr("display", function (d) { return entityURIs[d.key].sum ? "block" : "none";})
.enter().append("circle")
@@ -427,17 +442,17 @@ CircleOverlay.prototype.draw = function(force) {
.attr("cy", circleY)
.attr("r", circleRadius)
.attr("fill", colorBySources)
- .attr("fill-opacity", "0.3")
+ .attr("fill-opacity", fadeWithTime)
.attr("stroke", "#fff").attr("stroke-opacity", "0.3")
.attr("class", "circle")
- .attr("id", function (d) { return "site" + sanitize(d.key); })
+ .attr("id", function (d) { return "circle" + geoid(d.key); })
.attr("display", function (d) { return entityURIs[d.key].sum ? "block" : "none";})
.style("cursor", "pointer")
.on("mouseover", function (d) {
- d3.select("#site" + sanitize(d.key)).attr("fill-opacity", "1");
+ d3.select("#circle" + geoid(d.key)).attr("fill-opacity", "1");
})
.on("mouseout", function (d) {
- d3.select("#site" + sanitize(d.key)).attr("fill-opacity", "0.3");
+ d3.select("#circle" + geoid(d.key)).attr("fill-opacity", "0.8");
})
.on("click", displayCircleInfo);
};
@@ -449,9 +464,30 @@ CircleOverlay.prototype.draw = function(force) {
function displayCircleInfo(d) {
// console.log(entityURIs[d.value.id]);
- d3.json("contexts/" + d.value.id.split('/').slice(-1)[0] + ".json", function (json) {
- console.log(json);
- });
+ var bbox = d3.select("#circle" + geoid(d.key))[0][0].getBBox(),
+ popup_x = bbox.x + bbox.width,
+ popup_y = Math.max(bbox.y + bbox.height, 300);
+ var entity = entityURIs[d.key],
+ name = entity.name,
+ population = entity.population || "?",
+ entity_str = "<b>" + name + "</b><hr/><span class='popupinfo'>loading...</span>";
+ // entity_str = "<b>" + name + "</b><br/><i>pop.</i> " + population.toString() + "<hr/><span class='popupinfo'>loading...</span>";
+ var popup = new Popup(entity_str, popup_x, popup_y);
+ var json = data["CONTEXTS"][geoid(d.key)];
+ var contexts_str = "";
+ for (var text in json) {
+ var text_obj = doc_metadata[text];
+ if (!text_obj || !text_obj.date || !yearStillShowing(text_obj.date.substring(0,4))) continue;
+ contexts_str += "<div>"
+ var title = text_obj.label + ": <a href='zotero://select/" + text + "'>" + text_obj.title + "</a>",
+ date = text_obj.date.split(' ')[0];
+ contexts_str += title + "\n<br/>\n" + date;
+ for (var i in json[text]) {
+ contexts_str += "<blockquote>" + json[text][i] + "</blockquote>";
+ }
+ contexts_str += "</div>";
+ }
+ d3.select(".popupinfo").html(contexts_str);
}
function buildLegend() {
@@ -474,19 +510,21 @@ function buildLegend() {
var origin_box = labels.append("svg:g")
.attr("id", "origin_labels")
- .attr("transform", "translate(120, 50)");
+ .attr("transform", "translate(120, 40)");
origin_box.append("svg:text")
.text("Origin")
- .style("fill", "#000");
+ .style("fill", "#000")
+ .style("font-size", "1em");
origin_labels = origin_box.append("svg:g").selectAll(".originlabel").data(originColors.domain())
.enter().append("svg:g")
.attr("class", "originlabel")
- .attr("transform", function (d, i) { return "translate(0," + ((i+1)*30) +")"});
+ .attr("transform", function (d, i) { return "translate(0," + ((i+1)*20 + 5) +")"});
origin_labels.append("svg:text")
.text(function (d) { return entityURIs[d].name;})
+ .style("font-size", "0.8em")
.attr("fill", function (d) { return originColors(d); });
updateLegend();
@@ -531,31 +569,28 @@ function setGradient(svg) {
.style("stop-color", function (d) { return d; });
}
-function Popup (text, d, i) {
+function closePopup() {
+ d3.select("#popup").remove();
+}
+
+function Popup(text, x, y) {
+ d3.select("#popup").remove();
this.div = d3.select("body").append("div")
- .attr("class", "popup")
- .attr("data-node", i)
- .html("<span class='popupclose' onclick='closePopup(" + i + ")'>x</span>");
- this.node = d;
- this.index = i;
+ .attr("id", "popup")
+ .html("<span class='popupclose' onclick='closePopup()'>x</span>");
this.inner = this.div.append("span").attr("class", "popupText");
- this.showing = false;
- this.display = function(d) {
+ this.display = function() {
this.inner.html(text);
this.div.style("display", "block")
- .style("z-index", popupStack);
-
- this.update(d);
- this.showing = true;
+ .style("position", "absolute")
+ .style("z-index", "999");
return this;
};
- this.update = function (d) {
- this.div.style("left", Math.floor(d.y + 10 ) + "px");
- this.div.style("top", Math.floor(d.x + 10) + "px");
- };
- this.hide = function(d) {
- this.update(d);
- this.div.style("display", "none");
- this.showing = false;
+ this.update = function (x, y) {
+ this.div.style("left", Math.floor(x) + "px");
+ this.div.style("top", Math.floor(y) + "px");
};
+
+ if (x && y) this.update(x, y);
+ this.display();
};
View
40 chrome/content/papermachines/processors/support/heatmap-gmaps.js
@@ -164,3 +164,43 @@ HeatmapOverlay.prototype.addDataPoint = function(lat, lng, count){
HeatmapOverlay.prototype.toggle = function(){
this.heatmap.toggleDisplay();
}
+
+var map;
+var heatmap;
+
+window.onload = function(){
+
+ var myLatlng = new google.maps.LatLng(-15.6778, -47.4384);
+ var myOptions = {
+ zoom: 2,
+ minZoom: 2,
+ center: myLatlng,
+ mapTypeId: google.maps.MapTypeId.ROADMAP,
+ disableDefaultUI: false,
+ scrollwheel: true,
+ draggable: true,
+ navigationControl: true,
+ mapTypeControl: false,
+ scaleControl: true,
+ disableDoubleClickZoom: false
+ };
+ map = new google.maps.Map(document.getElementById("heatmapArea"), myOptions);
+
+ heatmap = new HeatmapOverlay(map, {"radius":15, "visible":true, "opacity":60, "legend": {
+ "title": "Mentions in Corpus",
+ "position": "br",
+ "offset": 30
+ }});
+
+ document.getElementById("togLegend").onclick = function(){
+ var legend = heatmap.heatmap.get("legend").get("element");
+ legend.hidden = !legend.hidden;
+ };
+
+ var myData = data["INTENSITY"];
+
+ // this is important, because if you set the data set too early, the latlng/pixel projection doesn't work
+ google.maps.event.addListenerOnce(map, "idle", function(){
+ heatmap.setDataSet(myData);
+ });
+};
View
2  chrome/content/papermachines/processors/templates/geoparser_export.html
@@ -8,7 +8,7 @@
<body>
<p id="path"></p>
<script>
- var p = document.getElementById("extracted");
+ var p = document.getElementById("path");
p.textContent = "The CSV file may be found at the following path: " + data["CSVPATH"];
</script>
</body>
View
4 chrome/content/papermachines/processors/templates/geoparser_flightpaths.html
@@ -13,15 +13,13 @@
<body>
<div id="body">
<div id="main">
+ <div id="togLegend" class="btn">Toggle Legend</div>
<div id="searches"><label for="searchTime" value="Time:"><input type="range" id="searchTime"/></label><span id="timeDisplay"></div>
<h1>Flight Paths: COLLECTION_NAME</h1>
<div id="heatmapArea">
</div>
- <div id="configArea">
- <div id="togLegend" class="btn">Toggle Legend</div>
- </div>
</div>
</div>
View
45 chrome/content/papermachines/processors/templates/geoparser_heatmap.html
@@ -4,6 +4,7 @@
<title>Heatmap: COLLECTION_NAME</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link type="text/css" rel="stylesheet" href="support/heatmap.css"/>
+ <script type="text/javascript" src="DATA_PATH"></script>
<script type="text/javascript" src="http://maps.google.com/maps/api/js?sensor=false"></script>
</head>
@@ -11,7 +12,6 @@
<div id="main">
<h1>Heatmap: COLLECTION_NAME</h1>
<div id="heatmapArea">
-
</div>
<div id="configArea">
@@ -21,48 +21,5 @@
</div>
<script type="text/javascript" src="support/heatmap.js"></script>
<script type="text/javascript" src="support/heatmap-gmaps.js"></script>
-<script type="text/javascript">
-
-var map;
-var heatmap;
-
-window.onload = function(){
-
- var myLatlng = new google.maps.LatLng(-15.6778, -47.4384);
- var myOptions = {
- zoom: 2,
- minZoom: 2,
- center: myLatlng,
- mapTypeId: google.maps.MapTypeId.ROADMAP,
- disableDefaultUI: false,
- scrollwheel: true,
- draggable: true,
- navigationControl: true,
- mapTypeControl: false,
- scaleControl: true,
- disableDoubleClickZoom: false
- };
- map = new google.maps.Map(document.getElementById("heatmapArea"), myOptions);
-
- heatmap = new HeatmapOverlay(map, {"radius":15, "visible":true, "opacity":60, "legend": {
- "title": "Mentions in Corpus",
- "position": "br",
- "offset": 30
- }});
-
- document.getElementById("togLegend").onclick = function(){
- var legend = heatmap.heatmap.get("legend").get("element");
- legend.hidden = !legend.hidden;
- };
-
- var myData = INTENSITY;
-
- // this is important, because if you set the data set too early, the latlng/pixel projection doesn't work
- google.maps.event.addListenerOnce(map, "idle", function(){
- heatmap.setDataSet(myData);
- });
-};
-
-</script>
</body>
</html>
View
2  install.rdf
@@ -5,7 +5,7 @@
<Description about="urn:mozilla:install-manifest">
<em:id>papermachines@chrisjr.org</em:id>
<em:name>Paper Machines</em:name>
- <em:version>0.4.0pre2</em:version>
+ <em:version>0.4.0</em:version>
<em:description>A Zotero extension for analysis and visualization in the digital humanities.</em:description>
<em:creator>Chris Johnson-Roberson</em:creator>
<em:homepageURL>http://www.papermachines.org/</em:homepageURL>
Please sign in to comment.
Something went wrong with that request. Please try again.