Upgraded to Python 3

jwngr · Jun 14, 2024 · 6c874a8 · 6c874a8
1 parent a28ad00
commit 6c874a8
Show file tree

Hide file tree

Showing 16 changed files with 12,859 additions and 7,439 deletions.
diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md
@@ -2,20 +2,20 @@
 
 Thank you for contributing to Six Degrees of Wikipedia!
 
-## Local Setup
+## Local setup
 
 There are three main pieces you'll need to get set up running locally:
 
-1.  Mock SQLite database of Wikipedia links.
-2.  Backend Python Flask web server.
-3.  [Create React App](https://github.com/facebook/create-react-app)-based frontend website.
+1.  Mock SQLite database of Wikipedia links
+2.  Backend Python Flask web server
+3.  [Create React App](https://github.com/facebook/create-react-app)-based frontend website
 
 There is some larger set up you'll need to run initially as well as some recurring set up every time
 you want to run the service.
 
 Note: The following instructions have only been tested on macOS.
 
-### Initial Setup
+### Initial setup
 
 The first step is to clone the repo and move into the created directory:
 
@@ -24,40 +24,53 @@ $ git clone git@github.com:jwngr/sdow.git
 $ cd sdow/
 ```
 
-Several global dependencies are required to run the service. Since installation instructions vary
-and are decently documented for each project, please refer to the links below on how to install them.
+Several dependencies are required to run the service:
+1.  [`sqlite3`](https://docs.python.org/3/library/sqlite3.html) - Data storage
+1.  [`nvm`](https://github.com/nvm-sh/nvm) - Manage Node and `npm` versions
+1.  [`pyenv`](https://github.com/pyenv/pyenv) - Manage Python and `pip` versions
+1.  [`virtualenv`](https://virtualenv.pypa.io/) - Avoid polluting global environment
 
-1.  [Python](https://www.python.org/downloads/) - macOS comes with an older `2.x` version of Python,
-    but I recommend using [`pyenv`](https://github.com/pyenv/pyenv) to install the latest `2.x`
-    release.
-1.  [`pip`](https://pip.pypa.io/en/stable/installing/) - Most recent versions of Python ship with
-    `pip`
-1.  [`sqlite3`](https://docs.python.org/3/library/sqlite3.html) - Can be installed via `brew install sqlite3`.
-1.  [`virtualenv`](https://virtualenv.pypa.io/en/stable/installation/) - Helps avoid polluting your
-    global environment.
+The simplest way to download these is via [Homebrew](https://github.com/pyenv/pyenv):
 
-Once all the required global dependencies above are installed, run the following commands to get
-everything set up:
+```bash
+## Install SQLite.
+$ brew install sqlite
+
+## Install nvm (Node + npm).
+$ brew install nvm
+$ nvm install node
+
+## Install + configure pyenv (Python + pip).
+$ brew install xz
+$ brew install pyenv 
+# Also configure pyenv path using instructions in link above.
+$ pyenv install 3
+
+## Install + configure  virtualenv.
+$ python -m pip install --user virtualenv
+# Also configure virtualenv path using instructions in link above.
+```
+
+Once the required global dependencies are installed, install the project dependencies and generate
+a mock local database:
 
 ```bash
+# Run from root of repo.
 $ virtualenv env
 $ source env/bin/activate
 $ pip install -r requirements.txt
 $ python scripts/create_mock_databases.py
-$ cp sdow.sqlite sdow/
-$ cd website/
-$ npm install
-$ cd ..
 ```
 
-### Recurring Setup
+### Recurring setup
 
 Every time you want to run the service, you need to source your environment, start the backend Flask
 app, and the frontend website. You can run the backend and frontend apps in different tabs.
 
 To run the backend, open a new tab and run the following commands from the repo root:
 
 ```bash
+# Run from root of repo.
 $ source env/bin/activate
 $ cd sdow/
 $ export FLASK_APP=server.py FLASK_DEBUG=1
@@ -71,24 +84,24 @@ $ cd website/
 $ npm start
 ```
 
-The service should be running at http://localhost:3000.
+The service can be found at http://localhost:3000.
 
-## Repo Organization
+## Repo organization
 
 Here are some highlights of the directory structure and notable source files
 
-- `.github/` - Contribution instructions as well as issue and pull request templates.
-- `config/` - Configuration files for services like NGINX, Gunicorn, and Supervisord.
-- `docs/` - Documentation.
+- `.github/` - Contribution instructions as well as issue and pull request templates
+- `config/` - Configuration files for services like NGINX, Gunicorn, and Supervisord
+- `docs/` - Documentation
 - `scripts/` - Scripts to do things like create a new version of the SDOW database, create a mock
-- `sdow/` - The Python Flask web server.
-  - `server.py` - Main entry point which initializes the Flask web server.
-  - `database.py` - Defines a `Database` class which simplifies querying the SDOW SQLite database.
-  - `breadth_first_search.py` - The main search algorithm which finds the shortest path between pages.
-  - `helpers.py` - Miscellaneous helper functions and classes.
-- `sketch/` - Sketch logo files.
-- `sql/` - SQLite table schemas.
-- `website/` - The frontend website, based on [Create React App](https://github.com/facebook/create-react-app).
-- `.pylintrc` - Default configuration for `pylint`.
-- `requirements.txt` - Requirements specification for installing project dependencies via `pip`.
-- `setup.cfg` - Python PEP 8 autoformatting rules.
+- `sdow/` - The Python Flask web server
+  - `server.py` - Main entry point which initializes the Flask web server
+  - `database.py` - Defines a `Database` class which simplifies querying the SDOW SQLite database
+  - `breadth_first_search.py` - The main search algorithm which finds the shortest path between pages
+  - `helpers.py` - Miscellaneous helper functions and classes
+- `sketch/` - Sketch logo files
+- `sql/` - SQLite table schemas
+- `website/` - The frontend website, based on [Create React App](https://github.com/facebook/create-react-app)
+- `.pylintrc` - Default configuration for `pylint`
+- `requirements.txt` - Requirements specification for installing project dependencies via `pip`
+- `setup.cfg` - Python PEP 8 autoformatting rules
diff --git a/docs/data-source.md b/docs/data-source.md
@@ -1,6 +1,6 @@
 # Data Source | Six Degrees of Wikipedia
 
-## Table of Contents
+## Table of contents
 
 - [Data Source](#data-source)
 - [Get the Data Yourself](#get-the-data-yourself)
@@ -9,7 +9,7 @@
 - [Historical Search Results](#historical-search-results)
 - [Database Creation Process](#database-creation-process)
 
-## Data Source
+## Data source
 
 Data for this project comes from Wikimedia, which creates [gzipped SQL dumps of the English language
 Wikipedia database](https://dumps.wikimedia.your.org/enwiki) twice monthly. The Six Degrees of
@@ -29,7 +29,7 @@ For performance reasons, files are downloaded from the
 Six Degrees of Wikipedia only deals with actual Wikipedia pages, which in Wikipedia parlance means
 pages which belong to [namespace](https://en.wikipedia.org/wiki/Wikipedia:Namespace) `0`.
 
-## Get the Data Yourself
+## Get the data yourself
 
 Compressed versions of the Six Degrees of Wikipedia SQLite database (`sdow.sqlite.gz`) are available
 for download from ["requester pays"](https://cloud.google.com/storage/docs/requester-pays) Google
@@ -95,7 +95,7 @@ $ pigz -d sdow.sqlite.gz
 - `gs://sdow-prod/dumps/20231220/sdow.sqlite.gz` (4.3 GB)
 </details>
 
-## Database Schema
+## Database schema
 
 The Six Degrees of Wikipedia database is a single SQLite file containing the following three tables:
 
@@ -114,7 +114,7 @@ The Six Degrees of Wikipedia database is a single SQLite file containing the fol
     1.  `source_id` - The page ID of the source page, the page that redirects to another page.
     2.  `target_id` - The page ID of the target page, to which the redirect page redirects.
 
-## Historical Search Results
+## Historical search results
 
 Historical search results are stored in a separate SQLite database (`searches.sqlite`) which
 contains a single `searches` table with the following schema:
@@ -134,7 +134,7 @@ as well as to make it easy to update the `sdow.sqlite` database to a more recent
 Historical search results are not available for public download, but they are not required to run
 this project yourself.
 
-## Database Creation Script
+## Database creation script
 
 A new build of the Six Degrees of Wikipedia database is created using the [database creation shell
 script](../scripts/buildDatabase.sh):
@@ -150,7 +150,7 @@ by passing the date of the dump in the format `YYYYMMDD` as a command line argum
 $ ./buildDatabase.sh <YYYYMMDD>
 ```
 
-## Database Creation Process
+## Database creation process
 
 Generating the Six Degrees of Wikipedia database from a dump of Wikipedia takes approximately two
 hours given the following instructions:

diff --git a/docs/miscellaneous.md b/docs/miscellaneous.md
@@ -1,11 +1,11 @@
 # Miscellaneous | Six Degrees of Wikipedia
 
-## Table of Contents
+## Table of contents
 
 * [Noteworthy Searches](#noteworthy-searches)
 * [Edge Case Page Titles](#edge-case-page-titles)
 
-## Noteworthy Searches
+## Noteworthy searches
 
 The following is a list of noteworthy searches:
 
@@ -19,7 +19,7 @@ The following is a list of noteworthy searches:
 | [Lion Express → Phinney](https://www.sixdegreesofwikipedia.com/?source=Lion%20Express&target=Phinney)                                                             | 9 degrees of separation!         |
 | [2016 French Open → Brachmia melicephala](https://www.sixdegreesofwikipedia.com/?source=2016%20French%20Open&target=Brachmia%20melicephala)                       | Sparse graph of 6 degrees        |
 
-## Edge Case Page Titles
+## Edge case page titles
 
 The following is a collection of edge page titles, mainly used to ensure the project works given a
 wide variety of inputs:

diff --git a/docs/web-server-setup.md b/docs/web-server-setup.md
@@ -1,13 +1,13 @@
-# Web Server Setup | Six Degrees of Wikipedia
+# Web erver Setup | Six Degrees of Wikipedia
 
-## Table of Contents
+## Table of contents
 
 - [Initial Setup](#initial-setup)
 - [Recurring Setup](#recurring-setup)
 - [Updating Data Source](#updating-data-source)
 - [Updating Server Code](#updating-server-code)
 
-## Initial Setup
+## Initial setup
 
 1.  Create a new [Google Compute Engine instance](https://console.cloud.google.com/compute/instances?project=sdow-prod)
     from the `sdow-web-server` instance template, which is configured with the following specs:
@@ -208,7 +208,7 @@
     $ sudo service stackdriver-agent start
     ```
 
-## Recurring Setup
+## Recurring setup
 
 1.  Activate the `virtualenv` environment:
 
@@ -243,7 +243,7 @@
     `gunicorn` is written to `/tmp/gunicorn-stdout---supervisor-<HASH>.log`. Logs are also written to
     Stackdriver Logging.
 
-## Updating Data Source
+## Updating data source
 
 To update the web server to a more recent `sdow.sqlite` file with minimal downtime, run the
 following commands after SSHing into the web server:
@@ -258,7 +258,7 @@ $ cd config/
 $ supervisorctl restart gunicorn
 ```
 
-## Updating Server Code
+## Updating server code
 
 To update the Python server code which powers the SDOW backend, run the following commands after
 SSHing into the web server:

diff --git a/requirements.txt b/requirements.txt
@@ -1,10 +1,10 @@
-flask == 2.3.2
-flask-compress == 1.4.0
-flask-cors == 3.0.9
-litecli == 1.2.0
-google-cloud-logging == 1.14.0
+flask == 3.0.3
+flask-compress == 1.15.0
+flask-cors == 4.0.1
+litecli == 1.11.0
+google-cloud-logging == 3.10.0
 google-compute-engine == 2.8.13
-gunicorn == 19.9.0
-protobuf == 3.18.3
-requests == 2.31.0
-supervisor == 4.1.0
+gunicorn == 22.0.0
+protobuf == 4.25.3
+requests == 2.32.3
+supervisor == 4.2.5
diff --git a/scripts/combine_grouped_links_files.py b/scripts/combine_grouped_links_files.py
@@ -4,8 +4,6 @@
 Output is written to stdout.
 """
 
-from __future__ import print_function
-
 import io
 import sys
 import gzip

diff --git a/scripts/create_mock_databases.py b/scripts/create_mock_databases.py
@@ -1,5 +1,3 @@
-from __future__ import print_function
-
 import os
 import sqlite3
 import subprocess

diff --git a/scripts/generate_updated_wikipedia_facts.py b/scripts/generate_updated_wikipedia_facts.py
@@ -5,8 +5,6 @@
 Generates an updated Wikipedia facts JSON file.
 """
 
-from __future__ import print_function
-
 import os
 import json
 import sqlite3

diff --git a/scripts/lookup_wikipedia_page_info.py b/scripts/lookup_wikipedia_page_info.py
@@ -5,8 +5,6 @@
 Looks up Wikipedia page information via the official Wikipedia API given a list of page IDs.
 """
 
-from __future__ import print_function
-
 import requests
 
 WIKIPEDIA_API_URL = 'https://en.wikipedia.org/w/api.php'

diff --git a/scripts/prune_pages_file.py b/scripts/prune_pages_file.py
@@ -5,12 +5,9 @@
 Output is written to stdout.
 """
 
-from __future__ import print_function
-
 import io
 import sys
 import gzip
-from sets import Set
 
 # Validate input arguments.
 if len(sys.argv) < 3:

diff --git a/scripts/replace_titles_and_redirects_in_links_file.py b/scripts/replace_titles_and_redirects_in_links_file.py
@@ -5,12 +5,9 @@
 Output is written to stdout.
 """
 
-from __future__ import print_function
-
 import io
 import sys
 import gzip
-from sets import Set
 
 # Validate inputs
 if len(sys.argv) < 4:
@@ -35,7 +32,7 @@
   sys.exit()
 
 # Create a set of all page IDs and a dictionary of page titles to their corresponding IDs.
-ALL_PAGE_IDS = Set()
+ALL_PAGE_IDS = set()
 PAGE_TITLES_TO_IDS = {}
 for line in io.BufferedReader(gzip.open(PAGES_FILE, 'r')):
   [page_id, page_title, _] = line.rstrip('\n').split('\t')

diff --git a/scripts/replace_titles_in_redirects_file.py b/scripts/replace_titles_in_redirects_file.py
@@ -4,12 +4,9 @@
 Output is written to stdout.
 """
 
-from __future__ import print_function
-
 import io
 import sys
 import gzip
-from sets import Set
 
 # Validate input arguments.
 if len(sys.argv) < 3:
@@ -29,7 +26,7 @@
   sys.exit()
 
 # Create a set of all page IDs and a dictionary of page titles to their corresponding IDs.
-ALL_PAGE_IDS = Set()
+ALL_PAGE_IDS = set()
 PAGE_TITLES_TO_IDS = {}
 for line in io.BufferedReader(gzip.open(PAGES_FILE, 'r')):
   [page_id, page_title, _] = line.rstrip('\n').split('\t')

diff --git a/sdow/database.py b/sdow/database.py
@@ -4,8 +4,8 @@
 
 import os.path
 import sqlite3
-import helpers as helpers
-from breadth_first_search import breadth_first_search
+import sdow.helpers as helpers
+from sdow.breadth_first_search import breadth_first_search
 
 
 class Database(object):