- What is this?
- Get the data
- Run the project
- What to expect
- Postgres DB analysis
Armslist-analysis was made to clean and summarize data from Armslist.com, a site used as a marketplace for buying and selling guns. It can be used with the data scraped by NPR or in conjunction with the Armslist scraper.
The following things are assumed to be true in this documentation.
- You are running OSX.
- You are using Python 2.7. (Probably the version that came OSX.)
- You have virtualenv and virtualenvwrapper installed and working.
- You have postgres installed and running
For more details on the technology stack used with the app-template, see our development environment blog post.
This code should work fine in most recent versions of Linux, but package installation and system dependencies may vary.
Clone the project:
git clone email@example.com:nprapps/armslist-analysis.git cd armslist-analysis
The data was scraped from the Armslist.com website in a separate repo, the filename includes the date where the scraper was run:
Place the dataset into the data folder.
Create a virtual environment and install the requirements:
mkvirtualenv armslist-analysis pip install –r requirements.txt
Run the script to clean and geocode the data:
Note: The current dataset supplied is about 80000 records so it can take some time to clean and geocode, patience is a virtue...or so they say
Sometimes the geocoding service is not accesible so we always cache and persist the geocoded locations not to repeat ourselves
Because on the original website some cities where not actually cities but could be thought more as regions, we did manually update some geolocations like
West PA, Pennsylvania (15-20 manually updated).
Note: For the final map we made some hand cleaning of place names to be more consistent
The script will create an on-memory geocode cache to try to minimize the hits to the actual Nominatim geocoding service API.
Running script will make two csv files:
data/listings-clean-nominatim.csvis the bulk of the data with geolocation included. Each row represents a listing and the associated details.
data/geocoded-cache-nominatim.csvis the geocoded cache persisted to disk for future runs of the script
Start your postgres server in case you have forgotten, if you have followed our development environment setup then:
We created a script to insert the cleaned data into a Postgres database for further analysis
After the script has successfully created the database tables, we can run the script that will generate the output data that has been used in our own articles
Running this script will create an output folder with all the csvs that we have used for our analysis.