Skip to content
A compilation of links to datajournalism & OSINT tools, guides and resources I find useful to keep at hand.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
LICENSE
README.md

README.md

datajournalism-resources

A compilation of links to datajournalism & OSINT tools, guides and resources I find useful to keep at hand. PRs welcomed!

Legend:

  • 🌐 = online tool/service/resource
  • 💻 = software
  • 📖 = guide/tutorial
  • 📝 = list of tools/resources
  • 🐍 = Python module
  • 💲 = paid or paid-only tool/service

Contents

APIs

  • Public APIs 📝 - A categorized list of APIs.
  • Postman 💻 - API development environment offering useful tools for crafting and debugging API requests.
  • ProgrammableWeb 📝 - A good API directory.

Archival

Breached Data

  • Have I Been Pwned? 🌐 - Check if an email appears in a breach, set up alerts.
  • GhostProject 🌐 - Check if an email appears in a breach. Shows the first 3 characters of the password for free.
  • Dehashed 🌐💲 - Find cleartext & hashed password from data breaches (paid, $4/week, $11/mo).
  • h8mail 💻 - Find passwords through different breach and reconnaissance services. Can also search the BreachedCompilation torrent.
  • Breach Data Search Engines Comparison 📝 (IntelTechniques)

Companies

  • OCCRP Data 🌐 - Fantastic search tool & resources made available by OCCRP. Public records, leaks, scraped business registers, and more.
  • OpenCorporates 🌐 - A very comprehensive companies database. Has an API
  • ICIJ's Offshore Leaks Database 🌐 - Data on offshore companies, foundations and trusts from the Panama Papers, the Offshore Leaks, the Bahamas Leaks and the Paradise Papers investigations.
  • List of company registers 📝 (Wikipedia) - A list of all companies registers, by country.
  • CompaniesHouse Short Guide 📖 - (Bellingcat) A guide about the UK online company registry.

Data Analysis & Manipulation

  • OpenRefine 💻 - Clean & transform messy data.
  • csvkit 💻 - A suite of command-line tools for converting to and working with CSV files.
  • pandas 🐍 - Powerful Python data analysis library. Best used in a Jupyter notebook.

Email

Lists of tools & resources

Location, Maps, Satellite Imagery

Mapping services & software

Tools & techniques

User generated content

  • Social media (see category)
  • Tourism & review websites: Foursquare, TripAdvisor, Yelp... 🌐
  • Wikimapia 🌐 - User-generated locations & descriptions. Has an API. Also allows to switch between satellite imagery from Google, Bing, OSM.
  • OpenStreetMap 🌐 - User-generated locations & maps. Use taginfo and/or overpass-turbo.eu - To search a location by key/value tags (see OSM's Wiki)
  • Vkontakte 🌐 - Use near:<coordinates> in a search.
  • EchoSec 🌐💲 - Search and analyze social media data based on location. ($499/mo)
  • GeoCreepy 💻 - Geolocation information gathering through social networking platforms (discontinued).

Interpretation

Multi-purpose tools

  • Belati 💻 - Command-line OSINT tool with whois, subdomain enumeration, mail harvesting, and more.
  • Maltego CE 💻 - Interactive data mining & mapping tool.
  • DataSploit 💻 - A collection of python scripts which automate open source intelligence searches about domain names, email addresses, IP addresses and usernames.
  • Buscador 💻 - A very handy VM with plenty of pre-installed & pre-configured OSINT tools.
  • Spiderfoot 💻 - Open source intelligence automation tool. Gathers intelligence about a given target, which may be an IP address, domain name, hostname, network subnet, ASN, e-mail address or person's name.

Phone numbers

  • NumberWay 🌐 - International directory of white pages and yellow pages phone books.
  • PhoneInfoga 💻 - Information gathering & OSINT reconnaissance tool for phone numbers.

Pictures, Photos, Videos

Search

Metadata

Reverse search

Verification & Analysis

Military/Weapons

Social Media

Facebook

Instagram

  • InstaLooter 💻 - Download all pictures & videos from an Instagram profile. No API key needed.

Linkedin

  • raven 💻 - Linkedin information gathering tool. Extracts employee data for a given company.

Reddit

Twitter

  • Tweetdeck 🌐
  • Tweetdeck Location Search Tutorial 📖
  • Tweets Analyzer 💻 - Twitter profile analyzer: tweet activity, locations, most used hashtags, etc. Can save tweets to JSON. Requires a Twitter API key.
  • TWINT (Twitter Intelligence Tool) 💻 - Advanced Twitter scraping tool, no API key needed. Can export to text, CSV, JSON, SQLite, Elasticsearch. Can detect emails, phone numbers, profiles.
  • twarc 💻🐍 - A command line tool and Python library for archiving Twitter in JSON format.
  • tinfoleak 💻 - Very complete open-source tool for Twitter intelligence analysis. Needs API credentials.

Text & Documents

Indexing & searching

  • Aleph 💻 - A toolkit for data search, management and analysis in investigative reporting.
  • Blacklight 💻 - Open source Solr user interface discovery platform.
  • ICIJ Extract 💻 - A command line tool for parallelized, distributed content-extraction.
  • searchbox 💻 - A simple out-of-the-box web interface to search through thousands of unstructured documents using Solr.

OCR

  • NewOCR.com 🌐 - Recognizes several languages, can resize images, shortcuts to Google & Bing Translate.
  • Tesseract 💻 - Open-source OCR engine.

Natural Language Processing

  • topia 🐍 - Python module to determine important terms within a given piece of content.

PDF

Visualization

Maps

Mindmaps & Network graphs

Timelines

  • TimelineJS 💻
  • Tik Tok 💻 - Javascript tool to easily create simple, mobile-friendly, vertical timelines. Open-source.

Weather

Websites

Searches, info, related entities

Scraping

Dark Web & Onion services

Whistleblowing software

Misc

  • Shodan 🌐 - Internet of Things search engine
  • grayhatwarfare 🌐 - Search open Amazon S3 buckets content.
  • awesome-selfhosted 📝 - A list of Free Software network services and web applications which can be hosted locally

License

This list is under the Creative Commons Attribution-NonCommercial 4.0 International Public License License.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.