Skip to content
master
Switch branches/tags
Code

Latest commit

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
Feb 13, 2019
Jun 18, 2019

datajournalism-resources

A compilation of links to datajournalism & OSINT tools, guides and resources I find useful to keep at hand. PRs welcomed!

by r3mlab | License: CC-BY-NC 4.0

Legend:

  • 🌐 = online tool/service/database
  • 💻 = software
  • 📖 = guide/tutorial
  • 📝 = list of tools/resources
  • 🐍 = Python module
  • 💲 = paid or paid-only tool/service

Contents

APIs

  • Postman 💻 - API development environment offering useful tools for crafting and debugging API requests.
  • ProgrammableWeb 📝 - A good API directory.
  • Public APIs 📝 - A categorized list of APIs.

Archival

Breached Data

  • Breach Data Search Engines Comparison 📝 (IntelTechniques)
  • CardPwn 💻 - Find out if a credit card number appears in a breach.
  • Dehashed 🌐💲 - Find cleartext & hashed password from data breaches (paid, $4/week, $11/mo).
  • GhostProject 🌐 - Check if an email appears in a breach. Shows the first 3 characters of the password for free.
  • h8mail 💻 - Find passwords through different breach and reconnaissance services. Can also search the BreachedCompilation torrent.
  • Have I Been Pwned? 🌐 - Check if an email appears in a breach, set up alerts.
  • pwndb.py 💻 - Command-line tool for searching leaked credentials using the Onion service with the same name.
  • WhatBreach 💻 - Search for breached emails and their corresponding database.

Companies

  • CompaniesHouse Short Guide 📖 (Bellingcat) - A guide about the UK online company registry.
  • DocumentCloud Search 🌐 - Search public documents uploaded to DocumentCloud, a publishing plateform used by many journalists and media.
  • ICIJ's Offshore Leaks Database 🌐 - Data on offshore companies, foundations and trusts from the Panama Papers, the Offshore Leaks, the Bahamas Leaks and the Paradise Papers investigations.
  • List of company registers 📝 (Wikipedia) - A list of all companies registers, by country.
  • OCCRP Data 🌐 - Fantastic search tool & resources made available by OCCRP. Public records, leaks, scraped business registers, and more.
  • OCCRP Investigative Dashboard 📝 - Collection of the most useful public data sources for investigative reporting. Many business registries listed.
  • OpenCorporates 🌐 - A very comprehensive companies database. Has an API.
  • Open Ownership Register 🌐 - Explore beneficial ownership data. Aggregates many datasets.

Data Analysis & Manipulation

See also: Visualization

  • csvkit 💻 - A suite of command-line tools for converting to and working with CSV files.
  • OpenRefine 💻 - Clean & transform messy data.
  • pandas 🐍 - Powerful Python data analysis library. Best used in a Jupyter notebook.

Email

See also: Breached Data

  • emailrep.io 🌐 - Public email reputation search & API. Can find social media profiles.
  • Infoga 💻 - Gather email accounts information (ip, hostname, country, etc) from different public sources.
  • theHarvester 💻 - Python command-line tool to search several search engines for mail addresses from a particular domain.
  • The most complete guide to finding anyone's email 📖 (Blurbiz)
  • Trumail 🌐 - Free email verification API.

Lists of tools & resources

Location, Maps, Satellite Imagery

Interpretation

Mapping services & software

Tools & techniques

User generated content

See also: Social Networks

Military/Weapons

Multi-purpose tools

  • Buscador 💻 - A very handy VM with plenty of pre-installed & pre-configured OSINT tools.
  • DataSploit 💻 - A collection of python scripts which automates open source intelligence searches about domain names, email addresses, IP addresses and usernames.
  • IntelligenceX Tools 🌐 - Various search, email and domain tools.
  • Maltego CE 💻 - Interactive data mining & mapping tool.
  • Spiderfoot 💻 - Open source intelligence automation tool. Gathers intelligence about a given target, which may be an IP address, domain name, hostname, network subnet, ASN, e-mail address or person's name.

News

  • AllYouCanRead 📝 - Database of news outlets by country.
  • NewsLookup 🌐 - News search engine with useful filters.
  • NewsNow 🌐 - News search engine with useful filters.
  • NewspaperMap 🌐 - Newspapers world map with feeds and automatic translation.

Phone numbers

Pictures, Photos, Videos

Pictures Metadata

Reverse search

Search

  • How to Conduct Comprehensive Video Collection (Bellingcat) 📖
  • PimEyes 🌐 - Face-recognition matching search engine.
  • SearchFace.ru 🌐 - Face recognition search engine for the Russian VK social network. See this guide from Bellingcat for a tutorial.
  • SocialMapper 🌐 - Social Media Mapping Tool that correlates profiles via facial recognition. Supports LinkedIn, Facebook, Twitter, Instagram, VKontakte, Weibo, Douban.

Verification & Analysis

Social Networks

All/General

  • EagleEye 💻 - Find Instagram, FB and Twitter profiles using image recognition and reverse image search.
  • HashAtIt 🌐 - Hashtag search across Twitter, Instagram, Pinterest, Facebook and Youtube.
  • Sherlock 💻 - Search for a username across 135 social media sites.
  • SocialMapper 🌐 - Social Media Mapping Tool that correlates profiles via facial recognition. Supports LinkedIn, Facebook, Twitter, Instagram, VKontakte, Weibo, Douban.
  • WhatsMyName 💻 - Search for usernames on 180+ web sites.

Discord

  • dis.cool 🌐 - Discord search engine.

Facebook

  • fb-search 🌐 - Simple Graph query crafter. Made after Facebook sudden closure of Graph Search.
  • FFFF Finds Facebook Friends 💻 - Builds a relationship graph of a target user. Partially reconstructs hidden friend lists. 🔥.

Github

  • gitrob 💻 - Find potentially sensitive files pushed to public repositories on Github. Requires a GitHub access token.
  • Zen 💻 - Find emails of Github users.

Instagram

  • instaloader 💻 - Download pictures (or videos) along with their captions and other metadata from Instagram.
  • instagram-scraper 💻 - Scrape a user's photos and videos.
  • searchmybio 🌐 - Search Instagram users biographies.

Linkedin

Reddit

Snapchat

  • Snapdex 🌐 - Searchable database of Snapchat usernames.
  • Snap Map 🌐 - Official Snapchat map.

Telegram

  • Buzz.im 🌐 - Search in open telegram messages.
  • Lyzem 🌐 - Telegram search engine.
  • Telegago 🌐 - Google Custom Search Engine for Telegram users & content. Can discover private groups.
  • tlgrm.eu 🌐 - Search for Telegram channels.
  • tgstat.ru 🌐 - Telegram analytics & seach tool.

Twitter

  • DMI-TCAT 💻 - PHP web interface to retrieve and analyze tweets.
  • SocialBearing 🌐 - Statistics on keywords, hashtags, users.
  • SpoonBill 🌐 - Track changes in Twitter profiles & bios. Requires a Twitter account.
  • tinfoleak 💻 - Very complete open-source tool for Twitter intelligence analysis. Needs API credentials.
  • twarc 💻🐍 - A command line tool and Python library for archiving Twitter in JSON format.
  • Tweetdeck 🌐
  • Tweetdeck Location Search Tutorial 📖
  • Tweet Map 🌐 - Explore the world and find geo-tagged tweets.
  • Tweets Analyzer 💻 - Twitter profile analyzer with tweet activity charts, locations, most used hashtags, etc. Can save tweets to JSON. Requires a Twitter API key.
  • tweetsmapper 💻 - Generates a Leaflet map for a given user or from an existing collection of tweets. Can retrieve full timelines.
  • TWINT (Twitter Intelligence Tool) 💻 - Advanced Twitter scraping tool, no API key needed. Can export to text, CSV, JSON, SQLite, Elasticsearch. Can detect emails, phone numbers, profiles.
  • Who Tweeted It First? 🌐 - Find out who was the first person who tweeted a link, video, quote or any piece of text.

VKontakte

  • SnRadar 🌐 - Search VKontakte content by location.

Youtube

  • Unlisted Videos 🌐 - Search & submit unlisted YouTube videos. No registration required.

Text & Documents

Documents metadata

  • Apache Tika 💻 - Extract metadata and text from over a thousand different file types.
  • FOCA 🌐💻 - Find metadata and hidden information in Microsoft Office, Open Office, or PDF files.
  • ICIJ Extract 💻 - A command line tool for parallelized, distributed content-extraction.

Indexing & searching

  • Aleph 💻 - A toolkit for data search, management and analysis in investigative reporting.
  • Blacklight 💻 - Open source Solr user interface discovery platform.
  • Datashare 💻 - Index & search documents on your computer, automatically detect people, organizations and locations with NLP.
  • DumpsterDiver 💻 - Analyze big volumes of various file types in search of secrets, credentials, etc.
  • ICIJ Extract 💻 - A command line tool for parallelized, distributed content-extraction.
  • searchbox 💻 - A simple out-of-the-box web interface to search through thousands of unstructured documents using Solr.

OCR

  • NewOCR.com 🌐 - Recognizes several languages. Can resize images & has shortcuts to Google & Bing Translate.
  • Tesseract 💻 - Open-source OCR engine.

PDF

  • PDF Text Extraction with PyPDF2, Tika & PDF Miner. 💻
  • tabula 💻 - Tool for liberating data tables trapped inside PDF files.

Text Processing & Analysis

  • topia 🐍 - Python module to determine important terms within a given piece of content.
  • TXM 💻 - Lexicometry and text statistical analysis for large bodies of text.

Transportation

Containers & Shipments

  • BIC Code Register 🌐 - Business Identifier Codes lookup. The website also has other search tools and useful information on container markings.
  • Prefix List 🌐 - Find the owner of a container from its prefix.
  • track-trace 🌐 - Track parcels/shipments, air cargo, containers and post.

Planes

Ships

Visualization

Graphs

  • Data Visualisation Catalogue 📖 - Find which visualisation is right for what you want to show. Plenty of tips & resources.
  • DataWrapper 🌐💲 - Easy to use graph & map tool. Free plan available.
  • Google Fusion Tables - Create maps & charts from data. Will shut down on Dec. 2019.
  • Matplotlib 🐍 - Python 2D plotting library. Best used with pandas in a Jupyter notebook.
  • RawGraph 🌐💻 - Generate static graphs through a very user-friendly interface. Can be run locally.

Maps

  • ArcGIS 💻💲 - Mapping & analysis software (proprietary, paid, 21-day trial)
  • Folium 🐍 - Python library to create Leaflet.js maps. Can be used in a Jupyter Notebook to map data from pandas.
  • Geopy 🐍 - Python geocoding library. Supports OSM Nominatim, Google, Bing, GeoNames & many more.
  • Google:
  • Humanitarian Data Exchange 🌐 - Useful resources of shapefiles, especially for administrative boundaries.
  • KML Interactive Sampler 🌐 - Lots of KML templates.
  • QGIS 💻 - Free & open-source alternative to ArcGis.

Mindmaps & Network graphs

Timelines

  • Tik Tok 💻 - Javascript tool to easily create simple, mobile-friendly, vertical timelines. Open-source.
  • TimelineJS 💻

Weather

Websites

See also: Archival

Dark Web & Onion services

Scraping

Searches, info, related entities

Misc

  • awesome-selfhosted 📝 - A list of Free Software network services and web applications which can be hosted locally
  • grayhatwarfare 🌐 - Search open Amazon S3 buckets content.
  • Shodan 🌐 - Internet of Things search engine
  • World License Plates 🌐 - Pictures of license plates from all around the world.

License

This list is under the Creative Commons Attribution-NonCommercial 4.0 International Public License License.

About

A compilation of links to datajournalism & OSINT tools, guides and resources I find useful to keep at hand.

Topics

Resources

License

Releases

No releases published

Packages

No packages published