-
Notifications
You must be signed in to change notification settings - Fork 0
CDN Whitelist Parser
The purpose of this submodule is to get all ASNs that are owned by CDNs from hackertarget.com, converting this data into csvs and inserting this data into the database.
The purpose of this parser is to download ASNs owned by CDNs from hackertarget.com and insert them into a database. This is done through a series of steps.
- Get list of CDNs from cdns.txt in the submodule
- Handled in the _get_cdns function
- Make an API call to https://api.hackertarget.com/aslookup/?q=
- Handled in the _run function
- This will get the json for the ASNs
- Format the data for database insertion
- Handled in the _run function
- Insert the data into the database
- Handled in the utils.rows_to_db
- First converts data to a csv then inserts it into the database
- CSVs are used for fast bulk database insertion
lib_bgp_data --cdn_whitelist
For debugging:
lib_bgp_data --cdn_whitelist --debug
or a variety of other possible commands, I've tried to make it fairly idiot proof with the capitalization and such.
The other way you can run it is with:
python3 -m lib_bgp_data --cdn_whitelist
Initializing the CDN_Whitelist class:
Parameter | Default | Description |
---|---|---|
name | self.__class__.__name__ |
The purpose of this is to make sure when we clean up paths at the end it doesn't delete files from other parsers. |
path | "/tmp/bgp_{}".format(name) |
Not used |
csv_dir | "/dev/shm/bgp_{}".format(name) |
Path for CSV files, located in RAM |
stream_level | logging.INFO |
Logging level for printing |
section | "bgp" |
database section to use |
Note that any one of the above attributes can be changed or all of them can be changed in any combination
To initialize CDN_Whitelist with default values:
from lib_bgp_data import CDN_Whitelist
parser = CDN_Whitelist()
To initialize CDN_Whitelist with custom path, CSV directory, and logging level and section:
from logging import DEBUG
from lib_bgp_data import CDN_Whitelist
parser = CDN_Whitelist(path="/my_custom_path",
csv_dir="/my_custom_csv_dir",
stream_level=DEBUG,
section="mydatabasesection")
To run the CDN_Whitelist with defaults (there are no optional parameters):
from lib_bgp_data import CDN_Whitelist
CDN_Whitelist().parse_roas()
- Table of contents
- Hacker target allows 100 free lookups/day. One company counts as 1 lookup
- There are several tools for this, however most of
them don't return all the ASNs for a company, or some companies don't show up
in search, or can't search for the company by name.
- utratools.com
- mxtoolbox.com
- dnschecker.org
- spyse.com
- ipinfo.io
- Using the different IRR's APIs is convuluted. They each maintain a different one. RIPE's database lookup tool says it can lookup across all the IRRs but when I try, I just get errors. Also to get the ASN, you first need to search by organisation, then get the organisation id, then perform an inverse search for ASNs using that organisation id.
- The list of CDNs is in cdns.txt. It's a handpicked list. Sometimes companies aren't very tight on branding and register ASNs under a different name.
-
Table of contents
- This table contains information on the ASNs retrieved from the hackertarget.com
- Unlogged tables are used for speed
- asn: The ASN of an AS (bigint)
- cdn: Name of CDN (varchar)
- Create Table SQL:
CREATE UNLOGGED TABLE IF NOT EXISTS {self.name} ( cdn varchar (200), asn bigint );