"ip" is a complete stack for the procurement, processing, analysis and visualization of honeypot data.
Geographical visualization of attacker data
Sample of the SQLite database contents
Bubble map of attackers based on continent
- Ingests Cowrie honeypot JSON logs into SQLite at
100,000+ insertions/sec
, while adding geolocation data from MaxMind.- Gets the following information on honeypot attackers:
- Continent, Country, ISP, Region, City, Timezone, and Postal Code
- Latitude and Longitude with Accuracy Radius
- Activity log (login success/fail, logout, credentials used)
- Log of all access timestamps, as well as timestamp for first and last attacker sightings
- Number of attacks from an IP on the honeypot
- Visualizes data out of the box in the following manners:
- Map IP addresses by geolocation, with color coding and labelling based on severity of threat
- Chart IP addresses by number of attacks conducted
- Gets the following information on honeypot attackers:
- Exposes all SQLite data as a Pandas/GeoPandas dataframe, which can be directly manipulated and visualized in the included Jupyter Notebook
- Low memory consumption
- Clone the GitHub repository with
git clone https://github.com/ranguli/ip
- Install the prerequisite packagess for your OS listed below.
- Within the project root, run
python install -r requirements.txt
, preferably in a python venv.
sudo apt install libpython3-dev proj-bin libgeos-dev libproj-dev
- Run
log_digester.py
. This will do the following:- Create the sqlite schema, including views, necessary for storing the converted data
- Perform Geolocation on IP addresses
- Create a SQLite view 'wordlist' - containing the attackers credentials. Export wordlist to
.txt
option TBD. - Create a profile of each individual attacker, including number of attacks
- Create a profile of each city and each country, including number of attacks
- System packages:
libgeos-dev
,libgdal-dev
,libproj-dev
- For Python requirements see
requirements.txt
- GeoLite2 City and ASN MMDB files in the root directory of the project, freely downloadable here
One days worth of Cowrie JSON logs are 60MB on average. This means that if the honeypot is running 24/7, you'll end up collecting about 20-30GB of uncompressed raw log data per honeypot a year. This is substantially less if you compress the data into tar achives. The SQLite database turns a 60-80MB daily log into roughly 1MB of processed data. So uncompressed it will yield roughly 365MB a year.
Extrapolating this out to a honeynet containing 5 sensors operated over 3 years:
(Uncompressed)
- Daily log yield:
~300MB
- Yearly log yield:
~100GB
- Total log yield:
~300GB
- Total SQlite yield:
~1GB
(Compressed)
- Daily log yield:
~30MB
- Yearly log yield:
~11GB
- Total log yield:
~33GB
- Create a virtualenv with
requirements.txt
packages installed - Run
python log_digester.py
, which will use the sample data provided in the repo - Run
jupyter notebook
to view the data visualizations
- Dockerize
- Write a Prometheus exporter
- Extract and analyze data based on timeframes
- Get the number of attacks/day, attacks/month, etc
- How do we determine the number of attacks for a given day?
- Need to normalize the timestamps first
- How do we determine the number of attacks for a given day?
- Get the average frequency of attacks for a timeframe (every minute, twice a day, etc)
- Get the number of attacks/day, attacks/month, etc