Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support /attributes and /events enpoints from MISP feed for Zeek intel generation #336

Closed
mmguero opened this issue Jan 15, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request external Depends on a bug or feature external to this project zeek Relating to Malcolm's use of Zeek
Milestone

Comments

@mmguero
Copy link
Collaborator

mmguero commented Jan 15, 2024

We've got some MISP capabilities. The code that handles grabbing MISP indicators is here and says in its comments:

# download the URL and parse as JSON to figure out what it is. it could be:
# - a manifest JSON (https://www.circl.lu/doc/misp/feed-osint/manifest.json)
# - a directory listing containing a manifest.json (https://www.circl.lu/doc/misp/feed-osint/)
# - a directory listing of misc. JSON files without a manifest.json

Some colleagues at USAF have been poking at it having discussions with us about expanding its compatibility of what it can handle. They've suggested we look at running MISP with docker compose and pulling from it directly.

I'm going to quote some of that discussion here:


Hey @mmguero I figured out what you should be querying from MISP to integrate with Malcolm. They're called "attributes".

  • Get a list of attributes from sending HTTP GET to the /attributes resource to get a list, as follows:
RESOURCE="/attributes"
MISP_URL=https://misp_url.local (yours will be different)
MISP_API_KEY=2uKLbs44NsGSx0dsJdM4V4MYj9BrH7tOlcRHLedj (yours will be different)

echo "requesting $RESOURCE"
curl \
        --header "Authorization:$MISP_API_KEY" \
        --header "Accept: application/json" \
        --header "Content-Type: application/json" \
        $MISP_URL$RESOURCE
  • The result will be a JSON, which is a massive list of dictionaries (see the id field below, at 626143!). Each of which resembles the following:
    {
        "id": "626143",
        "event_id": "166",
        "object_id": "0",
        "object_relation": null,
        "category": "Network activity",
        "type": "ip-dst",
        "value1": "80.51.7.66",
        "value2": "",
        "to_ids": true,
        "uuid": "37bb0fca-9043-4e09-a758-0efb3eae9937",
        "timestamp": "1704950232",
        "distribution": "5",
        "sharing_group_id": "0",
        "comment": "",
        "deleted": false,
        "disable_correlation": false,
        "first_seen": null,
        "last_seen": null,
        "value": "80.51.7.66"
    },
   { ...

As you might guess, the meat of this is the "value" member of the JSON, as that IP address is the dirty IP address that MISP is trying to say is malicious. I assume the value1 and value2 fields should be parsed as well for similar reasons.

What I'd recommend is standing up a MISP instance, loading the full list of default feeds, enabling them, and fetching from them (a guide is here).

And then waiting a good few hours for it to fetch down this massive list of attributes.

After you feel you have a good number of attributes (use the bash script at the beginning of this post and note the id, as that that is the current size of the list), iterate over each of them and enumerate what members are possible. Then decide which members are relevant to Malcolm and how they should be appropriately parsed. Perhaps that is just the IP addresses ("value" for relevant attributes), but you would know better than me in this regard!


I also found a bug! In MISP though: where when it pulls the attributes list, it doesn't give you the complete list. It would only give me 60, despite there actually being 4,000,000. Thus, what I described as just GET-ing /attributes will not give you the full list.

The solution is to find out how long the list is by pulling the attribute list and checking each for the highest id number (it appears that the first element is the highest ID, but I do not conclusively know this. Hence, check each ID from the list given), and then individually pulling all IDs from 1 up to that number.

Code snippet in the comment below to avoid blowing up the channel. Also, note that when we access attributes individually, we now have an outer single-element dictionary of just "Attribute."


#!/usr/bin/env python3

import requests
import time

MISP_URL="your misp url"
MISP_API_KEY="your misp api key"

headers = {
        "Authorization": MISP_API_KEY,
        "Accept": "application/json",
        "Content-Type": "application/json"
    }

# show the length of the JSON retrieved, which was 60
r = requests.get(f"{MISP_URL}/attributes", headers=headers)
print("length of GET '/attributes' json (which shows a small number): " + str(len(r.json())))

print("===================================")

# get the largest ID number
largest_id = 0
for attribute in r.json():
    if "id" in attribute and int(attribute["id"]) > largest_id:
        print(f"new largest id {attribute['id']}!")
        largest_id = int(attribute["id"])
    else:
        print(f"not a new largest ID: {attribute['id']}!")
print("largest id (size of actual list of attributes): " + str(largest_id))

print("===================================")

# iterate over each id individually and request the JSON corresponding to it
for attribute_id in range(int(largest_id)):
    r = requests.get(f"{MISP_URL}/attributes/view/{attribute_id}", headers=headers)
    item = r.json()
    if 'Attribute' in item:
        print(f"id = {item['Attribute']['id']}, value = {item['Attribute']['value']}")
    else:
        print(f"NOT AN ATTRIBUTE: id = {attribute_id}, json={item}") # this happens on id=1

Sample output:

length of GET '/attributes' json (which shows a small number): 60
===================================
new largest id 4000346!
not a new largest ID: 4000345
not a new largest ID: 4000342
not a new largest ID: 4000336
not a new largest ID: 4000333
not a new largest ID: 4000332
not a new largest ID: 4000330
not a new largest ID: 4000328
not a new largest ID: 4000324
not a new largest ID: 4000321
not a new largest ID: 4000320
not a new largest ID: 4000319
not a new largest ID: 4000318
not a new largest ID: 4000314
not a new largest ID: 4000312
not a new largest ID: 4000309
not a new largest ID: 4000306
not a new largest ID: 4000303
not a new largest ID: 4000301
not a new largest ID: 4000299
not a new largest ID: 4000298
not a new largest ID: 4000294
not a new largest ID: 4000291
not a new largest ID: 4000289
not a new largest ID: 4000288
not a new largest ID: 4000286
not a new largest ID: 4000283
not a new largest ID: 4000281
not a new largest ID: 4000279
not a new largest ID: 4000277
not a new largest ID: 4000276
not a new largest ID: 4000275
not a new largest ID: 4000274
not a new largest ID: 4000273
not a new largest ID: 4000271
not a new largest ID: 4000268
not a new largest ID: 4000265
not a new largest ID: 4000263
not a new largest ID: 4000260
not a new largest ID: 4000256
not a new largest ID: 4000251
not a new largest ID: 4000249
not a new largest ID: 4000247
not a new largest ID: 4000243
not a new largest ID: 4000238
not a new largest ID: 4000234
not a new largest ID: 4000231
not a new largest ID: 4000227
not a new largest ID: 4000225
not a new largest ID: 4000223
not a new largest ID: 4000222
not a new largest ID: 4000219
not a new largest ID: 4000216
not a new largest ID: 4000214
not a new largest ID: 4000212
not a new largest ID: 4000211
not a new largest ID: 4000210
not a new largest ID: 4000209
not a new largest ID: 4000204
not a new largest ID: 4000202
largest id (size of actual list of attributes): 4000346
===================================
NOT AN ATTRIBUTE: id = 0, json={'name': 'Invalid attribute', 'message': 'Invalid attribute', 'url': '/attributes/view/0'}
id = 1, value = 101.32.254.178
id = 2, value = 103.123.62.146
id = 3, value = 103.151.125.131
id = 4, value = 103.193.179.52
id = 5, value = 104.131.72.118

and so on

@mmguero mmguero added external Depends on a bug or feature external to this project zeek Relating to Malcolm's use of Zeek labels Jan 15, 2024
@mmguero mmguero modified the milestones: z.staging, v24.03.0 Jan 15, 2024
@mmguero mmguero added the falcon label Jan 22, 2024
@mmguero mmguero modified the milestones: v24.03.0, v24.02.0 Jan 30, 2024
@mmguero mmguero assigned mmguero and IdahoManny and unassigned mmguero Feb 5, 2024
@mmguero mmguero added the enhancement New feature or request label Feb 12, 2024
@mmguero mmguero modified the milestones: v24.02.0, v24.03.0 Feb 12, 2024
@mmguero mmguero assigned mmguero and unassigned IdahoManny Feb 27, 2024
@mmguero
Copy link
Collaborator Author

mmguero commented Feb 27, 2024

#!/usr/bin/env bash

RESOURCE="${1:-/attributes}"
MISP_URL=https://localhost:31443/
MISP_API_KEY=xxxxxxxxxxx

echo "requesting $RESOURCE" >&2
curl -fsSLk \
        --header "Authorization:$MISP_API_KEY" \
        --header "Accept: application/json" \
        --header "Content-Type: application/json" \
        $MISP_URL$RESOURCE

@mmguero
Copy link
Collaborator Author

mmguero commented Feb 27, 2024

I believe we can use the page, limit, type, from|to|last, to page over these attributes rather than going through from 1 to the highest number and then pulling them one at a time.

mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Feb 28, 2024
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Feb 28, 2024
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Feb 28, 2024
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Feb 28, 2024
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Feb 28, 2024
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Feb 28, 2024
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Feb 28, 2024
@mmguero
Copy link
Collaborator Author

mmguero commented Feb 29, 2024

The updated documentation:


MISP

In addition to loading Zeek intelligence files on startup, Malcolm will automatically generate a Zeek intelligence file for all Malware Information Sharing Platform (MISP) JSON files found under ./zeek/intel/MISP.

Additionally, if a special text file named .misp_input.txt is found in ./zeek/intel/MISP, that file will be read and processed as a list of MISP feed URLs, one per line, according to the following format:

misp|misp_url|auth_key (optional)

For example:

misp|https://example.com/data/feed-osint/manifest.json|df97338db644c64fbfd90f3e03ba8870
misp|https://example.com/doc/misp/|
misp|https://example.com/attributes|a943f5ff506ee6198e996333e0b672b1
misp|https://example.com/events|a943f5ff506ee6198e996333e0b672b1
…

Malcolm will attempt to connect to the MISP feed(s) and retrieve Attribute objects of MISP events and convert them to the Zeek intelligence format as described above. There are publicly available MISP feeds and communities, or users may run their own MISP instance.

Upon Malcolm connects to the URLs for the MISP feeds in .misp_input.txt, it will attempt to determine the format of the data served and process it accordingly. This could be presented as:

Note that only a subset of MISP attribute types can be expressed with the Zeek intelligence indicator types. MISP attributes with other types will be silently ignored.

@mmguero mmguero closed this as completed Feb 29, 2024
@mmguero mmguero changed the title expand MISP usage support /attributes and /events enpoints from MISP feed Mar 4, 2024
@mmguero mmguero changed the title support /attributes and /events enpoints from MISP feed support /attributes and /events enpoints from MISP feed for Zeek intel generation Mar 4, 2024
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request external Depends on a bug or feature external to this project zeek Relating to Malcolm's use of Zeek
Projects
Status: Released
Development

No branches or pull requests

2 participants