Skip to content
This repository has been archived by the owner on Jun 2, 2021. It is now read-only.

Project dead? Need takeover? #72

Closed
deadbits opened this issue Dec 27, 2018 · 22 comments
Closed

Project dead? Need takeover? #72

deadbits opened this issue Dec 27, 2018 · 22 comments
Assignees
Labels

Comments

@deadbits
Copy link

Just want to know if this is officially dead. If so, is it deprecated in favor of another project? Is it just lack of developers / time? is there anything the community can do to help?

The initial concept here has really solid potential and I'd hate to see it just disappear to time on GitHub. Lmk how I can help!

CC:
@merces @jseidl @turicas

@merces merces self-assigned this Dec 27, 2018
@merces
Copy link
Owner

merces commented Dec 27, 2018

Hi @deadbits!

Thank you very much for your offer. It's really appreciated!

I definitely believe in Aleph and the only reason it seems abandoned is indeed the lack of developers/time. Are you interested on leading the development here? What exactly do you have in mind?

Bringing up this discussion already helps. =)

@deadbits
Copy link
Author

deadbits commented Dec 27, 2018

Well initially I have some ideas for a bit of everything. Is this the type of work you are as inline with the project? If so, I can take over leading development here or at the very least implement some new features and help with PRs and issues

General:

  • Migrate to Python 3 (EOL for 2.7 is about a year away)
  • creating tests
  • Pull Request and Issue templates
  • Tests before merging new PRs

Collectors (updated):

  • paste hunting via YARA signatures / regex
    • Base64 PE header post, Hex encoded PE, PowerShell with suspicious keywords, etc
    • these would go through decoder plugin and check if it's worth keeping the sample
  • Twitter monitoring for search keywords (hxxp, hashes from researchers tweets to fetch from VTI, opendir, dailyscriptlet, etc.)
  • VirusTotal Intelligence Hunting notifications
  • REST API submission endpoint for Collector
  • SQS, S3, and/or DigitalOcean Spaces watcher for Collector

Parsing & Enrichment (updated):

  • ability to parse Outlook messages for the Email monitor
  • Plugin to YARA scan any file type
  • adding more information to the PE model
    • Check for commonly suspicious APIs
    • Check for and verify Authenticode
  • adding an ELF and macho model
  • Enrichment of extracted IOCs (ASN, geoip, DNS resolution, etc)

Exporting:

  • export data to arbitrary REST endpoint
  • export JSON pipeline result to SQS, S3 of Digital Ocean spaces bucket so users can act on pipeline results anyway they want to
  • export JSON pipeline result on disk for use in systems other than Elasticsearch, or just plain old viewing the data

@deadbits
Copy link
Author

deadbits commented Dec 27, 2018

Some more feature thoughts (updated) :

  • tokenizing emails and using those as passwords for protected attachments instead of the hard-coded list that exists now
  • add plug-ins to send files to different online/free or local sandbox services (hybrid, any.run, cuckoo)
  • submit url artifacts to URLScan.io? Idk if we'd want to send everything there
  • VBA extraction from Documents
    • Attempted deoobfuscation
  • Check strings for highly suspicious keywords
    Living of the land binary references, etc.
  • Process scriptlet files for malicious indications and send through pipeline if found
    • HTA, SCT, XML, WS, etc.
  • Reputation DB check for extracted URLs
  • Optional ability to allow for SOCKS5 proxy use for external requests

@jseidl
Copy link
Contributor

jseidl commented Dec 28, 2018 via email

@deadbits
Copy link
Author

deadbits commented Dec 28, 2018 via email

@deadbits
Copy link
Author

deadbits commented Dec 29, 2018

I know we'll likely move this to another ticket, or several, but I put all my ideas into one list so it's easier to view instead of my comments above:

General

  • Migrate to Python 3
  • Create tests
  • Create Pull Request and Issue templates
  • Github integration tests in Pull Request

Samples Object

  • Add filenames list
  • Add first seen / last seen
    • If file is seen more than once just update last seen
    • filenames list can be updated too if available

Collectors

  • Collectors service could be separate so not all collectors are stopped if user stops the service
  • REST API endpoint
    • Submit files directly
    • Submit PCAP, extract files as sample
  • S3 bucket monitor
  • SQS Collector / Polling
  • DigitalOcean Spaces Collector
  • Twitter searches Collector
    • Query and hashtag search of popular researchers and tags; extract hashes and URLs
    • Try to fetch hashes from VTI/Hybrid
    • Check if url is alive, download sample
  • Paste site monitor (this may be more trouble than its worth and leaning too far away from the Aleph purpose imho, but just a thought either way)
    • Keywords
    • Regex
    • YARA
    • Only keep pastes that look to be malware samples

Plugins

  • PE file parser enhancements
    • Certificate
    • Authenicode
    • Suspicious API exports
    • Anti-VM & Anti-Analysis checks
  • Accept and parse ELF and Macho files
  • Subfile extraction (hachoir-subfile + dd, etc)
  • Extract Base64 from files
    • Decode and if interesting MIME type create Sample
  • Extract email attachments for Collectors
  • Outlook email parser
  • Office document parser
    • VBA Macro extraction & deoobfuscation
  • YARA scan
  • ELF parser
  • Macho parser
  • Strings extraction enhancements
    • Find interesting patterns (emails, URLs, IPs, domains, BTC address, phone number)
    • Enrich extracted interesting patterns (hosts, URLs, IPs, emails)
  • VirusTotal plugin enhancements:
    • Get report if it exists
    • Optionally submit file for analysis if no report found (Aleph might be used in sensitive envs where not all files should be uploaded)
  • VTI daily check (requires paid account)
    • Clear notifications once completed
  • HTML parser enhancements
    • Extract links / urls
      • If URL matches pre-defined MIME types to keep, save as sample
      • Maybe crawl found links for more files with interesting MIME types and create a child Samples
  • Check hosts against reputation databases
    • threatexpert, FireHOL lists, VT host check, ShadowServer whitelist check (There's too many choices to list)
  • Submit samples on free online sandboxes or local installation python-sanboxapi
  • Zip and GZip enhancements
    • Tokenize emails to use passwords as brute force list
    • Let user define list of keywords in a config file to try as passwords
  • Ability to define SOCKS5 proxy for web requests
  • Scriptlet file parsers (HTA, SCT, XML, WS, etc)
    • Either as direct submission or as a child
    • If as child, feed back to Collector for pipeline processing

Decoders (subset of plugin to run under certain conditions?)

  • Base64
  • Reverse Base64
  • Hex to binary decoding
    Note: These are based on the Paste scraper finding these types of encoded files

Export Options
* Send to Elasticsearch
* Send to Splunk
* Save JSON to disk as storage
* Send to S3 bucket as storage
* Send to SQS so user can integrate with other systems and workflows
* Send to DigitalOcean Spaces as storage

@merces
Copy link
Owner

merces commented Dec 31, 2018

Wow. This is a lot of good ideas indeed! Thanks for that, @deadbits!

@jseidl We can leverage the "Development" branch as this is not being used by anyone. Would you be able to upload this new code there by the end of next week? I think we should leverage the energy @deadbits is willing to put on it to start it as soon as possible. 🙂

Thank you all!

@deadbits
Copy link
Author

deadbits commented Jan 3, 2019

@merces
Definitely a lot of ideas, indeed heh I don't know how many fit with the direction of this project and imho some would be higher priority than others. Not to mention implementing all those would take quite some time.

There's also a handful of open-source Python libraries I have in mind to lean on for some of the ideas so it's not code written from scratch. Though, I think a decent amount of them are quick-wins while others require more major work.

Regardless, I'm definitely up to help out in any way I can, and working with you and @jseidl to figure out what should be kept or scrapped, what should be prioritized, etc., etc.

@jseidl
Copy link
Contributor

jseidl commented Jan 3, 2019 via email

@deadbits
Copy link
Author

deadbits commented Jan 3, 2019 via email

@jseidl
Copy link
Contributor

jseidl commented Jan 3, 2019 via email

@jseidl
Copy link
Contributor

jseidl commented Jan 4, 2019 via email

@jseidl
Copy link
Contributor

jseidl commented Jan 4, 2019

Ok, attaching PDF from mail didn't work. Uploaded to my Drive here: https://drive.google.com/open?id=1lvNFhJcguHfLgXHm865XXWVnfahTQcOA

@deadbits
Copy link
Author

deadbits commented Jan 5, 2019

Also I'd like to all the collectors fist save the sample locally then consume the local file into the transport to avoid losing the sample in case connection fails abruptly or something else weird happens during collection.
...
On the processor side, on starting up reprocess samples left in the temp
dir, delete from temp dir only when making sure all data is stored on the
backends.

I built a project similar to this architecture and this is definitely the best approach. I'm guessing you're already planning this but storing locally by hash is a solid way to avoid collisions (instead of uuid4 or what not)

Basically:

  • Receive sample from wherever
  • Store locally with a unique file name
  • Put the file into transport
    • When you're sure it's stored on the backend DB (or at least accepted by the Consumer as an Object), delete it locally

@jseidl We can schedule some time to sync up next week maybe? Or this weekend even if that works for you. My weekday evenings are typically open, tomorrow I'm out most of the afternoon. Outside of that, I'm ready to get rolling 🚀

@jseidl
Copy link
Contributor

jseidl commented Jan 5, 2019 via email

@deadbits
Copy link
Author

deadbits commented Jan 5, 2019

Read through your presentation last night - good stuff! Overall it sounds like a really solid framework and the ideas on how to scale it, create the components separately, etc., are all awesome.

I saw you had that plugins would "run in order". I might have misread or skipped a part but is the idea for plugins to run one a a time on any given Processor, or would plugins for a MIME type run in parallel via threading/multiprocessing?


These are thoughts for way down the road but just had it on my mind after reading your PDF:
Another idea could be to have the plugins have an order of execution per MIME type, so each plugin can act on the results of the last. For example, maybe a Zip file comes in so it hits the "brute_zip" plugin, inside is an executable so the "yara_scan" plugin runs; the results of "yara_scan" says that the executable is Trojan ABC- so the "malware_decoder" plugin runs, and then "extract_iocs" runs on the results on malware_decoder, and so on... That way you get the results of all the plugins still but get to provide deeper levels of context as opposed to say: If file == EXE run strings and extract_iocs, sort of thing

Basically sending files down different plugins "paths" depending on their MIME type and any useful information from the previous plugin.


Also, The malware framework FAME also has a pretty cool feature for their plugins where a plugin inheriting the base class can use "acts_on", "generates", "triggered_by" and a few others. It's an interesting idea that might be useful to think on how to implement something similar. "generates" alerts of various types, or "triggered_by" another module in my example above
https://github.com/certsocietegenerale/fame/blob/ab0e9cc3640b2337dbd873a41e03987ba1ba8035/docs/modules.rst#scope

@jseidl
Copy link
Contributor

jseidl commented Jan 5, 2019 via email

@deadbits
Copy link
Author

deadbits commented Jan 5, 2019 via email

@jseidl
Copy link
Contributor

jseidl commented Jan 5, 2019 via email

@deadbits
Copy link
Author

deadbits commented Jan 5, 2019 via email

@deadbits
Copy link
Author

deadbits commented Jan 6, 2019

@jseidl we'll have to use Meet since Duo is mobile only and doesn't support screen share etc.
I just need your email address or you can send me an invite to adam@deadbits.org for today at 3PM Eastern if that still works

@deadbits
Copy link
Author

We can probably close this at this point 😏

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants