Skip to content

Testing

Bruce Allen edited this page Apr 1, 2015 · 42 revisions

There are several scripts in tests that are useful for testing bulk_extractor. These needs to be documented here.

Regression Testing

Regression testing plan for release

  • Compile with and without otimization and make sure results are the same

  • Compare results (with --diff) from last release and current release and report on the differences.

Data

The tests/Data directory contains test data files to verify the behavior of scanners, and to use in regression testing to compare results of running different versions of bulk_extractor on the same test data. Testers should be familiar with all the bulk_extractor scanners and the feature files that the scanners use to write out data that is found. For each feature file, we list in paraenthesis the scanner(s) that write to the file. The next section lists the test files in Data and in parenthesis the sanner(s) that should find data in the test file. The last section lists each scanner and the feature files and test files associated with the scanner.

Bulk_Extractor 1.5 Scanners

  • accts
  • aes
  • base16 (Disabled by default)
  • base64
  • elf
  • email
  • exif
  • facebook (Disabled by default)
  • find
  • gps
  • gzip
  • hashdb (Disabled by default)
  • hiberfile
  • httplogs
  • json
  • kml
  • net
  • outlook (Disabled by default)
  • pdf
  • rar
  • sceadan (Disabled by default)
  • sqlite
  • vcard
  • windirs
  • winlnk
  • winpe
  • winprefetch
  • wordlist (Disabled by default)
  • xor (Disabled by default)
  • zip

Scanners Not Compiled By Default

To use the lightgrep scanner, you must execute the configure script with --enable-lightgrep. Then execute make to build bulk_extractor with the lightgrep scanner included.

  • lightgrep

Experimental Scanners

The following scanners are experimental and/or not in current use.

  • ascii85
  • bulk (basis of sceadan scanner, but has been deprecated and is out of use)
  • extx (aborted attempt to find ext2/ext3/ext4 directory entries)
  • exvi2 (replaced by our exif scanner; had crashed; is deprecated at present)
  • httpheaders (current implementation causes hangs and possibly crashes)
  • lift (bulk analysis tool prior to sceadan; deprecated and out of use)
  • pipe (not in use)

Features Files

  • aes_keys.txt (aes)
  • alerts.txt
  • ccn.txt (accts)
  • ccn_track2.txt (accts)
  • domain.txt (email)
  • elf.txt (elf)
  • email.txt (email)
  • ether.txt (net, email)
  • exif.txt (exif, exiv2)
  • facebook.txt (facebook)
  • find.txt (find)
  • gps.txt (gps, exif, exiv2)
  • hex.txt (base16)
  • httplogs.txt (httplogs)
  • identified_blocks.txt (hashdb)
  • ip.txt (net)
  • jpeg_carved.txt (exif)
  • json.txt (json)
  • kml.txt (kml)
  • lightgrep.txt (lightgrep)
  • packets.pcap (net)
  • pii.txt (accts)
  • pii_teamviewer.txt
  • rar.txt (rar)
  • rfc822.txt (email)
  • sceadan.txt (sceadan)
  • sqlite.txt (sqlite)
  • sqlite_carved.txt (sqlite)
  • telephone.txt (accts)
  • unrar_carved.txt (rar)
  • unzip_carved.txt (zip)
  • url.txt (email)
  • url_facebook-address.txt (email)
  • url_facebook-id.txt (email)
  • url_microsoft-live.txt (email)
  • url_searches.txt (email)
  • url_services.txt (email)
  • vcard.txt (vcard)
  • windirs.txt (windirs)
  • winlnk.txt (winlnk)
  • winpe.txt (winpe)
  • winpe_carved.txt (winpe)
  • winprefetch.txt (winprefetch)
  • wordlist.txt (wordlist)
  • zip.txt (zip)

Features Files Generated by Experimental Scanners

The scanners that generate the following feature files are experimental or no longer in use.

  • bulk.txt (bulk)
  • extx.txt (extx)
  • httpheader.txt (httpheader)
  • lift_tags.txt (lift)
  • pipe.txt (pipe)
  • tcp.txt (created with experimental tcp carving code only on memory structures)

Other Files

Scanners can also write out histogram files. Is there just a subset of scanners that do this?
The histogram files mentioned in the manual include:

  • ccn_histogram.txt
  • ccn_histogram_track2.txt
  • domain_histogram.txt
  • email_domain_histogram.txt
  • email_histogram.txt
  • ether_histogram.txt
  • find_histogram.txt
  • ip_histogram.txt
  • lightgrep_histogram.txt
  • tcp_histogram.txt
  • telephone_histogram.txt
  • url_histogram.txt
  • wordlist_histogram.txt

- report.sqlite - report.xml

Regression Test Files

These are the test files in git repository path bulk_extractor/tests/Data.
I am trying to note the scanner(s) that read each file in parents to the right.

Note: The purpose of these test files still needs to be identified and associated with one or more scanners.

  • 5.psd
  • ansi.E01
  • base64.eml (base64)
  • base64.emlx (base64)
  • beth.odt
  • bitcoin.txt
  • bitlocker.tar
  • credit_card_numbers.htm (accounts)
  • deployPkg.dll.lnk (winlnk?)
  • FIREFOX.EXE-18ACFCFF.pf
  • german_ansi.E01
  • german_utf8.E01
  • kml_samples.E01 (kml)
  • MEGATRON-psd7909
  • mywinprefetch_cat (winprefetch)
  • nps-2010-emails.E01 (email)
  • NTLM-wenchao.pcap (net)
  • pdf_fragment (pdf)
  • rar_samples.tar (rar)
  • skipped-packets.bin (net)
  • ssn_test.txt (accounts)
  • test-acct.txt (accounts)
  • test-urls.txt (email)
  • testfile2_ANSI.txt
  • testfile2_UTF-8.txt
  • utf8-examples.txt
  • testpage.bin
  • utf8-examples.html
  • utf8-examples.rtf

Map each scanner to associated feature files and test files

Note: This needs to be corrected and updated still.

  • Scanner accts (looks for phone numbers, credit card numbers, etc.)

    • Feature files: ccn.txt ccn_track2.txt domain.txt pii.txt telephone.txt

    • Test files: credit_card_numbers.htm ssn_test.txt test-acct.txt

  • Scanner aes (detects in-memory AES keys from their key schedules.)

    • Feature files: aes_keys.txt

    • Test files:

  • Scanner base16 (decodes hexadecimal test - disabled by default.)

    • Feature files: hex.txt

    • Test files:

  • Scanner base64 (decodes BASE64 text.)

    • Feature files:

    • Test files: base64.eml base64.emlx

  • Scanner elf (detects and decodes ELF headers.)

    • Feature files: elf.txt

    • Test files:

  • Scanner email (Description Needed)

    • Feature files: domain.txt email.txt ether.txt rfc822.txt url.txt url_facebook-address url_facebook-id url_microsoft-live url_searches.txt url_services.txt

    • Test files: nps-2010-emails.E01 test-urls.txt

  • Scanner exif (decodes EXIF headers in JPEGs using built-in decoder.)

    • Feature files: exif.txt gps.txt jpeg_carved.txt

    • Test files:

  • Scanner facebook (detects Facebook HTML -disabled by default.)

    • Feature files: facebook.txt

    • Test files:

  • Scanner find (searches on keywords.)

    • Feature files: find.txt

    • Test files:

  • Scanner gps (detects XML from Garmin GPS devices.)

    • Feature files: gps.txt

    • Test files:

  • Scanner gzip (detects and decompresses GZIP files and gzip stream.)

    • Feature files: zip.txt

    • Test files:

  • Scanner hashdb (searches for sector hashes/ make a sector hash database - disabled by default.)

    • Feature files: identified_blocks.txt

    • Test files:

  • Scanner hiberfile (detects and decompresses Windows hibernation fragments.)

    • Feature files:

    • Test files:

  • Scanner httplog (searches for web server logs.)

    • Feature files: httplog.txt

    • Test files:

  • Scanner json (detects JavaScript Object Notation files.)

    • Feature files: json.txt

    • Test files:

  • Scanner kml (detects KML files.)

    • Feature files: gps.txt kml.txt

    • Test files: kml_samples.E01

  • Scanner lightgrep (Description Needed)

    • Feature files:

    • Test files:

  • Scanner net (IP packet scanning and carving.)

    • Feature files: domain.txt ether.txt ip.txt packets.pcap tcp.txt

    • Test files: NTLM-wenchao.pcap skipped-packets.bin

  • Scanner outlook (decrypts Outlook Compressible Encryption - disabled by default.)

    • Feature files:

    • Test files:

  • Scanner pdf (extracts text from some kinds of PDF files.)

    • Feature files:

    • Test files: pdf_fragment

  • Scanner rar (RAR files)

    • Feature files: rar.txt

    • Test files: rar_samples.tar

  • Scanner sceadan ( - disabled by default)

    • Feature files:

    • Test files:

  • Scanner sqlite (SQLite3 databases - only if they are contiguous.)

    • Feature files: sqlite.txt

    • Test files:

  • Scanner vcard (carves VCARD files.)

    • Feature files: vcard.txt

    • Test files:

  • Scanner windirs (Windows directory entries)

    • Feature files: windirs.txt

    • Test files:

  • Scanner winlnk (Windows LNK files)

    • Feature files: winlnk.txt

    • Test files: deployPkg.dll.lnk

  • Scanner winpe (Extracts information about Windows Portable Executable files.)

    • Feature files: winpe.txt

    • Test files:

  • Scanner winprefetch (extracts fields from Windows prefetch files and file fragments.)

    • Feature files: winprefetch.txt

    • Test files: mywinprefetch_cat

  • Scanner wordlist (builds word list for password cracking - disabled by default.)

    • Feature files: wordlist.txt

    • Test files:

  • Scanner xor (detects XOR obfuscation - disabled by default.)

    • Feature files:

    • Test files:

  • Scanner zip (detects and decompresses ZIP files and zlib streams.)

    • Feature files: zip.txt

    • Test files:

Experimental Scanners

  • Scanner ascii85 ( )

    • Feature files:
  • Scanner bulk (replaced by sceadan)

    • Feature files: bulk.txt
  • Scanner extx (attempts to find ext2/ext3/ext4 directory entires but not in use )

    • Feature files: extx.txt
  • Scanner exiv2 (decodes EXIF headers in JPEGs using libexiv2 - for regression testing.)

    • Feature files: exif.txt gps.txt
  • Scanner httpheader (searches for web server logs - hangs and might crash)

    • Feature files: httpheader.txt
  • Scanner lift (analyzes bulk data - prior to Sceadan and is deprecated)

    • Feature files: lift_tags.txt
  • Scanner pipe (is not in use)

    • Feature files: pipe.txt