-
-
Notifications
You must be signed in to change notification settings - Fork 195
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
16 changed files
with
209 additions
and
25 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -56,6 +56,8 @@ PATH | |
specs: | ||
excel_analyzer (0.0.1) | ||
activestorage | ||
mahoro | ||
rubyXL | ||
rubyzip | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
require "excel_analyzer" | ||
|
||
ExcelAnalyzer.on_spreadsheet_received = ->(raw_email_blob) do | ||
incoming_message = IncomingMessage.joins(raw_email: :file_blob). | ||
find_by(active_storage_blobs: { id: raw_email_blob }) | ||
incoming_message&.parse_raw_email! | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,30 @@ | ||
require "excel_analyzer/eml_analyzer" | ||
require "excel_analyzer/xls_analyzer" | ||
require "excel_analyzer/xlsx_analyzer" | ||
require "excel_analyzer/railtie" if defined?(Rails) | ||
|
||
## | ||
# This module provides functionality to analyze Excel files, particularly to | ||
# detect hidden data within spreadsheet attachments in emails. It supports .xls | ||
# and .xlsx file formats. | ||
module ExcelAnalyzer | ||
# A configurable callable that gets executed when an email with a spreadsheet | ||
# attachment is analyzed. This allows for custom handling of the spreadsheet | ||
# data. | ||
# | ||
# @example Set a custom callable to handle received spreadsheets | ||
# ExcelAnalyzer.on_spreadsheet_received = ->(blob) { process(blob) } | ||
# | ||
# @!attribute [rw] on_spreadsheet_received | ||
# @return [Proc] the callable to run for spreadsheet attachments | ||
mattr_accessor :on_spreadsheet_received, default: ->(blob) {} | ||
|
||
# Provides the list of content types that the ExcelAnalyzer will attempt to | ||
# analyze in search of hidden data. It currently includes content types for | ||
# .xls and .xlsx files. | ||
# | ||
# @return [Array<String>] the list of supported spreadsheet content types | ||
def self.content_types | ||
[XlsAnalyzer::CONTENT_TYPE, XlsxAnalyzer::CONTENT_TYPE] | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
require "mail" | ||
|
||
require "active_storage" | ||
require "active_storage/analyzer" | ||
|
||
require "mail_handler" | ||
|
||
module ExcelAnalyzer | ||
## | ||
# The EmlAnalyzer class extends the ActiveStorage::Analyzer to define a custom | ||
# analysis process for EML files. It checks for the presence of attachments | ||
# with content types associated with spreadsheet formats and invokes a | ||
# callback if necessary. | ||
class EmlAnalyzer < ActiveStorage::Analyzer | ||
CONTENT_TYPE = "message/rfc822" | ||
|
||
def self.accept?(blob) | ||
blob.content_type == CONTENT_TYPE | ||
end | ||
|
||
def metadata | ||
download_blob_to_tempfile do |file| | ||
mail = Mail.read(file.path) | ||
|
||
content_types = MailHandler.get_attachment_attributes(mail).map do | ||
_1[:content_type] | ||
end | ||
|
||
if content_types.any? { ExcelAnalyzer.content_types.include?(_1) } | ||
# rubocop:disable Style/RescueModifier | ||
ExcelAnalyzer.on_spreadsheet_received.call(blob) rescue nil | ||
# rubocop:enable Style/RescueModifier | ||
end | ||
end | ||
|
||
{} | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
85 changes: 85 additions & 0 deletions
85
gems/excel_analyzer/spec/excel_analyzer/eml_analyzer_spec.rb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
# frozen_string_literal: true | ||
|
||
require "spec_helper" | ||
require_relative "../support/helpers" | ||
|
||
RSpec.describe ExcelAnalyzer::EmlAnalyzer do | ||
describe ".accept?" do | ||
subject { ExcelAnalyzer::EmlAnalyzer.accept?(blob) } | ||
|
||
context "when the blob is an email" do | ||
let(:blob) { fake_blob(content_type: "message/rfc822") } | ||
it { is_expected.to eq true } | ||
end | ||
|
||
context "when the blob is not an email" do | ||
let(:blob) { fake_blob(content_type: "text/plain") } | ||
it { is_expected.to eq false } | ||
end | ||
end | ||
|
||
describe "#metadata" do | ||
around do |example| | ||
original_callback = ExcelAnalyzer.on_spreadsheet_received | ||
ExcelAnalyzer.on_spreadsheet_received = ->(blob) {} | ||
example.call | ||
ExcelAnalyzer.on_spreadsheet_received = original_callback | ||
end | ||
|
||
let(:mail) do | ||
Mail.new { add_file File.join(__dir__, "../fixtures/plain.txt") } | ||
end | ||
|
||
let(:io) { double(:File, path: "blob/path") } | ||
let(:blob) { fake_blob(io: io, content_type: "message/rfc822") } | ||
|
||
subject(:metadata) { ExcelAnalyzer::EmlAnalyzer.new(blob).metadata } | ||
|
||
before { allow(Mail).to receive(:read).with("blob/path").and_return(mail) } | ||
|
||
it { is_expected.to eq({}) } | ||
|
||
context "when mail contains XLS attachment" do | ||
let(:mail) do | ||
Mail.new { add_file File.join(__dir__, "../fixtures/data.xls") } | ||
end | ||
|
||
it { is_expected.to eq({}) } | ||
|
||
it "calls on_spreadsheet_received callback" do | ||
expect(ExcelAnalyzer.on_spreadsheet_received). | ||
to receive(:call).with(blob) | ||
metadata | ||
end | ||
end | ||
|
||
context "when mail contains XLSX attachment" do | ||
let(:mail) do | ||
Mail.new { add_file File.join(__dir__, "../fixtures/data.xlsx") } | ||
end | ||
|
||
it { is_expected.to eq({}) } | ||
it "calls on_spreadsheet_received callback" do | ||
expect(ExcelAnalyzer.on_spreadsheet_received). | ||
to receive(:call).with(blob) | ||
metadata | ||
end | ||
end | ||
|
||
context "when mail contains XLS and XLSX attachment" do | ||
let(:mail) do | ||
Mail.new do | ||
add_file File.join(__dir__, "../fixtures/data.xls") | ||
add_file File.join(__dir__, "../fixtures/data.xlsx") | ||
end | ||
end | ||
|
||
it { is_expected.to eq({}) } | ||
it "calls on_spreadsheet_received callback once only" do | ||
expect(ExcelAnalyzer.on_spreadsheet_received). | ||
to receive(:call).with(blob).once | ||
metadata | ||
end | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
def fake_blob(io: nil, content_type:) | ||
dbl = double(io: io, content_type: content_type) | ||
allow(dbl).to receive(:open).and_yield(io) | ||
dbl | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters