Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-ASCII characters can't be output in CSV. #81

Closed
TakayukiTomatsuri opened this issue Sep 29, 2021 · 1 comment · Fixed by #83
Closed

Non-ASCII characters can't be output in CSV. #81

TakayukiTomatsuri opened this issue Sep 29, 2021 · 1 comment · Fixed by #83
Assignees
Labels
Bug Something isn't working

Comments

@TakayukiTomatsuri
Copy link
Contributor

TakayukiTomatsuri commented Sep 29, 2021

Describe the bug
Non-ASCII characters can't output in CSV.
HAWK outputs CSV files with only ASCII characters.

All non-ASCII characters such as Chinese, Japanese, Russian are converted to symbols of ?.

The reason for this is using Export-Csv cmdlets without specifying encoding in the Out-MultipleFileType.ps1. The cmdlet uses the ASCII encoding.

According to the Microsoft Docs and my research, each cmdlet uses the following encodings by default.

Cmdlet Default encoding
Export-Csv ASCII
Export-Csv (with -Append) UTF-8 without BOM (It matches the existing encoding when the target file contains a BOM. In the absence of a BOM, it uses UTF-8 encoding.)
Export-CliXml UTF-16LE with BOM
Out-File UTF-16LE with BOM

Currently, the HAWK's output encoding of xml is UTF-16LE(BOM), and txt is UTF-16LE(BOM), and csv is ASCII.

So, HAWK's output of xml and txt files can contain non-ASCII characters, but csv files can't.

To fix it

It could be resolve by using Export-Csv cmdlet with encoding option -Encoding <encoding name>.

I recommend using UTF-16LE(BOM) with specifying -Encoding Unicode.
(However, UTF-8(BOM) may be better to work with Excel.)

The pros and cons of encodings are as below.

Encoding Pros Cons
UTF-16LE(BOM) Same encoding with XML and TXT files in HAWK outputs. Most tools correctly work. Excel can't double click open and show UTF-16 CSV file properly. Excel shows it as not delimited. It reproduces on the current version, too. However, the From Text/CSV Import button can correctly import such CSV files. So it doesn't seem to be a big deal.
UTF-8(BOM) Excel correctly works. Some tools do not consider the existence of BOM of UTF-8, so make trouble
UTF-16LE(no-BOM) - (Can't specified as encoding option in PowerShell v5)
UTF-8(no-BOM) - (Can't specified as encoding option in PowerShell v5)

To Reproduce
Steps to reproduce the behavior:

  1. Recieve or create a mail having a subject containing some non-ASCII characters. For example, テストabc1.
  2. Delete the mail.
  3. Wait until the deleting operation has been logged.
  4. Hit the HAWK cmd Get-HawkUserInvestigation <your mail address>
  5. Look the result file Exchange_Mailbox_Audit_<username>.csv.

The CSV file will be ASCII encoding and contain some ? symbols in the log record of deleting operation.
All non-ASCII characters are replaced with ? symbols.

"PSComputerName","RunspaceId","PSShowComputerName","Operation","OperationResult","LogonType","ExternalAccess","DestFolderId","DestFolderPathName","FolderId","FolderPathName","FolderName","MemberRights","MemberSid","MemberUpn","ClientInfoString","ClientIPAddress","ClientIP","ClientMachineName","ClientProcessName","ClientVersion","InternalLogonType","MailboxOwnerUPN","MailboxOwnerSid","DestMailboxOwnerUPN","DestMailboxOwnerSid","DestMailboxGuid","CrossMailboxOperation","LogonUserDisplayName","LogonUserSid","SourceItems","SourceFolders","SourceItemIdsList","SourceItemSubjectsList","SourceItemAttachmentsList","SourceItemFolderPathNamesList","SourceFolderPathNamesList","SourceItemInternetMessageIdsList","ItemId","ItemSubject","ItemAttachments","ItemInternetMessageId","DirtyProperties","OriginatingServer","SessionId","OperationProperties","AuditOperationsCountInAggregatedRecord","AggregatedRecordFoldersData","AppId","ClientAppId","ItemIsRecord","ItemComplianceLabel","MailboxGuid","MailboxResolvedOwnerName","LastAccessed","Identity","IsValid","ObjectState"
"outlook.office365.com","1111111-dummy","FALSE","MoveToDeletedItems","Succeeded","Owner","FALSE","LgAAAAAAAAAAAAAAAAAADUMY","\????????","LgAAAAAAAAAAAAAAAAA","\?????","","","","","Client=OWA;Action=ViaProxy","2001:db8::","2001:db8::","","","","Owner","user1@example.com","S-1-1111111111DUMY","","","","FALSE","user1","S-1-1111111111DUMY","RgAAAAAAADUMY","","RgAAAAAAADUMY","???abc1","","?????","","<dumy@mail.gmail.com>","","","","","","OS1P123456 (10.00.000.000)","c1111-1111","","","","00000002-0000-0000-000-000000000000","","","","aa111-0000","user1","2021/9/28 18:08","AAAAA=","TRUE","New"

Expected (better) behavior
HAWK can output CSV files containing non-ASCII characters, such as テストabc1.

"PSComputerName","RunspaceId","PSShowComputerName","Operation","OperationResult","LogonType","ExternalAccess","DestFolderId","DestFolderPathName","FolderId","FolderPathName","FolderName","MemberRights","MemberSid","MemberUpn","ClientInfoString","ClientIPAddress","ClientIP","ClientMachineName","ClientProcessName","ClientVersion","InternalLogonType","MailboxOwnerUPN","MailboxOwnerSid","DestMailboxOwnerUPN","DestMailboxOwnerSid","DestMailboxGuid","CrossMailboxOperation","LogonUserDisplayName","LogonUserSid","SourceItems","SourceFolders","SourceItemIdsList","SourceItemSubjectsList","SourceItemAttachmentsList","SourceItemFolderPathNamesList","SourceFolderPathNamesList","SourceItemInternetMessageIdsList","ItemId","ItemSubject","ItemAttachments","ItemInternetMessageId","DirtyProperties","OriginatingServer","SessionId","OperationProperties","AuditOperationsCountInAggregatedRecord","AggregatedRecordFoldersData","AppId","ClientAppId","ItemIsRecord","ItemComplianceLabel","MailboxGuid","MailboxResolvedOwnerName","LastAccessed","Identity","IsValid","ObjectState"
"outlook.office365.com","1111111-dummy","FALSE","MoveToDeletedItems","Succeeded","Owner","FALSE","LgAAAAAAAAAAAAAAAAAADUMY","\削除済みアイテム","LgAAAAAAAAAAAAAAAAA","\受信トレイ","","","","","Client=OWA;Action=ViaProxy","2001:db8::","2001:db8::","","","","Owner","user1@example.com","S-1-1111111111DUMY","","","","FALSE","user1","S-1-1111111111DUMY","RgAAAAAAADUMY","","RgAAAAAAADUMY","テストabc1","","受信トレイ","","<dummy@mail.gmail.com>","","","","","","OS1P123456 (10.00.000.000)","c1111-1111","","","","00000002-0000-0000-000-000000000000","","","","aa111-0000","user1","2021/9/28 18:08","AAAAA=","TRUE","New"

Screenshots
N/A

File (please complete the following information):

  • N/A

Additional context
N/A

@TakayukiTomatsuri TakayukiTomatsuri added the Bug Something isn't working label Sep 29, 2021
T0pCyber added a commit that referenced this issue Jan 21, 2022
Fix encoding of outputs from the Export-Csv cmdlet #81
T0pCyber added a commit that referenced this issue Feb 17, 2022
…_Csv

Revert "Fix encoding of outputs from the Export-Csv cmdlet #81"
T0pCyber added a commit that referenced this issue Apr 7, 2022
…Csv_UTF8BOM

Fix encoding of outputs from the Export-Csv cmdlet with UTF8(BOM). #81
@T0pCyber T0pCyber linked a pull request Apr 9, 2022 that will close this issue
15 tasks
@T0pCyber T0pCyber mentioned this issue Apr 9, 2022
13 tasks
@T0pCyber T0pCyber self-assigned this Apr 9, 2022
@T0pCyber
Copy link
Owner

T0pCyber commented Apr 9, 2022

Updates in 3.0.0 and merged to master. Updated in Gallery.

@T0pCyber T0pCyber closed this as completed Apr 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants