Skip to content

Commit

Permalink
Updated MSExcelDecoder to handle blank cell values (#384)
Browse files Browse the repository at this point in the history
Objective is to support excel files which has few blank values within the used range.
Taking example of below excel

![image](https://github.com/microsoft/kernel-memory/assets/34688460/0b1ddbc2-4338-48c3-b423-88810fea1cb3)

- Currently the code ignores the blank values, and that creates distortion in the arrangement of values.
Output of the above excel file looks something like below
```
"EmployeeName", "EmployeeSalary", "EmployeeDepartment", "EmployeeCity", "EmployeeId"
"John", 1000, "Accounts", "New York", 1
"Jack", 2000, "Tax", "Bangalore", 2
"Ryan", 3000, "HR", "Tokyo", 3
"Rob", 2000, "Tax", 4
"Pablo", 5000, "Accounts", "Paris", 
```
(See the employee city for Rob has been substituted by the employee Id 4)

- With the proposed change, a default value `Blank` will be substituted
when these kinds of scenarios are observed. Consumers are free to
overwrite the default value with the help of the newly introduced
constructor variable

```
"EmployeeName", "EmployeeSalary", "EmployeeDepartment", "EmployeeCity", "EmployeeId"
"John", 1000, "Accounts", "New York", 1
"Jack", 2000, "Tax", "Bangalore", 2
"Ryan", 3000, "HR", "Tokyo", 3
"Rob", 2000, "Tax", Blank, 4
"Pablo", 5000, "Accounts", "Paris", 5
```
(See the employee city for Rob is substituted with `Blank`)

---------

Co-authored-by: Neelambuj Banerjee <Neelambuj.Banerjee@gds.ey.com>
Co-authored-by: Devis Lucato <dluc@users.noreply.github.com>
  • Loading branch information
3 people committed Apr 8, 2024
1 parent 58b46eb commit a1b0ea6
Showing 1 changed file with 8 additions and 4 deletions.
12 changes: 8 additions & 4 deletions service/Core/DataFormats/Office/MsExcelDecoder.cs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
// Copyright (c) Microsoft. All rights reserved.
// Copyright (c) Microsoft. All rights reserved.

using System;
using System.IO;
Expand All @@ -15,6 +15,7 @@ public class MsExcelDecoder
private const string DefaultRowPrefix = "";
private const string DefaultColumnSeparator = ", ";
private const string DefaultRowSuffix = "";
private const string DefaultBlankCellValue = "";

private readonly bool _withWorksheetNumber;
private readonly bool _withEndOfWorksheetMarker;
Expand All @@ -24,6 +25,7 @@ public class MsExcelDecoder
private readonly string _rowPrefix;
private readonly string _columnSeparator;
private readonly string _rowSuffix;
private readonly string _blankCellValue;

public MsExcelDecoder(
bool withWorksheetNumber = true,
Expand All @@ -33,7 +35,8 @@ public class MsExcelDecoder
string? endOfWorksheetMarkerTemplate = null,
string? rowPrefix = null,
string? columnSeparator = null,
string? rowSuffix = null)
string? rowSuffix = null,
string? blankCellValue = null)
{
this._withWorksheetNumber = withWorksheetNumber;
this._withEndOfWorksheetMarker = withEndOfWorksheetMarker;
Expand All @@ -45,6 +48,7 @@ public class MsExcelDecoder
this._rowPrefix = rowPrefix ?? DefaultRowPrefix;
this._columnSeparator = columnSeparator ?? DefaultColumnSeparator;
this._rowSuffix = rowSuffix ?? DefaultRowSuffix;
this._blankCellValue = blankCellValue ?? DefaultBlankCellValue;
}

public FileContent ExtractContent(string filename)
Expand Down Expand Up @@ -79,7 +83,7 @@ public FileContent ExtractContent(Stream data)
{
if (row == null) { continue; }

var cells = row.CellsUsed().ToList();
var cells = row.Cells().ToList();

sb.Append(this._rowPrefix);
for (var i = 0; i < cells.Count; i++)
Expand All @@ -94,7 +98,7 @@ public FileContent ExtractContent(Stream data)
}
else
{
sb.Append(cell.Value);
sb.Append(cell.Value.IsBlank ? this._blankCellValue : cell.Value);
}

if (i < cells.Count - 1)
Expand Down

0 comments on commit a1b0ea6

Please sign in to comment.