Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading an excel (xlsx) file takes too much time to complete in debug mode in WPF Net 5 environment #543

Closed
marlonchosky opened this issue Apr 29, 2021 · 9 comments
Labels

Comments

@marlonchosky
Copy link

I have been writing and testing a code that reads an excel file and traverses the rows of an excel sheet, from the first column to the fifth column.

Running the project in non-debug mode makes the reading fast, about 1 second for a file with a total of 10400 rows.

The problem I have is that when I run the project in debug mode, the code that does this reading takes a long, long time to finish reading the excel rows. I have not let that the code finishes its execution, but I have waited more than 1 minute and it still did not finish reading the file.

Please, if anyone can know why is that the behavior of the code, since apparently it is a bug but I am not sure.

I have posted a github repository with a project to replicate the described problem.

The repository is here

Inside is the WPF project called PruebaDeLecturaDeExcel.
The code is inside the DeserealizarArchivo() method of the MainWindow class.

Inside the project there is an excel file to test.

@marlonchosky marlonchosky changed the title Reading an excel (xlsx) file takes too much time in to complete in debug mode in WPF Net 5 environment Reading an excel (xlsx) file takes too much time to complete in debug mode in WPF Net 5 environment Apr 29, 2021
@marlonchosky
Copy link
Author

So, the code executes quickly in debug mode when the line that invokes the ObtenerFecha() method is commented. But, I don't understand why is that?

@tonyqus
Copy link
Member

tonyqus commented May 28, 2021

How many data you have in this Excel? If it's just simple and limited data, can you test the same Excel reading code in console app instead of WPF app. As I know, there should be not much difference. Only the UI thread may be blocked.

@drmason789
Copy link

Possibly related: #564

@drmason789
Copy link

drmason789 commented May 30, 2021

I had a look at this...in debug mode throwing exceptions is very expensive time-wise. Your code is repeatedly trying to get the datetime value of the cell, which throws an InvalidOperationException, which is then caught, then it reads the string value of the cell and parses that.

It' much quicker to simply check the cell type first and act accordingly:

   private static DateTime ObtenerFecha(ICell celda) {
            if (celda.CellType == CellType.String)
                return DateTime.ParseExact(celda.StringCellValue, "dd/MM/yyyy", CultureInfo.InvariantCulture);
            if (celda.CellType == CellType.Numeric)
                return celda.DateCellValue;
            throw new InvalidOperationException("Tipo de celda inesperado."); // from Google translate
        }

@marlonchosky
Copy link
Author

marlonchosky commented Jun 1, 2021

How many data you have in this Excel? If it's just simple and limited data, can you test the same Excel reading code in console app instead of WPF app. As I know, there should be not much difference. Only the UI thread may be blocked.

@tonyqus As I mentioned in the first comment, the excel has around 10400 rows. But the problem is not the quantity, since in non-debug mode it is processed fast (close to 1 second). The problem comes when it runs in debug mode.

@marlonchosky
Copy link
Author

marlonchosky commented Jun 1, 2021

I had a look at this...in debug mode throwing exceptions is very expensive time-wise. Your code is repeatedly trying to get the datetime value of the cell, which throws an InvalidOperationException, which is then caught, then it reads the string value of the cell and parses that.

@drmason789 Interesting solution. I didn't know that you could compare the cell type with CellType.Numeric to get its value with the DateCellValue property. Thank you!

Possibly related: #564

I checked the issue you referenced in the previous comment and yes, they are related.

I had a look at this...in debug mode throwing exceptions is very expensive time-wise.

However, why are exceptions time consuming in debug mode? Is there an article that mentions the impact of using exceptions in debug mode?

@drmason789
Copy link

Approximately, when the runtime encounters the exception, it pauses the application and notifies the debugger which can then decide if it wants to break on that exception or not.

https://devblogs.microsoft.com/devops/performance-improvement-when-debugging-net-code-with-visual-studio-2015/

I believe the problem you are experiencing is not a problem with NPOI, so I think this issue should be closed.

On the point of the numeric cell type to get the date, when Excel knows the cell content is a date, it stores it as a double - the number of days since Jan-01-1900 (or 1904 if you've got that option configured). It is rendered to a cell based on the cell formatting, which can be the user's local date format, meaning the same date in the same workbook can display differently in different UI culture settings.

https://docs.microsoft.com/en-us/office/troubleshoot/excel/1900-and-1904-date-system

@marlonchosky
Copy link
Author

marlonchosky commented Jun 1, 2021

Approximately, when the runtime encounters the exception, it pauses the application and notifies the debugger which can then decide if it wants to break on that exception or not.

https://devblogs.microsoft.com/devops/performance-improvement-when-debugging-net-code-with-visual-studio-2015/

@drmason789 I will review the link article. Thanks

On the point of the numeric cell type to get the date, when Excel knows the cell content is a date, it stores it as a double - the number of days since Jan-01-1900 (or 1904 if you've got that option configured). It is rendered to a cell based on the cell formatting, which can be the user's local date format, meaning the same date in the same workbook can display differently in different UI culture settings.

https://docs.microsoft.com/en-us/office/troubleshoot/excel/1900-and-1904-date-system

I will also review the link article.

I believe the problem you are experiencing is not a problem with NPOI, so I think this issue should be closed.

I believe too. I will finish reading both articles and based on them I will close the issue.

@marlonchosky
Copy link
Author

marlonchosky commented Jun 1, 2021

@drmason789 After reading the links and answers so far, I can conclude:

On the point of the numeric cell type to get the date, when Excel knows the cell content is a date, it stores it as a double - the number of days since Jan-01-1900 (or 1904 if you've got that option configured). It is rendered to a cell based on the cell formatting, which can be the user's local date format, meaning the same date in the same workbook can display differently in different UI culture settings.

It is a very good solution that you offered to validate the part of my code that checks if a cell is date, as opposed to doing it with exceptions like I did it.
Thanks for the solution offered 👍

I believe the problem you are experiencing is not a problem with NPOI, so I think this issue should be closed.

You are right, this issue is more related to how Visual Studio behaves when running an application with a debugger attached, and it is not related to NPOI, so I am going to close this issue.

Thanks for your time :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants