Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assigning large matrices in R takes long when this extension is enabled #122

Open
flying-sheep opened this issue Dec 13, 2019 · 9 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@flying-sheep
Copy link

flying-sheep commented Dec 13, 2019

Hi! Thank you for making this work with my IRkernel, it’s great!

We have a problem with performance though. Please check out IRkernel/IRkernel#571 (comment) for context.

I assume it’s this line:

https://github.com/lckr/jupyterlab-variableInspector/blob/9ee08df7804285a8ef700cc63f7a23f779f3771e/src/inspectorscripts.ts#L265

But it could be something else. Can you figure it out?

@flying-sheep
Copy link
Author

flying-sheep commented Dec 16, 2019

I’m having a hard time trying to debug this. @jmdc-dkanazawa created a simpler way to reproduce this, though:

{myiris <- bind_rows(replicate(1000, iris, simplify=F)); NULL}
# With the variable inspector active, we have a 10 second wait here
dim(myiris)

The wait is after the first line displayed and before the second one is executed.

@flying-sheep flying-sheep changed the title load('thing.rda') takes long when this is enabled Assigning large matrices in R takes long when this extension is enabled Dec 16, 2019
@jmdc-dkanazawa
Copy link

jmdc-dkanazawa commented Dec 16, 2019

Thanks @flying-sheep
jupyter-lab debug logs(when run the code above) are as follows.

DEBUG: Sending msg status
DEBUG: Sending msg execute_input
DEBUG: Executing code: start.time <- Sys.time()
{myiris <- bind_rows(replicate(1000, iris, simplify=F)); NULL}
dim(myiris)
DEBUG: Value output...
DEBUG: Value output...
DEBUG: Sending display_data: List of 1
 $ text/plain: chr "NULL"
DEBUG: Sending msg display_data
DEBUG: Value output...
DEBUG: Sending display_data: List of 4
 $ text/html    : chr "<ol class=list-inline>\n\t<li>150000</li>\n\t<li>5</li>\n</ol>\n"
 $ text/markdown: chr "1. 150000\n2. 5\n\n\n"
 $ text/latex   : chr "\\begin{enumerate*}\n\\item 150000\n\\item 5\n\\end{enumerate*}\n"
 $ text/plain   : chr "[1] 150000      5"
DEBUG: Sending msg display_data
DEBUG: Sending msg execute_reply
DEBUG: Sending msg status
DEBUG: main loop: beginning
DEBUG: main loop: after poll. ZMQ code: 1; Errno: 11
DEBUG: main loop: shell
DEBUG: Sending msg status
DEBUG: Sending msg execute_input
DEBUG: Executing code: .ls.objects()
DEBUG: Stream output:      varType varSize     varShape
1   function   21912     1  x  NA
2   function    2472     1  x  NA
3   function   14504     1  x  NA
4   function    1904     1  x  NA
5   function    2480     1  x  NA
6 data.frame 5401856 150000  x  5
7   function    4088     1  x  NA
8    POSIXct     344     1  x  NA
9   function  137520     1  x  NA
                                                                    varContent
1                                          function (fit, standardized = TRUE)
2                                            function (..., deparse.level = 2)
3                                                          function (x, n = 6)
4                                function (x, n = 60, ncol = 10, byrow = TRUE)
5                                                               function (str)
6 Column names:  Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species
7                                                      function (actual, pred)
8                                                          2019-12-16 14:17:57
9                            function (object, digits = 3, latex = FALSE, ...)
  isMatrix            varName
1    FALSE   bob3_lavaan_plot
2    FALSE         crossTable
3    FALSE              mhead
4    FALSE             mxhead
5    FALSE    myDisplayModify
6    FALSE             myiris
7    FALSE                RSQ
8    FALSE         start.time
9    FALSE summary.CrossTable

DEBUG: Sending msg stream
DEBUG: Value output...
DEBUG: Sending display_data: List of 1
 $ text/plain: chr "[{\"varType\":\"function\",\"varSize\":21912,\"varShape\":\"1  x  NA\",\"varContent\":\"function (fit, standard"| __truncated__
DEBUG: Sending msg display_data
DEBUG: Sending msg execute_reply
DEBUG: Sending msg status
DEBUG: main loop: beginning
  • It took about 3 minutes when replicate counts up to 10,000.
{myiris <- bind_rows(replicate(10000, iris, simplify=F)); NULL}
  • My environment
    • Windows10 64bit
    • python 3.7.4
    • jupyter 1.0.0
    • jupyterlab 1.2.4
    • IRkernel 1.1

@izahn
Copy link

izahn commented Jan 16, 2021

Does this still happen with the current version? I cannot reproduce it.

@sands58
Copy link

sands58 commented Jan 23, 2021

Does this still happen with the current version? I cannot reproduce it.

Yes

@lckr lckr added bug Something isn't working help wanted Extra attention is needed labels Jan 26, 2021
@TTTPOB
Copy link

TTTPOB commented Jul 8, 2021

same here, r kernel get blocked quite a while after i create a big object

@ekungurov
Copy link

Still reproduces. Should it be fixed in IRkernel or in jupyterlab-variableInspector?

@flying-sheep
Copy link
Author

I’m almost sure it needs to be fixed here, maybe because of https://github.com/lckr/jupyterlab-variableInspector/blob/9ee08df7804285a8ef700cc63f7a23f779f3771e/src/inspectorscripts.ts#L265

IRkernel uses the repr create, which doesn’t create giant representations of things. But that print statement might take a while to execute if you capture its output or so.

@lckr
Copy link
Collaborator

lckr commented Jul 13, 2022

I'm not too fluent with R, if you could confirm, e.g., by changing the R script to use repr instead of print, solves or at least minimizes the problem I'm happy to incorporate the necessary changes.

@flying-sheep
Copy link
Author

flying-sheep commented Jul 14, 2022

I’m not using R these days myself. Maybe @ekungurov can do it.

What we’re trying to do is to find the content that gets huge, and either prevent that or only use fast code to handle it.

You could try using profvis to check which line really takes most time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

7 participants