Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow edits and completions when editing Python code #210528

Open
DonJayamanne opened this issue Mar 14, 2024 · 97 comments
Open

Slow edits and completions when editing Python code #210528

DonJayamanne opened this issue Mar 14, 2024 · 97 comments
Assignees
Labels
bug Issue identified by VS Code Team member as probable bug perf

Comments

@DonJayamanne
Copy link
Contributor

DonJayamanne commented Mar 14, 2024

Reported by @ale-dg here #206119 (comment)

Yes, but consider I did not use Pylance. Would you like me to test with it? I just activated it and opened the log trace for python and just for indexing it finishes its memory.

2024-03-13 17:51:01.092 [info] [Info - 17:51:01] (4192) Heap stats: total_heap_size=1220MB, used_heap_size=1156MB, total_physical_size=1218MB, total_available_size=2900MB, heap_size_limit=4096MB

2024-03-13 17:51:01.108 [info] [Warn - 17:51:01] (4192) Workspace indexing has hit its upper limit: 2000 files

@DonJayamanne DonJayamanne added the bug Issue identified by VS Code Team member as probable bug label Mar 14, 2024
@DonJayamanne DonJayamanne changed the title Pylance chewing up resources when running cells in Jupyter notebooks Resource usage of Pylance slowing Jupyter notebooks Mar 14, 2024
@DonJayamanne DonJayamanne transferred this issue from microsoft/vscode-jupyter Mar 14, 2024
@rchiodo
Copy link
Contributor

rchiodo commented Mar 14, 2024

How could Pylance be interfering with Jupyter execution? Unless the user only has a single core and not enough memory? That would be for the user to decide then to turn off Pylance or Jupyter.

@ale-dg
Copy link

ale-dg commented Mar 14, 2024

Hi @rchiodo

I honestly don't know. Everything has been a mystery as it happens with a lot of users, especially with large notebooks and lot of markdowns.

These are my Mac specs:

System Info
Item Value
CPUs Apple M1 Pro (10 x 24)
GPU Status 2d_canvas: enabled
canvas_oop_rasterization: enabled_on
direct_rendering_display_compositor: disabled_off_ok
gpu_compositing: enabled
multiple_raster_threads: enabled_on
opengl: enabled_on
rasterization: enabled
raw_draw: disabled_off_ok
skia_graphite: disabled_off
video_decode: enabled
video_encode: enabled
webgl: enabled
webgl2: enabled
webgpu: enabled
Load (avg) 3, 3, 3
Memory (System) 16.00GB (5.96GB free)
Process Argv
Screen Reader no
VM 0%

These are the Jupyter logs I have retrieved for @DonJayamanne disabling Pylance, with and without markdown cells.

1-Jupyter-no-MD.log
1-Jupyter-with-MD.log

With MD it takes a bit longer to run, the computer warms up a little bit and it was done with Pylance deactivated. What most people has been reporting is Python/Jupyter being slow and only a couple of us have actually found out that Pylance was using a lot of resources (see #206119 (comment), #206119 (comment) and microsoft/pylance-release#5614). So yet again, no idea which one is it.

I also tried to retrieve the logs from Pylance, although when just opening VSCode, the log files something around 30,000 lines and some of the final lines are the ones above. Below you can find the log file.

Pylance.log

Let me know if I can be of any help (although I only know Python for DS 😜)

Best

@DonJayamanne
Copy link
Contributor Author

@rchiodo I assumed that the numbers were off, and based on the comments from @ale-dg disabling Pylance improved things.
However there are other things that slow Jupyter as well, but Pylance was reported as being one of them, e.g. if CPU/Memory is used, then VS Code tends to slow down, which slows down Jupyter extensions & others.

Feel free to close this issue if there's nothing to be done here, we're already looking into other issues reported by the user.

@ale-dg
Copy link

ale-dg commented Mar 14, 2024

Hi,

Before closing the issue, I was about to comment that I just ran again the large notebook with markdowns and these extensions:

Extensions (4)
Extension Author (truncated) Version
python ms- 2024.2.1
vscode-pylance ms- 2024.3.1
jupyter ms- 2024.2.0
jupyter-renderers ms- 1.0.17

After getting to where I wanted it (around coding cell 152), when typing in a function (i.e. OrdinalEncoder(), or any other from a library), the mini-window that pop-ups with hints loads VERY slow compared with other the previous versions when I downgraded, and it begins to lag. Also, when just beginning to type in the first parenthesis, it kind of stops everything else for loading the window (it stops around 3-5s). Then if we are lucky, it doesn't crash or lags or anything.

The same happens for the "auto-complete" function (not sure what its name is... but is the one that loads below the code for choosing a previous function or variable or something else).

Best

@DonJayamanne
Copy link
Contributor Author

@ale-dg Please can you confirm that disabling Pylance extension makes it faster.
I.e. the following issue no longer exists

e mini-window that pop-ups with hints loads VERY slow compared with other the previous versions when I

Yes, you will no longer get completions, thats a different matter, however if it is still NOT faster, then we know for a fact that Pylance is not causing any delays and we can close this issue;

@DonJayamanne DonJayamanne changed the title Resource usage of Pylance slowing Jupyter notebooks Possibly high Resource usage of Pylance Mar 14, 2024
@ale-dg
Copy link

ale-dg commented Mar 14, 2024

@DonJayamanne it ran with the same speed, but the lag with typing just went away. Would it be worth it to close this issue as it is actually a bug with Pylance lagging VSCode?

Best

@DonJayamanne
Copy link
Contributor Author

but the lag with typing just went away.

lets leave this issue open, basically completions is slow

@DonJayamanne DonJayamanne changed the title Possibly high Resource usage of Pylance Slow completions Mar 14, 2024
@ale-dg
Copy link

ale-dg commented Mar 14, 2024

Well... just for the record, when I turned on Pylance again, I tested restarting the kernel to see what happens and all VSCode got stuck for some seconds (couldn't even scroll) and then it began to lag. In one of the issues was reported as well (and sometimes it crashes).

I also tried it without Pylance and it didn't crash. It just took its time to start running the notebook again.

Best

P.S. Also the "Go To" button still doesn't work...

@debonte debonte added the perf label Mar 14, 2024
@heejaechang
Copy link

@ale-dg can you try this? https://github.com/microsoft/pylance-release/wiki/Collecting-data-for-an-investigation.#collecting-cpuprofiles

and provide us with *.cpuprofiles?

by the way, to make things simpler first, try it with python.analysis.indexing: false and try to repro slow completion. it will help us to see whether it is actually pylance completion slow and why it is slow.

thank you

@heejaechang
Copy link

ah, one more thing, if you are seeing the memory pressure log, your workspace might be too big for our default setup (using vscode as node)

Heap stats: total_heap_size=1220MB, used_heap_size=1156MB, total_physical_size=1218MB, total_available_size=2900MB, heap_size_limit=4096MB

Try this new setting (https://github.com/microsoft/pylance-release/pull/5602/files) we just added in 2024.3.100. pylance will get a lot bigger memory space than the default one.

@rchiodo
Copy link
Contributor

rchiodo commented Mar 14, 2024

My guess is this is a duplicate of the completion problem we found with Jupyter completions:
https://github.com/microsoft/pyrx/issues/4663

@ale-dg
Copy link

ale-dg commented Mar 14, 2024

My guess is this is a duplicate of the completion problem we found with Jupyter completions: https://github.com/microsoft/pyrx/issues/4663

I'd gladly confirm if it's the same, but it shows me a 404 error when opening.

@ale-dg can you try this? https://github.com/microsoft/pylance-release/wiki/Collecting-data-for-an-investigation.#collecting-cpuprofiles

and provide us with *.cpuprofiles?

by the way, to make things simpler first, try it with python.analysis.indexing: false and try to repro slow completion. it will help us to see whether it is actually pylance completion slow and why it is slow.

thank you

I'll try it after a request of @bschnurr here microsoft/pylance-release#5614 (comment)

@ale-dg
Copy link

ale-dg commented Mar 14, 2024

Sorry... this might be quite a simple question, but I am not getting the "Pylance: start profiling" option. How do I get it working?

Screenshot 2024-03-14 at 1 52 41 PM

Forget it... it says version .100, I have 1... let me update

@ale-dg
Copy link

ale-dg commented Mar 14, 2024

Hi,

I just completed the cpuprofile. I forgot to do the python.analysis.indexing: false because I have been doing so many changes and tests that I no longer know what I have changed or not, so my apologies for it. You will see a long time-gap somewhere between 14:00 and 14:12 (or somewhere around there), that is how long the notebook takes to run. After that it is all just typing to add some functions and it lagged.

Just for the fun, I tried to open the profiles with VSCode, and it crashed 🤣🫠, so maybe something you'd like to check as well.

Best

profiles.zip

@rchiodo
Copy link
Contributor

rchiodo commented Mar 14, 2024

The profiles don't show completions taking any time. Was it slow when you were typing? It shows like 150ms for completions to work:

image

This likely means the slowdown is not in pylance but somewhere else. I'm guessing it's this code here:
https://github.com/microsoft/vscode-jupyter/blob/df25cd4ba2d39227ff186c36b01e1a629e7dee88/src/standalone/intellisense/kernelCompletionProvider.ts#L96

That code is getting completions from us too but also combining them with ones from the kernel. If the kernel is slow, (which I believe you said somewhere it was slow running cells?) then completions over all would be slow.

@ale-dg
Copy link

ale-dg commented Mar 14, 2024

They become slow after executing the cells. So, if I type without executing anything, they feel normal, like in old versions. When I begin executing cells, then the lags begins, something like this:

  1. type ord = OrdinalEncoder(
  2. lag lag lag lag lag lag lag
  3. hint window pops
  4. lag lag lag lag
  5. type han
  6. lag lag lag lag lag lag lag
  7. auto complete appears
  8. lag lag lag lag
  9. allows to select or finishes the sentence
  10. lag lag lag lag lag lag

.. and so on for the rest of the time you are working on the file

Hope it makes sense

@ale-dg
Copy link

ale-dg commented Mar 14, 2024

It makes it feel not normal, like there is a complete disconnection between the keyboard, the screen, VsCode, etc

@rchiodo
Copy link
Contributor

rchiodo commented Mar 14, 2024

That sounds to me like it's the kernel completions then. If it works fine without running any cells, then just Pylance is involved. If it slows down only when the kernel is involved, then it's both Pylance and the Jupyter Kernel providing the completions. Given the cpu profile you sent, Pylance isn't taking any time to compute completions, so I'm going to transfer this issue to the Jupyter extension.

@rchiodo rchiodo transferred this issue from microsoft/pylance-release Mar 14, 2024
@ale-dg
Copy link

ale-dg commented Apr 15, 2024

@DonJayamanne @mjbvz I did some last month... would these work? See them in this comment: #210528

Best

EDIT: I meant... I did some regarding Pylance...

@ale-dg
Copy link

ale-dg commented Apr 15, 2024

Also there are some logs here, should you want to check them as well (coming from another issue microsoft/pylance-release#5614)

microsoft/pylance-release#5614 (comment)

microsoft/pylance-release#5614 (comment)

microsoft/pylance-release#5614 (comment)

@ale-dg
Copy link

ale-dg commented Apr 16, 2024

Would this be related as well? Please see microsoft/pylance-release#5748 and microsoft/pylance-release#5173

@ale-dg
Copy link

ale-dg commented Apr 16, 2024

@DonJayamanne @mjbvz @heejaechang as you can see above, I have referenced this issue to another in Pylance. I attached the same log (from stable versions - VSCode, Jupyter, Pylance, etc.) since even by opening the same notebook as above, Pylance crashed 🫠

Hope it helps

Best

Pylance.log

@DonJayamanne
Copy link
Contributor Author

@mjbvz
I think I can replicate this, not sure, but I can see 300ms spent in rendering the parameter hints on my machine.
I would like to think I have a fairly beefy machine & 300ms is a lot when typing.

Screenshot 2024-04-17 at 11 30 34

& then I get a message similar to the following in the dev tools:
WARN [perf] Renderer reported VERY LONG TASK (297ms), starting profiling session '44cde74f-3e84-495a-bcca-b1688b64c7cb'

Here's are the packages that need to be installed

pip install ipykernel scikit-learn seaborn statsmodels plotly catppuccin-matplotlib pingouin

Here is the first and second cell

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib.patches import Polygon
import plotly.express as px
import plotly.graph_objects as go
import mplcatppuccin
import warnings
from scipy.stats import iqr
import scipy.stats as stats
import statsmodels.api as sm
from statsmodels.formula.api import ols
from pingouin import ttest
from sklearn.preprocessing import (
    StandardScaler,
    Normalizer,
    PowerTransformer,
    QuantileTransformer,
    RobustScaler,
    FunctionTransformer,
    MinMaxScaler,
    MaxAbsScaler,
)
from sklearn.pipeline import Pipeline
from statsmodels.graphics.gofplots import ProbPlot
from sklearn.compose import ColumnTransformer
from sklearn.model_selection import train_test_split, cross_validate
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import StratifiedKFold
warnings.simplefilter('ignore')

%matplotlib inline
%config InlineBackend.figure_format = 'retina'
pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 2)
pd.set_option('display.colheader_justify', 'left')

mpl.style.use('latte')

& second cell to replic the issue

from sklearn.preprocessing import OrdinalEncoder

ord = OrdinalEncoder()

Adding brackets or parameters to OridinalEncoder in the last line causes issues.
Note: Its very difficult to replicate this issue, i say that because I can no longer repro this issue
Here is the MD that I captured for two of the calls.

For some reason it gets called multiple times (i could see this getting called almost 20-30 times).
All when I add just the ( opening bracket.

One of the Mardown content passed into sanitize call

<p>Encode categorical features as an integer array.</p><p>The input to this transformer should be an array-like of integers or\nstrings, denoting the values taken on by categorical (discrete) features.\nThe features are converted to ordinal integers. This results in\na single column of integers (0 to n_categories - 1) per feature.</p><p>Read more in the <code>User Guide &lt;preprocessing_categorical_features&gt;</code>.\nFor a comparison of different encoders, refer to:<br><code>sphx_glr_auto_examples_preprocessing_plot_target_encoder.py</code>.</p><h2 id="parameters">Parameters</h2>\n<p>categories : &#39;auto&#39; or a list of array-like, default=&#39;auto&#39;<br>&nbsp;&nbsp;&nbsp;&nbsp;Categories (unique values) per feature:</p><ul>\n<li>&#39;auto&#39; : Determine categories automatically from the training data.</li>\n<li>list : <code>categories[i]</code> holds the categories expected in the ith\ncolumn. The passed categories should not mix strings and numeric\nvalues, and should be sorted in case of numeric values.</li>\n</ul>\n<p>&nbsp;&nbsp;&nbsp;&nbsp;The used categories can be found in the <code>categories_</code> attribute.</p><p>dtype : number type, default=np.float64<br>&nbsp;&nbsp;&nbsp;&nbsp;Desired dtype of output.</p><p>handle_unknown : {&#39;error&#39;, &#39;use_encoded_value&#39;}, default=&#39;error&#39;<br>&nbsp;&nbsp;&nbsp;&nbsp;When set to &#39;error&#39; an error will be raised in case an unknown\ncategorical feature is present during transform. When set to\n&#39;use_encoded_value&#39;, the encoded value of unknown categories will be\nset to the value given for the parameter <code>unknown_value</code>. In<br>&nbsp;&nbsp;&nbsp;&nbsp;<code>inverse_transform</code>, an unknown category will be denoted as None.</p><p>unknown_value : int or np.nan, default=None<br>&nbsp;&nbsp;&nbsp;&nbsp;When the parameter handle_unknown is set to &#39;use_encoded_value&#39;, this\nparameter is required and will set the encoded value of unknown\ncategories. It has to be distinct from the values used to encode any of\nthe categories in <code>fit</code>. If set to np.nan, the <code>dtype</code> parameter must\nbe a float dtype.</p><p>encoded_missing_value : int or np.nan, default=np.nan<br>&nbsp;&nbsp;&nbsp;&nbsp;Encoded value of missing categories. If set to <code>np.nan</code>, then the <code>dtype</code>\nparameter must be a float dtype.</p><p>min_frequency : int or float, default=None<br>&nbsp;&nbsp;&nbsp;&nbsp;Specifies the minimum frequency below which a category will be\nconsidered infrequent.</p><ul>\n<li><p>If <code>int</code>, categories with a smaller cardinality will be considered\ninfrequent.</p></li>\n<li><p>If <code>float</code>, categories with a smaller cardinality than\n<code>min_frequency * n_samples</code>  will be considered infrequent.</p></li>\n</ul>\n<p>max_categories : int, default=None<br>&nbsp;&nbsp;&nbsp;&nbsp;Specifies an upper limit to the number of output categories for each input\nfeature when considering infrequent categories. If there are infrequent\ncategories, <code>max_categories</code> includes the category representing the\ninfrequent categories along with the frequent categories. If <code>None</code>,\nthere is no limit to the number of output features.</p><p>&nbsp;&nbsp;&nbsp;&nbsp;<code>max_categories</code> do **not** take into account missing or unknown\ncategories. Setting <code>unknown_value</code> or <code>encoded_missing_value</code> to an\ninteger will increase the number of unique integer codes by one each.\nThis can result in up to <code>max_categories + 2</code> integer codes.</p><h2 id="attributes">Attributes</h2>\n<p>categories_ : list of arrays<br>&nbsp;&nbsp;&nbsp;&nbsp;The categories of each feature determined during <code>fit</code> (in order of\nthe features in X and corresponding with the output of <code>transform</code>).\nThis does not include categories that weren&#39;t seen during <code>fit</code>.</p><p>n_features_in_ : int<br>&nbsp;&nbsp;&nbsp;&nbsp;Number of features seen during <code>fit</code>.</p><p>feature_names_in_ : ndarray of shape (<code>n_features_in_</code>,)<br>&nbsp;&nbsp;&nbsp;&nbsp;Names of features seen during <code>fit</code>. Defined only when <code>X</code>\nhas feature names that are all strings.</p><p>infrequent_categories_ : list of ndarray<br>&nbsp;&nbsp;&nbsp;&nbsp;Defined only if infrequent categories are enabled by setting\n<code>min_frequency</code> or <code>max_categories</code> to a non-default value.\n<code>infrequent_categories_[i]</code> are the infrequent categories for feature\n<code>i</code>. If the feature <code>i</code> has no infrequent categories\n<code>infrequent_categories_[i]</code> is None.</p><h2 id="see-also">See Also</h2>\n<p>OneHotEncoder : Performs a one-hot encoding of categorical features. This encoding<br>&nbsp;&nbsp;&nbsp;&nbsp;is suitable for low to medium cardinality categorical variables, both in\nsupervised and unsupervised settings.<br>TargetEncoder : Encodes categorical features using supervised signal<br>&nbsp;&nbsp;&nbsp;&nbsp;in a classification or regression pipeline. This encoding is typically\nsuitable for high cardinality categorical variables.<br>LabelEncoder : Encodes target labels with values between 0 and<br>&nbsp;&nbsp;&nbsp;&nbsp;<code>n_classes-1</code>.</p><h2 id="notes">Notes</h2>\n<p>With a high proportion of <code>nan</code> values, inferring categories becomes slow with\nPython versions before 3.10. The handling of <code>nan</code> values was improved\nfrom Python 3.10 onwards, (c.f.<br><code>bpo-43475 &lt;https://github.com/python/cpython/issues/87641&gt;</code>_).</p><h2 id="examples">Examples</h2>\n<p>Given a dataset with two features, we let the encoder find the unique\nvalues per feature and transform the data to an ordinal encoding.</p><div class="code" data-code="id#108">&gt;&gt;&gt; from sklearn.preprocessing import OrdinalEncoder\n&gt;&gt;&gt; enc = OrdinalEncoder()\n&gt;&gt;&gt; X = [['Male', 1], ['Female', 3], ['Female', 2]]\n&gt;&gt;&gt; enc.fit(X)\nOrdinalEncoder()\n&gt;&gt;&gt; enc.categories_\n[array(['Female', 'Male'], dtype=object), array([1, 2, 3], dtype=object)]\n&gt;&gt;&gt; enc.transform([['Female', 3], ['Male', 1]])\narray([[0., 2.],\n       [1., 0.]])</div><div class="code" data-code="id#109">&gt;&gt;&gt; enc.inverse_transform([[1, 0], [0, 1]])\narray([['Male', 1],\n       ['Female', 2]], dtype=object)</div><p>By default, <code>OrdinalEncoder</code> is lenient towards missing values by\npropagating them.</p><div class="code" data-code="id#110">&gt;&gt;&gt; import numpy as np\n&gt;&gt;&gt; X = [['Male', 1], ['Female', 3], ['Female', np.nan]]\n&gt;&gt;&gt; enc.fit_transform(X)\narray([[ 1.,  0.],\n       [ 0.,  1.],\n       [ 0., nan]])</div><p>You can use the parameter <code>encoded_missing_value</code> to encode missing values.</p><div class="code" data-code="id#111">&gt;&gt;&gt; enc.set_params(encoded_missing_value=-1).fit_transform(X)\narray([[ 1.,  0.],\n       [ 0.,  1.],\n       [ 0., -1.]])</div><p>Infrequent categories are enabled by setting <code>max_categories</code> or <code>min_frequency</code>.\nIn the following example, &quot;a&quot; and &quot;d&quot; are considered infrequent and grouped\ntogether into a single category, &quot;b&quot; and &quot;c&quot; are their own categories, unknown\nvalues are encoded as 3 and missing values are encoded as 4.</p><div class="code" data-code="id#112">&gt;&gt;&gt; X_train = np.array(\n...     [["a"] * 5 + ["b"] * 20 + ["c"] * 10 + ["d"] * 3 + [np.nan]],\n...     dtype=object).T\n&gt;&gt;&gt; enc = OrdinalEncoder(\n...     handle_unknown="use_encoded_value", unknown_value=3,\n...     max_categories=3, encoded_missing_value=4)\n&gt;&gt;&gt; _ = enc.fit(X_train)\n&gt;&gt;&gt; X_test = np.array([["a"], ["b"], ["c"], ["d"], ["e"], [np.nan]], dtype=object)\n&gt;&gt;&gt; enc.transform(X_test)\narray([[2.],\n       [0.],\n       [1.],\n       [2.],\n       [3.],\n       [4.]])</div>

Another sample of mardown content passed into another call

<p>Encode categorical features as an integer array.</p><p>The input to this transformer should be an array-like of integers or\nstrings, denoting the values taken on by categorical (discrete) features.\nThe features are converted to ordinal integers. This results in\na single column of integers (0 to n_categories - 1) per feature.</p><p>Read more in the <code>User Guide &lt;preprocessing_categorical_features&gt;</code>.\nFor a comparison of different encoders, refer to:<br><code>sphx_glr_auto_examples_preprocessing_plot_target_encoder.py</code>.</p><h2>Parameters</h2>\n<p>categories : 'auto' or a list of array-like, default='auto'<br>&nbsp;&nbsp;&nbsp;&nbsp;Categories (unique values) per feature:</p><ul>\n<li>'auto' : Determine categories automatically from the training data.</li>\n<li>list : <code>categories[i]</code> holds the categories expected in the ith\ncolumn. The passed categories should not mix strings and numeric\nvalues, and should be sorted in case of numeric values.</li>\n</ul>\n<p>&nbsp;&nbsp;&nbsp;&nbsp;The used categories can be found in the <code>categories_</code> attribute.</p><p>dtype : number type, default=np.float64<br>&nbsp;&nbsp;&nbsp;&nbsp;Desired dtype of output.</p><p>handle_unknown : {'error', 'use_encoded_value'}, default='error'<br>&nbsp;&nbsp;&nbsp;&nbsp;When set to 'error' an error will be raised in case an unknown\ncategorical feature is present during transform. When set to\n'use_encoded_value', the encoded value of unknown categories will be\nset to the value given for the parameter <code>unknown_value</code>. In<br>&nbsp;&nbsp;&nbsp;&nbsp;<code>inverse_transform</code>, an unknown category will be denoted as None.</p><p>unknown_value : int or np.nan, default=None<br>&nbsp;&nbsp;&nbsp;&nbsp;When the parameter handle_unknown is set to 'use_encoded_value', this\nparameter is required and will set the encoded value of unknown\ncategories. It has to be distinct from the values used to encode any of\nthe categories in <code>fit</code>. If set to np.nan, the <code>dtype</code> parameter must\nbe a float dtype.</p><p>encoded_missing_value : int or np.nan, default=np.nan<br>&nbsp;&nbsp;&nbsp;&nbsp;Encoded value of missing categories. If set to <code>np.nan</code>, then the <code>dtype</code>\nparameter must be a float dtype.</p><p>min_frequency : int or float, default=None<br>&nbsp;&nbsp;&nbsp;&nbsp;Specifies the minimum frequency below which a category will be\nconsidered infrequent.</p><ul>\n<li><p>If <code>int</code>, categories with a smaller cardinality will be considered\ninfrequent.</p></li>\n<li><p>If <code>float</code>, categories with a smaller cardinality than\n<code>min_frequency * n_samples</code>  will be considered infrequent.</p></li>\n</ul>\n<p>max_categories : int, default=None<br>&nbsp;&nbsp;&nbsp;&nbsp;Specifies an upper limit to the number of output categories for each input\nfeature when considering infrequent categories. If there are infrequent\ncategories, <code>max_categories</code> includes the category representing the\ninfrequent categories along with the frequent categories. If <code>None</code>,\nthere is no limit to the number of output features.</p><p>&nbsp;&nbsp;&nbsp;&nbsp;<code>max_categories</code> do **not** take into account missing or unknown\ncategories. Setting <code>unknown_value</code> or <code>encoded_missing_value</code> to an\ninteger will increase the number of unique integer codes by one each.\nThis can result in up to <code>max_categories + 2</code> integer codes.</p><h2>Attributes</h2>\n<p>categories_ : list of arrays<br>&nbsp;&nbsp;&nbsp;&nbsp;The categories of each feature determined during <code>fit</code> (in order of\nthe features in X and corresponding with the output of <code>transform</code>).\nThis does not include categories that weren't seen during <code>fit</code>.</p><p>n_features_in_ : int<br>&nbsp;&nbsp;&nbsp;&nbsp;Number of features seen during <code>fit</code>.</p><p>feature_names_in_ : ndarray of shape (<code>n_features_in_</code>,)<br>&nbsp;&nbsp;&nbsp;&nbsp;Names of features seen during <code>fit</code>. Defined only when <code>X</code>\nhas feature names that are all strings.</p><p>infrequent_categories_ : list of ndarray<br>&nbsp;&nbsp;&nbsp;&nbsp;Defined only if infrequent categories are enabled by setting\n<code>min_frequency</code> or <code>max_categories</code> to a non-default value.\n<code>infrequent_categories_[i]</code> are the infrequent categories for feature\n<code>i</code>. If the feature <code>i</code> has no infrequent categories\n<code>infrequent_categories_[i]</code> is None.</p><h2>See Also</h2>\n<p>OneHotEncoder : Performs a one-hot encoding of categorical features. This encoding<br>&nbsp;&nbsp;&nbsp;&nbsp;is suitable for low to medium cardinality categorical variables, both in\nsupervised and unsupervised settings.<br>TargetEncoder : Encodes categorical features using supervised signal<br>&nbsp;&nbsp;&nbsp;&nbsp;in a classification or regression pipeline. This encoding is typically\nsuitable for high cardinality categorical variables.<br>LabelEncoder : Encodes target labels with values between 0 and<br>&nbsp;&nbsp;&nbsp;&nbsp;<code>n_classes-1</code>.</p><h2>Notes</h2>\n<p>With a high proportion of <code>nan</code> values, inferring categories becomes slow with\nPython versions before 3.10. The handling of <code>nan</code> values was improved\nfrom Python 3.10 onwards, (c.f.<br><code>bpo-43475 &lt;https://github.com/python/cpython/issues/87641&gt;</code>_).</p><h2>Examples</h2>\n<p>Given a dataset with two features, we let the encoder find the unique\nvalues per feature and transform the data to an ordinal encoding.</p><div data-code="id#108">&gt;&gt;&gt; from sklearn.preprocessing import OrdinalEncoder\n&gt;&gt;&gt; enc = OrdinalEncoder()\n&gt;&gt;&gt; X = [['Male', 1], ['Female', 3], ['Female', 2]]\n&gt;&gt;&gt; enc.fit(X)\nOrdinalEncoder()\n&gt;&gt;&gt; enc.categories_\n[array(['Female', 'Male'], dtype=object), array([1, 2, 3], dtype=object)]\n&gt;&gt;&gt; enc.transform([['Female', 3], ['Male', 1]])\narray([[0., 2.],\n       [1., 0.]])</div><div data-code="id#109">&gt;&gt;&gt; enc.inverse_transform([[1, 0], [0, 1]])\narray([['Male', 1],\n       ['Female', 2]], dtype=object)</div><p>By default, <code>OrdinalEncoder</code> is lenient towards missing values by\npropagating them.</p><div data-code="id#110">&gt;&gt;&gt; import numpy as np\n&gt;&gt;&gt; X = [['Male', 1], ['Female', 3], ['Female', np.nan]]\n&gt;&gt;&gt; enc.fit_transform(X)\narray([[ 1.,  0.],\n       [ 0.,  1.],\n       [ 0., nan]])</div><p>You can use the parameter <code>encoded_missing_value</code> to encode missing values.</p><div data-code="id#111">&gt;&gt;&gt; enc.set_params(encoded_missing_value=-1).fit_transform(X)\narray([[ 1.,  0.],\n       [ 0.,  1.],\n       [ 0., -1.]])</div><p>Infrequent categories are enabled by setting <code>max_categories</code> or <code>min_frequency</code>.\nIn the following example, "a" and "d" are considered infrequent and grouped\ntogether into a single category, "b" and "c" are their own categories, unknown\nvalues are encoded as 3 and missing values are encoded as 4.</p><div data-code="id#112">&gt;&gt;&gt; X_train = np.array(\n...     [["a"] * 5 + ["b"] * 20 + ["c"] * 10 + ["d"] * 3 + [np.nan]],\n...     dtype=object).T\n&gt;&gt;&gt; enc = OrdinalEncoder(\n...     handle_unknown="use_encoded_value", unknown_value=3,\n...     max_categories=3, encoded_missing_value=4)\n&gt;&gt;&gt; _ = enc.fit(X_train)\n&gt;&gt;&gt; X_test = np.array([["a"], ["b"], ["c"], ["d"], ["e"], [np.nan]], dtype=object)\n&gt;&gt;&gt; enc.transform(X_test)\narray([[2.],\n       [0.],\n       [1.],\n       [2.],\n       [3.],\n       [4.]])</div>

@ale-dg
Copy link

ale-dg commented Apr 17, 2024

@DonJayamanne @mjbvz
Do you need the yml or the requirements txt? I can get you both if it helps you to get the environment running easier.

EDIT: Well... just in case, here they are, in case they re helpful: both the yml for Conda and the requirements txt for pip.

Best

env_info.zip

@DonJayamanne
Copy link
Contributor Author

@ale-dg Thank you very mcuh.
However even without those packages I can see some delays in the display of the parameter hints

@DonJayamanne
Copy link
Contributor Author

@mjbvz I have no knowledge of the code, but found the following to be inresteing
element.innerHTML = sanitizeRenderedMarkdown(markdown, markdownHtmlDoc.body.innerHTML) as unknown as string;

Given the fact taht the markdown has already been sanitialized a few lines earlier, why do we need to sanitize this again.
Feel like we can remove this extra sanitization call.

@DonJayamanne
Copy link
Contributor Author

@ale-dg I'm surprised you do not see these delays when you write the exact same code in a regular Python file (i.e. outside notebooks) pointing to the same Python environment.

@ale-dg
Copy link

ale-dg commented Apr 17, 2024

@DonJayamanne I haven't tried the exact same code because normally when working with ML I use notebook, not normal py files. I can try to do it tomorrow rendering the plots in the interactive window, just to see how it goes.

Although your comment comes nicely because now I am writing a streamlit interface (on a completely different Python environment) in VSCode insiders which must be a py file and it SOMETIMES lags, not as frequently as with notebooks and not as dramatically, but it does. It's very light though, I don't think it would even show on a profile because I have tried to repro the exact same delay and it didn't happen.

What I can tell you is that if I paste a piece of code from one window to another (I did it from a ipynb to a py), it lags for some seconds while IntelliSense recognises the code and the lag stays for a while, then it comes back to normal.

Best

@DonJayamanne DonJayamanne changed the title Slow completions when running cells Slow completions when editing Python code Apr 17, 2024
@DonJayamanne
Copy link
Contributor Author

From my test and t looking at code and taking into account you’re Jupyter extension works , this issue has no relation to Jupyter execution

ie If you copy the exact same code (all the cells into a single python file), you will then run into the same perf issues

anyways, that’s fine for now, I think we have all the information we need as I can replicate some of the delays at my end

@DonJayamanne DonJayamanne removed info-needed Issue requires more information from poster notebook-kernel labels Apr 17, 2024
@DonJayamanne DonJayamanne changed the title Slow completions when editing Python code Slow edits and completions when editing Python code Apr 17, 2024
@DonJayamanne DonJayamanne transferred this issue from microsoft/vscode-jupyter Apr 17, 2024
@ale-dg
Copy link

ale-dg commented Apr 17, 2024

Alright then. Thanks for the help! Hope it also helps in solving the other issue regarding the speed of execution.

Best

@DonJayamanne
Copy link
Contributor Author

Please can you provide a sample csv file used in your notebook

@ale-dg
Copy link

ale-dg commented Apr 17, 2024

Sure, attached the file.

Best

palmerpenguins_extended.csv

@heejaechang
Copy link

don't notebook vs regular hover/signaturehelp use different markdown renderer?

@DonJayamanne
Copy link
Contributor Author

don't notebook vs regular hover/signaturehelp use different markdown renderer?

From what I can tell they are the same,

@mjbvz
Copy link
Contributor

mjbvz commented Apr 18, 2024

Yes the parameter hints widget is the same everywhere, it's only notebook output that uses a different markdown renderer

@DonJayamanne
Copy link
Contributor Author

@mjbvz

Sometimes I have seen this taking 300ms when working on notebooks, we use markdown parser to build outlines and that ends up parsing markdown which results in 300ms sometimes.
So, this is definitely happening in other places too, but not 100% of the time.
The issue does definitely repro for me (again not 100% of the time), please let me know if CPU profiles (however we have this from the user) will help and what else information you need.

@leifwalsh
Copy link

leifwalsh commented Apr 20, 2024

These are long issues so forgive me if this isn't new information, but I'm pretty sure I have the same problem described here and I've noticed something that might help?

When I type code into a cell and then try to execute it, when I get this laggy behavior, I notice inlay error messages like this show up:

image

As I watch (not touching anything, just waiting for the cell to execute), it progressively changes, as if it's analyzing prefixes of the content in the cell, or as if I were typing each character into the cell one by one very slowly. Here's another one (it made progress while I was typing):

image

I also, in this notebook where I just reproduced it, had a perfectly fine experience when I started from an empty notebook, for about 13 cells worth of mixed code and markdown, then around the 14th cell something changed and it started doing the bad thing. It was very sudden, like I'd crossed a threshold or something, which surprised me because before I've only seen it sometimes, when opening notebooks I'd already authored (and this reminded me to try reading these issues again).

I'd be happy to share a reproducer (my notebook is just 339KB including some plots), gather diagnostics, etc. but it looks like you have many? If you want, just point me to a message with instructions for gathering the ones that would help right now, I see you've done that quite a lot already. :)

@DonJayamanne
Copy link
Contributor Author

Thank you,
We have made a lot of performance improvements in vscode for editing and executing notebooks
Please can you install the latest vscode insiders and the latest pre-release version of Jupyter extension and test this out

let me know how it goes

If you still run into issues please share the cpu logs while you are experiencing these same issues
Instructions here #210528 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue identified by VS Code Team member as probable bug perf
Projects
None yet
Development

No branches or pull requests

10 participants