Slow edits and completions when editing Python code #210528

DonJayamanne · 2024-03-14T00:14:42Z

Reported by @ale-dg here #206119 (comment)

Yes, but consider I did not use Pylance. Would you like me to test with it? I just activated it and opened the log trace for python and just for indexing it finishes its memory.

2024-03-13 17:51:01.092 [info] [Info - 17:51:01] (4192) Heap stats: total_heap_size=1220MB, used_heap_size=1156MB, total_physical_size=1218MB, total_available_size=2900MB, heap_size_limit=4096MB

2024-03-13 17:51:01.108 [info] [Warn - 17:51:01] (4192) Workspace indexing has hit its upper limit: 2000 files

rchiodo · 2024-03-14T00:21:15Z

How could Pylance be interfering with Jupyter execution? Unless the user only has a single core and not enough memory? That would be for the user to decide then to turn off Pylance or Jupyter.

ale-dg · 2024-03-14T01:39:41Z

Hi @rchiodo

I honestly don't know. Everything has been a mystery as it happens with a lot of users, especially with large notebooks and lot of markdowns.

These are my Mac specs:

System Info

Item	Value
CPUs	Apple M1 Pro (10 x 24)
GPU Status	2d_canvas: enabled canvas_oop_rasterization: enabled_on direct_rendering_display_compositor: disabled_off_ok gpu_compositing: enabled multiple_raster_threads: enabled_on opengl: enabled_on rasterization: enabled raw_draw: disabled_off_ok skia_graphite: disabled_off video_decode: enabled video_encode: enabled webgl: enabled webgl2: enabled webgpu: enabled
Load (avg)	3, 3, 3
Memory (System)	16.00GB (5.96GB free)
Process Argv
Screen Reader	no
VM	0%

These are the Jupyter logs I have retrieved for @DonJayamanne disabling Pylance, with and without markdown cells.

1-Jupyter-no-MD.log
1-Jupyter-with-MD.log

With MD it takes a bit longer to run, the computer warms up a little bit and it was done with Pylance deactivated. What most people has been reporting is Python/Jupyter being slow and only a couple of us have actually found out that Pylance was using a lot of resources (see #206119 (comment), #206119 (comment) and microsoft/pylance-release#5614). So yet again, no idea which one is it.

I also tried to retrieve the logs from Pylance, although when just opening VSCode, the log files something around 30,000 lines and some of the final lines are the ones above. Below you can find the log file.

Pylance.log

Let me know if I can be of any help (although I only know Python for DS 😜)

Best

DonJayamanne · 2024-03-14T02:48:11Z

@rchiodo I assumed that the numbers were off, and based on the comments from @ale-dg disabling Pylance improved things.
However there are other things that slow Jupyter as well, but Pylance was reported as being one of them, e.g. if CPU/Memory is used, then VS Code tends to slow down, which slows down Jupyter extensions & others.

Feel free to close this issue if there's nothing to be done here, we're already looking into other issues reported by the user.

ale-dg · 2024-03-14T02:57:12Z

Hi,

Before closing the issue, I was about to comment that I just ran again the large notebook with markdowns and these extensions:

Extensions (4)

Extension	Author (truncated)	Version
python	ms-	2024.2.1
vscode-pylance	ms-	2024.3.1
jupyter	ms-	2024.2.0
jupyter-renderers	ms-	1.0.17

After getting to where I wanted it (around coding cell 152), when typing in a function (i.e. OrdinalEncoder(), or any other from a library), the mini-window that pop-ups with hints loads VERY slow compared with other the previous versions when I downgraded, and it begins to lag. Also, when just beginning to type in the first parenthesis, it kind of stops everything else for loading the window (it stops around 3-5s). Then if we are lucky, it doesn't crash or lags or anything.

The same happens for the "auto-complete" function (not sure what its name is... but is the one that loads below the code for choosing a previous function or variable or something else).

Best

DonJayamanne · 2024-03-14T03:07:23Z

@ale-dg Please can you confirm that disabling Pylance extension makes it faster.
I.e. the following issue no longer exists

e mini-window that pop-ups with hints loads VERY slow compared with other the previous versions when I

Yes, you will no longer get completions, thats a different matter, however if it is still NOT faster, then we know for a fact that Pylance is not causing any delays and we can close this issue;

ale-dg · 2024-03-14T03:20:21Z

@DonJayamanne it ran with the same speed, but the lag with typing just went away. Would it be worth it to close this issue as it is actually a bug with Pylance lagging VSCode?

Best

DonJayamanne · 2024-03-14T03:35:50Z

but the lag with typing just went away.

lets leave this issue open, basically completions is slow

ale-dg · 2024-03-14T03:44:19Z

Well... just for the record, when I turned on Pylance again, I tested restarting the kernel to see what happens and all VSCode got stuck for some seconds (couldn't even scroll) and then it began to lag. In one of the issues was reported as well (and sometimes it crashes).

I also tried it without Pylance and it didn't crash. It just took its time to start running the notebook again.

Best

P.S. Also the "Go To" button still doesn't work...

heejaechang · 2024-03-14T08:04:37Z

@ale-dg can you try this? https://github.com/microsoft/pylance-release/wiki/Collecting-data-for-an-investigation.#collecting-cpuprofiles

and provide us with *.cpuprofiles?

by the way, to make things simpler first, try it with python.analysis.indexing: false and try to repro slow completion. it will help us to see whether it is actually pylance completion slow and why it is slow.

thank you

heejaechang · 2024-03-14T08:09:24Z

ah, one more thing, if you are seeing the memory pressure log, your workspace might be too big for our default setup (using vscode as node)

Heap stats: total_heap_size=1220MB, used_heap_size=1156MB, total_physical_size=1218MB, total_available_size=2900MB, heap_size_limit=4096MB

Try this new setting (https://github.com/microsoft/pylance-release/pull/5602/files) we just added in 2024.3.100. pylance will get a lot bigger memory space than the default one.

rchiodo · 2024-03-14T16:20:41Z

My guess is this is a duplicate of the completion problem we found with Jupyter completions:
https://github.com/microsoft/pyrx/issues/4663

ale-dg · 2024-03-14T19:08:44Z

My guess is this is a duplicate of the completion problem we found with Jupyter completions: https://github.com/microsoft/pyrx/issues/4663

I'd gladly confirm if it's the same, but it shows me a 404 error when opening.

@ale-dg can you try this? https://github.com/microsoft/pylance-release/wiki/Collecting-data-for-an-investigation.#collecting-cpuprofiles

and provide us with *.cpuprofiles?

by the way, to make things simpler first, try it with python.analysis.indexing: false and try to repro slow completion. it will help us to see whether it is actually pylance completion slow and why it is slow.

thank you

I'll try it after a request of @bschnurr here microsoft/pylance-release#5614 (comment)

ale-dg · 2024-03-14T19:54:24Z

Sorry... this might be quite a simple question, but I am not getting the "Pylance: start profiling" option. How do I get it working?

Forget it... it says version .100, I have 1... let me update

ale-dg · 2024-03-14T21:59:41Z

Hi,

I just completed the cpuprofile. I forgot to do the python.analysis.indexing: false because I have been doing so many changes and tests that I no longer know what I have changed or not, so my apologies for it. You will see a long time-gap somewhere between 14:00 and 14:12 (or somewhere around there), that is how long the notebook takes to run. After that it is all just typing to add some functions and it lagged.

Just for the fun, I tried to open the profiles with VSCode, and it crashed 🤣🫠, so maybe something you'd like to check as well.

Best

profiles.zip

rchiodo · 2024-03-14T22:27:17Z

The profiles don't show completions taking any time. Was it slow when you were typing? It shows like 150ms for completions to work:

This likely means the slowdown is not in pylance but somewhere else. I'm guessing it's this code here:
https://github.com/microsoft/vscode-jupyter/blob/df25cd4ba2d39227ff186c36b01e1a629e7dee88/src/standalone/intellisense/kernelCompletionProvider.ts#L96

That code is getting completions from us too but also combining them with ones from the kernel. If the kernel is slow, (which I believe you said somewhere it was slow running cells?) then completions over all would be slow.

ale-dg · 2024-03-14T22:33:41Z

They become slow after executing the cells. So, if I type without executing anything, they feel normal, like in old versions. When I begin executing cells, then the lags begins, something like this:

type ord = OrdinalEncoder(
lag lag lag lag lag lag lag
hint window pops
lag lag lag lag
type han
lag lag lag lag lag lag lag
auto complete appears
lag lag lag lag
allows to select or finishes the sentence
lag lag lag lag lag lag

.. and so on for the rest of the time you are working on the file

Hope it makes sense

ale-dg · 2024-03-14T22:35:30Z

It makes it feel not normal, like there is a complete disconnection between the keyboard, the screen, VsCode, etc

rchiodo · 2024-03-14T22:40:03Z

That sounds to me like it's the kernel completions then. If it works fine without running any cells, then just Pylance is involved. If it slows down only when the kernel is involved, then it's both Pylance and the Jupyter Kernel providing the completions. Given the cpu profile you sent, Pylance isn't taking any time to compute completions, so I'm going to transfer this issue to the Jupyter extension.

ale-dg · 2024-04-15T23:41:18Z

@DonJayamanne @mjbvz I did some last month... would these work? See them in this comment: #210528

Best

EDIT: I meant... I did some regarding Pylance...

ale-dg · 2024-04-15T23:46:39Z

Also there are some logs here, should you want to check them as well (coming from another issue microsoft/pylance-release#5614)

microsoft/pylance-release#5614 (comment)

ale-dg · 2024-04-16T00:01:29Z

Would this be related as well? Please see microsoft/pylance-release#5748 and microsoft/pylance-release#5173

ale-dg · 2024-04-16T22:52:45Z

@DonJayamanne @mjbvz @heejaechang as you can see above, I have referenced this issue to another in Pylance. I attached the same log (from stable versions - VSCode, Jupyter, Pylance, etc.) since even by opening the same notebook as above, Pylance crashed 🫠

Hope it helps

Best

Pylance.log

DonJayamanne · 2024-04-17T02:05:00Z

@mjbvz
I think I can replicate this, not sure, but I can see 300ms spent in rendering the parameter hints on my machine.
I would like to think I have a fairly beefy machine & 300ms is a lot when typing.

& then I get a message similar to the following in the dev tools:
WARN [perf] Renderer reported VERY LONG TASK (297ms), starting profiling session '44cde74f-3e84-495a-bcca-b1688b64c7cb'

Here's are the packages that need to be installed

pip install ipykernel scikit-learn seaborn statsmodels plotly catppuccin-matplotlib pingouin

Here is the first and second cell

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib.patches import Polygon
import plotly.express as px
import plotly.graph_objects as go
import mplcatppuccin
import warnings
from scipy.stats import iqr
import scipy.stats as stats
import statsmodels.api as sm
from statsmodels.formula.api import ols
from pingouin import ttest
from sklearn.preprocessing import (
    StandardScaler,
    Normalizer,
    PowerTransformer,
    QuantileTransformer,
    RobustScaler,
    FunctionTransformer,
    MinMaxScaler,
    MaxAbsScaler,
)
from sklearn.pipeline import Pipeline
from statsmodels.graphics.gofplots import ProbPlot
from sklearn.compose import ColumnTransformer
from sklearn.model_selection import train_test_split, cross_validate
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import StratifiedKFold
warnings.simplefilter('ignore')

%matplotlib inline
%config InlineBackend.figure_format = 'retina'
pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 2)
pd.set_option('display.colheader_justify', 'left')

mpl.style.use('latte')

& second cell to replic the issue

from sklearn.preprocessing import OrdinalEncoder

ord = OrdinalEncoder()

Adding brackets or parameters to OridinalEncoder in the last line causes issues.
Note: Its very difficult to replicate this issue, i say that because I can no longer repro this issue
Here is the MD that I captured for two of the calls.

For some reason it gets called multiple times (i could see this getting called almost 20-30 times).
All when I add just the ( opening bracket.

One of the Mardown content passed into sanitize call

<p>Encode categorical features as an integer array.</p><p>The input to this transformer should be an array-like of integers or\nstrings, denoting the values taken on by categorical (discrete) features.\nThe features are converted to ordinal integers. This results in\na single column of integers (0 to n_categories - 1) per feature.</p><p>Read more in the <code>User Guide &lt;preprocessing_categorical_features&gt;</code>.\nFor a comparison of different encoders, refer to:<br><code>sphx_glr_auto_examples_preprocessing_plot_target_encoder.py</code>.</p><h2 id="parameters">Parameters</h2>\n<p>categories : &#39;auto&#39; or a list of array-like, default=&#39;auto&#39;<br>&nbsp;&nbsp;&nbsp;&nbsp;Categories (unique values) per feature:</p><ul>\n<li>&#39;auto&#39; : Determine categories automatically from the training data.</li>\n<li>list : <code>categories[i]</code> holds the categories expected in the ith\ncolumn. The passed categories should not mix strings and numeric\nvalues, and should be sorted in case of numeric values.</li>\n</ul>\n<p>&nbsp;&nbsp;&nbsp;&nbsp;The used categories can be found in the <code>categories_</code> attribute.</p><p>dtype : number type, default=np.float64<br>&nbsp;&nbsp;&nbsp;&nbsp;Desired dtype of output.</p><p>handle_unknown : {&#39;error&#39;, &#39;use_encoded_value&#39;}, default=&#39;error&#39;<br>&nbsp;&nbsp;&nbsp;&nbsp;When set to &#39;error&#39; an error will be raised in case an unknown\ncategorical feature is present during transform. When set to\n&#39;use_encoded_value&#39;, the encoded value of unknown categories will be\nset to the value given for the parameter <code>unknown_value</code>. In<br>&nbsp;&nbsp;&nbsp;&nbsp;<code>inverse_transform</code>, an unknown category will be denoted as None.</p><p>unknown_value : int or np.nan, default=None<br>&nbsp;&nbsp;&nbsp;&nbsp;When the parameter handle_unknown is set to &#39;use_encoded_value&#39;, this\nparameter is required and will set the encoded value of unknown\ncategories. It has to be distinct from the values used to encode any of\nthe categories in <code>fit</code>. If set to np.nan, the <code>dtype</code> parameter must\nbe a float dtype.</p><p>encoded_missing_value : int or np.nan, default=np.nan<br>&nbsp;&nbsp;&nbsp;&nbsp;Encoded value of missing categories. If set to <code>np.nan</code>, then the <code>dtype</code>\nparameter must be a float dtype.</p><p>min_frequency : int or float, default=None<br>&nbsp;&nbsp;&nbsp;&nbsp;Specifies the minimum frequency below which a category will be\nconsidered infrequent.</p><ul>\n<li><p>If <code>int</code>, categories with a smaller cardinality will be considered\ninfrequent.</p></li>\n<li><p>If <code>float</code>, categories with a smaller cardinality than\n<code>min_frequency * n_samples</code>  will be considered infrequent.</p></li>\n</ul>\n<p>max_categories : int, default=None<br>&nbsp;&nbsp;&nbsp;&nbsp;Specifies an upper limit to the number of output categories for each input\nfeature when considering infrequent categories. If there are infrequent\ncategories, <code>max_categories</code> includes the category representing the\ninfrequent categories along with the frequent categories. If <code>None</code>,\nthere is no limit to the number of output features.</p><p>&nbsp;&nbsp;&nbsp;&nbsp;<code>max_categories</code> do **not** take into account missing or unknown\ncategories. Setting <code>unknown_value</code> or <code>encoded_missing_value</code> to an\ninteger will increase the number of unique integer codes by one each.\nThis can result in up to <code>max_categories + 2</code> integer codes.</p><h2 id="attributes">Attributes</h2>\n<p>categories_ : list of arrays<br>&nbsp;&nbsp;&nbsp;&nbsp;The categories of each feature determined during <code>fit</code> (in order of\nthe features in X and corresponding with the output of <code>transform</code>).\nThis does not include categories that weren&#39;t seen during <code>fit</code>.</p><p>n_features_in_ : int<br>&nbsp;&nbsp;&nbsp;&nbsp;Number of features seen during <code>fit</code>.</p><p>feature_names_in_ : ndarray of shape (<code>n_features_in_</code>,)<br>&nbsp;&nbsp;&nbsp;&nbsp;Names of features seen during <code>fit</code>. Defined only when <code>X</code>\nhas feature names that are all strings.</p><p>infrequent_categories_ : list of ndarray<br>&nbsp;&nbsp;&nbsp;&nbsp;Defined only if infrequent categories are enabled by setting\n<code>min_frequency</code> or <code>max_categories</code> to a non-default value.\n<code>infrequent_categories_[i]</code> are the infrequent categories for feature\n<code>i</code>. If the feature <code>i</code> has no infrequent categories\n<code>infrequent_categories_[i]</code> is None.</p><h2 id="see-also">See Also</h2>\n<p>OneHotEncoder : Performs a one-hot encoding of categorical features. This encoding<br>&nbsp;&nbsp;&nbsp;&nbsp;is suitable for low to medium cardinality categorical variables, both in\nsupervised and unsupervised settings.<br>TargetEncoder : Encodes categorical features using supervised signal<br>&nbsp;&nbsp;&nbsp;&nbsp;in a classification or regression pipeline. This encoding is typically\nsuitable for high cardinality categorical variables.<br>LabelEncoder : Encodes target labels with values between 0 and<br>&nbsp;&nbsp;&nbsp;&nbsp;<code>n_classes-1</code>.</p><h2 id="notes">Notes</h2>\n<p>With a high proportion of <code>nan</code> values, inferring categories becomes slow with\nPython versions before 3.10. The handling of <code>nan</code> values was improved\nfrom Python 3.10 onwards, (c.f.<br><code>bpo-43475 &lt;https://github.com/python/cpython/issues/87641&gt;</code>_).</p><h2 id="examples">Examples</h2>\n<p>Given a dataset with two features, we let the encoder find the unique\nvalues per feature and transform the data to an ordinal encoding.</p><div class="code" data-code="id#108">&gt;&gt;&gt; from sklearn.preprocessing import OrdinalEncoder\n&gt;&gt;&gt; enc = OrdinalEncoder()\n&gt;&gt;&gt; X = [['Male', 1], ['Female', 3], ['Female', 2]]\n&gt;&gt;&gt; enc.fit(X)\nOrdinalEncoder()\n&gt;&gt;&gt; enc.categories_\n[array(['Female', 'Male'], dtype=object), array([1, 2, 3], dtype=object)]\n&gt;&gt;&gt; enc.transform([['Female', 3], ['Male', 1]])\narray([[0., 2.],\n       [1., 0.]])</div><div class="code" data-code="id#109">&gt;&gt;&gt; enc.inverse_transform([[1, 0], [0, 1]])\narray([['Male', 1],\n       ['Female', 2]], dtype=object)</div><p>By default, <code>OrdinalEncoder</code> is lenient towards missing values by\npropagating them.</p><div class="code" data-code="id#110">&gt;&gt;&gt; import numpy as np\n&gt;&gt;&gt; X = [['Male', 1], ['Female', 3], ['Female', np.nan]]\n&gt;&gt;&gt; enc.fit_transform(X)\narray([[ 1.,  0.],\n       [ 0.,  1.],\n       [ 0., nan]])</div><p>You can use the parameter <code>encoded_missing_value</code> to encode missing values.</p><div class="code" data-code="id#111">&gt;&gt;&gt; enc.set_params(encoded_missing_value=-1).fit_transform(X)\narray([[ 1.,  0.],\n       [ 0.,  1.],\n       [ 0., -1.]])</div><p>Infrequent categories are enabled by setting <code>max_categories</code> or <code>min_frequency</code>.\nIn the following example, &quot;a&quot; and &quot;d&quot; are considered infrequent and grouped\ntogether into a single category, &quot;b&quot; and &quot;c&quot; are their own categories, unknown\nvalues are encoded as 3 and missing values are encoded as 4.</p><div class="code" data-code="id#112">&gt;&gt;&gt; X_train = np.array(\n...     [["a"] * 5 + ["b"] * 20 + ["c"] * 10 + ["d"] * 3 + [np.nan]],\n...     dtype=object).T\n&gt;&gt;&gt; enc = OrdinalEncoder(\n...     handle_unknown="use_encoded_value", unknown_value=3,\n...     max_categories=3, encoded_missing_value=4)\n&gt;&gt;&gt; _ = enc.fit(X_train)\n&gt;&gt;&gt; X_test = np.array([["a"], ["b"], ["c"], ["d"], ["e"], [np.nan]], dtype=object)\n&gt;&gt;&gt; enc.transform(X_test)\narray([[2.],\n       [0.],\n       [1.],\n       [2.],\n       [3.],\n       [4.]])</div>

Another sample of mardown content passed into another call

<p>Encode categorical features as an integer array.</p><p>The input to this transformer should be an array-like of integers or\nstrings, denoting the values taken on by categorical (discrete) features.\nThe features are converted to ordinal integers. This results in\na single column of integers (0 to n_categories - 1) per feature.</p><p>Read more in the <code>User Guide &lt;preprocessing_categorical_features&gt;</code>.\nFor a comparison of different encoders, refer to:<br><code>sphx_glr_auto_examples_preprocessing_plot_target_encoder.py</code>.</p><h2>Parameters</h2>\n<p>categories : 'auto' or a list of array-like, default='auto'<br>&nbsp;&nbsp;&nbsp;&nbsp;Categories (unique values) per feature:</p><ul>\n<li>'auto' : Determine categories automatically from the training data.</li>\n<li>list : <code>categories[i]</code> holds the categories expected in the ith\ncolumn. The passed categories should not mix strings and numeric\nvalues, and should be sorted in case of numeric values.</li>\n</ul>\n<p>&nbsp;&nbsp;&nbsp;&nbsp;The used categories can be found in the <code>categories_</code> attribute.</p><p>dtype : number type, default=np.float64<br>&nbsp;&nbsp;&nbsp;&nbsp;Desired dtype of output.</p><p>handle_unknown : {'error', 'use_encoded_value'}, default='error'<br>&nbsp;&nbsp;&nbsp;&nbsp;When set to 'error' an error will be raised in case an unknown\ncategorical feature is present during transform. When set to\n'use_encoded_value', the encoded value of unknown categories will be\nset to the value given for the parameter <code>unknown_value</code>. In<br>&nbsp;&nbsp;&nbsp;&nbsp;<code>inverse_transform</code>, an unknown category will be denoted as None.</p><p>unknown_value : int or np.nan, default=None<br>&nbsp;&nbsp;&nbsp;&nbsp;When the parameter handle_unknown is set to 'use_encoded_value', this\nparameter is required and will set the encoded value of unknown\ncategories. It has to be distinct from the values used to encode any of\nthe categories in <code>fit</code>. If set to np.nan, the <code>dtype</code> parameter must\nbe a float dtype.</p><p>encoded_missing_value : int or np.nan, default=np.nan<br>&nbsp;&nbsp;&nbsp;&nbsp;Encoded value of missing categories. If set to <code>np.nan</code>, then the <code>dtype</code>\nparameter must be a float dtype.</p><p>min_frequency : int or float, default=None<br>&nbsp;&nbsp;&nbsp;&nbsp;Specifies the minimum frequency below which a category will be\nconsidered infrequent.</p><ul>\n<li><p>If <code>int</code>, categories with a smaller cardinality will be considered\ninfrequent.</p></li>\n<li><p>If <code>float</code>, categories with a smaller cardinality than\n<code>min_frequency * n_samples</code>  will be considered infrequent.</p></li>\n</ul>\n<p>max_categories : int, default=None<br>&nbsp;&nbsp;&nbsp;&nbsp;Specifies an upper limit to the number of output categories for each input\nfeature when considering infrequent categories. If there are infrequent\ncategories, <code>max_categories</code> includes the category representing the\ninfrequent categories along with the frequent categories. If <code>None</code>,\nthere is no limit to the number of output features.</p><p>&nbsp;&nbsp;&nbsp;&nbsp;<code>max_categories</code> do **not** take into account missing or unknown\ncategories. Setting <code>unknown_value</code> or <code>encoded_missing_value</code> to an\ninteger will increase the number of unique integer codes by one each.\nThis can result in up to <code>max_categories + 2</code> integer codes.</p><h2>Attributes</h2>\n<p>categories_ : list of arrays<br>&nbsp;&nbsp;&nbsp;&nbsp;The categories of each feature determined during <code>fit</code> (in order of\nthe features in X and corresponding with the output of <code>transform</code>).\nThis does not include categories that weren't seen during <code>fit</code>.</p><p>n_features_in_ : int<br>&nbsp;&nbsp;&nbsp;&nbsp;Number of features seen during <code>fit</code>.</p><p>feature_names_in_ : ndarray of shape (<code>n_features_in_</code>,)<br>&nbsp;&nbsp;&nbsp;&nbsp;Names of features seen during <code>fit</code>. Defined only when <code>X</code>\nhas feature names that are all strings.</p><p>infrequent_categories_ : list of ndarray<br>&nbsp;&nbsp;&nbsp;&nbsp;Defined only if infrequent categories are enabled by setting\n<code>min_frequency</code> or <code>max_categories</code> to a non-default value.\n<code>infrequent_categories_[i]</code> are the infrequent categories for feature\n<code>i</code>. If the feature <code>i</code> has no infrequent categories\n<code>infrequent_categories_[i]</code> is None.</p><h2>See Also</h2>\n<p>OneHotEncoder : Performs a one-hot encoding of categorical features. This encoding<br>&nbsp;&nbsp;&nbsp;&nbsp;is suitable for low to medium cardinality categorical variables, both in\nsupervised and unsupervised settings.<br>TargetEncoder : Encodes categorical features using supervised signal<br>&nbsp;&nbsp;&nbsp;&nbsp;in a classification or regression pipeline. This encoding is typically\nsuitable for high cardinality categorical variables.<br>LabelEncoder : Encodes target labels with values between 0 and<br>&nbsp;&nbsp;&nbsp;&nbsp;<code>n_classes-1</code>.</p><h2>Notes</h2>\n<p>With a high proportion of <code>nan</code> values, inferring categories becomes slow with\nPython versions before 3.10. The handling of <code>nan</code> values was improved\nfrom Python 3.10 onwards, (c.f.<br><code>bpo-43475 &lt;https://github.com/python/cpython/issues/87641&gt;</code>_).</p><h2>Examples</h2>\n<p>Given a dataset with two features, we let the encoder find the unique\nvalues per feature and transform the data to an ordinal encoding.</p><div data-code="id#108">&gt;&gt;&gt; from sklearn.preprocessing import OrdinalEncoder\n&gt;&gt;&gt; enc = OrdinalEncoder()\n&gt;&gt;&gt; X = [['Male', 1], ['Female', 3], ['Female', 2]]\n&gt;&gt;&gt; enc.fit(X)\nOrdinalEncoder()\n&gt;&gt;&gt; enc.categories_\n[array(['Female', 'Male'], dtype=object), array([1, 2, 3], dtype=object)]\n&gt;&gt;&gt; enc.transform([['Female', 3], ['Male', 1]])\narray([[0., 2.],\n       [1., 0.]])</div><div data-code="id#109">&gt;&gt;&gt; enc.inverse_transform([[1, 0], [0, 1]])\narray([['Male', 1],\n       ['Female', 2]], dtype=object)</div><p>By default, <code>OrdinalEncoder</code> is lenient towards missing values by\npropagating them.</p><div data-code="id#110">&gt;&gt;&gt; import numpy as np\n&gt;&gt;&gt; X = [['Male', 1], ['Female', 3], ['Female', np.nan]]\n&gt;&gt;&gt; enc.fit_transform(X)\narray([[ 1.,  0.],\n       [ 0.,  1.],\n       [ 0., nan]])</div><p>You can use the parameter <code>encoded_missing_value</code> to encode missing values.</p><div data-code="id#111">&gt;&gt;&gt; enc.set_params(encoded_missing_value=-1).fit_transform(X)\narray([[ 1.,  0.],\n       [ 0.,  1.],\n       [ 0., -1.]])</div><p>Infrequent categories are enabled by setting <code>max_categories</code> or <code>min_frequency</code>.\nIn the following example, "a" and "d" are considered infrequent and grouped\ntogether into a single category, "b" and "c" are their own categories, unknown\nvalues are encoded as 3 and missing values are encoded as 4.</p><div data-code="id#112">&gt;&gt;&gt; X_train = np.array(\n...     [["a"] * 5 + ["b"] * 20 + ["c"] * 10 + ["d"] * 3 + [np.nan]],\n...     dtype=object).T\n&gt;&gt;&gt; enc = OrdinalEncoder(\n...     handle_unknown="use_encoded_value", unknown_value=3,\n...     max_categories=3, encoded_missing_value=4)\n&gt;&gt;&gt; _ = enc.fit(X_train)\n&gt;&gt;&gt; X_test = np.array([["a"], ["b"], ["c"], ["d"], ["e"], [np.nan]], dtype=object)\n&gt;&gt;&gt; enc.transform(X_test)\narray([[2.],\n       [0.],\n       [1.],\n       [2.],\n       [3.],\n       [4.]])</div>

ale-dg · 2024-04-17T02:09:33Z

@DonJayamanne @mjbvz
Do you need the yml or the requirements txt? I can get you both if it helps you to get the environment running easier.

EDIT: Well... just in case, here they are, in case they re helpful: both the yml for Conda and the requirements txt for pip.

Best

env_info.zip

DonJayamanne · 2024-04-17T03:27:18Z

@ale-dg Thank you very mcuh.
However even without those packages I can see some delays in the display of the parameter hints

DonJayamanne · 2024-04-17T03:28:25Z

@mjbvz I have no knowledge of the code, but found the following to be inresteing
element.innerHTML = sanitizeRenderedMarkdown(markdown, markdownHtmlDoc.body.innerHTML) as unknown as string;

Given the fact taht the markdown has already been sanitialized a few lines earlier, why do we need to sanitize this again.
Feel like we can remove this extra sanitization call.

DonJayamanne · 2024-04-17T03:29:30Z

@ale-dg I'm surprised you do not see these delays when you write the exact same code in a regular Python file (i.e. outside notebooks) pointing to the same Python environment.

ale-dg · 2024-04-17T04:05:20Z

@DonJayamanne I haven't tried the exact same code because normally when working with ML I use notebook, not normal py files. I can try to do it tomorrow rendering the plots in the interactive window, just to see how it goes.

Although your comment comes nicely because now I am writing a streamlit interface (on a completely different Python environment) in VSCode insiders which must be a py file and it SOMETIMES lags, not as frequently as with notebooks and not as dramatically, but it does. It's very light though, I don't think it would even show on a profile because I have tried to repro the exact same delay and it didn't happen.

What I can tell you is that if I paste a piece of code from one window to another (I did it from a ipynb to a py), it lags for some seconds while IntelliSense recognises the code and the lag stays for a while, then it comes back to normal.

Best

DonJayamanne · 2024-04-17T04:44:41Z

From my test and t looking at code and taking into account you’re Jupyter extension works , this issue has no relation to Jupyter execution

ie If you copy the exact same code (all the cells into a single python file), you will then run into the same perf issues

anyways, that’s fine for now, I think we have all the information we need as I can replicate some of the delays at my end

ale-dg · 2024-04-17T04:55:48Z

Alright then. Thanks for the help! Hope it also helps in solving the other issue regarding the speed of execution.

Best

DonJayamanne · 2024-04-17T05:56:33Z

Please can you provide a sample csv file used in your notebook

ale-dg · 2024-04-17T17:18:25Z

Sure, attached the file.

Best

palmerpenguins_extended.csv

heejaechang · 2024-04-18T01:10:38Z

don't notebook vs regular hover/signaturehelp use different markdown renderer?

DonJayamanne · 2024-04-18T01:49:22Z

don't notebook vs regular hover/signaturehelp use different markdown renderer?

From what I can tell they are the same,

mjbvz · 2024-04-18T16:17:05Z

Yes the parameter hints widget is the same everywhere, it's only notebook output that uses a different markdown renderer

DonJayamanne · 2024-04-18T19:59:47Z

@mjbvz

Sometimes I have seen this taking 300ms when working on notebooks, we use markdown parser to build outlines and that ends up parsing markdown which results in 300ms sometimes.
So, this is definitely happening in other places too, but not 100% of the time.
The issue does definitely repro for me (again not 100% of the time), please let me know if CPU profiles (however we have this from the user) will help and what else information you need.

leifwalsh · 2024-04-20T21:34:08Z

These are long issues so forgive me if this isn't new information, but I'm pretty sure I have the same problem described here and I've noticed something that might help?

When I type code into a cell and then try to execute it, when I get this laggy behavior, I notice inlay error messages like this show up:

As I watch (not touching anything, just waiting for the cell to execute), it progressively changes, as if it's analyzing prefixes of the content in the cell, or as if I were typing each character into the cell one by one very slowly. Here's another one (it made progress while I was typing):

I also, in this notebook where I just reproduced it, had a perfectly fine experience when I started from an empty notebook, for about 13 cells worth of mixed code and markdown, then around the 14th cell something changed and it started doing the bad thing. It was very sudden, like I'd crossed a threshold or something, which surprised me because before I've only seen it sometimes, when opening notebooks I'd already authored (and this reminded me to try reading these issues again).

I'd be happy to share a reproducer (my notebook is just 339KB including some plots), gather diagnostics, etc. but it looks like you have many? If you want, just point me to a message with instructions for gathering the ones that would help right now, I see you've done that quite a lot already. :)

DonJayamanne · 2024-04-20T23:12:37Z

Thank you,
We have made a lot of performance improvements in vscode for editing and executing notebooks
Please can you install the latest vscode insiders and the latest pre-release version of Jupyter extension and test this out

let me know how it goes

If you still run into issues please share the cpu logs while you are experiencing these same issues
Instructions here #210528 (comment)

DonJayamanne added the bug Issue identified by VS Code Team member as probable bug label Mar 14, 2024

DonJayamanne assigned luabud and debonte Mar 14, 2024

DonJayamanne changed the title ~~Pylance chewing up resources when running cells in Jupyter notebooks~~ Resource usage of Pylance slowing Jupyter notebooks Mar 14, 2024

DonJayamanne transferred this issue from microsoft/vscode-jupyter Mar 14, 2024

DonJayamanne mentioned this issue Mar 14, 2024

Jupyter Notebook Very Slow and Problems with Cell Executions #206119

Open

DonJayamanne changed the title ~~Resource usage of Pylance slowing Jupyter notebooks~~ Possibly high Resource usage of Pylance Mar 14, 2024

DonJayamanne changed the title ~~Possibly high Resource usage of Pylance~~ Slow completions Mar 14, 2024

debonte added the perf label Mar 14, 2024

ale-dg mentioned this issue Mar 14, 2024

Pylance using a lot of memory microsoft/pylance-release#5614

Open

rchiodo transferred this issue from microsoft/pylance-release Mar 14, 2024

rchiodo unassigned debonte Mar 14, 2024

ale-dg mentioned this issue Apr 16, 2024

the more cells in jupyter notebook, the more file to analyze in pylance microsoft/pylance-release#5748

Open

DonJayamanne mentioned this issue Apr 17, 2024

Avoid duplicate markdown sanitization #210524

Closed

DonJayamanne changed the title ~~Slow completions when running cells~~ Slow completions when editing Python code Apr 17, 2024

DonJayamanne removed info-needed Issue requires more information from poster notebook-kernel labels Apr 17, 2024

DonJayamanne changed the title ~~Slow completions when editing Python code~~ Slow edits and completions when editing Python code Apr 17, 2024

DonJayamanne transferred this issue from microsoft/vscode-jupyter Apr 17, 2024

DonJayamanne assigned mjbvz and unassigned DonJayamanne Apr 25, 2024

DonJayamanne mentioned this issue Apr 29, 2024

Track markdown rendering delays in parameter hints #211591

Merged

Slow edits and completions when editing Python code #210528

Slow edits and completions when editing Python code #210528

Comments

DonJayamanne commented Mar 14, 2024 • edited

rchiodo commented Mar 14, 2024

ale-dg commented Mar 14, 2024

DonJayamanne commented Mar 14, 2024

ale-dg commented Mar 14, 2024

DonJayamanne commented Mar 14, 2024

ale-dg commented Mar 14, 2024

DonJayamanne commented Mar 14, 2024

ale-dg commented Mar 14, 2024 • edited

heejaechang commented Mar 14, 2024

heejaechang commented Mar 14, 2024

rchiodo commented Mar 14, 2024

ale-dg commented Mar 14, 2024

ale-dg commented Mar 14, 2024 • edited

ale-dg commented Mar 14, 2024

rchiodo commented Mar 14, 2024

ale-dg commented Mar 14, 2024 • edited

ale-dg commented Mar 14, 2024

rchiodo commented Mar 14, 2024 • edited

ale-dg commented Apr 15, 2024 • edited

ale-dg commented Apr 15, 2024

ale-dg commented Apr 16, 2024

ale-dg commented Apr 16, 2024

DonJayamanne commented Apr 17, 2024

One of the Mardown content passed into sanitize call

Another sample of mardown content passed into another call

ale-dg commented Apr 17, 2024 • edited

DonJayamanne commented Apr 17, 2024

DonJayamanne commented Apr 17, 2024

DonJayamanne commented Apr 17, 2024

ale-dg commented Apr 17, 2024

DonJayamanne commented Apr 17, 2024

ale-dg commented Apr 17, 2024

DonJayamanne commented Apr 17, 2024

ale-dg commented Apr 17, 2024

heejaechang commented Apr 18, 2024

DonJayamanne commented Apr 18, 2024

mjbvz commented Apr 18, 2024

DonJayamanne commented Apr 18, 2024

leifwalsh commented Apr 20, 2024 • edited

DonJayamanne commented Apr 20, 2024

DonJayamanne commented Mar 14, 2024 •

edited

ale-dg commented Mar 14, 2024 •

edited

ale-dg commented Mar 14, 2024 •

edited

ale-dg commented Mar 14, 2024 •

edited

rchiodo commented Mar 14, 2024 •

edited

ale-dg commented Apr 15, 2024 •

edited

ale-dg commented Apr 17, 2024 •

edited

leifwalsh commented Apr 20, 2024 •

edited