Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamlit startup time could be reduced from 1s to 400ms #6066

Closed
6 of 7 tasks
sfc-gh-tteixeira opened this issue Feb 6, 2023 · 5 comments · Fixed by #8150
Closed
6 of 7 tasks

Streamlit startup time could be reduced from 1s to 400ms #6066

sfc-gh-tteixeira opened this issue Feb 6, 2023 · 5 comments · Fixed by #8150
Labels
area:performance type:enhancement Requests for feature enhancements or new features

Comments

@sfc-gh-tteixeira
Copy link
Contributor

sfc-gh-tteixeira commented Feb 6, 2023

Problem

The time between streamlit run foo.py and having a blank app load in the browser feels a bit slow nowadays!

Based on a quick analysis, a simple way to cut startup time by 60% would be to lazy-load certain imports.

Methodology

  1. On a new environment, install Streamlit
  2. Call python -X importtime -c 'import streamlit' 2> latency.log
  3. Look at latency.log by hand or with Tuna.

Interestingly, even if you run this multiple times, there's no real change in the numbers.

Machine: M2 Macbook Pro

Results

The result of step 2 above is attached as latency.log.

Also, here's a pretty chart:
Screenshot 2023-02-05 at 22 01 09

Findings

Streamlit takes 1s to initialize:

import time:       641 |    1011102 | streamlit

Major culprits:

  • Importing pandas inside type_util: 240ms

    import time:      2062 |     240916 |                           pandas
    import time:       927 |     281330 |                         streamlit.type_util
    import time:      2104 |     440877 |                       streamlit.runtime.state.session_state
    import time:       320 |     441196 |                     streamlit.runtime.state.safe_session_state
    import time:       175 |     442065 |                   streamlit.runtime.state
    import time:        11 |     442076 |                 streamlit.runtime.state.session_state
    import time:       444 |     465120 |             streamlit.runtime.app_session
    import time:       785 |     506798 |           streamlit.runtime.runtime
    import time:       146 |     506944 |         streamlit.runtime
    import time:        11 |     506954 |       streamlit.runtime.scriptrunner
    import time:       193 |     507147 |     streamlit.cursor
    
  • Importing pandas.style in st.arrow: 209ms

    import time:       957 |     209657 |       pandas.io.formats.style
    import time:       278 |     209934 |     streamlit.elements.arrow
    
  • Setting up the plotly theme: 49ms

    import time:     46609 |      49771 |       streamlit.elements.lib.streamlit_plotly_theme
    import time:      2810 |      54590 |     streamlit.elements.plotly_chart
    
  • Importing altair inside of arrow_altair (mostly to set up theme): 48ms

    import time:       345 |      48145 |       altair
    import time:       124 |        124 |           streamlit.elements.lib
    import time:       264 |        388 |         streamlit.elements.lib.dicttools
    import time:       293 |        680 |       streamlit.elements.arrow_vega_lite
    import time:       300 |        300 |         streamlit.elements.form
    import time:       197 |        497 |       streamlit.elements.utils
    import time:       462 |      49782 |     streamlit.elements.arrow_altair
    
  • Loading the validators library: 40ms

    import time:       309 |      40592 |       validators
    import time:       413 |      41005 |     streamlit.elements.media
    
  • Importing requests inside streamlit.version: 52ms

    import time:       364 |      51844 |     requests
    import time:      1632 |      59739 |   streamlit.version
    
  • String util is slow to import probably due to emoji computation: 27ms

    import time:     26409 |      27205 |       streamlit.string_util
    import time:       168 |      27587 |     streamlit.file_util
    import time:       729 |      58916 |   streamlit.config
    

Proposal

  1. For some of the culprits above, we can just move the offending import out of the file's root scope and into the actual scope where it's used.
  2. For the files where we import modules to set up Altair and Plotly theming, stop setting up the themes globally at import time and, instead, change the Altair and Plotly figures to use our theme before marshalling them.
  3. For the emoji computation: move EMOJI_EXTRACTION_REGEX into a functools.cache'd function.
  4. Consider no longer checking the Streamlit version upstream to prompt people to upgrade. We do this with 10% probability on every run, and requires requests (52ms) -- but it's unclear whether this actually causes a nontrivial number of people to actually upgrade.

Community voting on feature requests enables the Streamlit team to understand which features are most important to our users.

If you'd like the Streamlit team to prioritize this feature request, please use the 👍 (thumbs up emoji) reaction in response to the initial post.

@sfc-gh-tteixeira sfc-gh-tteixeira added the type:enhancement Requests for feature enhancements or new features label Feb 6, 2023
@sfc-gh-tteixeira sfc-gh-tteixeira changed the title Streamlit startup time could be reduced from 1s to 600ms Streamlit startup time could be reduced from 1s to 400ms Feb 6, 2023
@cmayoracurzio
Copy link

Thanks for analyzing this! Some time ago I opened enhancement request #5798 with the same goal in mind (reduce startup time) but mostly focused on the size and quantity of Streamlit core static files. Sharing here because looking at both (python imports and static files) might be worthwhile.

@LukasMasuch
Copy link
Collaborator

LukasMasuch commented Feb 7, 2023

Thanks for the investigation 👍 I did a quick check. I think besides the Plotly theme, all other proposed changes are viable.

As far as I remember, in order to apply our Streamlit chart theme in the frontend we need to apply changes to the global Plotly theme before any chart object is created (fyi @willhuang1997).

Regarding Altair theme: removing it might break a few apps (that's why we kept it in), but it was never an official feature anyways. So, I assume this isn't a problem.

@sfc-gh-tteixeira
Copy link
Contributor Author

We can always do this in parts: first the easy ones, then the themes (if possible).

@LukasMasuch
Copy link
Collaborator

LukasMasuch commented May 23, 2023

Update: the altair dependency was moved into lazy loading in this PR: #6618 and pandas styler, validator, and requests here: #6531. This will be part of the 1.23 release giving us a small boost in startup performance.

LukasMasuch added a commit that referenced this issue Feb 8, 2024
## Describe your changes

The emoji data is the biggest object when running a blank Streamlit app.
Compiling the regex is also slightly expensive. However, the emoji data
is only required if there is a check for emojis; many apps might not
require this. Therefore, this PR makes the emoji module to lazy load
only if it is actually required.

This also adds a precheck for emoji checks to make sure that the string
even contains non alphanumeric characters before using the more
expensive emoji regex.

## GitHub Issue Link (if applicable)

Related to #6066

## Testing Plan

- Added e2e test to check that some lazy-loaded modules are not imported
in an almost blank Streamlit app.

---

**Contribution License Agreement**

By submitting this pull request you agree that all contributions to this
project are made under the Apache 2.0 license.
LukasMasuch added a commit that referenced this issue Feb 8, 2024
## Describe your changes

A small refactoring related to the creation of the plotly theme. Instead
of using a module for the plotly theme, we are capturing the creation in
a method to make it slightly better behave to enable lazy-loading for
plotly.

There aren't any other logical changes in here related to the Plotly
theme.

This refactor also applies a couple of other small refactorings related
to the imports.

Related to #6066

---

**Contribution License Agreement**

By submitting this pull request you agree that all contributions to this
project are made under the Apache 2.0 license.
LukasMasuch added a commit that referenced this issue Feb 8, 2024
## Describe your changes

If a user configures `server.fileWatcherType` to none or poll, there
isn't any reason we need to import the watchdog package, even if it is
installed on the system. This PR applies a couple of small refactorings
to make the `event_based_path_watcher` lazy-loaded (only importer if
actually needed).

## GitHub Issue Link (if applicable)

Related to #6066

## Testing Plan

- Added e2e test to make sure that `watchdog` and
`streamlit.watcher.event_based_path_watcher` are lazy loaded.

---

**Contribution License Agreement**

By submitting this pull request you agree that all contributions to this
project are made under the Apache 2.0 license.
LukasMasuch added a commit that referenced this issue Feb 9, 2024
## Describe your changes

The vendored pympler module is only used when someone explicitly
requests the metrics via the metrics endpoint. This PR moves the module
to lazyloading.

## GitHub Issue Link (if applicable)

Related to #6066

## Testing Plan

- Added e2e test to ensure that vendored `pympler` module is lazy
loaded.
---

**Contribution License Agreement**

By submitting this pull request you agree that all contributions to this
project are made under the Apache 2.0 license.
@LukasMasuch
Copy link
Collaborator

LukasMasuch commented Feb 9, 2024

Removed a couple of dependencies:

And lazy-loaded a couple of other modules:

This allows to reduce the import time to as low as ~200ms:

Screenshot 2024-02-09 at 15 10 31

And will enable significant speed ups related to the loading time of stlite 🥳 We also now have an e2e test to make sure that we don't accidentally import any of the lazy-loaded packages.

LukasMasuch added a commit that referenced this issue Feb 9, 2024
## Describe your changes

Lazy-load `pandas` and `pyarrow` only when required (e.g. usage of
`st.dataframe`).

This PR also includes a couple of other small refactorings related to
typing and imports.

## GitHub Issue Link (if applicable)

Related to #6066

## Testing Plan

- Added e2e test to ensure that `pyarrow` and `pandas` are lazy-loaded. 
---

**Contribution License Agreement**

By submitting this pull request you agree that all contributions to this
project are made under the Apache 2.0 license.
LukasMasuch added a commit that referenced this issue Feb 10, 2024
## Describe your changes

The deprecation of the `runner.fixMatplotlib` and the decision to always
use the `Agg` backend, made it possible to just configure the matplotlib
backend via the config option (also see the previous TODO comment). This
prevents an unnecessary import of matplotlib at the server start and
allows to lazy load this import.

## GitHub Issue Link (if applicable)

Related to #6066

## Testing Plan

- Added unit and e2e tests to make sure that `matplotlib` is properly
lazy-loaded.

---

**Contribution License Agreement**

By submitting this pull request you agree that all contributions to this
project are made under the Apache 2.0 license.
LukasMasuch added a commit that referenced this issue Feb 12, 2024
## Describe your changes

Lazy-load `numpy` and `pillow` only when required (e.g. usage of
`st.image`).

This PR also includes a couple of other small refactorings related to
typing and imports.

## GitHub Issue Link (if applicable)

Related to #6066

## Testing Plan

- Added e2e test to ensure that `numpy` and `pillow` are lazy-loaded. 
---

**Contribution License Agreement**

By submitting this pull request you agree that all contributions to this
project are made under the Apache 2.0 license.
LukasMasuch added a commit that referenced this issue Feb 13, 2024
## Describe your changes

Lazy-load `click` and `toml` dependencies. `click` will always be loaded
when Streamlit is started via its CLI which it will be in most use
cases. But it can also run without having `click` installed if the app
is not started via CLI (e.g. in stlite).

This PR also includes a couple of other small refactorings related to
typing and imports.

## GitHub Issue Link (if applicable)

Related to #6066

## Testing Plan

- Update unit tests and add `toml` to e2e test to check that its not
loaded yet.
- We cannot do the same for `click`, since the e2e tests use the CLI to
start Streamlit.

---

**Contribution License Agreement**

By submitting this pull request you agree that all contributions to this
project are made under the Apache 2.0 license.
LukasMasuch added a commit that referenced this issue Feb 13, 2024
## Describe your changes

This is the final lazy-loading PR for now. It lazy-loads the following
modules:

- `unittest`
- `packaging`
- `streamlit.proto.openmetrics_data_model_pb2`

This PR also includes a couple of other refactorings related to typing
and imports.

## GitHub Issue Link (if applicable)

- Closes #6066

## Testing Plan

- Add lazy-loaded modules to e2e test.

---

**Contribution License Agreement**

By submitting this pull request you agree that all contributions to this
project are made under the Apache 2.0 license.
zyxue pushed a commit to zyxue/streamlit that referenced this issue Apr 16, 2024
## Describe your changes

The emoji data is the biggest object when running a blank Streamlit app.
Compiling the regex is also slightly expensive. However, the emoji data
is only required if there is a check for emojis; many apps might not
require this. Therefore, this PR makes the emoji module to lazy load
only if it is actually required.

This also adds a precheck for emoji checks to make sure that the string
even contains non alphanumeric characters before using the more
expensive emoji regex.

## GitHub Issue Link (if applicable)

Related to streamlit#6066

## Testing Plan

- Added e2e test to check that some lazy-loaded modules are not imported
in an almost blank Streamlit app.

---

**Contribution License Agreement**

By submitting this pull request you agree that all contributions to this
project are made under the Apache 2.0 license.
zyxue pushed a commit to zyxue/streamlit that referenced this issue Apr 16, 2024
## Describe your changes

A small refactoring related to the creation of the plotly theme. Instead
of using a module for the plotly theme, we are capturing the creation in
a method to make it slightly better behave to enable lazy-loading for
plotly.

There aren't any other logical changes in here related to the Plotly
theme.

This refactor also applies a couple of other small refactorings related
to the imports.

Related to streamlit#6066

---

**Contribution License Agreement**

By submitting this pull request you agree that all contributions to this
project are made under the Apache 2.0 license.
zyxue pushed a commit to zyxue/streamlit that referenced this issue Apr 16, 2024
## Describe your changes

If a user configures `server.fileWatcherType` to none or poll, there
isn't any reason we need to import the watchdog package, even if it is
installed on the system. This PR applies a couple of small refactorings
to make the `event_based_path_watcher` lazy-loaded (only importer if
actually needed).

## GitHub Issue Link (if applicable)

Related to streamlit#6066

## Testing Plan

- Added e2e test to make sure that `watchdog` and
`streamlit.watcher.event_based_path_watcher` are lazy loaded.

---

**Contribution License Agreement**

By submitting this pull request you agree that all contributions to this
project are made under the Apache 2.0 license.
zyxue pushed a commit to zyxue/streamlit that referenced this issue Apr 16, 2024
## Describe your changes

The vendored pympler module is only used when someone explicitly
requests the metrics via the metrics endpoint. This PR moves the module
to lazyloading.

## GitHub Issue Link (if applicable)

Related to streamlit#6066

## Testing Plan

- Added e2e test to ensure that vendored `pympler` module is lazy
loaded.
---

**Contribution License Agreement**

By submitting this pull request you agree that all contributions to this
project are made under the Apache 2.0 license.
zyxue pushed a commit to zyxue/streamlit that referenced this issue Apr 16, 2024
## Describe your changes

Lazy-load `pandas` and `pyarrow` only when required (e.g. usage of
`st.dataframe`).

This PR also includes a couple of other small refactorings related to
typing and imports.

## GitHub Issue Link (if applicable)

Related to streamlit#6066

## Testing Plan

- Added e2e test to ensure that `pyarrow` and `pandas` are lazy-loaded. 
---

**Contribution License Agreement**

By submitting this pull request you agree that all contributions to this
project are made under the Apache 2.0 license.
zyxue pushed a commit to zyxue/streamlit that referenced this issue Apr 16, 2024
## Describe your changes

The deprecation of the `runner.fixMatplotlib` and the decision to always
use the `Agg` backend, made it possible to just configure the matplotlib
backend via the config option (also see the previous TODO comment). This
prevents an unnecessary import of matplotlib at the server start and
allows to lazy load this import.

## GitHub Issue Link (if applicable)

Related to streamlit#6066

## Testing Plan

- Added unit and e2e tests to make sure that `matplotlib` is properly
lazy-loaded.

---

**Contribution License Agreement**

By submitting this pull request you agree that all contributions to this
project are made under the Apache 2.0 license.
zyxue pushed a commit to zyxue/streamlit that referenced this issue Apr 16, 2024
## Describe your changes

Lazy-load `numpy` and `pillow` only when required (e.g. usage of
`st.image`).

This PR also includes a couple of other small refactorings related to
typing and imports.

## GitHub Issue Link (if applicable)

Related to streamlit#6066

## Testing Plan

- Added e2e test to ensure that `numpy` and `pillow` are lazy-loaded. 
---

**Contribution License Agreement**

By submitting this pull request you agree that all contributions to this
project are made under the Apache 2.0 license.
zyxue pushed a commit to zyxue/streamlit that referenced this issue Apr 16, 2024
## Describe your changes

Lazy-load `click` and `toml` dependencies. `click` will always be loaded
when Streamlit is started via its CLI which it will be in most use
cases. But it can also run without having `click` installed if the app
is not started via CLI (e.g. in stlite).

This PR also includes a couple of other small refactorings related to
typing and imports.

## GitHub Issue Link (if applicable)

Related to streamlit#6066

## Testing Plan

- Update unit tests and add `toml` to e2e test to check that its not
loaded yet.
- We cannot do the same for `click`, since the e2e tests use the CLI to
start Streamlit.

---

**Contribution License Agreement**

By submitting this pull request you agree that all contributions to this
project are made under the Apache 2.0 license.
zyxue pushed a commit to zyxue/streamlit that referenced this issue Apr 16, 2024
)

## Describe your changes

This is the final lazy-loading PR for now. It lazy-loads the following
modules:

- `unittest`
- `packaging`
- `streamlit.proto.openmetrics_data_model_pb2`

This PR also includes a couple of other refactorings related to typing
and imports.

## GitHub Issue Link (if applicable)

- Closes streamlit#6066

## Testing Plan

- Add lazy-loaded modules to e2e test.

---

**Contribution License Agreement**

By submitting this pull request you agree that all contributions to this
project are made under the Apache 2.0 license.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:performance type:enhancement Requests for feature enhancements or new features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants