Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating indexes is very slow #23

Closed
AdamGold opened this issue Jun 13, 2019 · 7 comments
Closed

Creating indexes is very slow #23

AdamGold opened this issue Jun 13, 2019 · 7 comments

Comments

@AdamGold
Copy link

AdamGold commented Jun 13, 2019

My tests are creating new DB rows and before I implemented the library, the tests took about 20 seconds.
When using the library they take about 10 minutes, probably because of the indexes creation. Is there a solution for this?

@AdamGold AdamGold changed the title Creating indexes is slow Creating indexes is very slow Jun 13, 2019
@honmaple
Copy link
Owner

Can you give more some details ? Creating indexs is not too slow unless there are a lot of data rows.
If there are a lot of data rows, you should increase yield_per,such as search.create_index(Model, yield_per=800) when first called.

@AdamGold
Copy link
Author

I am not calling create_index explicitly. The tests are setup with some rows (10-20 rows) and that's it. You could see that the indexes are constantly being updated throughout the test period by looking at the stderr - It shows <User ....> index update. When I change SQLALCHEMY_TRACK_MODIFICATIONS to False, the tests are no longer slow.

@honmaple
Copy link
Owner

Could you test it again with following code ? Or paste your test code that I want to know what happened.

import unittest
from tempfile import mkdtemp
from flask import Flask
from flask_sqlalchemy import SQLAlchemy
from flask_msearch import Search


class config:
    SQLALCHEMY_TRACK_MODIFICATIONS = True
    SQLALCHEMY_DATABASE_URI = 'sqlite://'
    DEBUG = True
    TESTING = True
    MSEARCH_INDEX_NAME = mkdtemp()
    MSEARCH_BACKEND = 'whoosh'
    MSEARCH_ENABLE = True


app = Flask(__name__)
app.config.from_object(config)
db = SQLAlchemy(app)
search = Search(app, db=db)


class User(db.Model):
    __searchable__ = ['username']

    id = db.Column(db.Integer, primary_key=True)
    username = db.Column(db.String(64))

    def __str__(self):
        return "User({0})".format(self.username)


class SearchTestBase(unittest.TestCase):
    def setUp(self):
        with app.test_request_context():
            db.create_all()

            for i in range(40):
                extra_user = User(username="user{0} name".format(i))
                db.session.add(extra_user)
                db.session.commit()

    def tearDown(self):
        with app.test_request_context():
            db.drop_all()
            db.metadata.clear()

    def test_user(self):
        with app.test_request_context():
            assert User.query.msearch("name", ["username"]).all()


if __name__ == '__main__':
    unittest.main()

@AdamGold
Copy link
Author

AdamGold commented Jun 16, 2019

Here's my config:

SQLALCHEMY_TRACK_MODIFICATIONS=True,
MSEARCH_INDEX_NAME="msearch",
MSEARCH_BACKEND="whoosh",
MSEARCH_ENABLE=True,
TESTING=True,
DATABASE=db_fd.name,
SQLALCHEMY_DATABASE_URI=f"sqlite:///{db_fd.name}",

I am using pytest, here's the db setup (after create_all):

    user = User.create_from_credentials(
        username="test",
        password="test",
        email="test@test.com",
        verified=True,
        verified_time=datetime.utcnow(),
    )
    get_db().session.add(user)
    extra_user = User.create_from_credentials(
        username="extra_test",
        password="test",
        email="b@test.com",
        verified=True,
        verified_time=datetime.utcnow(),
    )
    get_db().session.add(extra_user)
    get_db().session.commit()

user fixture:

@pytest.fixture
def user(app: flask.Flask):
    with app.app_context():
        yield User.query.filter_by(username="test").one()

User module:

class User(db_instance.Model, UserMixin):  # type: ignore
    """User table"""

    __table_args__ = (
        # named constraint for alembic
        UniqueConstraint("email", name="email_unique"),
    )
    __searchable__ = ["username", "name"]
    __msearch_termclass__ = CustomFuzzyTerm
    MAX_PASSWORD_LENGTH = 100
    MAX_USERNAME_LENGTH = 100
    id = Column(Internal_UID(), primary_key=True, autoincrement=True)
    username = Column(String(MAX_USERNAME_LENGTH), unique=True)
    email = Column(String(255), nullable=True)
    password = Column(String(MAX_PASSWORD_LENGTH))
    name = Column(String(consts.MAX_NAME_LENGTH))
    post_stats = relationship("PostStats", lazy=True, cascade="all")
    networks = relationship("Network", lazy=True, cascade="all")
    created_time = Column(DateTime, default=datetime.utcnow, nullable=False)
    oauth_tokens = relationship("OAuth", cascade="all")
    verified = Column(Boolean, default=False)
    verified_time = Column(DateTime, nullable=True)

The main thing that is causing the slow-down is the SQLALCHEMY_TRACK_MODIFICATIONS. When it's set to False, the tests run as usual. Flask-SQLAlchemy once warned that this property causes a significant overhead. Maybe I should set it to False only for tests?

@AdamGold
Copy link
Author

I have set SQLALCHEMY_TRACK_MODIFICATIONS to False, and called create_index manually when I setup the tests. Problem solved.

@honmaple
Copy link
Owner

Sorry, I haven't found the problem yet. Could you test it again with SQLALCHEMY_TRACK_MODIFICATIONS = True and MSEARCH_ENABLE = False?

@honmaple
Copy link
Owner

honmaple commented Jun 23, 2019

If the problem persists, you registered some extra signals that time-consuming.

If there is no problem, please test _index_signal func with cProfile

import cProfile
import pstats
from io import StringIO
from flask_msearch.whoosh_backend import WhooshSearch
from time import sleep


def profile(func):
    def _profile(*args, **kwargs):
        pr = cProfile.Profile()
        pr.enable()
        ret = func(*args, **kwargs)
        pr.disable()
        s = StringIO()
        sortby = 'cumulative'
        ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
        ps.print_stats(10)
        print(s.getvalue())
        return ret

    return _profile


class CustomSearchBackend(WhooshSearch):
    @profile
    def _index_signal(self, sender, changes):
        return super(CustomSearchBackend, self)._index_signal(sender, changes)

search = CustomSearchBackend(app, db=db)

@honmaple honmaple closed this as completed Mar 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants