Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexation is dammed slow with large data #12

Closed
jmrenouard opened this issue Feb 16, 2011 · 3 comments
Closed

Indexation is dammed slow with large data #12

jmrenouard opened this issue Feb 16, 2011 · 3 comments

Comments

@jmrenouard
Copy link

Hello,

We have face issues around the slow indexation process.

We have decided to rewrite 2 methods in FilesystemStore :
rebuildAllIndexes and rebuild(String name)

The indexation performance are now closed to solr/lucene.

I put the code on the issue sorry for no patch
the getDateTime method is just for having some timing information.

private String getDateTime() {
    DateFormat dateFormat = new SimpleDateFormat("yyyy/MM/dd HH:mm:ss");
    Date date = new Date();
    return dateFormat.format(date);
}
public void rebuildAllIndexes() throws Exception {
    Logger.info("Rebuild All indexes at "+getDateTime());
    stop();
    File fl = new File(DATA_PATH);
    FileUtils.deleteDirectory(fl);
    fl.mkdirs();
    List<ApplicationClass> classes = Play.classes
            .getAnnotatedClasses(Indexed.class);
    for (ApplicationClass applicationClass : classes) {
        rebuild(applicationClass.javaClass.getName());
    }

    Logger.info("Rebuild indexes finished at "+getDateTime());
}

public void rebuild(String name) {
    String id = UUID.randomUUID().toString();
    File oldFolder = new File(DATA_PATH, name);
    File newFolder = new File(DATA_PATH, name + id);
    Class cl = Play.classes.getApplicationClass(name).javaClass;
    List<JPABase> objects = JPA
            .em()
            .createQuery("select e from " + cl.getCanonicalName() + " as e")
            .getResultList();
    String index = cl.getName() + id;
    IndexWriter indexWriter = getIndexWriter(index);
    // FIXME ensure no other read/writes in here.
    try {
        for (JPABase jpaBase : objects) {
            Document document = ConvertionUtils.toDocument(jpaBase);
            if (document == null)
                return;
            indexWriter.addDocument(document);
        }

        getIndexWriter(index).flush();
        dirtyReader(index);
        getIndexSearcher(name).close();
        indexSearchers.remove(name);
        getIndexWriter(name).close();
        indexWriters.remove(name);
        FileUtils.deleteDirectory(oldFolder);
        newFolder.renameTo(oldFolder);
    } catch (IOException e) {
        throw new UnexpectedException(e);
    } catch (Exception e) {
        throw new UnexpectedException(e);
    }
}
@jfp
Copy link
Owner

jfp commented Feb 17, 2011

Hi,

Thank's for sharing the code, I'll take a look. (You were running with play.search.sync=false ?).

Regards

@jmrenouard
Copy link
Author

yes

play.search.sync=false;

for 45000 objects we need more than 5 minutes.
Now it takes 5 to 6 seconds.

10 seconds are needed with information printed about each document indexed.

Best Regards,
Jean-Marie Renouard

@jfp
Copy link
Owner

jfp commented Feb 18, 2011

Hi,

It's integrated in e69fd1e

Thanks again for sharing !

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants