Stores occurrences of methods in Lucene #19

Merged
merged 1 commit into from Mar 25, 2013

Projects

None yet

4 participants

@mads-hartmann

The plugin now keeps an Index (using Lucene) of all the places
methods are mentioned in the code-base. A separate Lucene index
is created on disk for each project.

The initial indices will be created upon start-up if no index
exists already. It will index the entire workspace and traverse
the parse-tree of all Scala sources.

For each project in the workspace we create a background job that
continuously update the index and re-index files that have changed.

There are two separate tests: OccurrenceCollectorTest contains test
that make sure we record the correct information from the parse trees
and IndexTest that makes sure the occurrence information can be stored
and retrieved from Lucene.

Fix #1001602

@mads-hartmann

Seems to be a problem with the validator, no the source code :)

@dotta
Eclipse Scala IDE member

Seems to be a problem with the validator, no the source code :)

@mads379 I'll check the Jenkins Job used by the validator, but it's good we have it hooked. Thanks @jsuereth!

@dotta dotta and 1 other commented on an outdated diff Mar 21, 2013
...g/scala/tools/eclipse/search/FileChangeObserver.scala
+import org.eclipse.core.resources.IResourceChangeEvent
+import org.eclipse.core.resources.IResourceChangeListener
+import org.eclipse.core.resources.IResourceDelta
+import org.eclipse.core.resources.IResourceDeltaVisitor
+import org.eclipse.core.resources.ResourcesPlugin
+
+/**
+ * Convenient way to react to changes that happen to
+ * files in eclipse projects in the workspace.
+ */
+class FileChangeObserver(
+ project: IProject,
+ onFileChange: IFile => Unit) extends HasLogger {
+
+ ResourcesPlugin.getWorkspace().addResourceChangeListener(
+ new FileChangeObserver.ChangeListener(project, onFileChange),
@dotta
dotta Mar 21, 2013

What happens if the project is deleted or closed?

@dotta
dotta Mar 21, 2013

Also, I don't see how this class is useful. At the moment, you could inline this code in the apply method of the FileChangeObserver companion object.

@mads-hartmann
mads-hartmann Mar 21, 2013

Oh. Yeah, that would be much better! :)

@dotta dotta commented on an outdated diff Mar 21, 2013
...g/scala/tools/eclipse/search/FileChangeObserver.scala
+ new FileChangeObserver(p,onFileChange)
+ }
+
+ // Hiding the Eclipse implementation here as we don't want the methods that
+ // Eclipse needs to leak into the interface of FileChangeObserver.
+ private class ChangeListener(
+ project: IProject,
+ onFileChange: IFile => Unit) extends IResourceChangeListener with HasLogger {
+
+ override def resourceChanged(event: IResourceChangeEvent): Unit = {
+ if (event == null || event.getDelta() == null)
+ return
+
+ val delta = event.getDelta()
+ delta.accept(new IResourceDeltaVisitor {
+ def visit(delta: IResourceDelta): Boolean = {
@dotta
dotta Mar 21, 2013

Minor point, but you could create an implicit for IResourceDelta => IResourceDeltaVisitor (it's a lot of - Java crap - ceremony for setting up a listener :))

@dotta dotta commented on an outdated diff Mar 21, 2013
...g/scala/tools/eclipse/search/FileChangeObserver.scala
+ // Eclipse needs to leak into the interface of FileChangeObserver.
+ private class ChangeListener(
+ project: IProject,
+ onFileChange: IFile => Unit) extends IResourceChangeListener with HasLogger {
+
+ override def resourceChanged(event: IResourceChangeEvent): Unit = {
+ if (event == null || event.getDelta() == null)
+ return
+
+ val delta = event.getDelta()
+ delta.accept(new IResourceDeltaVisitor {
+ def visit(delta: IResourceDelta): Boolean = {
+ val resource = delta.getResource
+ resource.getType match {
+ case IResource.FILE => {
+ val file = resource.asInstanceOf[IFile]
@dotta
dotta Mar 21, 2013

Casting hurts my brain :)

My suggestion is

if (resource.isInstanceOf[IFile]) onFileChange(resource.asInstanceOf[IFile]
else eclipseLog.error(s"Expected resource to be of type IFile, found ${resource.getClass}")

And you could go one step further and generalize this into an utilitary method that can be used to safely cast.

(The above code might very well not compile, but I think you'll get the idea)

@dotta dotta commented on an outdated diff Mar 21, 2013
...src/org/scala/tools/eclipse/search/SearchPlugin.scala
+import org.eclipse.core.resources.ResourcesPlugin
+import org.eclipse.core.runtime.jobs.Job
+import org.eclipse.ui.plugin.AbstractUIPlugin
+import org.osgi.framework.BundleContext
+import org.scala.tools.eclipse.search.indexing.Index
+import org.scala.tools.eclipse.search.indexing.SourceIndexer
+import org.scala.tools.eclipse.search.jobs.ProjectIndexJob
+import scala.collection.concurrent.TrieMap
+import scala.collection.concurrent.Map
+import org.eclipse.core.resources.IProject
+import scala.tools.eclipse.ScalaProject
+import scala.tools.eclipse.ScalaPlugin
+
+object SearchPlugin extends HasLogger {
+
+ val PLUGIN_ID = "org.scala.tools.eclipse.search"
@dotta
dotta Mar 21, 2013

This is a constant, hence you should mark it as final (check the Scala spec, chapter 4.1 if I'm not mistaken :))

@dotta dotta commented on an outdated diff Mar 21, 2013
...g/scala/tools/eclipse/search/FileChangeObserver.scala
+ private class ChangeListener(
+ project: IProject,
+ onFileChange: IFile => Unit) extends IResourceChangeListener with HasLogger {
+
+ override def resourceChanged(event: IResourceChangeEvent): Unit = {
+ if (event == null || event.getDelta() == null)
+ return
+
+ val delta = event.getDelta()
+ delta.accept(new IResourceDeltaVisitor {
+ def visit(delta: IResourceDelta): Boolean = {
+ val resource = delta.getResource
+ resource.getType match {
+ case IResource.FILE => {
+ val file = resource.asInstanceOf[IFile]
+ onFileChange(file)
@dotta
dotta Mar 21, 2013

Shouldn't you also track if the resource was changed/added/removed? I'd expect the action to be triggered in reaction to a file addition to be completely different from the one executed on file deletion.

@dotta dotta commented on an outdated diff Mar 21, 2013
...g/scala/tools/eclipse/search/FileChangeObserver.scala
+
+object FileChangeObserver {
+
+ def apply(p: IProject)(onFileChange: IFile => Unit): Unit = {
+ new FileChangeObserver(p,onFileChange)
+ }
+
+ // Hiding the Eclipse implementation here as we don't want the methods that
+ // Eclipse needs to leak into the interface of FileChangeObserver.
+ private class ChangeListener(
+ project: IProject,
+ onFileChange: IFile => Unit) extends IResourceChangeListener with HasLogger {
+
+ override def resourceChanged(event: IResourceChangeEvent): Unit = {
+ if (event == null || event.getDelta() == null)
+ return
@dotta
dotta Mar 21, 2013

This is very Java-ish :)

Why not using a for-comprehension?

for {
  ev <- Option(event)
  delta <- Option(ev.getDelta)
} {
  // your code ... 
}
@dotta dotta commented on an outdated diff Mar 21, 2013
...src/org/scala/tools/eclipse/search/SearchPlugin.scala
+import org.eclipse.core.resources.IProject
+import scala.tools.eclipse.ScalaProject
+import scala.tools.eclipse.ScalaPlugin
+
+object SearchPlugin extends HasLogger {
+
+ val PLUGIN_ID = "org.scala.tools.eclipse.search"
+
+ @volatile var plugin: SearchPlugin = _
+
+ /**
+ * Checks if the `file` is a type we know how to index.
+ */
+ def isIndexable(file: IFile): Boolean = {
+ // TODO: At some point we want to make the acceptable files extensible.
+ // such that frameworks such as play can have their template files indexed.
@dotta
dotta Mar 21, 2013

Good point. Please, file a ticket about this.

@dotta dotta and 1 other commented on an outdated diff Mar 21, 2013
...src/org/scala/tools/eclipse/search/SearchPlugin.scala
+import scala.tools.eclipse.ScalaProject
+import scala.tools.eclipse.ScalaPlugin
+
+object SearchPlugin extends HasLogger {
+
+ val PLUGIN_ID = "org.scala.tools.eclipse.search"
+
+ @volatile var plugin: SearchPlugin = _
+
+ /**
+ * Checks if the `file` is a type we know how to index.
+ */
+ def isIndexable(file: IFile): Boolean = {
+ // TODO: At some point we want to make the acceptable files extensible.
+ // such that frameworks such as play can have their template files indexed.
+ file.getFileExtension() == "scala"
@dotta
dotta Mar 21, 2013

Can the file become suddendly null? If yes, you need to protect yourself against this.

@mads-hartmann
mads-hartmann Mar 21, 2013

Heh, the IFile documentation doesn't say anything about it. Maybe add a null check just to be sure?

@dotta
dotta Mar 21, 2013

Yep. Since this method can be called by anyone, I think it's good to be defensive.

@dotta dotta commented on an outdated diff Mar 21, 2013
...src/org/scala/tools/eclipse/search/SearchPlugin.scala
+
+ @volatile var plugin: SearchPlugin = _
+
+ /**
+ * Checks if the `file` is a type we know how to index.
+ */
+ def isIndexable(file: IFile): Boolean = {
+ // TODO: At some point we want to make the acceptable files extensible.
+ // such that frameworks such as play can have their template files indexed.
+ file.getFileExtension() == "scala"
+ }
+}
+
+class SearchPlugin extends AbstractUIPlugin with HasLogger {
+
+ private val INDEX_DIR_NAME = "lucene-indices"
@dotta
dotta Mar 21, 2013

This is a constant, declare it final

@dotta dotta commented on an outdated diff Mar 21, 2013
....search/src/org/scala/tools/eclipse/search/Util.scala
+import scala.tools.eclipse.javaelements.ScalaSourceFile
+import org.eclipse.core.resources.IFile
+import org.eclipse.core.resources.ResourcesPlugin
+import org.eclipse.core.runtime.Path
+
+/**
+ * Object that contains various utility methods
+ */
+object Util extends HasLogger {
+
+ /**
+ * Will evaluate f and log the message `msg` together with the
+ * number of seconds it took to evaluate `f`. It returns the value
+ * computed by `f`
+ */
+ def timed[A](msg: String)(f: => A) = {
@dotta dotta and 1 other commented on an outdated diff Mar 21, 2013
....search/src/org/scala/tools/eclipse/search/Util.scala
+ */
+object Util extends HasLogger {
+
+ /**
+ * Will evaluate f and log the message `msg` together with the
+ * number of seconds it took to evaluate `f`. It returns the value
+ * computed by `f`
+ */
+ def timed[A](msg: String)(f: => A) = {
+ val now = System.currentTimeMillis()
+ f
+ val elapsed = ((System.currentTimeMillis() - now) * 0.001)
+ logger.debug("%s took %s seconds".format(msg, elapsed.toString))
+ }
+
+ def scalaSourceFileFromIFile(file: IFile): Option[ScalaSourceFile] = {
@dotta
dotta Mar 21, 2013

I think it would be better to have this method directly in ScalaSourceFile (@dragos, WDYT?)

@dotta
dotta Mar 21, 2013

Actually, I take it back. We need to keep it here or the plug-in won't be usable for people that are using the IDE 3.0.0.

@dragos
dragos Mar 21, 2013

Doesn't it make sense to use scalaProject.withSourceFile instead? You can get a compilation unit from a path, see TestProjectSetup.

@dotta dotta commented on an outdated diff Mar 21, 2013
...src/org/scala/tools/eclipse/search/SearchPlugin.scala
+
+ private val INDEX_DIR_NAME = "lucene-indices"
+
+ private var indexLocation: File = _
+ private var index: Index = _
+ private var sourceIndexer: SourceIndexer = _
+
+ override def start(context: BundleContext) {
+ SearchPlugin.plugin = this
+ super.start(context)
+
+ indexLocation = getStateLocation().append(INDEX_DIR_NAME).toFile()
+ index = new Index(indexLocation)
+ sourceIndexer = new SourceIndexer(index)
+
+ ResourcesPlugin.getWorkspace().getRoot().getProjects().foreach( createIndexJob )
@dotta
dotta Mar 21, 2013
  • ResourcesPlugin.getWorkspace() can throw a IllegalStateException, which I think you should catch and maybe print a debug message.
  • Can the projects returned by getProjects contain null values?
@dotta dotta and 1 other commented on an outdated diff Mar 21, 2013
...src/org/scala/tools/eclipse/search/SearchPlugin.scala
+ * Checks if the `file` is a type we know how to index.
+ */
+ def isIndexable(file: IFile): Boolean = {
+ // TODO: At some point we want to make the acceptable files extensible.
+ // such that frameworks such as play can have their template files indexed.
+ file.getFileExtension() == "scala"
+ }
+}
+
+class SearchPlugin extends AbstractUIPlugin with HasLogger {
+
+ private val INDEX_DIR_NAME = "lucene-indices"
+
+ private var indexLocation: File = _
+ private var index: Index = _
+ private var sourceIndexer: SourceIndexer = _
@dotta
dotta Mar 21, 2013

Are you sure that the above three fields don't need a synchronization policy? Basically, are you expecting that the thread running the start method is the same as the one calling createIndexJob and indexLocationForProject? I think not, hence you need a synchronization policy (and you need SearchPlugin to be thread-safe, as its instance it is going to be accessed concurrently by different thread ).

When you reason about this, think if you only need to protect yourself against visibility issues, or you need a stronger synchronization policy (before getting to the code, would be good if you share your thoughts ;-))

@mads-hartmann
mads-hartmann Mar 21, 2013

Ah I see. I'm not used to think about visibility issues! :)

Making them volatile would take of the visibility issues right? But I might want to fix it in another way. These three instance should really be vals as we don't expect them to change, but currently we can't make them vals as the start methods needs to be invoked before getStateLocation() can be used. So we could do something like this

@volatile var config = _
// later
config = {
  val indexLocation = ...
  val index = ...
  val indexer = ..
}

Various components depend on the index, indexer etc. and currently we deal with this by passing the instances along as constructor parameters. I feel like there must be a better way

@dotta
dotta Mar 21, 2013

Making them volatile would take of the visibility issues right?

Yes, that's exactly it!

Then, I don't exactly follow on the config bussiness. But I believe you understood the problem, so I'll just wait to see the new commit :)

@dotta dotta commented on an outdated diff Mar 21, 2013
...src/org/scala/tools/eclipse/search/SearchPlugin.scala
+ indexLocation = getStateLocation().append(INDEX_DIR_NAME).toFile()
+ index = new Index(indexLocation)
+ sourceIndexer = new SourceIndexer(index)
+
+ ResourcesPlugin.getWorkspace().getRoot().getProjects().foreach( createIndexJob )
+ }
+
+ override def stop(context: BundleContext) {
+ super.stop(context)
+ SearchPlugin.plugin = null
+ }
+
+ /**
+ * Creates an indexing job for the specific ScalaProject.
+ */
+ def createIndexJob(project: IProject): Unit = {
@dotta
dotta Mar 21, 2013

Looks like this can be private. Also, why not moving this in ProjectIndexJob.apply. To me it looks like this method "describes" how to create instances of ProjectIndexJob, i.e., it's a factory for ProjectIndexJob, so it makes a lot of sense to have this code in the companion object of ProjectIndexJob. WDYT?

If you like the idea, I would also make the constructor of class ProjectIndexJob private. The reason is that you are making explicit (in the code) that instances of ProjectIndexJob can only be created via the factory method in the companion object. This means that if tomorrow you need to change something in the way ProjectIndexJob instances are created, you can do so confidently because there is only one place where instance of ProjectIndexJob are created. Locality FTW :)

@dotta dotta and 1 other commented on an outdated diff Mar 21, 2013
...src/org/scala/tools/eclipse/search/SearchPlugin.scala
+ }
+
+ override def stop(context: BundleContext) {
+ super.stop(context)
+ SearchPlugin.plugin = null
+ }
+
+ /**
+ * Creates an indexing job for the specific ScalaProject.
+ */
+ def createIndexJob(project: IProject): Unit = {
+ ScalaPlugin.plugin.asScalaProject(project).foreach { sp =>
+ val job = new ProjectIndexJob(sourceIndexer, sp, interval = 5000)
+ job.schedule()
+ job.setPriority(Job.LONG)
+ logger.debug("Started job for " + project.getName)
@dotta
dotta Mar 21, 2013
  • This line should go in ProjectIndexJob.runInWorkspace.
  • Better to say what kind of job you started, i.e., "Started indexing job ..."
@mads-hartmann
mads-hartmann Mar 21, 2013

If we put this inside of ProjectIndexJob.runInWorkspace it will be printed every interval as the job schedules itself for execution :)

@dotta
dotta Mar 21, 2013

Right you are! :)

@dotta dotta commented on an outdated diff Mar 21, 2013
...src/org/scala/tools/eclipse/search/SearchPlugin.scala
+
+ ResourcesPlugin.getWorkspace().getRoot().getProjects().foreach( createIndexJob )
+ }
+
+ override def stop(context: BundleContext) {
+ super.stop(context)
+ SearchPlugin.plugin = null
+ }
+
+ /**
+ * Creates an indexing job for the specific ScalaProject.
+ */
+ def createIndexJob(project: IProject): Unit = {
+ ScalaPlugin.plugin.asScalaProject(project).foreach { sp =>
+ val job = new ProjectIndexJob(sourceIndexer, sp, interval = 5000)
+ job.schedule()
@dotta
dotta Mar 21, 2013

The call to schedule should happen after you have called setPriority, and any other additional property you may need to set (oherwise I think there is a chance for a race condition).

@dragos dragos commented on an outdated diff Mar 21, 2013
....search/src/org/scala/tools/eclipse/search/Util.scala
+ def timed[A](msg: String)(f: => A) = {
+ val now = System.currentTimeMillis()
+ f
+ val elapsed = ((System.currentTimeMillis() - now) * 0.001)
+ logger.debug("%s took %s seconds".format(msg, elapsed.toString))
+ }
+
+ def scalaSourceFileFromIFile(file: IFile): Option[ScalaSourceFile] = {
+ val path = file.getFullPath().toOSString()
+ ScalaSourceFile.createFromPath(path)
+ }
+
+ def iFileFromAbsolutePath(path: String): Option[IFile] = {
+ val workspace = ResourcesPlugin.getWorkspace()
+ val location= Path.fromOSString(path)
+ Option(workspace.getRoot().getFileForLocation(location))
@dragos
dragos Mar 21, 2013

This worries me. If possible, avoid absolute paths. They are evil. Try to keep everything relative to the workspace. The problem is that an OS file can be mapped to more than one file in the workspace (quite often if you have linked resources and neste projects, Maven does that). getFileForLocation docs says:

     Warning: This method ignores linked resources and their children.  Since
 linked resources may overlap other resources, a unique mapping from a
 file system location to a single resource is not guaranteed.  To find all 
 resources for a given location, including linked resources, use the method
@dotta dotta and 1 other commented on an outdated diff Mar 21, 2013
...src/org/scala/tools/eclipse/search/SearchPlugin.scala
+ ResourcesPlugin.getWorkspace().getRoot().getProjects().foreach( createIndexJob )
+ }
+
+ override def stop(context: BundleContext) {
+ super.stop(context)
+ SearchPlugin.plugin = null
+ }
+
+ /**
+ * Creates an indexing job for the specific ScalaProject.
+ */
+ def createIndexJob(project: IProject): Unit = {
+ ScalaPlugin.plugin.asScalaProject(project).foreach { sp =>
+ val job = new ProjectIndexJob(sourceIndexer, sp, interval = 5000)
+ job.schedule()
+ job.setPriority(Job.LONG)
@dotta
dotta Mar 21, 2013

What happens if the project gets deleted/closed while this job is running. Is your code robust against it.

The solution to the above is usually either having a try..catch, or calling setRule on the job (similar to what we do here https://github.com/scala-ide/scala-ide/blob/master/org.scala-ide.sdt.core/src/scala/tools/eclipse/buildmanager/ProjectsCleanJob.scala#L37 - you could use a modifyRule). At the moment I'm not sure which one is best. I'm wondering if it makes sense to prevent the user to delete/close a project because we are indexing it. On the other hand, setting a rule makes the code robust and it's clean. Considering indexing takes not much time, my current suggestion would be to try for setting a rule.

@mads-hartmann
mads-hartmann Mar 21, 2013

The jobs itself currently uses openAndExists whenever it is about to index a file and before it re-schedules itself. Are rules a better way to deal with this?

@dotta
dotta Mar 21, 2013

Rules actually lock the resource and disallow certain operations. openAndExists is really useful, but it doesn't guarantee you that a project doesn't get deleted/closed right after the condition is checked.

@dotta dotta and 1 other commented on an outdated diff Mar 21, 2013
...c/org/scala/tools/eclipse/search/indexing/Index.scala
+import org.eclipse.core.resources.IProject
+import org.scala.tools.eclipse.search.SearchPlugin
+
+/**
+ * A Lucene based index of all occurrences of Scala entities recorded
+ * in the workspace. See `toDocument` for more information about what
+ * is stored in the index.
+ *
+ * A separate Lucene index is stored on disk for each project in the
+ * Workspace.
+ *
+ * This class is thread-safe.
+ */
+class Index(indicesRoot: File) extends HasLogger {
+
+ private val analyzer = new SimpleAnalyzer(Version.LUCENE_41)
@dotta
dotta Mar 21, 2013

Mads, I couldn't find any information on the thread policy of SimpleAnalyzer. Could you point me to it.

Also, Lucene 4.2 has been recently released, is there a Version.LUCENE_42? Should we use it?

@dotta
dotta Mar 21, 2013

What I'm finding is that IndexWriter is thread-safe (http://lucene.apache.org/core/4_2_0/core/org/apache/lucene/index/IndexWriter.html), but nothing it's said about thread-safety of SimpleAnalyzer and IndexWriterConfig.

In case SimpleAnalyzer and IndexWriterConfig are indeed not thread-safe, my suggestion would be to turn config into a private method, i.e.,

private def config = {
  val analyzer = new SimpleAnalyzer(Version.LUCENE_41) 
  new IndexWriterConfig(Version.LUCENE_41, analyzer)
} 

This way you don't need to worry about visibility issues, since a fresh instance is created every time. However, here I'm assuming that creating fresh SimpleAnalyzer and IndexWriterConfig instances is a chep operation, I might be (very) wrong.

@mads-hartmann
mads-hartmann Mar 22, 2013

Created a private def for the config. If it's slow we can always make it faster later :) I added a ticket for upgrading the Lucene version

@dotta dotta and 1 other commented on an outdated diff Mar 21, 2013
...g/scala/tools/eclipse/search/FileChangeObserver.scala
+ }
+
+ // Hiding the Eclipse implementation here as we don't want the methods that
+ // Eclipse needs to leak into the interface of FileChangeObserver.
+ private class ChangeListener(
+ project: IProject,
+ onChanged: IFile => Unit = _ => (),
+ onRemoved: IFile => Unit = _ => (),
+ onAdded: IFile => Unit = _ => ()) extends IResourceChangeListener with HasLogger {
+
+ import EclipseImplicits._
+
+ override def resourceChanged(event: IResourceChangeEvent): Unit = {
+ for {
+ ev <- Option(event)
+ delta <- Option(event.getDelta())
@dotta
dotta Mar 21, 2013

s/event/ev ... or you may get a NPE :)

@dotta dotta commented on an outdated diff Mar 21, 2013
...src/org/scala/tools/eclipse/search/SearchPlugin.scala
+ /**
+ * Checks if the `file` is a type we know how to index.
+ */
+ def isIndexable(file: IFile): Boolean = {
+ // TODO: https://scala-ide-portfolio.assembla.com/spaces/scala-ide/tickets/1001602
+ file.getFileExtension() == "scala"
+ }
+}
+
+class SearchPlugin extends AbstractUIPlugin with HasLogger {
+
+ private final val INDEX_DIR_NAME = "lucene-indices"
+
+ private var indexLocation: File = _
+ private var index: Index = _
+ private var sourceIndexer: SourceIndexer = _
@dotta dotta commented on an outdated diff Mar 21, 2013
...src/org/scala/tools/eclipse/search/SearchPlugin.scala
+
+ private final val INDEX_DIR_NAME = "lucene-indices"
+
+ private var indexLocation: File = _
+ private var index: Index = _
+ private var sourceIndexer: SourceIndexer = _
+
+ override def start(context: BundleContext) {
+ SearchPlugin.plugin = this
+ super.start(context)
+
+ indexLocation = getStateLocation().append(INDEX_DIR_NAME).toFile()
+ index = new Index(indexLocation)
+ sourceIndexer = new SourceIndexer(index)
+
+ ResourcesPlugin.getWorkspace().getRoot().getProjects().foreach { proj =>
@dotta dotta commented on an outdated diff Mar 21, 2013
...src/org/scala/tools/eclipse/search/SearchPlugin.scala
+import org.eclipse.core.resources.IProject
+import scala.tools.eclipse.ScalaProject
+import scala.tools.eclipse.ScalaPlugin
+
+object SearchPlugin extends HasLogger {
+
+ final val PLUGIN_ID = "org.scala.tools.eclipse.search"
+
+ @volatile var plugin: SearchPlugin = _
+
+ /**
+ * Checks if the `file` is a type we know how to index.
+ */
+ def isIndexable(file: IFile): Boolean = {
+ // TODO: https://scala-ide-portfolio.assembla.com/spaces/scala-ide/tickets/1001602
+ file.getFileExtension() == "scala"
@dotta dotta commented on an outdated diff Mar 21, 2013
...c/org/scala/tools/eclipse/search/indexing/Index.scala
+ val hits = searcher.search(query, MAX_POTENTIAL_MATCHES).scoreDocs
+ hits.map { hit =>
+ val doc = searcher.doc(hit.doc)
+ fromDocument(doc)
+ }
+ }
+ }
+
+ /**
+ * ARM method for writing to the index.
+ */
+ private def doWithWriter(project: IProject)(f: IndexWriter => Unit): Unit = {
+ val dir = SearchPlugin.plugin.indexLocationForProject(project)
+ val writer = new IndexWriter(FSDirectory.open(dir), config)
+ f(writer)
+ writer.close()
@dotta
dotta Mar 21, 2013

I believe this should be

try f(writer)
finally writer.close()
@dotta dotta and 1 other commented on an outdated diff Mar 21, 2013
...scala/tools/eclipse/search/jobs/ProjectIndexJob.scala
+ } else {
+ observer.stop
+ }
+ Status.OK_STATUS
+ }
+}
+
+object ProjectIndexJob extends HasLogger {
+
+ def apply(indexer: SourceIndexer, sp: ScalaProject): ProjectIndexJob = {
+
+ logger.debug("Started ProjectIndexJob for " + sp.underlying.getName)
+
+ val job = new ProjectIndexJob(indexer, sp, interval = 5000)
+ job.setPriority(Job.LONG)
+ job.setRule(sp.underlying)
@dotta
dotta Mar 21, 2013

To what rule is sp.underlying mapped to?

@mads-hartmann
mads-hartmann Mar 21, 2013

It is my understanding (might very well be wrong) that as long as the job is running no other files can obtain a rule for the IProject. See example 1 on this page http://www.eclipse.org/articles/Article-Concurrency/jobs-api.html

@dotta
dotta Mar 21, 2013

That depends on the Rule. There are rules that do not conflict, and hence two Jobs with non-conflicting rules could be running in parallel.

The following comes from the JavaDoc in ISchedulingRule

/**
 * Returns whether this scheduling rule is compatible with another scheduling rule.
 * If <code>true</code> is returned, then no job with this rule will be run at the 
 * same time as a job with the conflicting rule.  If <code>false</code> is returned, 
 * then the job manager is free to run jobs with these rules at the same time.
 * <p>
 * Implementations of this method must be reflexive, symmetric, and consistent,
 * and must return <code>false</code> when compared  to a rule they know 
 * nothing about.
 * <p>
 * This method must return true if calling {@link #contains(ISchedulingRule)} on
 * the same rule also returns true. This is required because it would otherwise
 * allow two threads to be running concurrently with the same rule.
 *
 * @param rule the rule to check for conflicts
 * @return <code>true</code> if the rule is conflicting, and <code>false</code>
 *  otherwise.
 */
public boolean isConflicting(ISchedulingRule rule);
@dotta
dotta Mar 21, 2013

That's why I suggested to use the modifyRule. But I'm still wondering to what Rule is sp.underlying mapped to. :)

@mads-hartmann
mads-hartmann Mar 21, 2013

sp.underlying is the IProject that is being index. IProject implements ISchedulingRule with exclusive write access. According to http://www.eclipse.org/articles/Article-Concurrency/jobs-api.html but I can't find the source of IResource so I can check :)

@mads-hartmann
mads-hartmann Mar 21, 2013

Ah, later it states "clients should not make any assumptions about what rules are used for what operations; instead they should always query the rule factory". I'll create a specific rule then ;)

@dotta
dotta Mar 21, 2013

Cool. Thanks for digging into this. The Job API in Eclipse is a foundamental building block, so it's really good you have read about it.

By the way, I'm not sure what you mean by I'll create a new rule. Is ResourcesPlugin.getWorkspace().getRuleFactory().modifyRule not good enough for some reason?

@mads-hartmann
mads-hartmann Mar 22, 2013

Oh yeah, modifyRule should be just the thing we need :) Just sent a new bunch of commits where the scheduling is done explicitly usign a modify rule on the IProject

@dotta dotta commented on an outdated diff Mar 21, 2013
...c/org/scala/tools/eclipse/search/indexing/Index.scala
+import org.apache.lucene.document.Field
+import org.scala.tools.eclipse.search.Util
+import org.eclipse.core.resources.IProject
+import org.scala.tools.eclipse.search.SearchPlugin
+
+/**
+ * A Lucene based index of all occurrences of Scala entities recorded
+ * in the workspace. See `toDocument` for more information about what
+ * is stored in the index.
+ *
+ * A separate Lucene index is stored on disk for each project in the
+ * Workspace.
+ *
+ * This class is thread-safe.
+ */
+class Index(indicesRoot: File) extends HasLogger {
@dotta
dotta Mar 21, 2013

This class assumes that the resources passed to the different methods exist and, in the case of a project, it's open. I think it's good to mention it in the documentation.

@dotta dotta commented on the diff Mar 21, 2013
...c/org/scala/tools/eclipse/search/indexing/Index.scala
+ *
+ * This class is thread-safe.
+ */
+class Index(indicesRoot: File) extends HasLogger {
+
+ private val analyzer = new SimpleAnalyzer(Version.LUCENE_41)
+ private val config = new IndexWriterConfig(Version.LUCENE_41, analyzer)
+ private val MAX_POTENTIAL_MATCHES = 100000
+
+ /**
+ * Add all `occurrences` to the index of the specific project.
+ */
+ def addOccurrences(occurrences: Seq[Occurrence], project: IProject): Unit = {
+ doWithWriter(project) { writer =>
+ val docs = occurrences.map( toDocument )
+ writer.addDocuments(docs.toIterable.asJava)
@dotta
dotta Mar 21, 2013

This methods can throw 2 kind of exception

  • CorruptIndexException - if the index is corrupt
  • IOException - if there is a low-level IO error

Should we handle them?

@mads-hartmann
mads-hartmann Mar 22, 2013

Should be fixed now as we use Try and pattern match on it.

@dotta
dotta Mar 24, 2013

I don't think it's enough to log the exception. For instance, if the index is corrupted, it probably doesn't make much sense to keep using it. Maybe we should create a new one. Or maybe there are better alternatives. I think we need to read up on the Lucene JavaDoc. Anyway, as I said in this comment, maybe we should postpone this for a new PR.

@dotta dotta commented on the diff Mar 21, 2013
...c/org/scala/tools/eclipse/search/indexing/Index.scala
+ */
+ def addOccurrences(occurrences: Seq[Occurrence], project: IProject): Unit = {
+ doWithWriter(project) { writer =>
+ val docs = occurrences.map( toDocument )
+ writer.addDocuments(docs.toIterable.asJava)
+ }
+ }
+
+ /**
+ * Removed all occurrences from the index that are recorded in the
+ * given file
+ */
+ def removeOccurrencesFromFile(file: File, project: IProject): Unit = {
+ doWithWriter(project) { writer =>
+ val query = new TermQuery(TermQueries.fileTerm(file))
+ writer.deleteDocuments(query)
@dotta
dotta Mar 21, 2013

Same comment as above

@mads-hartmann
mads-hartmann Mar 22, 2013

Should be fixed now as we use Try and pattern match on it.

@dotta dotta commented on an outdated diff Mar 21, 2013
...c/org/scala/tools/eclipse/search/indexing/Index.scala
+ val dir = SearchPlugin.plugin.indexLocationForProject(project)
+ val writer = new IndexWriter(FSDirectory.open(dir), config)
+ f(writer)
+ writer.close()
+ }
+
+ /**
+ * ARM method for searching the index.
+ */
+ private def withSearcher[A](project: IProject)(f: IndexSearcher => A): A = {
+ val dir = SearchPlugin.plugin.indexLocationForProject(project)
+ val reader = DirectoryReader.open(FSDirectory.open(dir))
+ val searcher = new IndexSearcher(reader)
+ val r = f(searcher)
+ reader.close()
+ r
@dotta
dotta Mar 21, 2013

I think this should be

try f(searcher)
finally reader.close()
@dotta dotta commented on an outdated diff Mar 21, 2013
...c/org/scala/tools/eclipse/search/indexing/Index.scala
+ /**
+ * ARM method for writing to the index.
+ */
+ private def doWithWriter(project: IProject)(f: IndexWriter => Unit): Unit = {
+ val dir = SearchPlugin.plugin.indexLocationForProject(project)
+ val writer = new IndexWriter(FSDirectory.open(dir), config)
+ f(writer)
+ writer.close()
+ }
+
+ /**
+ * ARM method for searching the index.
+ */
+ private def withSearcher[A](project: IProject)(f: IndexSearcher => A): A = {
+ val dir = SearchPlugin.plugin.indexLocationForProject(project)
+ val reader = DirectoryReader.open(FSDirectory.open(dir))
@dotta
dotta Mar 21, 2013

This can throw IOException. It's fine if you can't recover from that, but we should fail gracefully and notify the user.

@dotta dotta commented on an outdated diff Mar 21, 2013
...c/org/scala/tools/eclipse/search/indexing/Index.scala
+ */
+ def fromDocument(doc: Document): Occurrence = {
+ import LuceneFields._
+ (for {
+ word <- Option(doc.get(WORD))
+ path <- Option(doc.get(FILE))
+ offset <- Option(doc.get(OFFSET))
+ occurrenceKind <- Option(doc.get(OCCURRENCE_KIND))
+ } yield {
+ // We're storing the (real) absolute path in Lucene but we need an
+ // Eclipse full path to create a ScalaSource file.
+ val ifile = Util.iFileFromAbsolutePath(path) getOrElse (throw new Exception("Couldn't get ifile from path " + path))
+ val file = Util.scalaSourceFileFromIFile(ifile).getOrElse (throw new Exception("Wasn't able to create ScalaSourceFile from path " + path))
+ Occurrence(word, file, Integer.parseInt(offset), OccurrenceKind.fromString(occurrenceKind))
+ }) getOrElse {
+ throw new Exception("Wasn't able to convert document to occurrence")
@dotta
dotta Mar 21, 2013

I think it would be good to have a ScalaSearchException for generic errors, and never use Exception. Also, we should eventually catch this exception somewhere up in the stack and fail gracefully by informing the user about the problem.

@dotta dotta commented on the diff Mar 21, 2013
...ols/eclipse/search/indexing/OccurrenceCollector.scala
+
+import scala.tools.eclipse.ScalaPresentationCompiler
+import scala.tools.eclipse.javaelements.ScalaSourceFile
+import scala.tools.eclipse.logging.HasLogger
+
+/**
+ * Used to parse and traverse the parse tree of a compilation unit finding
+ * all the occurrence of Scala entities we're interested in.
+ */
+object OccurrenceCollector extends HasLogger {
+
+ /**
+ * Find all occurrences of words we're find interesting in a compilation unit. It
+ * will return Left if it wasn't able to access the source file.
+ */
+ def findOccurrences(file: ScalaSourceFile): Either[String, Seq[Occurrence]] = {
@dotta
dotta Mar 21, 2013

I wonder if Try wouldn't be better than Either here. You don't have to change this, I'm just pointing it out in case you didn't know about Try.

@mads-hartmann
mads-hartmann Mar 22, 2013

I'll keep it as Either for now and see what makes the most sense when I have some more call-sites for the function (Currently it's only used in tests)

@dotta
dotta Mar 22, 2013

Sounds good.

@dotta dotta commented on the diff Mar 21, 2013
...ols/eclipse/search/indexing/OccurrenceCollector.scala
+ // Method definitions
+ case t@DefDef(_, name, _, _, _, body) if !isSynthetic(pc)(t, name.toString) =>
+ occurrences += Occurrence(name.toString, file, t.pos.point, Declaration)
+ super.traverse(body) // We recurse in the case of chained invocations, foo.bar().baz()
+
+ case _ => super.traverse(tree)
+ }
+ }
+ }
+ traverser.apply(t)
+ occurrences
+ }
+
+ private def isSynthetic(pc: ScalaPresentationCompiler)
+ (tree: pc.Tree, name: String): Boolean = {
+ val syntheticNames = Set("<init>")
@dotta
dotta Mar 21, 2013

Can you find references to a class' constructor?

@mads-hartmann
mads-hartmann Mar 22, 2013

I'll add a ticket for this :) Because I just found another issue. I doesn't find references used inside string interpolation. E.g.

val t = "Mirco"
val str = s"Hi there, ${t}"
@dotta
dotta Mar 22, 2013

Cool. Didn't think of string interpolation, good catch!

@dotta dotta and 1 other commented on an outdated diff Mar 21, 2013
...ols/eclipse/search/indexing/OccurrenceCollector.scala
+
+ // Invoking a method w/o an argument doesn't result in apply, just an Ident node.
+ case t@Ident(fun) if !isSynthetic(pc)(t, fun.toString) =>
+ occurrences += Occurrence(fun.toString, file, t.pos.point, Reference)
+
+ // Invoking a method on an instance w/o an argument doesn't result in an Apply node, simply a Select node.
+ case t@Select(rest,name) if !isSynthetic(pc)(t, name.toString) =>
+ occurrences += Occurrence(name.toString, file, t.pos.point, Reference)
+ super.traverse(rest) // recurse in the case of chained selects: foo.baz.bar
+
+ // Method definitions
+ case t@DefDef(_, name, _, _, _, body) if !isSynthetic(pc)(t, name.toString) =>
+ occurrences += Occurrence(name.toString, file, t.pos.point, Declaration)
+ super.traverse(body) // We recurse in the case of chained invocations, foo.bar().baz()
+
+ case _ => super.traverse(tree)
@dotta
dotta Mar 21, 2013

It's an interesting strategy to index a method without considering its parameters (& types). On the one side, it makes the traversing logic much much simpler (and cleaner) because you don't have to carefully handle cases such as partial method applications. On the other hand, you risk to have (many?) false positive. (Just sharing my thoughts here ... :))

@mads-hartmann
mads-hartmann Mar 22, 2013

I just had a fun idea. Lets keep is as simple as possible for now and then collect statistics on hits/missed when we're in beta to see which cases might make sense to make more fine-grained :)

@dotta
dotta Mar 22, 2013

Absolutely!

@dotta dotta commented on an outdated diff Mar 21, 2013
...ols/eclipse/search/indexing/OccurrenceCollector.scala
+ // Invoking a method on an instance w/o an argument doesn't result in an Apply node, simply a Select node.
+ case t@Select(rest,name) if !isSynthetic(pc)(t, name.toString) =>
+ occurrences += Occurrence(name.toString, file, t.pos.point, Reference)
+ super.traverse(rest) // recurse in the case of chained selects: foo.baz.bar
+
+ // Method definitions
+ case t@DefDef(_, name, _, _, _, body) if !isSynthetic(pc)(t, name.toString) =>
+ occurrences += Occurrence(name.toString, file, t.pos.point, Declaration)
+ super.traverse(body) // We recurse in the case of chained invocations, foo.bar().baz()
+
+ case _ => super.traverse(tree)
+ }
+ }
+ }
+ traverser.apply(t)
+ occurrences
@dotta
dotta Mar 21, 2013

occurrences.toList (prefer returning an immutable collection, since you can get it for free).

@dotta dotta commented on an outdated diff Mar 21, 2013
...scala/tools/eclipse/search/jobs/ProjectIndexJob.scala
+ // Potentially changed by several threads. This job and the FileChangeObserver
+ private val changedResources: BlockingQueue[(IFile, FileEvent)] = new LinkedBlockingQueue[(IFile, FileEvent)]
+
+ val changed = (f: IFile) => {
+ if (SearchPlugin.isIndexable(f)) {
+ changedResources.put(f, Changed)
+ }
+ }
+
+ val removed = (f: IFile) => changedResources.put(f, Removed)
+
+ val observer = FileChangeObserver(project.underlying)(
+ onChanged = changed,
+ onAdded = changed,
+ onRemoved = removed
+ )
@dotta
dotta Mar 21, 2013

changed, removed and observer could be private, no?

@dotta dotta commented on an outdated diff Mar 21, 2013
...ala/tools/eclipse/search/indexing/SourceIndexer.scala
+import scala.tools.eclipse.ScalaPlugin
+import scala.tools.eclipse.ScalaProject
+import org.scala.tools.eclipse.search.Util._
+import org.eclipse.core.resources.IFile
+import scala.tools.eclipse.util.Utils
+
+/**
+ * Indexes Scala sources and add all occurrences to the `index`.
+ */
+class SourceIndexer(index: Index) extends HasLogger {
+
+ /**
+ * Indexes all sources in the current workspace. This removes all previous occurrences
+ * recorded in that workspace.
+ */
+ def indexWorkspace(root: IWorkspaceRoot): Unit = {
@dotta
dotta Mar 21, 2013

Is this method used? Looks like you don't need it anymore.

@dotta dotta commented on the diff Mar 21, 2013
...ala/tools/eclipse/search/indexing/SourceIndexer.scala
+
+ /**
+ * Indexes all sources in `project`. This removes all previous occurrences recorded in
+ * that project.
+ */
+ def indexProject(proj: ScalaProject): Unit = {
+ Utils.debugTimed("Indexing project %s".format(proj)) {
+ proj.allSourceFiles.foreach { indexIFile }
+ }
+ }
+
+ /**
+ * Indexes the parse tree of an IFile if the IFile is pointing to a Scala source file.
+ * This removes all previous occurrences recorded in that file.
+ */
+ def indexIFile(file: IFile): Unit = {
@dotta dotta commented on the diff Mar 21, 2013
...ala/tools/eclipse/search/indexing/SourceIndexer.scala
+ }
+ }
+
+ /**
+ * Indexes the parse tree of an IFile if the IFile is pointing to a Scala source file.
+ * This removes all previous occurrences recorded in that file.
+ */
+ def indexIFile(file: IFile): Unit = {
+ scalaSourceFileFromIFile(file).foreach { cu => indexScalaFile(cu) }
+ }
+
+ /**
+ * Indexes the occurrences in a Scala file. This removes all previous occurrences
+ * recorded in that file.
+ */
+ def indexScalaFile(sf: ScalaSourceFile): Unit = {
@dotta dotta commented on the diff Mar 21, 2013
...scala/tools/eclipse/search/jobs/ProjectIndexJob.scala
+ while( !changedResources.isEmpty && !monitor.isCanceled() && openAndExists) {
+ val (file, changed) = changedResources.poll()
+ monitor.subTask(file.getName())
+ changed match {
+ case Changed => indexer.indexIFile(file)
+ case Removed => // TODO: Remove from the index. This class needs access to the index for that.
+ }
+ monitor.worked(1)
+ }
+ monitor.done()
+
+ if (openAndExists) {
+ schedule(interval)
+ } else {
+ observer.stop
+ }
@dotta
dotta Mar 21, 2013

Since we are using a Rule you no longer need to check for openAndExists. However, we do need a way to avoid scheduling the ProjectIndexJob when the project is closed/deleted. Maybe a resource listener?

@dotta
dotta Mar 21, 2013

Yes, I didn't see the ticket.

Another thing I'm wondering about is the following. Now we set a Rule for preventing the user to delete/close the project while this Job is executing. That's good. However, we keep re-sheduling this job over and over. What I fear it can happen is that the user is not allowed to delete the project becuase this Job keeps getting re-scheduled. Which would of course be bad. If this can happen, then we need to move the Job re-scheduling logic outside of this Job (e.g., we could have a Timer that every N seconds schedules the Job). It would be good to check if this can indeed happen.

@mads-hartmann
mads-hartmann Mar 21, 2013

The job current uses schedule(interval) - doesn't that mean that the job is finished and a new one is executed after interval seconds?

We can also use beginRule/endRule in ProjectIndexJob to be specific about when we release the lock on the resources

@dotta
dotta Mar 21, 2013

The job current uses schedule(interval) - doesn't that mean that the job is finished and a new one is executed after interval seconds?

AFAIR, that means the same Job is re-scheduled.

We can also use beginRule/endRule in ProjectIndexJob to be specific about when we release the lock on the resources

Oh, I never used that. I'll check the doc, but looks like a good idea to me!

@dotta dotta commented on the diff Mar 21, 2013
...scala/tools/eclipse/search/jobs/ProjectIndexJob.scala
+ * be running as long as Eclipse is running.
+ *
+ * It uses a FileChangeObserver to keep track of files in the project that
+ * needs to be re-indexed.
+ */
+class ProjectIndexJob private (
+ indexer: SourceIndexer,
+ project: ScalaProject,
+ interval: Long) extends WorkspaceJob("Project Indexing Job: " + project.underlying.getName) with HasLogger {
+
+ private trait FileEvent
+ private case object Changed extends FileEvent
+ private case object Removed extends FileEvent
+
+ // Potentially changed by several threads. This job and the FileChangeObserver
+ private val changedResources: BlockingQueue[(IFile, FileEvent)] = new LinkedBlockingQueue[(IFile, FileEvent)]
@dotta
dotta Mar 21, 2013

Let me just say that I love it :)

@dotta
Eclipse Scala IDE member

Yes, I admit it, I really had a lot of pleasure in reviewing this PR :) The main reason is that the code is really clean. Trust me, I'd have not been able to go to such deepness if you didn't do a great job in the first place.

I have one more request, would it be possible to add a test for checking correct indexing of constructor declaration and references (both user-defined and synthetic ones).

@dotta
Eclipse Scala IDE member

@mads379 I'm gonna have a look at how to fix the PR validator ;-)

@dotta
Eclipse Scala IDE member

PLS REBUILD ALL

@dotta
Eclipse Scala IDE member

PLS REBUILD ALL

@dotta
Eclipse Scala IDE member

PLS REBUILD ALL

@dotta
Eclipse Scala IDE member

The failure of scala-search-master-2.10-pr-validator Job is correct. To fix it, you need to wide the version range for org.eclipse.jdt.core;bundle-version="[3.6.0,3.7.10)" in your MANIFEST to be org.eclipse.jdt.core;bundle-version="[3.6.0,3.8.10)"

@dotta dotta and 1 other commented on an outdated diff Mar 22, 2013
...scala/tools/eclipse/search/jobs/ProjectIndexJob.scala
+ onChanged = changed,
+ onAdded = changed,
+ onRemoved = removed
+ )
+
+ private def openAndExists: Boolean = {
+ project.underlying.exists() && project.underlying.isOpen
+ }
+
+ override def runInWorkspace(monitor: IProgressMonitor): IStatus = {
+
+ val ruleFactory = ResourcesPlugin.getWorkspace().getRuleFactory()
+ val schedulingRule = ruleFactory.modifyRule(project.underlying)
+
+ try {
+ Platform.getJobManager().beginRule(schedulingRule, monitor)
@dotta
dotta Mar 22, 2013

I checked the implementation of Job and how it handles rules, and I believe we can greatly simplify this.

When you create the Job here, before scheduling it, add the following job.setRule(ResourcesPlugin.getWorkspace().getRuleFactory() .modifyRule(project.underlying). By doing so, you no longer need the beginRule/endRule logic, because it is already taken care by the platform. The rule is only applied for the time the Job is run.

@mads-hartmann
mads-hartmann Mar 22, 2013

Awesome. I'll change it

@mads-hartmann mads-hartmann and 1 other commented on an outdated diff Mar 22, 2013
...scala/tools/eclipse/search/jobs/ProjectIndexJob.scala
+
+ private def openAndExists: Boolean = {
+ project.underlying.exists() && project.underlying.isOpen
+ }
+
+ override def runInWorkspace(monitor: IProgressMonitor): IStatus = {
+
+ val ruleFactory = ResourcesPlugin.getWorkspace().getRuleFactory()
+ val schedulingRule = ruleFactory.modifyRule(project.underlying)
+
+ try {
+ Platform.getJobManager().beginRule(schedulingRule, monitor)
+ if (monitor.isCanceled()) {
+ // I assume that if a user cancels an indexing job they just
+ // want it out of the way for a little while.
+ schedule(interval * 4)
@mads-hartmann
mads-hartmann Mar 22, 2013

Actually we shouldn't do this. It means that we can never stop the job, even with the cancel method

@dotta
dotta Mar 22, 2013

Good catch. I think it would be better to have the scheduling logic outside of this class.

@mads-hartmann

So, seems like I didn't get the version in the MANIFEST quite right. @dotta any idea?

@dotta
Eclipse Scala IDE member

Yeah, I need to check what's wrong there. I'll push a commit for fixing it somewhere. For the moment, just ignore the failure :)

@dragos dragos and 1 other commented on an outdated diff Mar 22, 2013
...ols/eclipse/search/indexing/OccurrenceCollector.scala
+ })(err)
+ }
+
+ private def findOccurrences(pc: ScalaPresentationCompiler)
+ (file: ScalaSourceFile, t: pc.Tree): Seq[Occurrence] = {
+ import pc._
+
+ val occurrences = new scala.collection.mutable.ListBuffer[Occurrence]()
+ val traverser = new Traverser {
+ override def traverse(tree: Tree) {
+
+ tree match {
+ // Direct invocations of methods
+ case Apply(t@Ident(fun), args) if !isSynthetic(pc)(t, fun.toString) =>
+ occurrences += Occurrence(fun.toString, file, t.pos.point, Reference)
+ args.foreach { super.traverse } // recurse on the arguments
@dragos
dragos Mar 22, 2013

Better use super.traverseTrees or the like.

@dragos
dragos Mar 22, 2013

One thing to check.. since you call super.traverse on each arg, it means that you won't see the first layer of nodes in each argument. I wonder if you get any occurrences for x and bar:

  foo.bar(bar.x)
@mads-hartmann
mads-hartmann Mar 25, 2013

You're completely right. Added a test for the small bit of code which fails. Using super. traverseTrees fixed that :) Pushing change shortly

@dragos dragos and 1 other commented on an outdated diff Mar 22, 2013
...ols/eclipse/search/indexing/OccurrenceCollector.scala
+ occurrences += Occurrence(fun.toString, file, t.pos.point, Reference)
+ args.foreach { super.traverse } // recurse on the arguments
+
+ // E.g. foo.bar()
+ case Apply(t@Select(rest, name), args) if !isSynthetic(pc)(t, name.toString) =>
+ occurrences += Occurrence(name.toString, file, t.pos.point, Reference)
+ args.foreach { super.traverse } // recurse on the arguments
+ super.traverse(rest) // We recurse in the case of chained invocations, foo.bar().baz()
+
+ // Invoking a method w/o an argument doesn't result in apply, just an Ident node.
+ case t@Ident(fun) if !isSynthetic(pc)(t, fun.toString) =>
+ occurrences += Occurrence(fun.toString, file, t.pos.point, Reference)
+
+ // Invoking a method on an instance w/o an argument doesn't result in an Apply node, simply a Select node.
+ case t@Select(rest,name) if !isSynthetic(pc)(t, name.toString) =>
+ occurrences += Occurrence(name.toString, file, t.pos.point, Reference)
@dragos
dragos Mar 22, 2013

It may be a bit premature, but one thing to keep in mind is that you'll have tons of little duplicate strings. Probably in the order of thousands for things like apply, Seq, List, Object, String, foreach, etc. It would make sense to reuse them (have a standard names object somewhere). The compiler does the same trick, and it saved a lot of memory (and time).

@mads-hartmann
mads-hartmann Mar 25, 2013

I created a ticket for this :)

@dragos dragos and 1 other commented on an outdated diff Mar 22, 2013
...ols/eclipse/search/indexing/OccurrenceCollector.scala
+ args.foreach { super.traverse } // recurse on the arguments
+ super.traverse(rest) // We recurse in the case of chained invocations, foo.bar().baz()
+
+ // Invoking a method w/o an argument doesn't result in apply, just an Ident node.
+ case t@Ident(fun) if !isSynthetic(pc)(t, fun.toString) =>
+ occurrences += Occurrence(fun.toString, file, t.pos.point, Reference)
+
+ // Invoking a method on an instance w/o an argument doesn't result in an Apply node, simply a Select node.
+ case t@Select(rest,name) if !isSynthetic(pc)(t, name.toString) =>
+ occurrences += Occurrence(name.toString, file, t.pos.point, Reference)
+ super.traverse(rest) // recurse in the case of chained selects: foo.baz.bar
+
+ // Method definitions
+ case t@DefDef(_, name, _, _, _, body) if !isSynthetic(pc)(t, name.toString) =>
+ occurrences += Occurrence(name.toString, file, t.pos.point, Declaration)
+ super.traverse(body) // We recurse in the case of chained invocations, foo.bar().baz()
@dragos
dragos Mar 22, 2013

The comment is wrong, this is a method definition, not application ;-)

@mads-hartmann
mads-hartmann Mar 25, 2013

If we're going to support roughly the same kinds of features as JDT we need declarations as well. Then you can search for declarations of methods called "foobar" for example :)

Also, for refactoring it might be nice to include declarations as well.

@mads-hartmann
mads-hartmann Mar 25, 2013

Ah sorry, completely misread your comment! :)

@dragos dragos and 1 other commented on an outdated diff Mar 22, 2013
...ols/eclipse/search/indexing/OccurrenceCollector.scala
+ private def findOccurrences(pc: ScalaPresentationCompiler)
+ (file: ScalaSourceFile, t: pc.Tree): Seq[Occurrence] = {
+ import pc._
+
+ val occurrences = new scala.collection.mutable.ListBuffer[Occurrence]()
+ val traverser = new Traverser {
+ override def traverse(tree: Tree) {
+
+ tree match {
+ // Direct invocations of methods
+ case Apply(t@Ident(fun), args) if !isSynthetic(pc)(t, fun.toString) =>
+ occurrences += Occurrence(fun.toString, file, t.pos.point, Reference)
+ args.foreach { super.traverse } // recurse on the arguments
+
+ // E.g. foo.bar()
+ case Apply(t@Select(rest, name), args) if !isSynthetic(pc)(t, name.toString) =>
@dragos
dragos Mar 22, 2013

Do you need this case? Wouldn't the Select node be treated by the Select case below? (same for the other Apply(Ident..)

@mads-hartmann
mads-hartmann Mar 25, 2013

No I don't. I think I had those cases earlier when I tried to distinguish between the entities that were being referenced from the parse tree. I later gave up on that goal ;)

@dragos dragos commented on the diff Mar 22, 2013
.../scala/tools/eclipse/search/indexing/Occurrence.scala
+
+object LuceneFields {
+ val WORD = "word"
+ val FILE = "file"
+ val OFFSET = "offset"
+ val OCCURRENCE_KIND = "occurrenceKind"
+}
+
+/**
+ * Represents an occurrence of a word that we're interested in when
+ * indexing the parse trees.
+ */
+case class Occurrence(
+ word: String,
+ file: ScalaSourceFile,
+ offset: Int, // char offset from beginning of file
@dragos
dragos Mar 22, 2013

Why do you need the offset?

@mads-hartmann
mads-hartmann Mar 25, 2013

When we need to search for the exact matches I want to use the askTypeAt method :)

@dotta
Eclipse Scala IDE member

@mads379 PR #20 should fix the issue with the PR validators (and SHA 41c6a97 should no longer be needed...)

@dotta dotta and 1 other commented on an outdated diff Mar 24, 2013
...c/org/scala/tools/eclipse/search/indexing/Index.scala
+ * ARM method for writing to the index.
+ */
+ private def doWithWriter(project: IProject)(f: IndexWriter => Unit): Try[Unit] = {
+ val loc = SearchPlugin.plugin.indexLocationForProject(project)
+ for {
+ dir <- Try(FSDirectory.open(loc))
+ writer <- Try(new IndexWriter(dir, config))
+ _ <- {
+ try {
+ Success(f(writer))
+ } catch {
+ case e: Exception => Failure(new ScalaSearchException(s"Unable to write to the index: ${e.getStackTraceString}"))
+ } finally {
+ writer.close()
+ dir.close()
+ }
@dotta
dotta Mar 24, 2013

The logic here is broken, I can see at least two problems:

  • if Try(new IndexWriter(dir, config)) throws an exception, the following generator in the for-comphrension won't be executed, hence both writer and dir won't be disposed.
  • if writer.close() throws an exception, the next expression, i.e., dir.close(), won't be executed.

I've just pushed a commit here for an example on how this could be fixed (you are free of including it, but it would be good to write a couple of tests to make sure it works as expected ;)) - As a side note, I'm still trying to wrap my head around Try. I feel the current usage is not providing any real advantage, and that's why I've removed all calls to Try in the linked commit.

Getting back to the main subject, I think its important to reason about exceptional behavior upfront. However, this PR is big enough as it is, hence my suggestion is to look at exception handling in a new PR (what I'm trying to say here is that we should fix all comments like TODO: Deal with exceptions - or any unhandled exception - in a new PR).

@mads-hartmann
mads-hartmann Mar 25, 2013

Oh yeah, I see. I had a look at your commit and I really like it so I've cherry-picked it and will push the changes shortly :) I created a ticket to handle the exceptions: https://scala-ide-portfolio.assembla.com/spaces/scala-ide/tickets/1001629

@dotta
Eclipse Scala IDE member

@mads379 Could you add a couple of tests for the using module. I really haven't tested it, so it's good to make sure it does what's supposed to do :)

Otherwise, this looks good to me. @dragos, how does it look on your side?

@dotta
Eclipse Scala IDE member

As Iuli pointed out, we should use Eclipse IResource instead of absolute paths. What I suggest is that we create a ticket and we tackle this in a different PR. Ok?

You can read about Eclipse IResource here

@dotta
Eclipse Scala IDE member

Alright, I've merged #20, now both PR validator should work fine.

@mads379 I believe your PR is ready for inclusion. What you should do now is rebasing your branch on the latest master (because this PR can no longer be cleanly merged) and squash/amends commit as you think it makes sense. When you are done, just force push your branch and if all is good it will get merged :)

@mads-hartmann mads-hartmann Stores occurrences of methods in Lucene
The plugin now keeps an Index (using Lucene) of all the places
methods are mentioned in the code-base. A separate Lucene index
is created on disk for each project.

The initial indices will be created upon start-up if no index
exists already. It will index the entire workspace and traverse
the parse-tree of all Scala sources.

For each project in the workspace we create a background job that
continuously update the index and re-index files that have changed.

There are two separate tests: OccurrenceCollectorTest contains test
that make sure we record the correct information from the parse trees
and IndexTest that makes sure the occurrence information can be stored
and retrieved from Lucene.

Fix #1001602
bf07aab
@mads-hartmann

PLS REBUILD ALL

@mads-hartmann

Yay! :)

@dotta dotta merged commit e2d1a10 into scala-ide:master Mar 25, 2013
@dotta
Eclipse Scala IDE member

Congratulations for this really awesome PR!

Keep up the good work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment