Skip to content

An application for efficeint fulltext searching through a directory of .doc(x) files. Written in Go. GUI made using Fyne.

Notifications You must be signed in to change notification settings

r3quie/go-docx-search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

docx-search

Docx-search is an app written in Go that is made for better and faster full-text searching of doc(x) files. In my experience, full-text search in file explorer tends to be quite buggy, especially when dealing with numbers. So I decided to make my own "search engine".

It was made with precision in mind, I needed to find specific provisions that were used in previously made documents, so the search does not support wildcards.

Implementation

The code gets the text itself from the .docx file via the archive/zip package. A .docx file is just a zip file containing a bunch of .xml files along with other data of the document like images. So it just opens the file as a zip file and looks through it.

After obtaining the .xml file containing the text body, we're using a regular expression to extract the document body from the file without any XML formatting data. It very simply finds any occurrences of < and > (non-greedily) and deletes everything in between including the inequality signs. We're then left with an unformatted body of plain text. Ideal for searching:)

After that, it's just basic array work, finding a substring in a string etc.

The user input is split into a []string by line (\n). After that, we walk the directory specified in env/env when it finds another directory, it'll walk it too. If a .docx file is found the code proceeds as specified above.

Additionally, the search supports a boolean filter. At the moment the search checks one bool to determine whether to apply the filter and after that, it checks the second bool and uses it as the filter. This was relevant because under Czech law, there are 2 types of subjects. The filter counts every occurrence of 2 specified strings and returns the corresponding boolean value and through a simple logical formula returns a boolean based on whether all user input terms were found in the text body AND whether the corresponding string was found more times than the other.

GUI

The GUI is made using Fyne. Every piece of code, however, should work without the main function. You're welcome to implement it however you like.

About

An application for efficeint fulltext searching through a directory of .doc(x) files. Written in Go. GUI made using Fyne.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages