Skip to content
This repository has been archived by the owner on Mar 11, 2021. It is now read-only.

Workitem search #395

Merged
merged 50 commits into from Nov 8, 2016
Merged

Workitem search #395

merged 50 commits into from Nov 8, 2016

Conversation

sbose78
Copy link
Member

@sbose78 sbose78 commented Oct 25, 2016

compliance* Add search capabilities to workitems title,description and ID field(s

  • Used postresql full text search capabilities.
  • Supported search mechanisms:

For each search string
"ID:12345" --> Search by ID 12345 only
"12345" ---> Search by ID,title,description
"The cost of the device is 12345" --> Search by ID,title,description

URLs can be searched as -
"http://demo.almighty.io/#/detail/212" ----> Understand that it's a URL , search full text by ID as startsWith(212) and the string startsWith("http://demo.almighty.io/#/detail/212")
So effectively the sql parameter for searching this is :
"212* & http://demo.almighty.io/#/detail/212*"

Eventually extend support for scenarios for work item linking where the workitem content has
"Hi this has been done in http://demo.almighty.io/#/detail/212"
The code should parse out the URL and link the workitem with workitem 212.

  • Pagination (Baiju)
  • REST API endpoint (Pranav)
  • search string parser (Pranav)

@codecov-io
Copy link

codecov-io commented Oct 25, 2016

Current coverage is 65.42% (diff: 71.62%)

Merging #395 into master will increase coverage by 0.56%

@@             master       #395   diff @@
==========================================
  Files            44         46     +2   
  Lines          1986       2195   +209   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits           1288       1436   +148   
- Misses          569        609    +40   
- Partials        129        150    +21   

Powered by Codecov. Last update 4c753cb...df47c64

return q, nil
}

func convertFromModel(wiType models.WorkItemType, workItem models.WorkItem) (*app.WorkItem, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some reuse of function between WorkItemRepository and SearchReposiotry would be useful

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think so too but the method is unexported in models. That's why I copied it over temporarily but unsure about the permanent solution.

Could you please suggest a way to re-use the same method instead of copying it over?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just export them? Possible move them out of the workitem repository

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aslakknutsen
I will start with exporting it - opening a separate PR for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return &result, nil
}

func (r *GormSearchRepository) loadTypeFromDB(ctx context.Context, name string) (*models.WorkItemType, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some reuse of function between WorkItemRepository and SearchReposiotry would be useful

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we not adding this code to the WorkItemRepository? It is tightly coupled to the structure of the work items.

Copy link
Member Author

@sbose78 sbose78 Oct 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We plan to support search for other objects as well, eventually, Users, etc.
But for now we shall support only workitems.

Update: Misunderstood your comment, will do what you suggsted in a separate PR now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsmaeder Did you mean that we add loadTypeFromDb to work item repository as an exported method - I wasn't quite able to figure out the best approach, could you please help?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment was aimed at why we're not adding search functionality to the work item repository instead of implementing it here. That is a different discussion.
Since we are accessing the database directly in this package anyway, it doesn't really make a difference: there is not much smarts in the loadTypes... function anyway.

}

// Search returns work items for the given query
func (r *GormSearchRepository) SearchFullText(ctx context.Context, sqlQueryString string, sqlQuerytParameter string) ([]*app.WorkItem, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit stuttering on the naming here:

search.GormSearchRepository.SearchFullText

)

const (
fulltextPsqlQuery = `select * from work_items WHERE to_tsvector('english', id::text || ' ' || fields::text) @@ to_tsquery($1)`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the to_tsvector should be indexed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbose78 sbose78 force-pushed the workitem_search branch 2 times, most recently from ca14d3d to 5985c65 Compare October 26, 2016 06:27
// This SQL query is used when search is performed across workitem fields and workitem ID
testText = `select * from work_items WHERE
setweight(to_tsvector('english', id::text), 'A') ||
setweight(to_tsvector('english', fields::text), 'B') @@ to_tsquery('english',$1) and deleted_at is NULL`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using fields::text allows me to search for 'field names' in the JSON as well. e.g. "system.title". Not sure if that is desirable.

 '2':1 'id':15 'item':14 'new':3 'null':10,12,16 'system.assignee':9 'system.creator':7 'system.description':11 'system.remote':13 'system.state':2 'system.title':4 'test':5 'tmaeder':8 'wi':6

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes @aslakknutsen I will be updating the query to search in "system.description" and "system.title" only.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in :
7a9501a

@sbose78
Copy link
Member Author

sbose78 commented Oct 26, 2016

As per discussion,

  • update all current capabilities with "Starts-with" search.
  • URLs can be searched as :
    "http://demo.almighty.io/#/detail/212" ----> Understand that it's a URL , search full text by ID as startsWith(212) and the string startsWith("http://demo.almighty.io/#/detail/212")
    So effectively the sql parameter for searching this is :
    "212* & http://demo.almighty.io/#/detail/212*"
  • "i am looking for ID:42515" --->
    The sql query parameter is "I am looking for 42515" ---> this looks at both text and ID for 42515, instead of ID only. That's a known issue for now, to be addressed later.

@pranavgore09
Copy link
Contributor

@sbose78 thanks for writing it up here. I am taking up URL task.

Action("show", func() {
Routing(
GET("/:q"),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description of the "show" action is missing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @DhritiShikhar for pointing out. done in 6ed119d

}

// RegexWorkItemDetailClientURL tells us how URL for WI details on front end looks like
const RegexWorkItemDetailClientURL = `^(?P<protocol>http[s]?)://(?P<domain>demo.almighty.io)(?P<path>/#/detail/)(?P<id>\d*)`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pranavgore09
Could you please move the const to the top of the file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbose78 sure, I am making changes: This and other pieces of code are still in progress, and in upcoming commit I will make these changes.
thanks for pointing out things.


//mapURLGroupWithValues accepts slice of group names and slice of values.
//If both slices have different lenghts, empty value will be put for group name.
func mapURLGroupWithValues(groupNames []string, stringToMatch string) map[string]string {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is very unclear what this method is doing @pranavgore09 .
Could you please add some code comments with an example what we are trying to accomplish here?
URL groups , values, etc need some explanation :)

var compiledWIURL = regexp.MustCompile(RegexWorkItemDetailClientURL)

// First index is ignored because of behaviour of "SubexpNames"
var groupsWIURL = compiledWIURL.SubexpNames()[:1]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to groupsWorkItemUrl.
also add an example what is happening here.

const RegexWorkItemDetailClientURL = `^(?P<protocol>http[s]?)://(?P<domain>demo.almighty.io)(?P<path>/#/detail/)(?P<id>\d*)`

// Use following compiled form in future use
var compiledWIURL = regexp.MustCompile(RegexWorkItemDetailClientURL)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to compiledWorkItemURL.
also add an example what is happening here.

@pranavgore09
Copy link
Contributor

@aslakknutsen @sbose78
When user input is http://demo.almighty.io/#/detail/212 for search, we convert it into following search string 212:* & demo.almighty.io/#/detail/212:*

Now, the problem with this behaviour is, as a user when I enter that URL, I expect at least that single object with id 212. But considering our query string, it performs & operation and does not pick up expected records.

This is the case we need OR in the query string instead of AND.

@sbose78
Copy link
Member Author

sbose78 commented Oct 28, 2016

Hi PranaV
Yes you are right, this is an OR operation.

Convert the URL into
( Url | id ) and then append it to the search query parameter string.

Psql recognizes OR operations written in the form ( url | id )

Shoubhik

On Oct 28, 2016 8:36 AM, "Pranav Gore" notifications@github.com wrote:

@aslakknutsen https://github.com/aslakknutsen @sbose78
https://github.com/sbose78
When user input is http://demo.almighty.io/#/detail/212 for search, we
convert it into following search string 212:* &
demo.almighty.io/#/detail/212:*

Now, the problem with this behaviour is, as a user when I enter that URL,
I expect at least that single object with id 212. But considering our query
string, it performs & operation and does not pick up expected records.

This is the case we need OR in the query string instead of AND.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#395 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAhSAIzddvlUcvDNt5frY1ELC4EgTLkNks5q4WaegaJpZM4KfoQG
.

@@ -355,3 +355,25 @@ var _ = Resource("trackerquery", func() {
})

})

var _ = Resource("search", func() {
BasePath("/search")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are only searching for work items here, should we reflect that in the path? like /search/workitems or /workitems/search?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically search should be able to return 'all type' of objects. I think there are 2 different ways to filter the type, by ?type=workitem|user|project|code and as part of the query it self.

In the UI for the specific use case of linking, the UI would know it can only link workitems(?) so it can auto add that filter.

But in a more general search bar, the user can filter it via e.g. "type:workitem search term a b"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes @aslakknutsen , type will be another query param (not mandatory).
In current case if not provided we will consider type=workitem.

This is helpful for searching user while assigning tasks to users as well.

Param("q", String, "Search Query")
})
Response(OK, func() {
Media(CollectionOf(workItem))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about paging? I'm fine to say "add it later" if the task gets too big.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes @tsmaeder , @baijum is working on adding paging.

return transaction.Do(c.ts, func() error {
t, err := c.sRepository.SearchFullText(ctx.Context, ctx.Q)
if err != nil {
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, we should handle errors by calling stuff like ctx.BadParameter() instead of just returning the error.

return &result, nil
}

func (r *GormSearchRepository) loadTypeFromDB(ctx context.Context, name string) (*models.WorkItemType, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we not adding this code to the WorkItemRepository? It is tightly coupled to the structure of the work items.

// Is "id:45453 id:43234" be valid ? NO, because the no row can have 2 IDs.
searchQuery = testID
}
return searchQuery, idStr + wordStr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shoudn't idStr and wordStr be separated by something here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example string that would be passed to the SQL query for search would be :
"2342 & issue & with & network"
The search for all the tokens above will be done on

  • ID ,
  • fields->>system.description ,
  • fields->>system.title.

So if 2342 is an ID as well, then that specific row would be considered as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you're doing "idStr + wordStr". I doubt that is correct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tsmaeder I get what you mean, let me check this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a commit to address this @tsmaeder .
@pranavgore09 : could you please review

Commit : f86839f

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsmaeder thanks for pointing out. ! I missed that.
@sbose78 checked, that was needed thanks.

parts := strings.Split(rawSearchString, " ")
var res searchKeyword
for _, part := range parts {
if strings.HasPrefix(part, "id:") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the front end know when to put "id:" in front of a word the user enters in the "context assist"? Can't we make this decision here in the back end?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the front end know when to put "id:" in front of a word the user enters in the "context assist"? Can't we make this decision here in the back end?

Good question, @tsmaeder ,
The UX for using this is yet to be defined, but one way in which the UI might be putting an "id:" explicitly is when the user types something like #SOME_NUMBER ( like we do here in GH ). In that case, the UI would make an Ajax call with "ID:SOME_NUMBER " asking the API to do a search-by-id explicitly. If the "ID" was not specified by the UI during the API call, the API would have searched by both ID and text containing SOME_NUMBER.

Why are we not adding this code to the WorkItemRepository? It is tightly coupled to the structure of the work items.

Hi @tsmaeder , I didn't find the 'comment' box under your comment and therefore I'm replying here for your comment on #395 (diff)

Are you suggesting I make the loadTypeFromDB an exported method of WorkItemRepository ? I wish we could do something similar..

@baijum baijum force-pushed the workitem_search branch 2 times, most recently from 3fac5fe to 5765868 Compare October 29, 2016 10:28
@@ -0,0 +1,2 @@
CREATE INDEX IF NOT EXISTS fields_title ON work_items USING GIN (to_tsvector('english', fields->>'system.title'));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need condition checking as we are using a migration tool?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer writing re-runnable sql queries - just doing so as a good practice :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created #422 for adding general indexing for fields

@sbose78 sbose78 force-pushed the workitem_search branch 2 times, most recently from 477527e to d1dbca4 Compare November 2, 2016 10:54
@baijum
Copy link
Contributor

baijum commented Nov 2, 2016

[test]

@baijum baijum changed the title WIP : Workitem search Workitem search Nov 2, 2016
Attribute("first", String)
Attribute("last", String)
var searchPagingLinks = a.Type("searchPagingLinks", func() {
a.Attribute("prev", d.String)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow-up on the PR @kwk did: I think we should standardize on how to do the import renames: I would propose 3-letter lowercase, like "dsl" or "api"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsmaeder yes I would agree with you on this. dsl and api much better than a and d. Still I am not very ok with the name api as it is too generic (I don't have any suggestions either). dsl sounds perfect to use.

*/

// This SQL query is used when search is performed across workitem fields and workitem ID
WhereClauseForSearchByText = `setweight(to_tsvector('english',fields->>'system.title'),'B')||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbose78 : IMO, we shold somehow mention that this where clause is only for workitem relation ?
We can maintain a constants in a table like -

search by ID full text search query
work_item WHERE_CLAUSE_WI_BY_ID WHERE_CLAUSE_WI_BY_FULL_TEXT
users WHERE_CLAUSE_USER_BY_ID WHERE_CLAUSE_USER_BY_FULL_TEXT

Because soon we will have type in query param that will decide which table to query on.
We can take union of multiple query results in the DB itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good observation @pranavgore09 ,
I agree with you that we need to have something like this, but not sure if we should be doing so right now because we are unsure about the implementation at a sql-query level about how to search multiple types. We need to figure out if we could go the single-query path , or multiple-query path.

Tracking it as part of #430

}

// prev link
if offset > 0 && count > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that pagination for work items is in master, let's use the same code in both places.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsmaeder is there any function that I can call from here? or copy the same code to here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we would have to factor that code out first.

Copy link
Member Author

@sbose78 sbose78 Nov 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @baijum , @tsmaeder ,
It would be nice to have it factored out and modularize the code for the same instead of repetitive code. Could we fix it as part of a separate PR , merged to master , and then changes taken into this ( and other PRs ) ?

tracking this here #431

}

// Validate ensures that the search string is valid and also ensures its not an injection attack.
func (r *GormSearchRepository) Validate(ctx context.Context, rawSearchString string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this going to be used? Our protection against sql injection should not rely on detecting certain classes of strings. We're going to miss one eventually.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @tsmaeder ,
I'm still thinking about how to validate the search string. But the thought process behind this is that there should be some validation of the string before we pass it on to be searched. As you mentioned, injection needn't be only of a specific grammar. Could you help with some ideas/implementation around this?

Copy link
Contributor

@tsmaeder tsmaeder Nov 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, injection is not a problem if you always treat user input as values, not part of the query string. In general, this means don't manipulate the user input, pass it as parameters to the query. I'm not sure what we're doing when constructing the query parameter is safe. If we used multiple parameters instead of one, i would be confident that we are injection-safe.
I really don't think we should validate the input string. For a malicious client, it won't make a difference (it simply will not call validate()) and for a normal client, what is the use case? Does Google validate your search?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tsmaeder ,
Your argument sounds valid to me. I'll remove the Validate(..).
db := r.db.Model(&models.WorkItem{}).Where(sqlSearchQuery, sqlSearchQueryParameter)

Should be taking care of query injection issues.

@tsmaeder
Copy link
Contributor

tsmaeder commented Nov 3, 2016

I don't like that fact that we have a service that takes a very unstructured, highly application-specific search string and directly uses it to query the database. I think it would be better if we had a layer where we have clearly defined search operators ("full text search for $1 on field X", "date < 1.1. 2017") and a separate service that turns the "unstructured" queries into structured ones. This will allow us to reuse the search facilities we are building now.
I can live with what we have here at the moment, but I think we should refactor it sooner rather than later.

pranavgore09 and others added 25 commits November 8, 2016 01:33
Updated respective tests.
Added t.Parallel() to all unit tests.
test-all working fine on local system but seems to be breaking on jenkins.
Trying to figure out the difference.
Did not find simple and better way in goadesign for adding usage example to GET request with query params, hence added as a multiline string in parameter value.

Using multiple comments instead of single line comment multiple times.
Added test for empty q value result.
Test cases updated.
tested with http://localhost:8080/api/search?q=%22http://demo.almighty.io/detail/1%22&page[limit]=2 query on local DB.
Usefull while testing locally.

Added tests for the same.
Update wir.create() with creator string value.
* Use ts_vector column on work_items updated via triggers
* Rank results
@aslakknutsen aslakknutsen merged commit 22be2a3 into master Nov 8, 2016
@aslakknutsen
Copy link
Contributor

ugh, damn it! Forgot to add "Contributed by xxx" to the merge commit. Sorry!

God job, all! 👍

@baijum baijum deleted the workitem_search branch November 9, 2016 10:12
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants