Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter endpoint #93

Closed
jnehring opened this issue Nov 20, 2015 · 14 comments
Closed

filter endpoint #93

jnehring opened this issue Nov 20, 2015 · 14 comments

Comments

@jnehring
Copy link
Member

We could implement an endpoint /toolbox/filter/{filter-id}. Each filter is a SPARQL query. When you submit RDF data to the filter, then the SPARQL query is executed on the RDF and the result is returned to the user. So the general idea is to filter out some triples out of NIF.

This can have two applications

a) Make the output of an e-Service easier. When someone says "I dont want to see NIF, I just want to get all entities in a text" then we can create a filter that extracts the entities from NIF. Then the user can create a pipeline that calls e-Entity first to extract entities. Then it calls the filter to drop all other information in the NIF response apart from the entities.
b) Long pipelines might suffer from the problem that they produce more and more NIF and therefore they process a lot of data. Also some unnecessary data processing might occur. Using the filter you can filter out some data inside of the pipeline.

This might be a feature of FREME 0.5. It might make #89 obsolete.

@x-fran
Copy link
Member

x-fran commented Nov 23, 2015

I like the first solution you've proposed. But we need to talk about this one too.
Personally I'd like to see what others have to say before anything.

@koidl
Copy link

koidl commented Nov 25, 2015

@jnehring I am afraid I don't get it. Is your solution proposing that we first receive the NIF based return and then send that return back to a filter which then returns the delta that we need?

@fsasaki
Copy link

fsasaki commented Nov 25, 2015

No. The proposed solution is that you send your content to FREME and then
specify what filter you want to be applied before getting the result. No
need to send things twice.

2015-11-25 17:10 GMT+01:00 Kevin Koidl notifications@github.com:

@jnehring https://github.com/jnehring I am afraid I don't get it. Is
your solution proposing that we first receive the NIF based return and then
send that return back to a filter which then returns the delta that we need?


Reply to this email directly or view it on GitHub
#93 (comment)
.

@fsasaki
Copy link

fsasaki commented Nov 25, 2015

P.S.: you could send things twice, but that is not needed.

2015-11-25 17:16 GMT+01:00 Felix Sasaki felix.sasaki@googlemail.com:

No. The proposed solution is that you send your content to FREME and then
specify what filter you want to be applied before getting the result. No
need to send things twice.

2015-11-25 17:10 GMT+01:00 Kevin Koidl notifications@github.com:

@jnehring https://github.com/jnehring I am afraid I don't get it. Is
your solution proposing that we first receive the NIF based return and then
send that return back to a filter which then returns the delta that we need?


Reply to this email directly or view it on GitHub
#93 (comment)
.

@jnehring
Copy link
Member Author

I have two ideas how the workflow of using this could look like. Here an example for the WRIPL use case of e-Terminology:

First idea: With pipelines

  1. You name the subset of information in NIF that you are interested. E.g. "return only term URIs and remove all the rest"
  2. We create a filter that extracts your desired information.
  3. We or you create a pipeline that first calls e-Terminology and then the filter.
  4. You do not call e-Terminology directly but you call the pipeline. You submit your text and get only term URIs back without all the other information. I am not sure yet about the output format. I think it will still be JSON-LD but with a very simple structure and without NIF.

2nd idea: With extra parameter

Step 1) and 2) same as above
3) You attach to the request you send to e-Terminology the parameter "filter=terms-only". "terms-only" is the id of the filter that we created in step 2). There is no pipeline involved. Also in this case you dont use the /toolbox/filter endpoint.

Actually I think that the 2nd approach is easier for the users. There is a little more implementation work but it saves us the work of creating the pipelines.

@fsasaki
Copy link

fsasaki commented Nov 26, 2015

Sounds good.

2015-11-26 10:01 GMT+01:00 Jan Nehring notifications@github.com:

I have two ideas how the workflow of using this could look like. Here an
example for the WRIPL use case of e-Terminology:

First idea: With pipelines

  1. You name the subset of information in NIF that you are interested. E.g.
    "return only term URIs and remove all the rest"
  2. We create a filter that extracts your desired information.
  3. We or you create a pipeline that first calls e-Terminology and then the
    filter.
  4. You do not call e-Terminology directly but you call the pipeline. You
    submit your text and get only term URIs back without all the other
    information. I am not sure yet about the output format. I think it will
    still be JSON-LD but with a very simple structure and without NIF.

2nd idea: With extra parameter

Step 1) and 2) same as above
3) You attach to the request you send to e-Terminology the parameter
"filter=terms-only". "terms-only" is the id of the filter that we created
in step 2). There is no pipeline involved. Also in this case you dont use
the /toolbox/filter endpoint.

Actually I think that the 2nd approach is easier for the users. There is a
little more implementation work but it saves us the work of creating the
pipelines.


Reply to this email directly or view it on GitHub
#93 (comment)
.

@koidl
Copy link

koidl commented Nov 26, 2015

@jnehring that sounds great! Some minor questions/comments

  1. Can we combine separate services? For example use a filter to get 'terms-only' (in e-entity the entity labels and in e-terminology the terms) in one return? Separate calls and filters for each service is fine I just wonder if its easier to have one filter option for both services? (Just an idea not sure if it makes sense)
  2. Its essential that the output format is JSON-LD or related. If a developer has to study a documentation to understand the output we have basically failed.

@fsasaki
Copy link

fsasaki commented Nov 26, 2015

+1 to 2) from Kevin. Also, I see that people want to have simple output
after a pipelines of services. That could be built in for the individual
last service. E.g. looking at
http://api.freme-project.eu/doc/0.4/tutorials/translate_EN-NL_including_terminology.html
one could say after e-Translation via a dedicated query: give me the terms
in the source and target language.
about the output: there is a simple standardized result format for sparql
http://www.w3.org/TR/sparql11-results-json/
not json-ld but json and very regular. Kevin, could you provide a few
example queries? We then could look into how sparql queries and the results
would look like.

Best,

Felix

2015-11-26 11:39 GMT+01:00 Kevin Koidl notifications@github.com:

@jnehring https://github.com/jnehring that sounds great! Some minor
questions/comments

Can we combine separate service? For example use a filter to get
'terms-only' (in e-entity the entity labels and in e-terminology the terms)
in one return? Separate calls and filters for each service is fine I just
wonder if its easier to have one filter option for both services? (Just an
idea not sure if it makes sense)
2.

Its essential that the output format is JSON-LD or related. If a
developer has to study a documentation to understand the output we have
basically failed.


Reply to this email directly or view it on GitHub
#93 (comment)
.

@koidl
Copy link

koidl commented Nov 26, 2015

@fsasaki

That result looks clean. @Xfran what do you think?

In relation to the queries we 'simply' need the entity label, the entity reference (URL) and the Entity Type to start with. The Entity Type is an interesting one to us because it allows us to filter out for example "Type:Thing". I am not sure however how to deal with the datasets in future. The Entity Type relates to the dbpedia dataset. Things that might trigger questions (or which we should discuss now) are:

  1. Adding relevance
  2. Dealing with returns from datasets that have no entity type (such as the finance taxonomy). However this might not be a problem because we wont need entity types if we have a domain specific taxonomy.

Is this helpful. Or will I try to construct a query example...?

Kevin

@fsasaki
Copy link

fsasaki commented Nov 26, 2015

That's helpful, thanks. I'll create a query example by next week or before.

@koidl
Copy link

koidl commented Nov 26, 2015

Thanks @fsasaki looking forward to it

@jnehring
Copy link
Member Author

Can we combine separate services? For example use a filter to get 'terms-only' (in e-entity the entity labels and in e-terminology the terms) in one return? Separate calls and filters for each service is fine I just wonder if its easier to have one filter option for both services? (Just an idea not sure if it makes sense)

It should be possible to create a pipeline that calls 1) e-eEntity, 2) calls e-Terminology and 3) applies the filter. A dedicated filter needs to be created for this.

Its essential that the output format is JSON-LD or related. If a developer has to study a documentation to understand the output we have basically failed.

I agree. The output of the filter is RDF so it can be JSON-LD.

@fsasaki
Copy link

fsasaki commented Nov 26, 2015

I agree. The output of the filter is RDF so it can be JSON-LD.

The output of a sparql query would be RDF result format which can be stored as json or in a different syntax, but it won't be jsonl-ld.

@jnehring
Copy link
Member Author

jnehring commented Jan 8, 2016

Implemented. See documentation in http://api-dev.freme-project.eu/doc/knowledge-base/filtering.html

@jnehring jnehring closed this as completed Jan 8, 2016
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants