Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Api Web Service #586

Closed
wants to merge 3 commits into from
Closed

Api Web Service #586

wants to merge 3 commits into from

Conversation

toneiv
Copy link

@toneiv toneiv commented Jun 6, 2016

Hello there,

I have made a patch that allows to present the features of Shaarli in the form of Web Service. Minor changes were made in order to add a publishing date to a link and keep track of deleted links.

An Android application to view, add and manage links is being finalized and I plan to publish it to the Play Store soon.

I'm not a PHP developer and I'm not at all familiar with Git. Thank you tell me if something needs to be add to complete this request.

Regards,
Toneiv

@ArthurHoaro
Copy link
Member

ArthurHoaro commented Jun 6, 2016

That's huge, thanks!

I'll review it later, however, I have a few questions:

  • can you explain how your authentication work?
  • what's LinkDeletedDB for?
  • can you provide an API documentation explaining what's exposed and how? (parameters, response code, etc.)

@ArthurHoaro ArthurHoaro added in progress feature documentation 3rd party interoperability with third-party platforms labels Jun 6, 2016
@nicolasdanelon
Copy link

Awesome commit !! :O looks great !! In the future we can create React or Angular themes ;)
can you add comments to your functions? we definitelly need a readme for the api.
👍 👍 👍
I was playing with curl

curl    -H "Date: Mon Jun  6 13:50:50 ART 2016" \
        -H "Authorization: oi2u987c89qwkhje2k1" \
        -H "Content-Type: application/json" \
        -d "login=root&password=root&submit=Login" \
        "127.0.0.1:9001?do=api?o=tags"

but I get

<br />
<b>Notice</b>:  Undefined offset: 1 in <b>/home/nicolasm/ws-api/application/Api.php</b> on line <b>150</b><br />
"LOGIN_ERROR_TOKEN_VERIFICATION_FAILED"%  

@toneiv
Copy link
Author

toneiv commented Jun 7, 2016

Thanks for your responses. It indeed definitely lacks some examples and explanation...

The authentication is based on this article "Designing a Secure REST (Web) API without OAuth" : https://www.ida.liu.se/~TDP024/labs/hmacarticle.pdf.

The principle is as next :

  • the client creates a unique hash representing the request to the
    server. For this I use : the type of the method (get, post, delete...), the url of the request, the date of the request, the content type and the secret key which is shared with the server.
  • I also send the date of the request
  • on the server the hash is generated in the same way and compared with the one send by the client.
  • furthermore I make a comparison between the date of the client and the date of the server and accept less than 5 minutes of differences.

Here is an example in Android Java with the retrofit Library :

String method = request.method();
String uriPath = request.url().encodedQuery();
String contentType = request.header("Content-Type");
SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMddHHmmss", Locale.getDefault());
String dateValue = sdf.format(date);
String auhotizationValue = Constants.API_KEY + ":" + hash_hmac_sha256(api_secret, method + "\n" + contentType + "\n" + dateValue + "\n" + uriPath);
//Constants.API_KEY is not used...

Request authorizedRequest = request.newBuilder()
  .addHeader("Authorization", authorizationValue)
  .addHeader("Date", dateValue)
  .build();

return chain.proceed(authorizedRequest);

The hash_hmac_sha256 looks like this :

Mac mac = Mac.getInstance("HmacSHA256");
SecretKeySpec secret = new SecretKeySpec(api_hash.getBytes(),"HmacSHA256");
mac.init(secret);
StringBuilder sbHash = new StringBuilder();
byte[] digest = mac.doFinal(api_key.getBytes());
for (byte b : digest) {
  sbHash.append(String.format("%02x", b));
}
hash = sbHash.toString();
return hash;

LinkDeleredDB is used to store the link previously deleted (only linkdate and deleteddate). It's used to sync the client and the server : if a link is deleted from the web interface, the android client is informed of this deletion with a specific request.

The method implemented actually are :
GET :

  • checkAccount (?do=api&o=account) : it returns the name of the account, the shaarli's version and the number of links on the server.
  • getTags (?do=api&o=tags) : it returns a list of tags (name and occurences)
  • getLinksUpdated (?do=api&o=links&m=updated) : it returns links updated and can use filter (see below).
  • getLinksDeleted (?do=api&o=links&m=deleted) : it returns links deleted and can use filter (see below).
  • getLinkDeletedByKey (?do=api&o=link&m=deleted&key=20160519_064930) : it returns a unique link corresponding to the linkdate in parameter or a 404 error
  • search() (?do=api&o=links&m=search&searchterm=shaarli&searchtags=todo) : it returns the links resulting of a search by term and/or by tags. It can use filter (see below)
  • getLinksByDay (?do=api&o=links&m=search&FILTER_DAY=20160127) : it returns links for a day passed in parameter and can use filter (see below).
  • getLinkByKey (?do=api&o=link&m=search&key=20160519_064930) : it returns a unique link corresponding to the linkdate in parameter or a 404 error
  • getLinkByHash (?do=api&o=link&m=search&hash=rJKFTg) : it returns a unique link corresponding to the linkdate in parameter or a 404 error

filterByDate (?do=api&xxx&mindate=20160127) : it filters the result with links between the date of the request and the date in parameter
filterByPage (?do=api&xxx&nb=20) : use for pagination

POST :

  • restore a previously deleted link
  • edit a link with linkdate or hash
  • post a new link

DELETE :

  • delete a link by linkdate or hash

I hope it's helping...

@ArthurHoaro
Copy link
Member

ArthurHoaro commented Jun 7, 2016

Thanks for the explanations!

Authentication

This looks good. It's kind of the same mechanism as JWT which I'm familiar with. It only needs to be documented. I can help on this.

LinkDeletedDB

I'm not a big fan of the idea to store deleted links in another file, nor to duplicate LinkDB and API methods code.
If I understand correctly, you're using this, and the editdate attribute to update your local database?

Maybe a better idea would be add an update feed, whatever the format:

  • datetime
  • action: UPDATED|DELETED
  • link ID

Also, deleted link restore is a new feature unrelated to the API. It's probably doable within the client.

What do you think?

API doc

This looks fine. I can work on a proper doc.

Although, I've a few remarks:

  • o=/m=: characters aren't expensive, we can use option (?) / method.
  • The smallhash is generated using the link key (linkdate), is it useful to have both?
  • For the pagination, an offset parameter is missing (nb link from number offset).
  • post: adding a new link with an already shaared URL will update the existent link.

EDIT: also, unit tests.

@toneiv
Copy link
Author

toneiv commented Jun 7, 2016

Authentication

Yes, you're right, it needs to be documented. I appreciate your help.

LinkDeletedDb

I understand your concern about that. If I understand correctly, you suggest not adding an editdate in the LinkDb but instead add the editdate or deleteddate to another database with the reference to the LinkId. It has some impacts in my application but it seems doable to me. I can work on that in the next few days.

Also, deleted link restore is a new feature unrelated to the API. It's probably doable within the client.

Sorry I don't understand what you mean. You think that it's not a good thing to offer this option in the API?

API doc

o=/m=: characters aren't expensive, we can use option (?) / method.

OK. o is for "object" actually...

The smallhash is generated using the link key (linkdate), is it useful to have both?

No it's not, I'm not using smallhash in my application. I can remove it.

For the pagination, an offset parameter is missing (nb link from number offset).

Indeed, I forgot to mention that there's a "page" parameter for this purpose.

post: adding a new link with an already shaared URL will update the existent link.

Actually, the api return an error "Duplicate URL", it seems to me that this is a much better behaviour to inform the user that this url is already in his database... Maybe provide an option about that ?

Unit test

I'm not familiar with that in PHP... I am looking that up

@ArthurHoaro
Copy link
Member

If I understand correctly, you suggest not adding an editdate in the LinkDb but instead add the editdate or deleteddate to another database with the reference to the LinkId.

Yep. A simple log, available through another service (e.g. ?do=api&object=log), which let you know which link have to be updated/deleted. It also mean removing methods updated/deleted for links, and the additional class.

o is for "object" actually...

That's exactly why I prefer explicit terms. 😃

Actually, the api return an error "Duplicate URL", it seems to me that this is a much better behaviour to inform the user that this url is already in his database... Maybe provide an option about that ?

You're right, that's better. If the error contains the existent link, it's even better. So you can switch to an edit form, and send an update request.

I'm not familiar with that in PHP... I am looking that up

Feel free to ask for help if you need to.
See https://github.com/shaarli/Shaarli/wiki/Unit-tests


Also, it looks like you don't expose a WS to list links without any search parameter?

I won't review the code yet, but please use 4 spaces indentation instead of tabs, and refer to #95 (comment) for coding style.

- LinkDeletedDB replaced by LinkHistoryDB (used to track edition and deletion on links)
- Supression of editdate in LinkDB
- More explicit parameters in Web Service (o=object, m=method...)
- Remove method using smallhash
- Coding styles : replacement tab by 4 spaces, max char by lines...
- Documentation of the method
- Redesign of filterByDate with startDate and endDate
@toneiv
Copy link
Author

toneiv commented Jun 10, 2016

Based on suggestions made on this thread, here's some additions to the web service api :

  • LinkDeletedDb has been replace by LinkHistoryDb : this file tracks the modification and deletion in links using the structure proposed by @ArthurHoaro
  • Thereby there's no more need to add an editdate in LinkDb anymore. It has been removed.
  • Useless methods with smallhash has been removed
  • New method to fetch history of links (edition, deletion or both)
  • The methods to fetch deleted and edited links has been updated to use LinkHistoryDB
  • Compliance with the standards in coding style
  • Documentation of the different methods with explanation for using the api
  • Redesign of the filterByDate method with startDate and endDate

Thanks again for all the suggestions. If possible, thank you to give me feedback on this version.

@ArthurHoaro
Copy link
Member

Thank you! I'll make a first review now, and I'll probably start working on the doc and tests this weekend.

@@ -0,0 +1,820 @@
<?php

class Account {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PHPDoc

@toneiv
Copy link
Author

toneiv commented Jun 10, 2016

Sorry if it's a bit overwhelming

No problem, thanks for your time. I just push a commit taking care of your most simple remarks.
I've got some questions/remarks that I will add directly in the concerned comments.

@@ -170,6 +171,8 @@
require_once 'application/PluginManager.php';
require_once 'application/Router.php';
require_once 'application/Updater.php';
require_once 'application/LinkHistoryDB.php';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import sorting (alphabetical order)

@virtualtam
Copy link
Member

Hi @toneiv and thanks for your interest in contributing to Shaarli!

It would have been preferable to raise a discussion issue / feature request before submitting a PR, here are some reasons why:

The aforementioned points let people discuss and define what needs to be done, and how. This also helps new developers to get started with the development and testing tools, and get PRs reviewed efficiently by splitting development tasks over small, easily-reviewable, targeted commits.

Right now, there is too high a risk to introduce regressions and security vulnerabilities, so I'd suggest archiving this PR and introducing a REST API step-by-step, with atomic changes and proper test coverage.

This would roughly correspond to:

  • adding link edition history tracking to LinkDB
  • adding HTTP query endpoints with:
    • one view (class) per endpoint
      • one PHP method per supported HTTP method
    • adding/modifying utility functions and classes (e.g. application/HttpUtils.php)
  • routing HTTP queries to these new endpoints
  • adding test coverage along each new feature/view/endpoint

@ArthurHoaro
Copy link
Member

ArthurHoaro commented Jun 11, 2016

Right now, there is too high a risk to introduce regressions

Not really, because it barely touches Shaarli's existing code or features. That's why I was willing to get this PR mergeable as a first step to a proper API. But I agree with you other points, it's probably a better idea to archive this, and getting it done step by step.

Before your bullet list, we need to define the API. Something like this.

@toneiv
Copy link
Author

toneiv commented Jun 13, 2016

Hi @virtualtam ,

I appreciate your comments regarding the fact that I should not have to submit a PR in the first place. As I mentioned in my first post, I am a beginner in PHP and Github and so I am not aware of good practice in this area.

It does seem necessary to define a concerted manner API. If you can use as a base part of my development, great. I quickly tried to answer the technical remarks mentioned by updating my code. I also would follow the evolution of the definition of an API worthy of the name.

I will publish an app on the Play Store in the coming days / weeks. So I would inform those interested in this application they must use the version on my own repository.

@nodiscc
Copy link
Member

nodiscc commented Jun 13, 2016

Before your bullet list, we need to define the API. Something like this.

@ArthurHoaro If you'd like to create a new repository for the API specification and start writing in in the format of your choice, I'd be glad to help. I'm sure there would be other contributors interested (ping @mro).

This PR can be kept open for future reference. @toneiv thanks for your work on this.

@mro
Copy link

mro commented Jun 13, 2016

API specification

@nodiscc thank you for asking – I think the best API is the one that has been around forever and the best spec are running acceptance tests (a la https://github.com/mro/Shaarli-API-test).

So currently I'm happy (for my usecases), if the refactorings of the HTTP/HTML interface don't break above tests.

A bit more visionary would be to start with defining goals of the API, security requirements and interoperability with prior art like OStatus, AtomPub, extensibility, restiness, feasibility, etc.

But frankly, I currently don't see a need for a prose API spec, nor any added complexity – be it server or client side.

@dimtion what do you think?

@ArthurHoaro
Copy link
Member

@mro: The main problem with the current "API" is that it relies on a consistency we may not be able to provide, because it's directly linked to the user experience. Exposing a proper API would allow the UX to evolve without worrying about 3rd party tools, which might have a completely different usage.

Let's say, for the sake of the example, that we change the way the tags are handled. We add a fancy JS lib which handles data in its own way. 3rd parties can no longer POST a list of tags separated by a space, and consistency is broken. Not likely to happen but you get the idea.

Another benefit I see, it would allow us to expose services specific to 3rd party tools usage, regardless of the UI. For example, getting the list of all tags, which is not currently easy to do.

@nodiscc: I'll do that.

@mro
Copy link

mro commented Jun 14, 2016

I think the current API is more powerful and more resilient than you think. From my point of view, it's about core features only and here it's HTTP POST parameters and very little HTML CSS/XPaths. Most annoying are the various styles of error reports.

But all after all it's very few conditions that have to be met for getting, posting, deleting posts.

Not likely to happen

you get the idea. If not done by intention to break compatibility, I expect this not to happen soon.

getting the list of all tags, which is not currently easy to do

I find https://links.mro.name/?do=atom&nb=all easy. It gets – among other stuff – all tags in a well-defined manner that's not likely to be needed to be changed. What kind of other easy do you have in mind?

@mro
Copy link

mro commented Jun 14, 2016

P.S.: the core features I mean are those tested by #536

@ArthurHoaro
Copy link
Member

I think the current API is more powerful and more resilient than you think.

It's possible. Anyway, designing a REST API isn't going to break what's existing.

What kind of other easy do you have in mind?

Not downloading the whole database in ATOM, which can be huge (almost 4Mo here), just to retrieve the tag list. There is already a function which retrieves all tags and count their occurrences in Shaarli's core code. We would only have to create a GET service which calls it, and formats the output.

@mro
Copy link

mro commented Jun 14, 2016

tl;dr: Keeping CRUD posts compatible with all shaarli versions out there is all the API I need (Plus the initial setup GET/POST). Build a greenfield API if you care, but plz don't break what https://github.com/mro/Shaarli-API-test/ tests.

@ArthurHoaro

First of all, you're talking about things I didn't. I was talking of CRUD for posts. Now we're in statistics-land ("all tags"), but

downloading the whole database

you asked for ALL tags, you get them (duck)

almost 4Mo here

indeed, the mother of all shaarlis is quite impressive.

We would only have to create a GET service which calls it, and formats the output.

You mean something like http://sebsauvage.net/links/?do=tagcloud and only look at (CSS selectors) *.count and subsequent [href*="?searchtags="]? It's hacky, but computers are good at such things. And it's available in every Shaarli out there.

IMO the above 2 CSS criteria are a legitimate restriction for future templates. Whoever wants this stability has to provide a test things and that's it.

Read only if bored

I agree that a greenfield-designed API could be more convenient to use and fiddling with HTML markup is more fun the fewer things are to be taken care of.

But my point is, that it's a lot of work, on both, provider and consumer side, to change an API. Especially if backward compatbility on multiple platforms has to be ensured – what not only @dimtion and I have to. And the benefit is what exactly? Machine access is already possible today.

API design is hard to get right. The above example IMO fails in several crucial points and I'd rather have no API than one like the above without any test, to say the least. It's 800 lines of legacy code from day 1.

Read if not so bored

Again, if I wasn't clear yet: I'd rather give some love to what is already there (existing behaviour, nail the relevant parts with tests like #536 does) than dream from a shiny new API (for years now, see #16) that breaks all 3rd party tools, increases the codebase, adds complexity, brings new bugs and has to be built in the first place.

@nodiscc
Copy link
Member

nodiscc commented Jun 14, 2016

@mro you make some good points, but I feel relying on the HTML/CSS layout for a clean, reliable and maintainable programmable access is somehow wrong. I need to sort my thoughts out to clearly explain why. As you said, it works for now and can probably be improved/standardized - but is it a good plan in the long term?

@mro
Copy link

mro commented Jun 14, 2016

IMO it's about priorities where to put effort into. And code without tests definitively isn't something I'd call an API or would rely on as a 3rd party client.

I insist in backward compatibility so much, because currently the HTTP/HTML interface is all we have, and what we have in a single, unchanged form since the beginnings – and we (3rd party clients) – need to support anyway for several years. So I'd love to see the situation improve here.

@pikzen
Copy link

pikzen commented Jun 14, 2016

What Shaarli currently has is not an API. Parsing HTML is not an API, it's scraping. It's a hackish way to retrieve data and pretend Shaarli has an API. The flaws of scraping are well known, and the biggest is that you're coupling your "API" with the UI. Call that the current method of retrieving data and acknowledge in the project wiki that there is no stable API, but locking Shaarli to that is technical debt that will be terrible to get rid of.

  • What happens if you want to change the Shaarli theme ? Make sure you have exactly the same tags as the "API" defined years ago because it was practical at the time.
  • What happens if your Shaarli theme loads everything with Javascript ? Your "API" is broken, unless you're willing to run a phantomjs instance.
  • What happens if your Shaarli theme doesn't conform to what is expected by whatever is consuming it? Your "API" is broken. Actually, worse, you don't know your API is broken, it'll probably just send back useless data.

HTML is not a format for data exchange. It's a format for data presentation. It's strictly worse than JSON and XML if your goals are to send data, as your schema is not explicit, and parsing it is much slower than parsing either of those. Send HTML when you want to insert it into a webpage, send a format appropriate for data exchange when you want to do data exchange.

Once again, that is terribly awful coupling, what Shaarli has been trying to get away from for over a year now. Breaking component B because A is changed is bad. Breaking a completely unrelated component because you made a UI change is worse. I should be able to have my UI terribly broken without it impacting other unrelated features.

Backwards compatibility is non-existent because there is NO API. And that problem can be solved by versioning your API endpoints. Either keep the old ones or slowly phase them out, making them return 403s or whatever error code you want.

Define an API. Discuss with the project members. Or don't. Define what endpoints you need (assuming a REST API), what methods they accept, what parameters they take. Implement that, make sure it is uses the same codepaths as the regualr app as much as possible, version it (shaarli.org/api/v1/ is the commonly used model, and works well.). Yes, this takes time. But that's what an API is. It's supposed to facilitate the developers' lives, not make them having to pull out a parser.

Concerning #536, I do not think it is a viable long term solution. For now, while Shaarli has no stable API, sure. Just don't call it an API ¯_(ツ)_/¯.
It is also a mix of XSLT, bash scripts, ruby scripts, and various tools called (that may not even be installed on the machine) that painfully check something that should be done serverside. Not only does that add language complexity to the project, that is also hundreds of lines of bash (whywouldyoudothat.png).

I will not comment on @toneiv's API implementation as I did not have the time to review the code. There are probably improvements to make, but nobody ever created a good API on the first try. It is however a step in the right direction.

@mro
Copy link

mro commented Jun 14, 2016

ok, you guys got me, let's focus the energy forward. What are your expectations how to evolve an API? What would you consider goals and non-goals?

To start with:

  • how to incubate, refine and finalize call semantics and syntax? What standards and best practices do you find worth considering? (@pikzen versioning!)
  • what prio do acceptance tests have?
  • how authenticate?

I would love to see the API defined solely via tests, but don't know how to do acceptance tests with a technology stack you guys grok.

P.S.: a last wish: plz keep HTTP/HTML stable until a core API is released.

@ArthurHoaro
Copy link
Member

What would you consider goals and non-goals?

IMHO, the first stable version should be able to retrieve, search, post, edit and delete links. That's more or less what's been proposed in this PR.

how to incubate, refine and finalize call semantics and syntax? What standards and best practices do you find worth considering? (@pikzen versioning!)
how authenticate?

A JSON REST API is probably the better way to go since it's widely used. Details need to be discussed.

what prio do acceptance tests have?

The API will be covered by unit tests.

P.S.: a last wish: plz keep HTTP/HTML stable until a core API is released.

We will.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3rd party interoperability with third-party platforms API REST API documentation feature in progress security
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants