provide push API for statical information #35

Orbiter · 2015-06-04T10:45:08Z

To add more sources than harvested by twitter, we want to add data from other sources including RSS feeds and geoJSON data. These sources must be added to the message index in the context of a lifetime flag #33

The data submitted to the API must therefore include:

URL of the source
data format of the source (i.e. RSS/GeoRSS/geoJSON etc)
a harvesting frequence (the submitter knows best how often the data is changed)
a lifetime. The lifetime must be smaller or equal to the harvester frequence. The lifetime is asserted to the index and it may mean that the data disappears from search results after that time. A special lifetime of 2^31-1 can be set to announce that the data is statical forever, like a normal 'news' message, or a location that will never change (i.e. place of a city)

Orbiter · 2015-06-04T10:52:43Z

create #37 first

zyzo · 2015-07-03T08:17:21Z

Hi, what needs to be done to achieve this ? A new Push API is created (#55) but data is saved as MessageEntry without a harvesting frequence or a lifetime as described above.

…ources Issues : #205 #206 Server side requirements : loklak/loklak_server#35 loklak/loklak_server#37 loklak/loklak_server#58 (implemented) loklak/loklak_server#59 (implemented)

Orbiter · 2015-07-06T13:31:54Z

implemented with api/push/geojson.json

zyzo · 2015-07-07T09:46:38Z

I have several blocker questions, that are critical to implement the connect service interface :

what field name is source url saved to ? same for harvester frequence ?
Is the periodic harvester implemented yet ? So I just save the harvester frequence as a message field, and it will be automatically detected and harvested by the server ?
what if, one source url contains multiple messages ? Is it better to save common information (url, harvesting frequence), and list of messages the source contains, as new data type, rather than duplicate in each message ? It would be much easier to update the source.

Orbiter · 2015-07-10T07:47:26Z

what field name is source url saved to ? same for harvester frequence ?

nowhere yet. There must be a new data structure to hold this. At this time, data can just read from that url and if the import shall start again, the api must be called again. That is of course not the target design. It is true that the url must be stored and then the harvesting frequence must be either submitted as well or the frequence must be computed by try.

Is the periodic harvester implemented yet ? So I just save the harvester frequence as a message field, and it will be automatically detected and harvested by the server ?

We already have a mechanism in loklak which is doing a very same thing: the query index. This index stores all words which have been submitted as query, stored the messge frequency and provides a prediction when the next time the a message for the query may appear again. We need something for this for IoT imports as well. Designing such a thing is somehow critical because it's difficult du clean up a messed up data structure later. Therefore I would like to collect some more experience with the API before starting automated imports. From my point of view they can be added later and meanwhile we can help ourself with cron jobs pushing the API again an agin.

what if, one source url contains multiple messages ?

One source should of course contain several messages! You may think of duplicate messages (thats your next question about) but that should never be the case for several messages from one import. However, we must take care of it, see bwlow.

Is it better to save common information (url, harvesting frequence), and list of messages the source contains, as new data type, rather than duplicate in each message ? It would be much easier to update the source.

I don't exactly understand how the several topics you address (re-harvesting, data types and message duplication) are related, I believe they are unrelated. Howver they should be considered, but not connected:

re-harvesting: I answered on that in another issue
data type: all different json schemas must be considered separately, but of course they should be handled with the same re-harvesting mechanismn.
double message detection: if we harvest from IoT sources, the data may be updated, but same and also may be not updated and the same. I would consider to not distinguish these cases and just compare the new data with the old data stored in the index. Therefore we must find an identification how to find out if a IoT data object refers to the same message generation entity or not. I believe we can identify the device using the geolocation information and the source url. That would of course mean, that each source must not contain several IoT entities at the same place.

We could compute a hash from a string consisting of the harvesting url and the location. That hash must be stored into a kind of device hash field in the message or we re-use another field for that, i.e. the link field in the form <source-url>#<lat>,<lon> or create another field, like provider_id.

zyzo · 2015-07-10T12:51:59Z

I don't exactly understand how the several topics you address (re-harvesting, data types and message duplication) are related, I believe they are unrelated.

This is just a question about data scheme. The question, in a cleaner format, is : is it wiser to save import source information and list of imported messages, in a new data structure (for e.g. SourceEntry), rather than save import source information inside each imported message. I think you already answer this, and I totally agree - saving in a new data structure is the way to go :

There must be a new data structure to hold this. At this time, data can just read from that url and if the import shall start again, the api must be called again. That if of course not the target design. It is true that the url must be stored and then the harvesting frequence must be either submitted as well or the frequence must be computed by try.

And thank you for the high level of details. This answer definitely helps a lot.

Orbiter · 2015-07-14T10:04:54Z

I think there is a mix-up of two things here:

The question, in a cleaner format, is : is it wiser to save import source information and list of imported messages, in a new data structure (for e.g. SourceEntry), rather than save import source information inside each imported message. I think you already answer this, and I totally agree - saving in a new data structure is the way to go :

The new data structure should hold the source url and import metadata, not the source content. The imported content must be adjusted to fit in our message format. How to do that is already answered, you implemented a mapping for this and I suggested to add the source content (i.e. the content of the property object from geojson) as part of the message in the same fashion as rich texts are stored.

This commit introduces two new features : - save the import profile when pushing custom messages. Currently it is only implemented in /api/push/geojson.json - /api/import.json with source_type parameter to retrieve import profiles list by source_type

zyzo · 2015-07-31T06:12:50Z

Implemented in #83

This was referenced Jun 4, 2015

add harvester for statical data #36

Closed

add search index, dump file and visualization for statical data harvesting objects #37

Closed

Orbiter added the enhancement label Jun 4, 2015

Orbiter mentioned this issue Jun 22, 2015

Implement a Data Source Interface and add the data from the FOSSASIA and Freifunk Api to loklak through the interface fossasia/loklak_webclient#206

Closed

zyzo mentioned this issue Jul 4, 2015

Implement Connect Service & Data Source Interface #205 #206 fossasia/loklak_webclient#285

Merged

Orbiter closed this as completed Jul 6, 2015

zyzo mentioned this issue Jul 6, 2015

/push/geojson.json api documentation #68

Merged

zyzo reopened this Jul 7, 2015

Orbiter added the blocker for loklak_webclient label Jul 9, 2015

zyzo mentioned this issue Jul 13, 2015

Fossasia push service #77

Merged

5 tasks

zyzo mentioned this issue Jul 23, 2015

Save import information (ImportProfile) when pushing custom messages #83

Merged

2 tasks

zyzo closed this as completed Jul 31, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

provide push API for statical information #35

provide push API for statical information #35

Orbiter commented Jun 4, 2015

Orbiter commented Jun 4, 2015

zyzo commented Jul 3, 2015

Orbiter commented Jul 6, 2015

zyzo commented Jul 7, 2015

Orbiter commented Jul 10, 2015

zyzo commented Jul 10, 2015

Orbiter commented Jul 14, 2015

zyzo commented Jul 31, 2015

provide push API for statical information #35

provide push API for statical information #35

Comments

Orbiter commented Jun 4, 2015

Orbiter commented Jun 4, 2015

zyzo commented Jul 3, 2015

Orbiter commented Jul 6, 2015

zyzo commented Jul 7, 2015

Orbiter commented Jul 10, 2015

zyzo commented Jul 10, 2015

Orbiter commented Jul 14, 2015

zyzo commented Jul 31, 2015